Fun and games with Apache, mod_perl, TWiki and SMF, Part 3

As I said in my last post, I wanted to use one TWiki installation to support all the virtual hosts I was setting up. Although TWiki provides its own authentication and access control, it doesn't really provide a workable way of restricting read access. The other problem is that as I already needed to protect the site anyway using SSL and HTTP Basic Authentication, if I used the TWiki mechanisms I'd end up having to duplicate and manage the access control information in both Apache and in TWiki, which really wasn't acceptable.

Note however that you really can't get away from using TWiki authentication if you want to track page changes by user. The TWiki documentation on how to do this is pretty abysmal, but if you are already using HTTP Basic Authentication you can get away with controlling access to all the TWiki CGI scripts with Basic Authentication and requiring that people register themselves with TWiki before editing any pages. As all of the virtual hosts I was setting up were already access controlled I had that part of the problem covered.

My requirements were that I should be able to share some webs between all the virtual hosts, other webs between a particular subset of the virtual hosts, and yet other webs would be restricted to just a single virtual host. If I could to this it meant I could share a single Main and TWiki web amongst all the vhosts, so for example once people had registered in TWiki they could be given access to other sites without requiring re-registration. It would also allow me to share common information such as site documentation across all the sites whilst still maintaining secure access to the non-shared webs.

The first step was to create a twiki directory under the htdocs directory for each vhost, and create symlinks from there to the common bin, lib and templates directories. Subdirectories were created for data and pub as these wouldn't be shared between vhosts, and within those directories a further set of symlinks to the appropriate Web subdirectories of the master twiki install were made. Seperate Trash subdirectories were made - if we shared Trash subdirs then pages deleted in one Web would be visible to all the others.

The next step was to figure out how to get each vhost to use the appropriate one of the TWiki trees I'd just created. TWiki stores its configuration in lib/Twiki.cfg - the file is a set of perl global variable assignments that specify the environment TWiki is running under and each twiki CGI script loads it in at startup. Obviously these would need to be different for each vhost, but rather than having a seperate TWiki.cfg for each site I wanted to have just one copy. My last post included the following segments from perl.conf:

        $env = [
            [ APP_ROOT => '/approot' ],
            [ TWIKI_ROOT => "$sr/htdocs_$vh/twiki" ],
        ];
            SetEnv        => $env,
            PerlSetEnv    => $env,

This was so I could grab those values from the environment and then use them to configure TWiki to point to the files and directories appropriate to the current vhost. I put the necessary changes to Twiki.cfg in place, fired up Apache and pointed my browser at the TWiki homepage of one of the vhosts - Yay! it all worked. I then fired up a seperate browser window, pointed it at one of the other vhosts and bounced on the reload key a few times. It all looked OK at first, but after a few refreshes the page started showing Webs from the other site. I switched back to the original browser window, hit reload a few times and that started showing info for the other site as well - Urk!

I had a pretty good idea what was wrong. The Apache architecture consists of a pool of httpd processes that serve requests, and in a virtual hosting setup such as mine a given httpd process will potentially serve pages for multiple virtual hosts. I was also running TWiki under mod_perl, and in that setup the perl interpreter and any code loaded into it is persistent - each httpd process has an interpreter embedded inside it, unlike the normal CGI environment where each CGI script invocation results in a seperate fork/exec of the perl interpreter. Under this environment, global variables are toxic, and TWiki is riddled with them (the standard of some of the code in TWiki is less than excellent, this is just one example). I guesed that the problem was being cause by one or more global variables that weren't being reeinitialised appropriately, but which ones? I really didn't want to have to make drastic changes to the TWiki codebase to fix the problem if I could avoid it. A bit of trawling through the code revealed the root cause - although TWiki.cfg was correctly initialising everything based on the environment variables that were being passed in, on subsequent invocations the globals weren't being modified to point to the current vhost.

The fix needed two parts - I needed to seperate out the bits of TWiki.cfg that were dynamic and needed to be reinitialised on each request, and I needed to make sure that the reinitialisation was actually performed on each request. The first step was to go through TWiki.cfg and seperate out all the dynamic bits into a subroutine:

#
# Initialise the dynamic bits of TWiki's configuration.
#
sub doDynamicConfig
{
        # Fetch config from environment and untaint.
        my ($twiki_root) = $ENV{TWIKI_ROOT} =~ m{^([\w/._-]+)$};
        my ($twiki_url) = substr($ENV{SCRIPT_URI}, 0, -length($ENV{SCRIPT_URL}))
            =~ m{^([\w/:._-]+)$};

        # Set up dependent TWiki globals.
        $defaultUrlHost      = $twiki_url;
        $pubDir              = "$twiki_root/pub";
        $templateDir         = "$twiki_root/templates";
        $dataDir             = "$twiki_root/data";
        $logDir              = $dataDir;

        #
        # XXX NASTY HACK.
        # Depends on the ordering of elements in @storeSettings.
        #
        $storeSettings[1]    = $dataDir;
        $storeSettings[3]    = $pubDir,

        $wikiHomeUrl         = "$twiki_url/twiki";
        $debugFilename       = "$logDir/debug.txt";
        $warningFilename     = "$logDir/warning.txt";
        $htpasswdFilename    = "$dataDir/.htpasswd";
        $logFilename         = "$logDir/log%DATE%.txt";
        $remoteUserFilename  = "$dataDir/remoteusers.txt";
        $userListFilename    = "$dataDir/$mainWebname/$wikiUsersTopicname.txt";
}

I also seperated out all the static bits into a second subroutine, I've left that out as it's not particularly interesting. The next bit was to figure out how to make sure doDynamicConfig was run on every request. Fortunately all the TWiki scripts call a common routine on startup (TWiki::initialize in lib/TWiki.pm) so all that was required was the addition of a call to doDynamicConfig and now everything worked as it should.

That's the end (for the moment, at least) of my series of posts on this topic, I hope that someone out there who is trying to deploy TWiki across multiple, secure virtual hosts might find that the information therein saves them some grief!

Categories : Web, Tech, Perl, Work