Fun and games with Apache, mod_perl, TWiki and SMF, Part 3

As I said in my last post, I wanted to use one TWiki installation to support all the virtual hosts I was setting up. Although TWiki provides its own authentication and access control, it doesn't really provide a workable way of restricting read access. The other problem is that as I already needed to protect the site anyway using SSL and HTTP Basic Authentication, if I used the TWiki mechanisms I'd end up having to duplicate and manage the access control information in both Apache and in TWiki, which really wasn't acceptable.

Note however that you really can't get away from using TWiki authentication if you want to track page changes by user. The TWiki documentation on how to do this is pretty abysmal, but if you are already using HTTP Basic Authentication you can get away with controlling access to all the TWiki CGI scripts with Basic Authentication and requiring that people register themselves with TWiki before editing any pages. As all of the virtual hosts I was setting up were already access controlled I had that part of the problem covered.

My requirements were that I should be able to share some webs between all the virtual hosts, other webs between a particular subset of the virtual hosts, and yet other webs would be restricted to just a single virtual host. If I could to this it meant I could share a single Main and TWiki web amongst all the vhosts, so for example once people had registered in TWiki they could be given access to other sites without requiring re-registration. It would also allow me to share common information such as site documentation across all the sites whilst still maintaining secure access to the non-shared webs.

The first step was to create a twiki directory under the htdocs directory for each vhost, and create symlinks from there to the common bin, lib and templates directories. Subdirectories were created for data and pub as these wouldn't be shared between vhosts, and within those directories a further set of symlinks to the appropriate Web subdirectories of the master twiki install were made. Seperate Trash subdirectories were made - if we shared Trash subdirs then pages deleted in one Web would be visible to all the others.

The next step was to figure out how to get each vhost to use the appropriate one of the TWiki trees I'd just created. TWiki stores its configuration in lib/Twiki.cfg - the file is a set of perl global variable assignments that specify the environment TWiki is running under and each twiki CGI script loads it in at startup. Obviously these would need to be different for each vhost, but rather than having a seperate TWiki.cfg for each site I wanted to have just one copy. My last post included the following segments from perl.conf:

        $env = [
            [ APP_ROOT => '/approot' ],
            [ TWIKI_ROOT => "$sr/htdocs_$vh/twiki" ],
        ];
            SetEnv        => $env,
            PerlSetEnv    => $env,

This was so I could grab those values from the environment and then use them to configure TWiki to point to the files and directories appropriate to the current vhost. I put the necessary changes to Twiki.cfg in place, fired up Apache and pointed my browser at the TWiki homepage of one of the vhosts - Yay! it all worked. I then fired up a seperate browser window, pointed it at one of the other vhosts and bounced on the reload key a few times. It all looked OK at first, but after a few refreshes the page started showing Webs from the other site. I switched back to the original browser window, hit reload a few times and that started showing info for the other site as well - Urk!

I had a pretty good idea what was wrong. The Apache architecture consists of a pool of httpd processes that serve requests, and in a virtual hosting setup such as mine a given httpd process will potentially serve pages for multiple virtual hosts. I was also running TWiki under mod_perl, and in that setup the perl interpreter and any code loaded into it is persistent - each httpd process has an interpreter embedded inside it, unlike the normal CGI environment where each CGI script invocation results in a seperate fork/exec of the perl interpreter. Under this environment, global variables are toxic, and TWiki is riddled with them (the standard of some of the code in TWiki is less than excellent, this is just one example). I guesed that the problem was being cause by one or more global variables that weren't being reeinitialised appropriately, but which ones? I really didn't want to have to make drastic changes to the TWiki codebase to fix the problem if I could avoid it. A bit of trawling through the code revealed the root cause - although TWiki.cfg was correctly initialising everything based on the environment variables that were being passed in, on subsequent invocations the globals weren't being modified to point to the current vhost.

The fix needed two parts - I needed to seperate out the bits of TWiki.cfg that were dynamic and needed to be reinitialised on each request, and I needed to make sure that the reinitialisation was actually performed on each request. The first step was to go through TWiki.cfg and seperate out all the dynamic bits into a subroutine:

#
# Initialise the dynamic bits of TWiki's configuration.
#
sub doDynamicConfig
{
        # Fetch config from environment and untaint.
        my ($twiki_root) = $ENV{TWIKI_ROOT} =~ m{^([\w/._-]+)$};
        my ($twiki_url) = substr($ENV{SCRIPT_URI}, 0, -length($ENV{SCRIPT_URL}))
            =~ m{^([\w/:._-]+)$};

        # Set up dependent TWiki globals.
        $defaultUrlHost      = $twiki_url;
        $pubDir              = "$twiki_root/pub";
        $templateDir         = "$twiki_root/templates";
        $dataDir             = "$twiki_root/data";
        $logDir              = $dataDir;

        #
        # XXX NASTY HACK.
        # Depends on the ordering of elements in @storeSettings.
        #
        $storeSettings[1]    = $dataDir;
        $storeSettings[3]    = $pubDir,

        $wikiHomeUrl         = "$twiki_url/twiki";
        $debugFilename       = "$logDir/debug.txt";
        $warningFilename     = "$logDir/warning.txt";
        $htpasswdFilename    = "$dataDir/.htpasswd";
        $logFilename         = "$logDir/log%DATE%.txt";
        $remoteUserFilename  = "$dataDir/remoteusers.txt";
        $userListFilename    = "$dataDir/$mainWebname/$wikiUsersTopicname.txt";
}

I also seperated out all the static bits into a second subroutine, I've left that out as it's not particularly interesting. The next bit was to figure out how to make sure doDynamicConfig was run on every request. Fortunately all the TWiki scripts call a common routine on startup (TWiki::initialize in lib/TWiki.pm) so all that was required was the addition of a call to doDynamicConfig and now everything worked as it should.

That's the end (for the moment, at least) of my series of posts on this topic, I hope that someone out there who is trying to deploy TWiki across multiple, secure virtual hosts might find that the information therein saves them some grief!

Categories : Web, Tech, Perl, Work

Fun and games with Apache, mod_perl, TWiki and SMF, Part 2

As promised in my last post I'm going to describe how I used mod_perl to script the generation of my httpd.conf file. The script is rather large, so I'm going to break it down into chunks and explain each one separately.

use Apache;
use Apache::PerlSections;
use CGI qw(-compile);

The first step is to load the prerequisite Apache modules. In order to configure Apache from within perl, you need to use the Apache::PerlSections module. The use CGI preloads and compiles the ubiquitous CGI module without importing it into the current module. The reason for doing this is that any CGI scripts that run under Apache::Registry will just import the symbols they need and won't need to compile the (rather large) CGI module on each invocaton - i.e. they will run faster.

# Put Apache under RM control.
use Sun::Solaris::Project qw(:ALL);
$_ = getpwuid(Apache->server->uid());
setproject(getdefaultproj($_), $_, 0) == 0 || die($!);

In order to provide predictable response times from the web server it's a good idea to run it under the Solaris Fair Share Scheduler (FSS). You should follow the link to find out how to set up shares for the webserver. The block above puts the HTTP processes into the project that's been set up for the webserver to run in. The setproject(2) system call requires root privilege, so the code depends on a quirk of the way Apache is designed - in order to open low-numbered ports, Apache must be started as root. At the point at which the perl startup script is run, Apache is still running as root so it still has the permissions necessary to issue the setproject() call to put Apache into the project that has been set up for it.

package Apache::ReadConfig;
use lib qw(/approot/conf);
use AppConf;
use Tie::DxHash;
use Net::Domain qw(hostfqdn hostdomain);
use File::Find;
use File::Path;
use POSIX qw(lchown);

# Enforce strict checking of the configuration.
$Apache::Server::StrictPerlSections = 1;

# Lexical refs for appConf variables - avoids polluting Apache::ReadConfig.
my $CC = \%AppConf::CommonConf;
my $GC = \%AppConf::GateConf;
my $SC = \%AppConf::SiteConf;

The perl Apache configuration section needs to be contained inside the package Apache::ReadConfig. Apache scans the package for any global variables, and creates Apache configuration directives with the same name, for example a perl package global named $ServerName is mapped onto the Apache ServerName configuration directive. The StrictPerlSections specifies that the contents of the Apache::ReadConfig should be be validated. The consequence is that all package globals need to correspond to valid, well-formed Apache configuration directives. The mapping between the various types of Apache directives and the corresponding perl forms is documented in the O'Reilly Practical mod_perl book, available on-line. You need to be careful to define any non-configuration variables as lexical (my) variables so they don't get mistaken for Apache configuration directives. The definitions of $CC, $GC and $SC give us easy access to the configuration hashes without polluting the Apache::ReadConfig namespace - a normal use statement with an import list would result in the creation of package globals for the imported items, and that would cause the StrictPerlSections checking to fail.

# Apache HTTP host configuration variables.
our ($ServerName, $Port, $ServerAdmin, @Listen, %VirtualHost);

# Look up global environment items.
my $sr = Apache->server_root_relative();
$sr =~ s{/$}{};
my $dn = hostdomain();
my $fqdn = hostfqdn();

# Done here so we don't have to hard-code the hostname.
$ServerName = $fqdn;
$ServerAdmin = "webservd\@$fqdn";
push(@Listen, "$fqdn:80");

The our statement defines the Apache configuration directives that are going to be defined by the perl script. Note that multivalued configuration variables map onto perl arrays (e.g. @Listen) whereas the more complex multi-level directives map onto hashes (e.g. %VirtualHost). The block above sets up the server name and makes apache listen to the normal non-SLL port 80.

# Instantiate the HTTPS virtual hosts.
while (my ($vh, $cfg) = each (%{$SC})) {
        my $vhfqdn = "$vh.$dn";
        push(@Listen, "$vhfqdn:443");

        # Preserve hash insertion order.
        my %vh;
        tie(%vh, 'Tie::DxHash');

        # Create the Apache sections for the virtual host.
        %vh = (
            basic_config($vh, $vhfqdn, $sr, $cfg),
            ssl_rewrite_rules($vh, $vhfqdn, $sr, $cfg),
            directory_sections($vh, $vhfqdn, $sr, $cfg),
        );
        $VirtualHost{$vhfqdn} = \%vh;

        # Create the .../gates directory for the virtual host.
        make_gate_dir($index_tmpl, $vh, $vhfqdn, $sr, $cfg);

        # Create the .../twiki directory for the virtual host.
        make_twiki_dir($vh, $vhfqdn, $sr, $cfg);

        # Change the ownership of the site htdocs tree.
        find({ wanted => sub { chowner($uid, $gid) }, no_chdir => 1 },
            "$sr/htdocs_$vh");
}

There's a couple of things of note here. The first is the somewhat puzzling use of the Tie::DxHash package. As I said earlier, the more complex Apache configuration directives map onto perl hashes. The problem is that perl hashes don't allow duplicate keys and they also don't preserve insertion order, whereas some of the Apache configuration directives are either order-dependent or can be repeated. Tie::DxHash fixes this problem by preserving insertion order and allowing us to insert duplicate items into a perl hash. The calls to basic_config, ssl_rewrite_rules and directory_sections return lists of configuration directives which are then inserted into the %VirtualHost hash used to define each virtual host being set up. The rest of the code sets up the contents of the htdocs directory for each virtual host - as I said originally, a lot of the hosts share common data, so these subs create all the necessary symlinks under the per-vhost htdocs directories when the server starts up.

#
# Return the basic SSL vhost configuration.
#
sub basic_config
{
        my ($vh, $vhfqdn, $sr, $cfg) = @_;
        return (
            ServerName                  => $vhfqdn,
            SSLEngine                   => 'on',
            DocumentRoot                => "$sr/htdocs_${vh}",
            ErrorLog                    => "$sr/logs/${vh}_error_log",
            TransferLog                 => "$sr/logs/${vh}_access_log",
            SSLCertificateFile          => "$sr/conf/ssl.crt/${vhfqdn}.crt",
            SSLCertificateKeyFile       => "$sr/conf/ssl.key/${vhfqdn}.key",
            CustomLog                   => "$sr/logs/${vh}_ssl_request_log" .
                q{ "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b"},
        );
}

This is an example of one of the simpler configuration blocks. Firstly it defines the host name and htdocs directory for each vhost. It also defines seperate log files for each virtual host, and makes sure that the correct certificate and key files are used by each vhost.

#
# Return the directory sections for the virtual host.
#
sub directory_sections
{
        my ($vh, $vhfqdn, $sr, $cfg) = @_;

        # Standard directory options.
        my @std_opts = (
            Order               => 'Deny,Allow',
            Allow               => 'From All',
            AllowOverride       => 'none',
            Options             => 'Indexes FollowSymLinks MultiViews',
        );

        # Root directory.
        $dir_spec = {};
        tie(%$dir_spec, 'Tie::DxHash');
        %$dir_spec = (
            AuthType                    => 'Basic',
            AuthName                    => qq{"$vh project"},
            appAuthEnabled              => 'on',
            appAuthCacheLifetime        => 60,
            appAuthGroupFile            => '/approot/conf/app.group',
            Require                     => "group $vh",
            @std_opts,
        );
        $dir{"$sr/htdocs_$vh"} = $dir_spec;

This chunk sets up the root htdocs directory of the virtual host. Access control to the virtual hosts is done using HTTP basic authentication, implemented with an in-house Apache module that authenticates usernames and passwords against the Sun-wide LDAP infrastructure. I've left out the details of several of the other configuration blocks as they are very similar to the one above. However I'll shown the TWiki configuration code below as I'll be discussing this further in a subsequent post:

        # twiki bin.
        $env = [
            [ APP_ROOT => '/approot' ],
            [ TWIKI_ROOT => "$sr/htdocs_$vh/twiki" ],
        ];
        $dir_spec = {};
        tie(%$dir_spec, 'Tie::DxHash');
        %$dir_spec = (
            FilesMatch => {
                '^(statistics|testenv)$' => {
                    SetHandler  => 'cgi-script',
                },
                '.*' => {
                    SetHandler          => 'perl-script',
                    PerlHandler         => 'Apache::Registry',
                    PerlSendHeader      => 'On',
                },
            },
            SetEnv        => $env,
            PerlSetEnv    => $env,
            Order         => 'Deny,Allow',
            Allow         => 'From All',
            AllowOverride => 'none',
            Options       => 'ExecCGI',
        );
        $dir{"$sr/htdocs_$vh/twiki/bin"} = $dir_spec;
        push(@salias, [ '/twiki/bin' =>> "$sr/htdocs_$vh/twiki/bin" ]);

        return(Alias => \@alias, ScriptAlias => \@salias, Directory => \%dir);
}

Rather than installing TWiki once for each virtual host, I wanted to have a single installation that was shared between all the vhosts. The make_twiki_dir sub mentioned previously sets up the per-vhost directories and symlinks needed to do this, and the block above sets up the necessary Apache configuration directives. Note the use of Tie::DxHash on the dir_spec hash - I want to run statistics and testenv TWiki scripts as standard CGI scripts so I explicitly match those first before providing a catch-all to run everything else under mod_perl. Without the use of Tie::DxHash it isn't possible to guarantee the ordering that's required for this to work. I also need to tell the common TWiki installation which virtual host it is running under when called. The easiest way of doing this is by passing the necessary configuration information (APP_ROOT and TWIKI_ROOT) in environment variables, as this will work for both the case where the TWiki scripts are run as vanilla CGI scripts and the case where they are run under Apache::Registry. Finally the sub retuns the Alias, ScriptAlias and Directory blocks needed to define the layout of this particular virtual host.

In the final part of this series I'll describe the problems I had in getting TWiki to run under this configuration and how I solved them with a few simple changes to the standard TWiki configuration.

Categories : Web, Tech, Perl, Work

Fun and games with Apache, mod_perl, TWiki and SMF, Part 1

As part of the day job I've had to set up a single machine hosting several SSL-secured websites that make heavy use of mod_perl and TWiki in particular. The sites all access some shared content (e.g. source code trees) as well as having their own individual stuff, and they are all laid out in broadly the same way. Another requirement is to be able to easily and quickly add more sites in the future. Being a lazy sort of person, I didn't particularly fancy doing that all by hand, so I dug out my copy of Writing Apache Modules with Perl and C and started hacking. One of the little-known features of mod_perl is that as well as being able to use it to speed up your CGI scripts you can also use it to dynamically create the contents of your httpd.conf file. In my case, some of the configuration (e.g. icons, stylesheets and so on) was common to all the sites I was building, so it could go in httpd.conf as normal. The rest would be done dynamically, so I split that off into a separate file (more on that in a later post) and loaded in via the following block in httpd.conf:

PerlFreshRestart        on
PerlTaintCheck          on
PerlRequire             conf/perl.conf

One of the other things I wanted to do was to be able to use the same CGI scripts running under mod_perl's Apache::Registry on each of the virtual hosts, so I went one step further and abstracted out configuration which was common to both the webserver and the CGI scripts into a seperate file. This file has a hash containing an entry for each virtual host, as well as hashes describing content shared between the various virtual hosts. I'm not going to reproduce the whole thing as it is rather large, but the following excerpts will give a flavour:

# Common configuration items.
our %CommonConf = (
    src                 => '/approot/src',
    twiki               => '/approot/apache/twiki',
    site_stylesheet     => '/stylesheets/site.css',
    src_stylesheet      => '/stylesheets/sccs.css',
    cgi_stylesheet      => '/stylesheets/cgi.css',
);

our %GateConf = (
    # ON & friends.
    'ONNV (all)' => {
        dir     => 'onnv',
        path    => 'usr/src',
        include => [ 'usr/src/include' ],
        desc    => q{
            Entire ON Nevada source hierarchy.
        },
    },
    'ONNV (uts)' => {
        dir     => 'onnv',
        path    => 'usr/src/uts',
        include => [ 'usr/src/include' ],
        desc    => q{
            Kernel-only ON Nevada source hierarchy.
        },
    },
];

our %SiteConf = (
    sitea => {
        interface       => 'bge0:1',
        wiki_main       => 'Sitea',
        wiki_webs       => [ 'Sitea' ],
        wiki_search     => '/twiki/bin/search/Sitea/SearchResult?search=',
        wiki_view       => '/twiki/bin/view/Sitea',
        gates   => [
            'ONNV (all)',
            'ONNV (uts)',
        ],
    },
);

The first things of interest are the keys of the %SiteConf hash, which are the names of the virtual hosts I was going to create, and the the interface value in %SiteConf which is the virtual interface I was going to run the host on. There's a chicken-and-egg situation when you try to set up SSL-enabled virtual hosts in Apache. Normal Apache name-based virtual hosting uses the hostname in the incoming HTTP requests to figure out which virtual host you are accessing. Unfortunately when running SSL that isn't possible because the incoming request is encrypted - so Apache can't find out whick decryption key to use because the virtual hostname is encrypted! The consequence is that you need to assign each SSL virtual host it's own IP address, and therefore it's own (virtual) interface. The interface value is used by the SMF service script that is used to start up the webserver - this wraps around the standard apachectl script and creates/deletes the required virtual interfaces as necessary before calling the standard apachectl script:

#!/bin/ksh -p
#
# SMF service script for Apache.
#

# Include SMF shell support.
. /lib/svc/share/smf_include.sh

# Root of application install tree.
typeset -r APP_ROOT=${APP_ROOT:-$(APP_ROOT)}

# Apache configuration
typeset -r APACHE_HOME=${APP_ROOT}/apache
typeset -r APACHE_LOG=$APACHE_HOME/logs
typeset -r APACHE_BIN=$APACHE_HOME/bin
typeset -r NETMASK=255.255.255.0

if [[ $# -ne 1 ]]; then
        echo "Invalid arguments"
        exit $SMF_EXIT_ERR_FATAL
fi

# Clear SMV environment variables.
smf_clear_env

case $1 in
start)
        /bin/rm -rf $APACHE_LOG/*
        for vhost in $(perl -I$APP_ROOT/conf -MAppConf \
            -e 'AppConf::print_hosts()'); do
                if=$(perl -I$APP_ROOT/conf -MAppConf \
                    -e "AppConf::print_interface('$vhost')")
                /usr/sbin/ifconfig $if plumb up $vhost netmask $NETMASK
        done
        $APACHE_BIN/apachectl startssl \
            && exit $SMF_EXIT_OK || exit $SMF_EXIT_ERR_FATAL
        ;;
stop)
        $APACHE_BIN/apachectl stop
        status=$?
        for vhost in $(perl -I$APP_ROOT/conf -MAppConf \
            -e 'AppConf::print_hosts()'); do
                if=$(perl -I$APP_ROOT/conf -MAppConf \
                    -e "AppConf::print_interface('$vhost')")
                /usr/sbin/ifconfig $if unplumb
        done
        [[ $status -eq 0 ]] && exit $SMF_EXIT_OK || exit $SMF_EXIT_ERR_FATAL
        ;;
refresh)
        $APACHE_BIN/apachectl graceful \
            && exit $SMF_EXIT_OK || exit $SMF_EXIT_ERR_FATAL
        ;;
*)
        echo "Invalid method $1"
        exit $SMF_EXIT_ERR_FATAL
        ;;
esac

The AppConf::print_hosts() and AppConf::print_interface() subs are provided by the common configuration file so that shell scripts (such as the SMF service script) can query the list of virtual hosts and interfaces. The other bit of the jigsaw is the SMF manifest that describes the Apache service:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--
    start/stop/restart apache for app
-->
<service_bundle type='manifest' name='app:apache'>
<service name='application/app/apache' type='service' version='1'>
    <single_instance />
    <instance name='default' enabled='true'>
        <!-- milestones required -->
        <dependency name='milestone'
            grouping='require_all'
            restart_on='error'
            type='service'>
            <service_fmri value='svc:/milestone/multi-user-server:default' />
        </dependency>
        <dependency name='filesystem'
            grouping='require_all'
            restart_on='error'
            type='service'>
            <service_fmri value='svc:/system/filesystem/local:default' />
        </dependency>
        <!-- services required -->
        <dependency name='database'
            grouping='require_all'
            restart_on='refresh'
            type='service'>
            <service_fmri value='svc:/application/app/mysql:default' />
        </dependency>
        <!-- default method context -->
        <method_context working_directory=':default' project=':default'>
            <method_credential user='root' group='root' />
            <method_environment>
                <envvar name='PATH'
                value='/approot/apache/bin:/aproot/utils/bin:/approot/perl5/bin:/usr/bin' />
            </method_environment>
        </method_context>
        <!-- methods -->
        <exec_method
            type='method'
            name='start'
            exec='apache start'
            timeout_seconds='60'
        />
        <exec_method
            type='method'
            name='stop'
            exec='apache stop'
            timeout_seconds='60'
        />
        <exec_method
            type='method'
            name='refresh'
            exec='apache refresh'
            timeout_seconds='60'
        />
    </instance>
    <stability value='Stable' />
    <template>
        <common_name>
            <loctext xml:lang='C'>
            app application - apache component
            </loctext>
        </common_name>
        <documentation>
            <doc_link name='apache'
              uri='file:////approot/apache/htdocs/manual' />
        </documentation>
    </template>
</service>
</service_bundle>

This is a pretty standard SMF script, things to note are the dependency on another component of the application (the MySQL service) and the use of the little-used method_context element to factor out the common environment (UID, working directory and PATH) from the individual start/stop/restart method definitions.

In part 2 I'll describe how I scripted the configuration of the virtual hosts, including putting them under the control of the Fair Share scheduler from within perl.

Categories : Web, Tech, Perl, Work

Jack Frost

I was out on Patrol today, and Peter was nagging me because I hadn't blogged anything in a long time, and he's right - nearly three months since my last post - oops! As for my wander today up Tintwistle Knarr and across to Arnfield Moor with Bob, I have nothing much to report other than it was overcast, grey, chilly and dark very early - oh, and the eight hares we saw were all getting their snowy white winter coats. Speaking of seasonal things, I was downloading photos from by camera and I found this rather nice wintry scene which I've added a touch of soft focus - fame and glory to anyone who identifies the location :-)

Frostry trees