Fun and games with Apache, mod_perl, TWiki and SMF, Part 2

As promised in my last post I'm going to describe how I used mod_perl to script the generation of my httpd.conf file. The script is rather large, so I'm going to break it down into chunks and explain each one separately.

use Apache;
use Apache::PerlSections;
use CGI qw(-compile);

The first step is to load the prerequisite Apache modules. In order to configure Apache from within perl, you need to use the Apache::PerlSections module. The use CGI preloads and compiles the ubiquitous CGI module without importing it into the current module. The reason for doing this is that any CGI scripts that run under Apache::Registry will just import the symbols they need and won't need to compile the (rather large) CGI module on each invocaton - i.e. they will run faster.

# Put Apache under RM control.
use Sun::Solaris::Project qw(:ALL);
$_ = getpwuid(Apache->server->uid());
setproject(getdefaultproj($_), $_, 0) == 0 || die($!);

In order to provide predictable response times from the web server it's a good idea to run it under the Solaris Fair Share Scheduler (FSS). You should follow the link to find out how to set up shares for the webserver. The block above puts the HTTP processes into the project that's been set up for the webserver to run in. The setproject(2) system call requires root privilege, so the code depends on a quirk of the way Apache is designed - in order to open low-numbered ports, Apache must be started as root. At the point at which the perl startup script is run, Apache is still running as root so it still has the permissions necessary to issue the setproject() call to put Apache into the project that has been set up for it.

package Apache::ReadConfig;
use lib qw(/approot/conf);
use AppConf;
use Tie::DxHash;
use Net::Domain qw(hostfqdn hostdomain);
use File::Find;
use File::Path;
use POSIX qw(lchown);

# Enforce strict checking of the configuration.
$Apache::Server::StrictPerlSections = 1;

# Lexical refs for appConf variables - avoids polluting Apache::ReadConfig.
my $CC = \%AppConf::CommonConf;
my $GC = \%AppConf::GateConf;
my $SC = \%AppConf::SiteConf;

The perl Apache configuration section needs to be contained inside the package Apache::ReadConfig. Apache scans the package for any global variables, and creates Apache configuration directives with the same name, for example a perl package global named $ServerName is mapped onto the Apache ServerName configuration directive. The StrictPerlSections specifies that the contents of the Apache::ReadConfig should be be validated. The consequence is that all package globals need to correspond to valid, well-formed Apache configuration directives. The mapping between the various types of Apache directives and the corresponding perl forms is documented in the O'Reilly Practical mod_perl book, available on-line. You need to be careful to define any non-configuration variables as lexical (my) variables so they don't get mistaken for Apache configuration directives. The definitions of $CC, $GC and $SC give us easy access to the configuration hashes without polluting the Apache::ReadConfig namespace - a normal use statement with an import list would result in the creation of package globals for the imported items, and that would cause the StrictPerlSections checking to fail.

# Apache HTTP host configuration variables.
our ($ServerName, $Port, $ServerAdmin, @Listen, %VirtualHost);

# Look up global environment items.
my $sr = Apache->server_root_relative();
$sr =~ s{/$}{};
my $dn = hostdomain();
my $fqdn = hostfqdn();

# Done here so we don't have to hard-code the hostname.
$ServerName = $fqdn;
$ServerAdmin = "webservd\@$fqdn";
push(@Listen, "$fqdn:80");

The our statement defines the Apache configuration directives that are going to be defined by the perl script. Note that multivalued configuration variables map onto perl arrays (e.g. @Listen) whereas the more complex multi-level directives map onto hashes (e.g. %VirtualHost). The block above sets up the server name and makes apache listen to the normal non-SLL port 80.

# Instantiate the HTTPS virtual hosts.
while (my ($vh, $cfg) = each (%{$SC})) {
        my $vhfqdn = "$vh.$dn";
        push(@Listen, "$vhfqdn:443");

        # Preserve hash insertion order.
        my %vh;
        tie(%vh, 'Tie::DxHash');

        # Create the Apache sections for the virtual host.
        %vh = (
            basic_config($vh, $vhfqdn, $sr, $cfg),
            ssl_rewrite_rules($vh, $vhfqdn, $sr, $cfg),
            directory_sections($vh, $vhfqdn, $sr, $cfg),
        );
        $VirtualHost{$vhfqdn} = \%vh;

        # Create the .../gates directory for the virtual host.
        make_gate_dir($index_tmpl, $vh, $vhfqdn, $sr, $cfg);

        # Create the .../twiki directory for the virtual host.
        make_twiki_dir($vh, $vhfqdn, $sr, $cfg);

        # Change the ownership of the site htdocs tree.
        find({ wanted => sub { chowner($uid, $gid) }, no_chdir => 1 },
            "$sr/htdocs_$vh");
}

There's a couple of things of note here. The first is the somewhat puzzling use of the Tie::DxHash package. As I said earlier, the more complex Apache configuration directives map onto perl hashes. The problem is that perl hashes don't allow duplicate keys and they also don't preserve insertion order, whereas some of the Apache configuration directives are either order-dependent or can be repeated. Tie::DxHash fixes this problem by preserving insertion order and allowing us to insert duplicate items into a perl hash. The calls to basic_config, ssl_rewrite_rules and directory_sections return lists of configuration directives which are then inserted into the %VirtualHost hash used to define each virtual host being set up. The rest of the code sets up the contents of the htdocs directory for each virtual host - as I said originally, a lot of the hosts share common data, so these subs create all the necessary symlinks under the per-vhost htdocs directories when the server starts up.

#
# Return the basic SSL vhost configuration.
#
sub basic_config
{
        my ($vh, $vhfqdn, $sr, $cfg) = @_;
        return (
            ServerName                  => $vhfqdn,
            SSLEngine                   => 'on',
            DocumentRoot                => "$sr/htdocs_${vh}",
            ErrorLog                    => "$sr/logs/${vh}_error_log",
            TransferLog                 => "$sr/logs/${vh}_access_log",
            SSLCertificateFile          => "$sr/conf/ssl.crt/${vhfqdn}.crt",
            SSLCertificateKeyFile       => "$sr/conf/ssl.key/${vhfqdn}.key",
            CustomLog                   => "$sr/logs/${vh}_ssl_request_log" .
                q{ "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b"},
        );
}

This is an example of one of the simpler configuration blocks. Firstly it defines the host name and htdocs directory for each vhost. It also defines seperate log files for each virtual host, and makes sure that the correct certificate and key files are used by each vhost.

#
# Return the directory sections for the virtual host.
#
sub directory_sections
{
        my ($vh, $vhfqdn, $sr, $cfg) = @_;

        # Standard directory options.
        my @std_opts = (
            Order               => 'Deny,Allow',
            Allow               => 'From All',
            AllowOverride       => 'none',
            Options             => 'Indexes FollowSymLinks MultiViews',
        );

        # Root directory.
        $dir_spec = {};
        tie(%$dir_spec, 'Tie::DxHash');
        %$dir_spec = (
            AuthType                    => 'Basic',
            AuthName                    => qq{"$vh project"},
            appAuthEnabled              => 'on',
            appAuthCacheLifetime        => 60,
            appAuthGroupFile            => '/approot/conf/app.group',
            Require                     => "group $vh",
            @std_opts,
        );
        $dir{"$sr/htdocs_$vh"} = $dir_spec;

This chunk sets up the root htdocs directory of the virtual host. Access control to the virtual hosts is done using HTTP basic authentication, implemented with an in-house Apache module that authenticates usernames and passwords against the Sun-wide LDAP infrastructure. I've left out the details of several of the other configuration blocks as they are very similar to the one above. However I'll shown the TWiki configuration code below as I'll be discussing this further in a subsequent post:

        # twiki bin.
        $env = [
            [ APP_ROOT => '/approot' ],
            [ TWIKI_ROOT => "$sr/htdocs_$vh/twiki" ],
        ];
        $dir_spec = {};
        tie(%$dir_spec, 'Tie::DxHash');
        %$dir_spec = (
            FilesMatch => {
                '^(statistics|testenv)$' => {
                    SetHandler  => 'cgi-script',
                },
                '.*' => {
                    SetHandler          => 'perl-script',
                    PerlHandler         => 'Apache::Registry',
                    PerlSendHeader      => 'On',
                },
            },
            SetEnv        => $env,
            PerlSetEnv    => $env,
            Order         => 'Deny,Allow',
            Allow         => 'From All',
            AllowOverride => 'none',
            Options       => 'ExecCGI',
        );
        $dir{"$sr/htdocs_$vh/twiki/bin"} = $dir_spec;
        push(@salias, [ '/twiki/bin' =>> "$sr/htdocs_$vh/twiki/bin" ]);

        return(Alias => \@alias, ScriptAlias => \@salias, Directory => \%dir);
}

Rather than installing TWiki once for each virtual host, I wanted to have a single installation that was shared between all the vhosts. The make_twiki_dir sub mentioned previously sets up the per-vhost directories and symlinks needed to do this, and the block above sets up the necessary Apache configuration directives. Note the use of Tie::DxHash on the dir_spec hash - I want to run statistics and testenv TWiki scripts as standard CGI scripts so I explicitly match those first before providing a catch-all to run everything else under mod_perl. Without the use of Tie::DxHash it isn't possible to guarantee the ordering that's required for this to work. I also need to tell the common TWiki installation which virtual host it is running under when called. The easiest way of doing this is by passing the necessary configuration information (APP_ROOT and TWIKI_ROOT) in environment variables, as this will work for both the case where the TWiki scripts are run as vanilla CGI scripts and the case where they are run under Apache::Registry. Finally the sub retuns the Alias, ScriptAlias and Directory blocks needed to define the layout of this particular virtual host.

In the final part of this series I'll describe the problems I had in getting TWiki to run under this configuration and how I solved them with a few simple changes to the standard TWiki configuration.

Categories : Web, Tech, Perl, Work