Making a perl filehandle into an object

I'm working on a project that uses both perl and MySQL, and I needed an easy way of piping stuff from perl into mysqlimport so that I could do fast bulk uploads to the database. MySQL is a bit odd in how it does this - it insists that the data has to come from a file who's filename prefix is the same as the table into which you are loading. This precludes a vanilla perl piped open and requires the use of a named pipe - the easiest way being to create a named pipe under /tmp of the form <table name>.<pid>.

I needed several loaders open simultaneously, so the obvious thing was to abstract the functionality into a class. I still wanted to be able to use normal print statements to output the table rows to the loader, so I wanted the filehandle to simultaneously be an object as well. Creating objects in perl is done with the bless operator, which requires a reference to bless - you can't bless a normal scalar value into an object. Fortunately open() in later versions of perl actually gives you a reference to a filehandle:

$ perl -d -e 1

Loading DB routines from version 1.25
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):	1
  DB<1> my $a;

  DB<2> open($a, '>', '/dev/null');

  DB<3> x $a
0  GLOB(0x3a40e0)
   -> *main::$a
         FileHandle({*main::$a}) => fileno(3)
  DB<4> bless($a, 'MyClass');

  DB<5> x $a
0  MyClass=GLOB(0x3a40e0)
   -> *main::$a
         FileHandle({*main::$a}) => fileno(3)

Having blessed the filehandle into the appropriate class we can then invoke methods on it (e.g. $a->do_stuff()), as well as print to it directly (e.g. print($a "hello\n");). However, having an object without any associated properties isn't really much use - in my case I needed to be able to store the name of the named pipe I was using to communicate with mysqlimport so that I could remove it when the filehandle was closed. It's possible to use the perl tie and overloading mechanisms to do this, but as this is perl and tmtowtdi always applies, and there is in fact a simpler although less obvious way.

To follow this it's first necessary to understand a little about how perl actually stores values internally. Variables come in different types, the common ones being scalars, hashes and arrays, repectively denoted by the leading $, % and @ characters on variables. $a, %a and @a look like they are entirely different variables, but in fact they aren't - they are all slots in a single perl symbol table entry called "a". These symbol table entries are called typeglobs or globs for short and are accessible by using the "*" prefix on a variable, so *a refers to the typeglob where $a, %a and @a all live.

The "a" filehandle that we opened in the example above also has a hash slot, so if we want to store additional attributes on "a" we need some way of getting the associated hash slot in its typeglob. This is actually very easy, although the syntax is a little abstruse:

my $fh;
open($fh, '>', $fifo);
my $self = \%{*$fh};
$self->{fifo} = $fifo;
bless($fh, $class);

Let's pick apart the line that assigns to $self. $fh is actually a reference to a filehandle, so we dereference it with *$ to the entire "fh" glob. The %{...} says we want to access the hash slot of the "fh" glob, and the \ gets us a reference to that, so $self ends up being a reference to the hash slot associated with "fh". Phew. We can then assign to it as a normal hash reference. When we subsequently call a method on the blessed $fh filehandle, we can use exactly the same chant to get back the hash reference and access the data that we put in it. This trick is used by the standard perl IO::Socket class to squirrel away socket attributes, but it's often useful to be able to associate properties with a filehandle yourself so I think this particular technique deserves to be more widely known. If you want further information on how all this stuff hangs together, you should check out the perlref manpage.

Tags : , ,
Categories : Tech, Perl