For the purposes of this chapter, I’m going to label as “dynamic subroutines” anything I don’t explicitly name by typing sub some_name
or that doesn’t exist until runtime. Perl is extremely flexible in letting me figure out the code as I go along, and I can even have code that writes code. I’m going to lump a bunch of different subroutine topics in this chapter just because there’s no good home for them apart from each other.
We first showed anonymous subroutines in Learning Perl when we showed user-defined sorting, although we didn’t tell you that they were anonymous subroutines. In Intermediate Perl we used them to create closures, work with map
and grep
, and a few other things. I’ll pick up where Intermediate Perl left off to show just how powerful they can be. With any of these tricks, not knowing everything ahead of time can be very liberating.
I can store anonymous subroutines in variables. They don’t actually execute until I tell them to. Instead of storing values, I store behavior. This anonymous subroutine adds its first two arguments and returns the result, but it won’t do that until I execute it. I merely define the subroutine and store it in $add_sub
:
my $add_sub = sub { $_[0] + $_[1] };
This way, I can decide what to do simply by choosing the variable that has the behavior that I want. A simple-minded program might do this with a series of if-elsif
tests and branches because it needs to hardcode a branch for each possible subroutine call. Here I create a little calculator to handle basic arithmetic. It takes three arguments on the command line and does the calculation. Each operation gets its own branch of code:
#!/usr/bin/perl # basic-arithmetic.pl use v5.10.1; use strict; REPL: while( 1 ) { my( $operator, @operands ) = get_line(); if( $operator eq '+' ) { add( @operands ) } elsif( $operator eq '-' ) { subtract( @operands ) } elsif( $operator eq '*' ) { multiply( @operands ) } elsif( $operator eq '/' ) { divide( @operands ) } else { print "No such operator [$operator]!\n"; last REPL; } } print "Done, exiting...\n"; sub get_line { # This could be a lot more complicated, but this isn't the point print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; ( split /\s+/, $line )[1,0,2]; } sub add { say $_[0] + $_[1] } sub subtract { say $_[0] - $_[1] } sub multiply { say $_[0] * $_[1] } sub divide { say $_[1] ? $_[0] / $_[1] : 'NaN' }
Those branches are really just the same thing; they take the two operands, perform a calculation, and print the result. The only thing that differs in each branch is the subroutine name. If I want to add more operations I have to add more nearly identical branches of code. Not only that, I have to add the code to the while
loop, obscuring the intent of the loop. If I decide to do things a bit differently, I have to change every branch. That’s just too much work.
I can turn that on its head so I don’t have a long series of branches to code or maintain. I want to extract the subroutine name from the branches so I can make one block of code that works for all operators. Ideally, the while
loop wouldn’t change and would just deal with the basics of getting the data and sending them to the right subroutine:
while( 1 ) { my( $operator, @operand ) = get_line(); my $some_sub = ....; say $some_sub->( @operands ); }
Now the subroutine is just something stored in the variable $some_sub
, so I have to decide how to get the right anonymous subroutine in there. I could use a dispatch table (a hash that stores the anonymous subroutines), then select the subroutines by their keys. In this case, I use the operator symbol as the key. I can also catch bad input because I know which operators are valid: they are the keys of the hash.
My processing loop stays the same even if I add more operators. I also label the loop REPL
(for Read-Evaluate-Print), and I’ll use that label later when I want to control the looping from one of my subroutines:
#!/usr/bin/perl # basic-arithmetic-dispatch.pl use v5.10.1; use strict; use vars qw( %Operators ); %Operators = ( '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { $_[1] ? eval { $_[0] / $_[1] } : 'NaN' }, ); REPL: while( 1 ) { my( $operator, @operand ) = get_line(); my $some_sub = $Operators{ $operator }; unless( defined $some_sub ) { say "Unknown operator [$operator]"; last REPL; } say $Operators{ $operator }->( @operand ); } print "Done, exiting...\n"; sub get_line { print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; ( split /\s+/, $line )[1,0,2]; }
If I want to add more operators, I just add new entries to the hash. I can add completely new operators, such as the %
operator for modulus, or the x
operator as a synonym for the *
multiplication operator:
use vars qw( %Operators ); %Operators = ( '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { eval { $_[0] / $_[1] } || 'NaN' }, '%' => sub { $_[0] % $_[1] }, ); $Operators{ 'x' } = $Operators{ '*' };
That’s fine and it works, but maybe I have to change my program so that instead of the normal algebraic notation I use Reverse Polish Notation (where the operands come first and the operator comes last). That’s easy to handle because I just change the way I pick the anonymous subroutine. Instead of looking at the middle argument, I look at the last argument. That all happens in my get_line
subroutine. I rearrange that a bit and everything else stays the same:
sub get_line { print "\nprompt> "; my $line = <STDIN>; $line =~ s/^\s+|\s+$//g; my @list = split /\s+/, $line; unshift( @list, pop @list ); @list; }
Now that I’ve done that, I can make a little change to handle more than just binary operators. If I want to handle something that takes more than two arguments, I do the same thing I just did: take the last argument and use it as the operator and pass the rest of the arguments to the subroutine. I don’t really have to change anything other than adding a new operator. I define a "
operator and use the max
function from List::Util
to find the maximum value of all the arguments I pass to it. This is similar to the example we showed in Learning Perl to show that Perl doesn’t care how many arguments I pass to a subroutine:
%Operators = ( # ... same stuff as before '"' => sub { my $max = shift; foreach ( @_ ) { $max = $_ if $_ > $max } $max }, );
I can also handle a single operand because my code doesn’t really care how many there are, and a list of one element is just as good as any other list. Here’s the reason that I actually wrote this program. I often need to convert between number bases, or from unix time to a time I can read:
%Operators = ( # ... same stuff as before 'dh' => sub { sprintf "%x", $_[0] }, 'hd' => sub { sprintf "%d", hex $_[0] }, 't' => sub { scalar localtime( $_[0] ) }, );
Finally, how about an operator that works with 0 arguments? It’s just a degenerate case of what I already have. My previous programs didn’t have a way to stop the program. If I used those programs, I’d have to interrupt the program. Now I can add my q
operator, which really isn’t an operator but a way to stop the program. I cheat a little by using last
to break out of the while
loop.
I could do anything I like, though, including exit
straight away. In this case, I use last
with the loop label I gave to the while
:
%Operators = ( # ... same stuff as before 'q' => sub { last REPL }, );
If I need more operators, I simply add them to the hash with a reference to the subroutine that implements them. I don’t have to add any logic or change the structure of the program. I just have to describe the additional feature (although the description is in code).
In the last section I stored my anonymous subroutines in a variable, but a subroutine is really just another slot in the typeglob (see Chapter 8). I can store subroutines there too. When I assign an anonymous subroutine to a typeglob, Perl figures out to put it in the CODE slot. After that I use the subroutine just as if I had defined it with a name:
print "Foo is defined before\n" if defined( &foo ); *foo = sub { print "Here I am!\n" }; foo(); print "Foo is defined afterward\n" if defined( &foo );
This can be useful if I need to replace some code in another module as I’ll do in Chapter 10. I don’t want to edit the other module. I’ll leave it as it is and replace the single definition I need to change. Since subroutines live in the symbol table, I can just use the full package specification to replace a subroutine:
#!/usr/bin/perl use v5.10.1; package Some::Module; sub bar { say "I'm in " . __PACKAGE__ } package main; Some::Module::bar(); *Some::Module::bar = sub { say "Now I'm in " . __PACKAGE__ }; Some::Module::bar();
If I run this under warnings, Perl catches my suspicious activity and complains because I really shouldn’t be doing this without a good reason:
% perl -w replace_sub.pl
I'm in Some::Module
Subroutine Some::Module::bar redefined at replace_sub.pl line 11.
Now I'm in main
I change the code a bit to get around that warning. Instead of turning off all warnings, I isolate that bit of code with a naked block and turn off any warnings in the redefine
class:
{ no warnings 'redefine'; *Some::Module::bar = sub { say "Now I'm in " . __PACKAGE__ }; }
Although I did this with an existing subroutine definition, I can do it without a previous declaration too. With a little modification my main
package defines the new subroutine quux
in Some::Module
:
package Some::Module; # has no subroutines package main; { no warnings 'redefine'; *Some::Module::quux = sub { say "Now I'm in " . __PACKAGE__ }; } Some::Module::quux();
Recognize anything familiar? If I change it around it might look a bit more like something you’ve seen before as a trick to import symbols into another namespace. You’ve probably been doing this same thing for quite a while without even knowing about it:
package Some::Module; sub import { *main::quux = sub { say "I came from " . __PACKAGE__ }; } package main; Some::Module->import(); quux();
This is the same thing that the Exporter
module does to take definitions in one package and put them into another. It’s only slightly more complicated than this because Exporter
figures out who’s calling it and does some work to look in @EXPORT
and @EXPORT_OK
. Other than that, it’s a bunch of monkey programming around an assignment to a typeglob:
In the previous section, I replaced the definition of a valid subroutine name with an anonymous subroutine. I fiddled with the symbol table to make things happen. Now, I’m going to move from fiddling to abuse.
A symbolic reference, or reference to the symbol table, uses a string to choose the name of the variable and what looks like a dereference to access it:
my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo
This normally isn’t a good idea, so much so that strict
prohibits it. Adding use strict
to my example, I get a fatal error:
use strict; my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo
It’s the refs
portion of strict
that causes the problem:
Can't use string ("foo") as a SCALAR ref while "strict refs" in use at program.pl line 3.
I can get around that by turning off the refs
portion temporarily:
use strict; { no strict 'refs'; my $name = 'foo'; my $value_in_foo = ${ $name }; # $foo }
I could also just not turn on the refs
portion of strict
, but it’s better to turn it off only when I need it and let Perl catch unintended uses:
use strict qw(subs vars); # no 'refs'
For dynamic subroutine tricks, I want to store the subroutine name in a variable, then turn it into a subroutine.
First, I put the name foo
into the scalar $good_name
. I then dereference it as a typeglob reference so I can assign my anonymous subroutine to it. Since $good_name
isn’t a reference, Perl uses its value as a symbolic reference. The value becomes the name of the typeglob Perl should look at and affect. When I assign my anonymous subroutine to *{ $good_name }
, I’m creating an entry in the symbol table for the current package for a subroutine named &foo
. It also works with the full package specification so I can create &Some::Module::foo
too.
#!/usr/bin/perl use strict; { no strict 'refs'; my $good_name = 'foo'; *{ $good_name } = sub { say 'Hi, how are you?' }; my $remote_name = 'Some::Module::foo'; *{ $remote_name } = sub { say 'Hi, are you from Mars?' }; } foo(); # no problem Some::Module::foo(); # no problem
I can be even more abusive, though, and this is something that I shouldn’t ever do, at least not in any code that does something useful or important.
By putting an illegal name in a variable I can get around Perl’s identifier rules. Normally, I have to start a variable name with a letter or an underscore and follow it with letters, underscores, or digits. Now I get around all that to create the subroutine with the name <=>
by using a symbolic reference:
{ no strict 'refs'; my $evil_name = '<=>'; *{ $evil_name } = sub { print "How did you ever call me?\n" }; # <=>() yeah, that's not gonna happen *{ $evil_name }{CODE}->(); &{$evil_name}(); # Another way ;-) }
I don’t need the variable though:
*{ '<=>' } = sub { print "How did you ever call me?\n" }; &{ '<=>' }();
I still can’t use my illegal subroutine in the normal way, so I have to look in its typeglob or use another symbolic reference.
In my Data::Constraint
module, I needed to provide a way to validate a value in such a way that the user could build up complex requirements easily and without writing code. The validation would be a matter of configuration, not programming.
Instead of applying a validation routine to a set of values, I turned it around to apply a list of subroutines to a value. Each particular value would have its own combination of validation routines, and I’d validate each value separately (although probably still in some sort of loop). Each subroutine is a constraint on the value.
I start by defining some subroutines to check a value. I don’t know ahead of time what the values will represent or which constraints the user will place on it. I’ll make some general subroutines that the programmer can combine in any way she likes. Each subroutine returns true or false:
my %Constraints = ( is_defined => sub { defined $_[0] }, not_empty => sub { length $_[0] > 0 }, is_long => sub { length $_[0] > 8 }, has_whitespace => sub { $_[0] =~ m/\s/ }, no_whitespace => sub { $_[0] !~ m/\s/ }, has_digit => sub { $_[0] =~ m/\d/ }, only_digits => sub { $_[0] !~ m/\D/ }, has_special => sub { $_[0] =~ m/[^a-z0-9]/ }, );
The %Constraints
hash now serves as a library of validation routines that I can use. Once defined, I figure out how I want to use them.
For example, I want to write a password checker that looks for at least eight characters, no whitespace, at least one digit, and at least one special character. Since I’ve stored the subroutines in a hash, I just pull out the ones I need and pass the candidate password to each one:
chomp( my $password = <STDIN> ); my $fails = grep { ! $Constraints{ $_ }->( $password ) } qw( is_long no_whitespace has_digit has_special );
I use grep
in scalar context so it returns the number of items for which its block returns true. Since I really want the number of items that return false, I negate the return value of the subroutine call to make false turn into true, and vice versa. If $fails
is anything but zero, I know that something didn’t pass.
The benefit comes when I want to apply this to many different values, each of which might have their own constraints. The technique is the same, but I have to generalize it a bit more:
my $fails = grep { ! $Constraints{ $_ }->( $input{$key} ) } @constraint_names;
From there parameter checking is simply configuration:
password is_long no_whitespace has_digit has_special employee_id not_empty only_digits last_name not_empty
I specify that configuration however I like and load it into my program. It is especially useful for nonprogrammers who need to change the behavior of the application. They don’t need to touch any code. If I store that in a file, I read in the lines and build a data structure to hold the names and the constraints that go with them. Once I have that set up, I access everything in the right way to do the same thing I did in the previous example:
while( <CONFIG> ) { chomp; my( $key, @constraints ) = split; $Config{$key} = \@constraints; } my %input = get_input(); # pretend that does something foreach my $key ( keys %input ) { my $failed = grep { ! $Constraints{ $_ }->( $input{$key} ) } @{ $Config{$key} }; push @failed, $key if $failed; } print "These values failed: @failed\n";
My code to check them is small and constant no matter how many input parameters I have or the particular requirements for each of them.
This is the basic idea behind Data::Constraint
although it does more work to set up the situation and return a list of the constraints the value did not meet. I could change this up a little to return a list of the constraints that failed:
my @failed = grep { $Constraints{ $_ }->( $value ) ? () : $_ } @constraint_names;
Much in the same way that I went through a list of constraints in the previous example, I might want to build a processing pipeline. I do the same thing: decide which subroutines to include, then iterate through that list, applying in turn each subroutine to the value.
I can normalize a value by deciding which transformations I should perform. I store all of the transformations as subroutines in %Transformations
, then list the ones I want to use in @process
. After that, I read in lines on input and apply each subroutine to the line:
#!/usr/bin/perl # sub-pipeline.pl my %Transformations = ( lowercase => sub { $_[0] = lc $_[0] }, uppercase => sub { $_[0] = uc $_[0] }, trim => sub { $_[0] =~ s/^\s+|\s+$//g }, collapse_whitespace => sub { $_[0] =~ s/\s+/ /g }, remove_specials => sub { $_[0] =~ s/[^a-z0-9\s]//ig }, ); my @process = qw( remove_specials lowercase collapse_whitespace trim ); while( <STDIN> ) { foreach my $step ( @process ) { $Transformations{ $step }->( $_ ); print "Processed value is now [$_]\n"; } }
I might even combine this sort of thing with the constraint checking I did in the previous section. I’ll clean up the value before I check its validity. The input and processing code is very short and should stay that way. The complexity is outside of the flow of the data.
Perl v5.16 added the __SUB__
token so I can reference the current subroutine without knowing it’s name or storing a reference to it. A subroutine can call itself without knowing its name, if it even has one. Under the feature
pragma, this is known as current_sub
.
use v5.16; use feature qw(current_sub);
The perlsub example implements a factorial subroutine that gets its answer by recursion:
use v5.16; my $factorial = sub { my( $n ) = @_; return 1 if $n == 1; return( $n * __SUB__->( $n - 1 ) ); };
I don’t have to use __SUB__
as part of an immediate dereference. I can store the code reference, although I don’t know why I’d want to do that since I can just use __SUB__
when I need it:
my $current_sub = __SUB__;
I can reference the anonymous subroutine without __SUB__
if I declare the variable before I assign to it:
my $factorial; $factorial = sub { my( $n ) = @_; return 1 if $n == 1; return( $n * $factorial->( $n - 1 ) ); };
This section isn’t really like the previous two, but I always think of it when I talk about these techniques. As we told you in Intermediate Perl, I can use a scalar variable in the place of a method name as long as the value is a simple scalar (so, no references or other oddities). This works just fine as long as the object can respond to the foo
method:
my $method_name = 'foo'; $object->$method_name();
If I want to run a chain of methods on an object, I can just go through the list of method names like I did for the anonymous subroutines. It’s not really the same thing to Perl, but for the programmer it’s the same sort of thinking. I go through the method names using map
to get all of the values that I want:
my $isbn = Business::ISBN->new( '0596101058' ); my( $country, $publisher, $item ) = map { $isbn->$_ } qw( group_code publisher_code article_code );
I don’t have parallel code where I have to type the same thing many times. Again, the code to extract the values I need is very short and the complexity of choosing and listing the methods I need happens away from the important parts of the code flow.
Because subroutine references are scalars, I can pass them as arguments to other subroutines:
my $nameless_sub = sub { ... }; foo( $nameless_sub );
But I don’t want to pass these things as scalars; I want to do the fancy things that sort
, map
, and grep
do by using inline blocks:
my @odd_numbers = grep { $_ % 2 } 0 .. 100; my @squares = map { $_ * $_ } 0 .. 100; my @sorted = sort { $a <=> $b } qw( 1 5 2 0 4 7 );
To work this little bit of magic, I need to use Perl’s subroutine prototypes. Someone may have told you that prototypes are as useless as they are evil, but in this case I need them to tell Perl that the naked block of code represents a subroutine.
As an example, I want to write something that reduces a list to a single value according to the block of code that I give it. Graham Barr does this in List::Util
with the reduce
function, which takes a list and turns it into a single value according to the subroutine I give it. This snippet turns a list of numbers into its sum:
use List::Util; my $sum = reduce { $a + $b } @list;
The reduce
function is a well-known method to process a list and you’ll see it in many other languages. To seed the operation, it takes the first two arguments off of the list and computes the result according to the inline subroutine. After that, it takes the result and the next element of the list and repeats the computation, doing that until it has gone through all of the elements of the list.
As with map
, grep
, and sort
, I don’t put a comma after the inline subroutine argument to reduce
. To get this to work, though, I need to use Perl’s subroutine prototypes to tell the subroutine to expect an inline subroutine.
The List::Util
module implements its functions in XS to make them really speedy, but in case I can’t load the XS stuff for some reason, Graham has a pure Perl backup:
package List::Util; sub reduce (&@) { my $code = shift; no strict 'refs'; return shift unless @_ > 1; use vars qw($a $b); my $caller = caller; local(*{$caller."::a"}) = \my $a; local(*{$caller."::b"}) = \my $b; $a = shift; foreach (@_) { $b = $_; $a = &{$code}(); } $a; }
In his prototype, Graham specifies (&@)
. The &
tells Perl that the first argument is a subroutine and the @
says the rest is a list. The perlsub documentation has the list of prototype symbols and their meanings, but this is all I need here.
The rest of reduce
works like sort
by putting two elements into the package variables $a
and $b
. Graham defines the lexical variables with those names, and immediately assigns to the typeglobs for $a
and $b
in the calling package by using symbolic references. After that the values of $a
and $b
are the lexical versions. When he calls the subroutine argument &{$code}()
, that code looks at its package variables, which are the ones in effect when I wrote the subroutine. Got that? Inside reduce
, I’m using the lexical versions, but inside $code
, I’m using the package versions from the calling package. That’s why Graham made them aliases of each other.
I can get rid of the $a
and $b
global variables too. To do that, I can use @_
instead:
my $count = reduce { $_[0] + $_[1] } @list;
Since @_
is one of Perl’s special variables that always live in the main::
package, I don’t have to worry about the calling package. I also don’t have to worry about putting the list elements in variables. I can play with @_
directly. I call the anonymous subroutine with the first two elements in @_
and put the result back into @_
. I keep doing that until @_
has only one element, which I finally return:
sub reduce(&@) { my $sub = shift; while( @_ > 1 ) { unshift @_, $sub->( shift, shift ); } return $_[0]; }
So far this has only worked with flat lists. What if I wanted to do a similar thing with a complex data structure? In my Object::Iterate
module, I created versions of map
and grep
that I can use with arbitrary data structures in objects. I call my versions imap
and igrep
:
use Object:Iterate; my @filtered = igrep {...} $object; my @transformed = imap {...} $object;
I use the same prototype magic I used before, although this time the second argument is a scalar because I’m working with an object instead of a list. I use the prototype, (&$)
:
sub igrep (&$) { my $sub = shift; my $object = shift; $object->_check_object; my @output = (); while( $object->__more__ ) { local $_ = $object->__next__; push @output, $_ if $sub->(); } $object->__final__ if $object->can( __final__ ); wantarray ? @output : scalar @output; } sub _check_object { croak( "iterate object has no __next__ method" ) unless eval { $_[0]->can( '__next__' ) }; croak( "iterate object has no __more__ method" ) unless eval { $_[0]->can( '__more__' ) }; $_[0]->__init__ if eval { $_[0]->isa( '__init__' ) }; return 1; }
In igrep
, I put the inline subroutine argument into $sub
and the object argument into $object
. Object::Iterate
works by relying on the object to provide methods to get the next elements for the iteration. I ensure that the object can respond to those methods by calling _check_object
, which returns true if the object has the right methods.
The __more__
method lets igrep
know if there are any more elements to process. If there are more elements to process, igrep
uses the __next__
method to get the next element from the object. No matter what I’ve done to store the data in my object, igrep
doesn’t worry about it because it makes the object figure it out.
Once I have an element, I assign it to $_
, just like the normal versions of map
and grep
do. Inside my inline, I use $_
as the current element.
Here’s a short example using my Netscape::Bookmarks
module. I want to walk through its tree of categories and links to check all of the links. Once I get my $bookmarks
object, I use it with igrep
. Inside the inline subroutine, I use the check_link
function from my HTTP::SimpleLinkChecker
module to get the HTTP status of the link. If it’s 200
, the link is okay, but since I want the bad links, I igrep
for the ones that aren’t 200
. Finally, I print the number of bad links along with the list of links:
#!/usr/bin/perl # bookmark-checker.pl use HTTP::SimpleLinkChecker qw(check_link); use Netscape::Bookmarks; use Object::Iterate qw(igrep); my $bookmarks = Netscape::Bookmarks->new( $ARGV[0] ); die "Did not get Bookmarks object!" unless ref $bookmarks; my @bad_links = igrep { 200 != check_link($_); } $bookmarks; { local $/ = "\n\t"; print "There are " . @bad_links . " bad links$/@bad_links\n"; }
The magic happens later in the program where I defined the special methods to work with Object::Iterate
. I create a scope where I can define some methods in Netscape::Bookmarks::Category
and provide a scope for the lexical variable @links
. My __more__
method simply returns the number of elements in @links
, and __next__
returns the first element in @links
. I could have been more fancy to have __next__
walk through the data structure instead of using __init__
to get them all at once, but that would take a lot more room on the page. No matter what I decide to do, I just have to follow the interface for Object::Iterate
:
{ package Netscape::Bookmarks::Category; my @links = (); sub __more__ { scalar @links } sub __next__ { shift @links } sub __init__ { my $self = shift; my @categories = ( $self ); while( my $category = shift @categories ) { push @categories, $category->categories; push @links, map { $_->href } $category->links; } print "There are " . @links . " links\n"; } }
When Perl can’t find a method on a module or anywhere in its inheritance tree, it goes back to the original class and looks for the special subroutine AUTOLOAD. As a catch-all; Perl sets the package variable $AUTOLOAD
to the name of the method for which it was looking and passes AUTOLOAD the same parameter list. After that it’s up to me what I want to do.
To define a method based on AUTOLOAD
, I first have to figure out what the method name should be. Perl puts the full package specification in $AUTOLOAD
, and I usually only need the last part, which I can extract with a regular expression:
if( $AUTOLOAD =~ m/::(\w+)$/ ) { # stuff with $1 }
In some code, you’ll also see this as a substitution that discards everything but the method name. This has the disadvantage of destroying the original value of $AUTOLOAD
, which I might want later:
$AUTOLOAD =~ s/.*:://; # destructive, not preferred
Once I have the method name, I can do anything I like. Since I can assign to typeglobs to define a named subroutine (as I promised in Chapter 8), I might as well do that. I use $AUTOLOAD
, which has its original with the full package specification still, as a symbolic reference. Since $AUTOLOAD
is not a reference, Perl interprets its typeglob dereference to mean that it should define the variable with that name, access the typeglob, and make the assignment:
*{$AUTOLOAD} = sub { ... };
If $AUTOLOAD
is Foo::bar
, this turns into:
*{'Foo::bar'} = sub { ... };
That one line sets the right package, defines the subroutine name without defining the code that goes with it, and finally assigns the anonymous subroutine. If I were to code that myself ahead of time, my code would look like this:
{ package Foo; sub bar; *bar = sub { ... } }
Once I’ve defined the subroutine, I want to run it with the original arguments I tried to pass to the method name. However, I want to make it look as if AUTOLOAD
had nothing to do with it, and I don’t want AUTOLOAD
to be in the call stack. This is one of the few places where I should use a goto
. This replaces AUTOLOAD
in the subroutine stack and runs the new subroutine I’ve just defined. By using an ampersand in front of the name and nothing on the other side, Perl uses the current @_
for the argument list of my subroutine call:
goto &{$AUTOLOAD};
In Chapter 14 of Intermediate Perl we use AUTOLOAD
to define subroutines on the fly. We look in the $AUTOLOAD
variable. If the method name is the same as something in @elements
, we create an anonymous subroutine to return the value for the hash element with that key. We assign that anonymous subroutine to the typeglob with that name. That’s a symbolic reference so we wrap a naked block around it to limit the scope of our no strict 'refs'
. Finally, once we’ve made the typeglob assignment we use goto
to redispatch the method call to the subroutine we just defined. In effect, it’s as if the subroutine definition was always there and the next time I call that method Perl doesn’t have to look for it:
XXX: Check with latest Alpaca
sub AUTOLOAD { my @elements = qw(color age weight height); our $AUTOLOAD; if ($AUTOLOAD =~ /::(\w+)$/ and grep $1 eq $_, @elements) { my $field = ucfirst $1; { no strict 'refs'; *{$AUTOLOAD} = sub { $_[0]->{$field} }; } goto &{$AUTOLOAD}; } if ($AUTOLOAD =~ /::set_(\w+)$/ and grep $1 eq $_, @elements) { my $field = ucfirst $1; { no strict 'refs'; *{$AUTOLOAD} = sub { $_[0]->{$field} = $_[1] }; } goto &{$AUTOLOAD}; } die "$_[0] does not understand $method\n"; }
One of my favorite uses of AUTOLOAD
comes from the Hash::AsObject
module by Paul Hoffman. He does some fancy magic in his AUTOLOAD
routine so I access a hash’s values with its keys, as I normally would, or as an object with methods named for the keys:
use Hash::AsObject; my $hash = Hash::AsObject->new; $hash->{foo} = 42; # normal access to a hash reference print $hash->foo, "\n"; # as an object; $hash->bar( 137 ), # set a value;
It can even handle multilevel hashes:
$hash->{baz}{quux} = 149; $hash->baz->quux;
The trick is that $hash
is really just a normal hash reference that’s blessed into a package. When I call a method on that blessed reference, it doesn’t exist so Perl ends up in Hash::AsObject::AUTOLOAD
. Since it’s a pretty involved bit of code to handle lots of special cases, I won’t show it here, but it does basically the same thing I did in the previous section by defining subroutines on the fly.
Autosplitting is another variation on the AUTOLOAD
technique but I haven’t seen it used as much as it used to be. Instead of defining subroutines dynamically, AutoSplit
takes a module and parses its subroutine definitions and stores each subroutine in its own file. It loads a subroutine’s file only when I call that subroutine. In a complicated API with hundreds of subroutines I don’t have to make Perl compile every subroutine when I might just want to use a couple of them. Once I load the subroutine, Perl does not have to compile it again in the same program. Basically, I defer compilation until I need it.
To use AutoSplit
, I place my subroutine definitions after the __END__
token so Perl does not parse or compile them. I tell AutoSplit
to take those definitions and separate them into files:
% perl -e 'use AutoSplit; autosplit( "MyModule.pm", "auto_dir", 0, 1, 1 );
I usually don’t need to split a file myself, though, since ExtUtils::MakeMaker
takes care out that for me in the build process. After the module is split, I’ll find the results in one of the auto
directories in the Perl library path. Each of the .al
files holds a single subroutine definition:
ls ./site_perl/5.8.4/auto/Text/CSV _bite.al combine.al fields.al parse.al string.al autosplit.ix error_input.al new.al status.al version.al
To load the method definitions when I need them, I use the AUTOLOAD
method provided by AutoLoader
and typically use it as a typeglob assignment. It knows how to find the right file, load it, parse and compile it, then define the subroutine:
use AutoLoader; *AUTOLOAD = \&AutoLoader::AUTOLOAD;
You may have already run into AutoSplit
at work. If you’ve ever seen an error message like this, you’ve witnessed AutoLoader
looking for the missing method in a file. It doesn’t find the file, so it reports that it can’t locate the file. The Text::CSV
module uses AutoLoader
, so when I load the module and call an undefined method on the object, I get the error:
% perl -MText::CSV -e '$q = Text::CSV-
new; $q->foobar'>
Can't locate auto/Text/CSV/foobar.al in @INC ( ... ).
This sort of error almost always means that I’m using a method name that isn’t part of the interface.
I can use subroutine references to represent behavior as data, and I can use the references like any other scalar.
The documentation for prototypes is in the perlsub documentation.
Mark Jason Dominus also used the function names imap
and igrep
to do the same thing I did, although his discussion of iterators in Higher-Order Perl is much more extensive. See http://hop.perl.plover.com/. I talk about my version in “The Iterator Design Pattern” in The Perl Review 0.5 (September 2002), which you can get for free online, http://www.theperlreview.com/Issues/The_Perl_Review_0_5.pdf. Mark Jason’s book covers functional programming in Perl by composing new functions out of existing ones, so it’s entirely devoted to fancy subroutine magic.
Randy Ray writes about autosplitting modules in The Perl Journal #6. For the longest time it seemed that this was my favorite article on Perl and the one that I’ve read the most times.
Nathan Torkington’s “CryptoContext” appears in The Perl Journal #9, as well as in the TPJ compilation The Best of The Perl Journal: Computer Science & Perl Programming.