Although I don’t normally deal with typeglobs or the symbol table, I need to understand them for the tricks I’ll use in later chapters. I’ll lay the foundation for advanced topics including dynamic subroutines and jury-rigging code in this chapter.
Symbol tables organize and store Perl’s package (global) variables, and I can affect the symbol table through typeglobs. By messing with Perl’s variable bookkeeping I can do some powerful things. You’re probably already getting the benefit of some of these tricks without even knowing it.
Before I get too far, I want to review the differences between package and lexical variables. The symbol table tracks package variables but not lexical variables. When I fiddle with the symbol table or typeglobs, I’m dealing with package variables. Package variables are also known as global variables since they are visible everywhere in the program.
In Learning Perl and Intermediate Perl, we used lexical variables whenever possible. We declared lexical variables with my
, and those variables could only be seen inside their scope. Since lexical variables have limited reach, I didn’t need to know all of the program to avoid a variable name collision. Lexical variables are a bit faster too since Perl doesn’t have to deal with the extra bookkeeping of the symbol table.
Lexical variables have a limited scope, and they only affect that part of the program. This little snippet declares the variable name $n
twice in different scopes, creating two different variables that do not interfere with each other:
my $n = 10; # outer scope my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { my $n = shift; $n ** 2; }
This double use of $n
is not a problem. The declaration inside the subroutine is a different scope and gets its own version that masks the other version. At the end of the subroutine, its version of $n
disappears as if it never existed. The outer $n
is still 10
.
Package variables are a different story. Doing the same thing with package variables stomps on the previous definition of $n
:
$n = 10; my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { $n = shift; $n ** 2; }
Perl has a way to deal with the double use of package variables, though. The local
built-in temporarily moves the current value, 10
, out of the way until the end of the scope, and the entire program sees the new value, 15
, until the scope of local
ends:
$n = 10; my $square = square( 15 ); print "n is $n, square is $square\n"; sub square { local $n = shift; $n ** 2; }
We showed the difference in Intermediate Perl. The local
version changes everything including the parts outside of its scope while the lexical version only works inside its scope. Here’s a small program that demonstrates it both ways. I define the package variable $global
, and I want to see what happens when I use the same variable name in different ways. To watch what happens, I use the show_me
subroutine to tell me what it thinks the value of $global
is. I’ll call show_me
before I start, then subroutines that do different things with $global
. Remember that show_me
is outside of the lexical scope of any other subroutine:
#!/usr/bin/perl # not strict clean, yet, but just wait $global = qq(I'm the global version); show_me('At start'); lexical(); localized(); show_me('At end'); sub show_me { my $tag = shift; print "$tag: $global\n" }
The lexical
subroutine starts by defining a lexical variable also named $global
. Within the subroutine, the value of $global
is obviously the one I set. However, when it calls show_me
, the code jumps out of the subroutine. Outside of the subroutine, the lexical variable has no effect. In the output, the line I tagged with From lexical()
shows I'm the global version
.
sub lexical { my $global = "I'm in the lexical version"; print "In lexical(), \$global is --> $global\n"; show_me('From lexical()'); }
Using local
is completely different since it deals with the package version of the variable. When I localize a variable name, Perl sets aside its current value for the rest of the scope. The new value I assign to the variable is visible throughout the entire program until the end of the scope. When I call show_me
, even though I jump out of the subroutine, the new value for $global
that I set in the subroutine is still visible.
sub localized { local $global = "I'm in the localized version"; print "In localized(), \$global is --> $global\n"; show_me('From localized'); }
The output shows the difference. The value of $global
starts off with its original version. In lexical()
, I give it a new value but show_me
can’t see it; show_me
still sees the global version. In localized()
, the new value sticks even in show_me
. However, after I’ve called localized()
, $global
comes back to its original values.
At start: I'm the global version In lexical(), $global is --> I'm in the lexical version From lexical: I'm the global version In localized(), $global is --> I'm in the localized version From localized: I'm in the localized version At end: I'm the global version
Hold that thought for a moment, because I’ll use it again after I introduce typeglobs.
No matter which part of my program I am in, or which package I am in, I can always get to the package variables as long as I preface the variable name with the full package name. Going back to my lexical()
, I can see the package version of the variable even when that name is masked by a lexical variable of the same name. I just have to add the full package name to it, $main::global
.
sub lexical { my $global = "I'm in the lexical version"; print "In lexical(), \$global is --> $global\n"; print "The package version is still --> $main::global\n"; show_me('From lexical()'); }
The output shows that I have access to both:
In lexical, $global is --> I'm the lexical version The package version is still --> I'm the global version
That’s not the only thing I can do, however. If, for some odd reason, I have a package variable with the same name as a lexical variable that’s currently in scope, I can use our
(introduced in Perl 5.6) to tell Perl to use the package variable for the rest of the scope:
sub lexical { my $global = "I'm in the lexical version"; our $global; print "In lexical with our, \$global is --> $global\n"; show_me('In lexical()'); }
Now the output shows that I don’t ever get to see the lexical version of the variable:
In lexical with our, $global is --> I'm the global version
It seems pretty silly to use our
that way since it masks the lexical version for the rest of the subroutine. If I only need the package version for part of the subroutine, I can create a scope just for it so I can use it for that part and let the lexical version take the rest:
sub lexical { my $global = "I'm in the lexical version"; { our $global; print "In the naked block, our \$global is --> $global\n"; } print "In lexical, my \$global is --> $global\n"; print "The package version is still --> $main::global\n"; show_me('In lexical()'); }
Now the output shows all of the possible ways I can use $global
:
In the naked block, our $global is --> I'm the global version In lexical, my $global is --> I'm the lexical version The package version is still --> I'm the global version
Each package has a special hash-like data structure called the symbol table, which comprises all of the stashes for that package. A stash is a hash that has all the variables defined in a package. It’s not a Perl hash like we showed in Learning Perl, but it looks and acts like it in some ways, and its name is the package name with two colons on the end, such as %main::
.
This isn’t a normal Perl hash, but I can look in it with the keys
operator. Want to see all of the symbol names defined in the main
package? I simply print all the keys for this special hash:
#!/usr/bin/perl # show_main_vars.pl foreach my $entry ( keys %main:: ) { print "$entry\n"; }
I won’t show the output here because it’s rather long, but when I look at it, I have to remember that those are the variable names without sigils. When I see the identifier _
, I have to remember that it has references to the variables $_
, @_
, and so on. Here are some special variable names that Perl programmers will recognize once they put a sigil in front of them:
/ " ARGV INC ENV $ - 0 @
If I look in another package I don’t see anything because I haven’t defined any variables yet:
#!/usr/bin/perl # show_empty_foo_vars.pl foreach my $entry ( keys %Foo:: ) { print "$entry\n"; }
If I define some variables in package Foo
, I’ll then be able to see some output:
#!/usr/bin/perl # show_foo_vars.pl package Foo; @n = 1 .. 5; $string = "Hello Perl!\n"; %dict = ( 1 => 'one' ); sub add { $_[0] + $_[1] } foreach my $entry ( keys %Foo:: ) { print "$entry\n"; }
The output shows a list of the identifier names without any sigils attached. The symbol table stores the identifier names:
n add string dict
The %main::
symbol table also contains all of the other symbol tables, so I can also write the same program with main::
in front of Foo::
:
foreach my $entry ( keys %main::Foo:: ) { print "$entry\n"; }
That’s just a bonus fact for you. It’s probably not useful.
I can use the other hash operators on these stashes too. I can delete all of the variables with the same name. In the next program, I define the variables $n
and $m
then assign values to them. I call show_foo
to list the variable names in the Foo
package, which I use because it doesn’t have all of the special symbols that the main
package does:
#!/usr/bin/perl # show_foo.pl package Foo; our $n = 10; our $m = 20; show_foo( "After assignment" ); delete $Foo::{'n'}; delete $Foo::{'m'}; show_foo( "After delete" ); sub show_foo { print "-" x 10, $_[0], "-" x 10, "\n"; print "\$n is $n\n\$m is $m\n"; foreach my $name ( keys %Foo:: ) { print "$name\n"; } }
The output shows me that the symbol table for Foo::
has entries for the names n
and m
, as well as for show_foo
. Those are all of the variable names I defined; two scalars and one subroutine. After I use delete
, the entries for n
and m
are gone:
----------After assignment---------- $n is 10 $m is 20 show_foo n m ----------After delete---------- $n is 10 $m is 20 show_foo
The data are still there though. The compiler had already resolved the names to their data locations. The subroutine still references those data, so it can still use them even if their names disappear.
By default, Perl variables are global variables, meaning that I can access them from anywhere in the program as long as I know their names. Perl keeps track of them in the symbol table, which is available to the entire program. Each package has a list of defined identifiers just like I showed in the previous section. Each identifier has a pointer (although not in the C sense) to a slot for each variable type. There are also two bonus slots for the variables NAME and PACKAGE, which I’ll use in a moment. Figure 1 shows the relationship between the package, identifier, and type of variable.
Package Identifier Type Thingy +------> SCALAR - $bar | +------> ARRAY - @bar | +------> HASH - %bar | Foo:: -----> bar -----+------> CODE - &bar | +------> IO - file and dir handle | +------> GLOB - *bar | +------> FORMAT - format names | +------> NAME | +------> PACKAGE
There are seven variable types. The three common ones are the SCALAR
, ARRAY
, and HASH
, but Perl also has CODE
for subroutines (Chapter 9 covers subroutines as data), IO
for file and directory handles, and GLOB
for the whole thing. Once I have the glob I can get a reference to a particular variable of that name by accessing the right entry. To access the scalar portion of the *bar
typeglob, I access that part almost like a hash access. These return references to data for those slots:
#!/usr/bin/perl # make_aliases.pl use v5.10; $bar = 'Buster'; @bar = qw(Mimi Roscoe); # these return references or undef $foo = *bar{SCALAR}; $baz = *bar{ARRAY}; say "\$foo is $$foo"; say "\@baz is @$baz";
If I try to use a slot for a variable I haven’t used yet, I get undef:
#!/usr/bin/perl # make_aliases_no_hash.pl use v5.10; $bar = 'Buster'; @bar = qw(Mimi Roscoe); $quux = *bar{HASH}; # returns say '$quux is undefined!' unless defined $quux;
I see that $quux
is undefined:
% perl make_aliases_no_hash.pl
$quux is undefined!
Curiously, this doesn’t work if I access the SCALAR
slot, which returns an anonymous scalar reference even if the variable has never been used:
#!/usr/bin/perl # make_aliases_no_scalar.pl use v5.10; $foo = *bar{SCALAR}; say '$foo is a reference' if ref $foo; say '$foo is undefined!' unless defined $foo; say '$$foo is undefined!' unless defined $$foo;
The $foo
value is always defined, but it’s a reference to undef:
% perl make_aliases_no_scalar.pl
$foo is a reference
$$foo is undefined!
For everything but a scalar variable, this gives me a way to check if a package variable has been used somewhere already. I can do it this way, but later I’ll show Package::Stash
:
#!/usr/bin/perl # show_used_var_types.pl use v5.10; foreach my $entry ( sort keys %main:: ) { say $entry; say "\tarray is defined" if *{$entry}{ARRAY}; say "\thash is defined" if *{$entry}{HASH}; say "\tsub is defined" if *{$entry}{CODE}; }
Although I can use the stashes in rvalues, I can’t use them as lvalues:
*bar{SCALAR} = 5;
I’ll get a fatal error:
Can't modify glob elem in scalar assignment ...
I can assign to a typeglob as a whole, though, and Perl will figure out the right place to put the value. I’ll show that later in the “Aliasing” section in this chapter.
I also get two bonus entries in the typeglob, PACKAGE
and NAME
, so I can always tell from which variable I got the typeglob. These also return strings, but I don’t think this is terribly useful for anything other than deep magic:
#!/usr/bin/perl # typeglob-name-package.pl use v5.10; $foo = "Some value"; $bar = "Another value"; who_am_i( *foo ); who_am_i( *bar ); sub who_am_i { local *glob = shift; say "I'm from package " . *glob{PACKAGE}; say "My name is " . *glob{NAME}; }
Although this probably has limited usefulness, at least outside of any debugging, the output tells me more about the typeglobs I passed to the function:
I'm from package main My name is foo I'm from package main My name is bar
I can alias variables by assigning one typeglob to another. In this example, all of the variables with the identifier bar
become nicknames for all of the variables with the identifier foo
once Perl assigns the *foo
typeglob to the *bar
typeglob:
#!/usr/bin/perl # alias.pl use v5.10; $foo = "Foo scalar"; @foo = 1 .. 5; sub foo { q(I'm a subroutine!) } say "\$foo is <$foo>, \@foo is <@foo>"; *bar = *foo; # typeglob assignment say "\$bar is <$bar>, \@bar is <@bar>"; say 'Sub returns <', bar(), '>'; $bar = 'Bar scalar'; @bar = 6 .. 10; say "\$foo is <$foo>, \@foo is <@foo>";
When I change either the variables named bar
or foo
, the other is changed too because they are actually the same thing with different names. Notice the values for *foo
change although I had changed values through *bar
:
% perl alias.pl
$foo is <Foo scalar>, @foo is <1 2 3 4 5>
$bar is <Foo scalar>, @bar is <1 2 3 4 5>
Sub returns <I'm a subroutine!>
$foo is <Bar scalar>, @foo is <6 7 8 9 10>
I don’t have to assign an entire typeglob. If I assign a reference to a typeglob, I only affect that part of the typeglob that the reference represents. Assigning the scalar reference \$scalar
to the typeglob *foo
only affects the SCALAR
part of the typeglob. In the next line, when I assign a \@array
to the typeglob, the array reference only affects the ARRAY
part of the typeglob. Having done that, I’ve made *foo
a Frankenstein’s monster of values I’ve taken from other variables:
#!/usr/bin/perl
# frankenstein.pl
use strict;
use v5.10;
my $scalar = 'foo';
my @array = 1 .. 5;
*foo = \$scalar;
*foo = \@array;
{
no strict 'vars'; # or declare them
say "\$foo is $foo";
say "\@foo is @foo";
}
% perl frankenstein.pl
$foo is foo
@foo is 1 2 3 4 5
Notice that strict
doesn’t complain about *foo
. It will complain about $foo
and @foo
though. If you have to do this sort of thing, you might want to pre-declare variables instead:
#!/usr/bin/perl use v5.10; use strict; use vars qw($foo @foo); my $scalar = 'foo'; my @array = 1 .. 5; *foo = \$scalar; *foo = \@array; say "\$foo is $foo"; say "\@foo is @foo";
This feature can be quite useful when I have a long variable name but I want to use a different name for it. This is essentially what the Exporter
module does when it imports symbols into my namespace (and this doesn’t have the strict
problem either). Instead of using the full package specification, I have it in my current package. Exporter
takes the variables from the exporting package and assigns to the typeglob of the importing package:
package Exporter; sub import { my $pkg = shift; my $callpkg = caller($ExportLevel); # ... *{"$callpkg\::$_"} = \&{"$pkg\::$_"} foreach @_; }
Before Perl 5.6 introduced filehandle references, if I had to pass a subroutine a filehandle I’d have to use a typeglob. This is the most likely use of typeglobs that you’ll see in older code. For instance, the CGI
module can read its input from a filehandle I specify, rather than using STDIN
:
use CGI; open FH, $cgi_data_file or die "Could not open $cgi_data_file: $!"; CGI->new( *FH ); # can't new( FH ), need a typeglob
This also works with references to typeglobs:
CGI->new( \*FH ); # can't new( FH ), need a typeglob
Again, this is the older way of doing things. The newer way involves a scalar that holds the filehandle reference:
use CGI; open my $fh, '<', $cgi_data_file or die "Could not open $cgi_data_file: $!"; CGI->new( $fh );
In the old method, the filehandles were package variables so they couldn’t be lexical variables. And, they have no sigil. Passing them to a subroutine, however, was a problem. What name do I use for them in the subroutine? I don’t want to use another name already in use because I’ll overwrite its value. I can’t use local with a filehandle either.
local( FH ) = shift; # won't work.
That line of code gives a compilation error:
Can't modify constant item in local ...
I have to use a typeglob instead. Perl figures out to assign the GLOB
and IO
portions of the FH
typeglob:
local( *FH ) = shift; # will work.
Once I’ve done that, I use the filehandle FH
just like I would in any other situation. It doesn’t matter to me that I got it through a typeglob assignment. Since I’ve localized it, any filehandle of that name anywhere in the program uses my new value, just as in my earlier local
example. Nowadays, just use filehandle references, $fh
, and leave this stuff to the older code (unless I’m dealing with the special filehandles STDOUT
, STDERR
, and STDIN
).
Using typeglob assignment I can give anonymous subroutines a name. Instead of dealing with a subroutine dereference I can deal with a named subroutine.
The File::Find
module takes a callback function to select files from a list of directories:
use File::Find; find( \&wanted, @dirs ); sub wanted { ... }
In File::Find::Closures
, I have several functions that return two closures I can use with File::Find
. That way, I can run common find tasks without recreating the &wanted
function I need:
package File::Find::Closures; sub find_by_name { my %hash = map { $_, 1 } @_; my @files = (); ( sub { push @files, canonpath( $File::Find::name ) if exists $hash{$_} }, sub { wantarray ? @files : [ @files ] } ) }
I use File::Find::Closures
by importing the generator function I want to use, in this case find_by_name
, and then use that function to create two anonymous subroutines: one for find and one to use afterward to get the results:
use File::Find; use File::Find::Closures qw( find_by_name ); my( $wanted, $get_file_list ) = find_by_name( 'index.html' ); find( $wanted, @directories ); foreach my file ( $get_file_list->() ) { ... }
Perhaps I don’t want to use subroutine references, for whatever reasons. I can assign the anonymous subroutines to typeglobs. Since I’m assigning references, I only affect subroutine entry in the typeglob. After the assignment I can then do the same thing I did with filehandles in the last section, but this time with named subroutines. After I assign the return values from find_by_name
to the typeglobs *wanted
and *get_file_list
, I have subroutines with those names:
( *wanted, *get_file_list ) = find_by_name( 'index.html' ); find( \&wanted, @directories ); foreach my file ( get_file_list() ) { ... }
This is a contrived example since I could be much more clear by using subroutine references that I assign to scalar variables. If I absolutely need to use a subroutine named &wanted
because there’s another bit of code I’m not allowed to change, this sort of thing could work. In Chapter 9 I’ll use this trick with AUTOLOAD to define subroutines on the fly or to replace existing subroutine definitions.
Now that I’ve shown you how to manipulate typeglobs and stashes, I’ll show you the easy way. The Package::Stash
module reduces the number of punctuation characters that I have to type. To modify a stash, I create an object for that stash using the namespace (without trailing colons):
use Package::Stash; my $foo_stash = Package::Stash->new( 'Animals' );
Once I have the object, I call methods to do the low-level things I was doing myself. I can add a variable:
$foo_stash->add_symbol( '$camel' );
I can give it an initial value:
$foo_stash->add_symbol( '$camel', 'Amelia' );
Earlier, I had a program to show all the names in a particular stash:
#!/usr/bin/perl # show_main_vars.pl foreach my $entry ( keys %main:: ) { print "$entry\n"; }
With Package::Stash
, this changes to something slightly more complicated:
#!/usr/bin/perl # show_main_vars_package_stash.pl use Package::Stash; my $main_stash = Package::Stash->new( 'main' )->get_all_symbols; foreach my $key ( keys %$main_stash ) { print "$key\n"; }
Although this is more involved, I have the $main_stash
reference that I can pass around like any other reference instead of hardcoding the stash name:
#!/usr/bin/perl # show_names.pl use Package::Stash; my $main_stash = Package::Stash->new( 'main' ); show_names( $main_stash ); sub show_names { my( $stash ) = @_; my $hash = $stash->get_all_symbols; foreach my $key ( keys %$hash ) { print "$key\n"; } }
The module has many other methods to manipulate or remove names from stashes, but I’ll leave it to you to read the documentation.
The symbol table is Perl’s accounting system for package variables, and typeglobs are the way I access them. In some cases, such as passing a filehandle to a subroutine, I can’t get away from the typeglob because I can’t take a reference to a filehandle package variable. To get around some of these older limitations in Perl, programmers used typeglobs to get to the variables they needed. That doesn’t mean that typeglobs are outdated, though. Modules that perform magic, such as Exporter
, uses them without me even knowing about it. To do my own magic, typeglobs turn out to be quite handy.
Stashes and globs have their documentation spread out among perlapi, perlref, perlmod, perlsub, and perldata.
Chapter 10 of Programming Perl, Fourth Edition, talks about symbol tables and how Perl handles them internally.
Phil Crow shows some symbol table tricks in “Symbol Table Manipulation” for Perl.com. http://www.perl.com/pub/2005/03/17/symtables.html.
Randal Schwartz talks about scopes in his Unix Review column for May 2003, http://www.stonehenge.com/merlyn/UnixReview/col46.html.
The Perl Advent Calendar for 2011 had an entry for Package:Stash
http://perladvent.org/2011/2011-12-07.html.