Chapter 8. Symbol Tables and Typeglobs

Although I don’t normally deal with typeglobs or the symbol table, I need to understand them for the tricks I’ll use in later chapters. I’ll lay the foundation for advanced topics including dynamic subroutines and jury-rigging code in this chapter.

Symbol tables organize and store Perl’s package (global) variables, and I can affect the symbol table through typeglobs. By messing with Perl’s variable bookkeeping I can do some powerful things. You’re probably already getting the benefit of some of these tricks without even knowing it.

Package and Lexical Variables

Before I get too far, I want to review the differences between package and lexical variables. The symbol table tracks package variables but not lexical variables. When I fiddle with the symbol table or typeglobs, I’m dealing with package variables. Package variables are also known as global variables since they are visible everywhere in the program.

In Learning Perl and Intermediate Perl, we used lexical variables whenever possible. We declared lexical variables with my, and those variables could only be seen inside their scope. Since lexical variables have limited reach, I didn’t need to know all of the program to avoid a variable name collision. Lexical variables are a bit faster too since Perl doesn’t have to deal with the extra bookkeeping of the symbol table.

Lexical variables have a limited scope, and they only affect that part of the program. This little snippet declares the variable name $n twice in different scopes, creating two different variables that do not interfere with each other:

my $n = 10; # outer scope

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { my $n = shift; $n ** 2; }

This double use of $n is not a problem. The declaration inside the subroutine is a different scope and gets its own version that masks the other version. At the end of the subroutine, its version of $n disappears as if it never existed. The outer $n is still 10.

Package variables are a different story. Doing the same thing with package variables stomps on the previous definition of $n:

$n = 10;

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { $n = shift; $n ** 2; }

Perl has a way to deal with the double use of package variables, though. The local built-in temporarily moves the current value, 10, out of the way until the end of the scope, and the entire program sees the new value, 15, until the scope of local ends:

$n = 10;

my $square = square( 15 );

print "n is $n, square is $square\n";

sub square { local $n = shift; $n ** 2; }

We showed the difference in Intermediate Perl. The local version changes everything including the parts outside of its scope while the lexical version only works inside its scope. Here’s a small program that demonstrates it both ways. I define the package variable $global, and I want to see what happens when I use the same variable name in different ways. To watch what happens, I use the show_me subroutine to tell me what it thinks the value of $global is. I’ll call show_me before I start, then subroutines that do different things with $global. Remember that show_me is outside of the lexical scope of any other subroutine:

#!/usr/bin/perl

# not strict clean, yet, but just wait
$global = qq(I'm the global version);

show_me('At start');
lexical();
localized();
show_me('At end');

sub show_me {
    my $tag = shift;

    print "$tag: $global\n"
    }

The lexical subroutine starts by defining a lexical variable also named $global. Within the subroutine, the value of $global is obviously the one I set. However, when it calls show_me, the code jumps out of the subroutine. Outside of the subroutine, the lexical variable has no effect. In the output, the line I tagged with From lexical() shows I'm the global version.

sub lexical {
    my $global = "I'm in the lexical version";
    print "In lexical(), \$global is --> $global\n";
    show_me('From lexical()');
    }

Using local is completely different since it deals with the package version of the variable. When I localize a variable name, Perl sets aside its current value for the rest of the scope. The new value I assign to the variable is visible throughout the entire program until the end of the scope. When I call show_me, even though I jump out of the subroutine, the new value for $global that I set in the subroutine is still visible.

sub localized {
    local $global = "I'm in the localized version";
    print "In localized(), \$global is --> $global\n";
    show_me('From localized');
    }

The output shows the difference. The value of $global starts off with its original version. In lexical(), I give it a new value but show_me can’t see it; show_me still sees the global version. In localized(), the new value sticks even in show_me. However, after I’ve called localized(), $global comes back to its original values.

At start: I'm the global version
In lexical(), $global is --> I'm in the lexical version
From lexical: I'm the global version
In localized(), $global is --> I'm in the localized version
From localized: I'm in the localized version
At end: I'm the global version

Hold that thought for a moment, because I’ll use it again after I introduce typeglobs.

Getting the Package Version

No matter which part of my program I am in, or which package I am in, I can always get to the package variables as long as I preface the variable name with the full package name. Going back to my lexical(), I can see the package version of the variable even when that name is masked by a lexical variable of the same name. I just have to add the full package name to it, $main::global.

sub lexical {
    my $global = "I'm in the lexical version";
    print "In lexical(), \$global is --> $global\n";
    print "The package version is still --> $main::global\n";
    show_me('From lexical()');
    }

The output shows that I have access to both:

In lexical, $global is  --> I'm the lexical version
The package version is still --> I'm the global version

That’s not the only thing I can do, however. If, for some odd reason, I have a package variable with the same name as a lexical variable that’s currently in scope, I can use our (introduced in Perl 5.6) to tell Perl to use the package variable for the rest of the scope:

sub lexical {
    my $global = "I'm in the lexical version";
    our $global;
    print "In lexical with our, \$global is --> $global\n";
    show_me('In lexical()');
    }

Now the output shows that I don’t ever get to see the lexical version of the variable:

In lexical with our, $global is  --> I'm the global version

It seems pretty silly to use our that way since it masks the lexical version for the rest of the subroutine. If I only need the package version for part of the subroutine, I can create a scope just for it so I can use it for that part and let the lexical version take the rest:

sub lexical {
    my $global = "I'm in the lexical version";

    {
    our $global;
    print "In the naked block, our \$global is --> $global\n";
    }

    print "In lexical, my \$global is --> $global\n";
    print "The package version is still --> $main::global\n";
    show_me('In lexical()');
    }

Now the output shows all of the possible ways I can use $global:

In the naked block, our $global is --> I'm the global version
In lexical, my $global is  --> I'm the lexical version
The package version is still --> I'm the global version

The Symbol Table

Each package has a special hash-like data structure called the symbol table, which comprises all of the stashes for that package. A stash is a hash that has all the variables defined in a package. It’s not a Perl hash like we showed in Learning Perl, but it looks and acts like it in some ways, and its name is the package name with two colons on the end, such as %main::.

This isn’t a normal Perl hash, but I can look in it with the keys operator. Want to see all of the symbol names defined in the main package? I simply print all the keys for this special hash:

#!/usr/bin/perl
# show_main_vars.pl
foreach my $entry ( keys %main:: ) {
    print "$entry\n";
    }

I won’t show the output here because it’s rather long, but when I look at it, I have to remember that those are the variable names without sigils. When I see the identifier _, I have to remember that it has references to the variables $_, @_, and so on. Here are some special variable names that Perl programmers will recognize once they put a sigil in front of them:

/
"
ARGV
INC
ENV
$
-
0
@

If I look in another package I don’t see anything because I haven’t defined any variables yet:

#!/usr/bin/perl
# show_empty_foo_vars.pl

foreach my $entry ( keys %Foo:: ) {
    print "$entry\n";
    }

If I define some variables in package Foo, I’ll then be able to see some output:

#!/usr/bin/perl
# show_foo_vars.pl

package Foo;

@n      = 1 .. 5;
$string = "Hello Perl!\n";
%dict   = ( 1 => 'one' );

sub add { $_[0] + $_[1] }

foreach my $entry ( keys %Foo:: ) {
    print "$entry\n";
    }

The output shows a list of the identifier names without any sigils attached. The symbol table stores the identifier names:

n
add
string
dict

The %main:: symbol table also contains all of the other symbol tables, so I can also write the same program with main:: in front of Foo:::

foreach my $entry ( keys %main::Foo:: ) {
    print "$entry\n";
    }

That’s just a bonus fact for you. It’s probably not useful.

I can use the other hash operators on these stashes too. I can delete all of the variables with the same name. In the next program, I define the variables $n and $m then assign values to them. I call show_foo to list the variable names in the Foo package, which I use because it doesn’t have all of the special symbols that the main package does:

#!/usr/bin/perl
# show_foo.pl

package Foo;

our $n = 10;
our $m = 20;

show_foo( "After assignment" );

delete $Foo::{'n'};
delete $Foo::{'m'};

show_foo( "After delete" );

sub show_foo {
    print "-" x 10, $_[0], "-" x 10, "\n";

    print "\$n is $n\n\$m is $m\n";

    foreach my $name ( keys %Foo:: ) {
        print "$name\n";
        }
    }

The output shows me that the symbol table for Foo:: has entries for the names n and m, as well as for show_foo. Those are all of the variable names I defined; two scalars and one subroutine. After I use delete, the entries for n and m are gone:

----------After assignment----------
$n is 10
$m is 20
show_foo
n
m
----------After delete----------
$n is 10
$m is 20
show_foo

The data are still there though. The compiler had already resolved the names to their data locations. The subroutine still references those data, so it can still use them even if their names disappear.

Typeglobs

By default, Perl variables are global variables, meaning that I can access them from anywhere in the program as long as I know their names. Perl keeps track of them in the symbol table, which is available to the entire program. Each package has a list of defined identifiers just like I showed in the previous section. Each identifier has a pointer (although not in the C sense) to a slot for each variable type. There are also two bonus slots for the variables NAME and PACKAGE, which I’ll use in a moment. Figure 1 shows the relationship between the package, identifier, and type of variable.

 Package    Identifier           Type    Thingy

                       +------> SCALAR - $bar
                       |
                       +------> ARRAY  - @bar
                       |
                       +------> HASH   - %bar
                       |
Foo:: -----> bar  -----+------> CODE   - &bar
                       |
                       +------> IO     - file and dir handle
                       |
                       +------> GLOB   - *bar
                       |
                       +------> FORMAT - format names
                       |
                       +------> NAME
                       |
                       +------> PACKAGE

There are seven variable types. The three common ones are the SCALAR, ARRAY, and HASH, but Perl also has CODE for subroutines (Chapter 9 covers subroutines as data), IO for file and directory handles, and GLOB for the whole thing. Once I have the glob I can get a reference to a particular variable of that name by accessing the right entry. To access the scalar portion of the *bar typeglob, I access that part almost like a hash access. These return references to data for those slots:

#!/usr/bin/perl
# make_aliases.pl
use v5.10;

$bar = 'Buster';
@bar = qw(Mimi Roscoe);

# these return references or undef
$foo = *bar{SCALAR};
$baz = *bar{ARRAY};

say "\$foo is $$foo";
say "\@baz is @$baz";

If I try to use a slot for a variable I haven’t used yet, I get undef:

#!/usr/bin/perl
# make_aliases_no_hash.pl
use v5.10;

$bar = 'Buster';
@bar = qw(Mimi Roscoe);

$quux = *bar{HASH};  # returns

say '$quux is undefined!' unless defined $quux;

I see that $quux is undefined:

% perl make_aliases_no_hash.pl
$quux is undefined!

Curiously, this doesn’t work if I access the SCALAR slot, which returns an anonymous scalar reference even if the variable has never been used:

#!/usr/bin/perl
# make_aliases_no_scalar.pl
use v5.10;

$foo = *bar{SCALAR};

say '$foo is a reference' if ref $foo;
say '$foo is undefined!' unless defined $foo;
say '$$foo is undefined!' unless defined $$foo;

The $foo value is always defined, but it’s a reference to undef:

% perl make_aliases_no_scalar.pl
$foo is a reference
$$foo is undefined!

For everything but a scalar variable, this gives me a way to check if a package variable has been used somewhere already. I can do it this way, but later I’ll show Package::Stash:

#!/usr/bin/perl
# show_used_var_types.pl
use v5.10;

foreach my $entry ( sort keys %main:: ) {
    say $entry;

    say "\tarray  is defined" if *{$entry}{ARRAY};
    say "\thash   is defined" if *{$entry}{HASH};
    say "\tsub    is defined" if *{$entry}{CODE};
    }

Although I can use the stashes in rvalues, I can’t use them as lvalues:

*bar{SCALAR} = 5;

I’ll get a fatal error:

Can't modify glob elem in scalar assignment ...

I can assign to a typeglob as a whole, though, and Perl will figure out the right place to put the value. I’ll show that later in the “Aliasing” section in this chapter.

I also get two bonus entries in the typeglob, PACKAGE and NAME, so I can always tell from which variable I got the typeglob. These also return strings, but I don’t think this is terribly useful for anything other than deep magic:

#!/usr/bin/perl
# typeglob-name-package.pl
use v5.10;

$foo = "Some value";
$bar = "Another value";

who_am_i( *foo );
who_am_i( *bar );

sub who_am_i {
    local *glob = shift;

    say "I'm from package " . *glob{PACKAGE};
    say "My name is "       . *glob{NAME};
    }

Although this probably has limited usefulness, at least outside of any debugging, the output tells me more about the typeglobs I passed to the function:

I'm from package main
My name is foo
I'm from package main
My name is bar

Aliasing

I can alias variables by assigning one typeglob to another. In this example, all of the variables with the identifier bar become nicknames for all of the variables with the identifier foo once Perl assigns the *foo typeglob to the *bar typeglob:

#!/usr/bin/perl
# alias.pl
use v5.10;

$foo = "Foo scalar";
@foo = 1 .. 5;
sub foo { q(I'm a subroutine!) }
say "\$foo is <$foo>, \@foo is <@foo>";

*bar = *foo;  # typeglob assignment

say "\$bar is <$bar>, \@bar is <@bar>";
say 'Sub returns <', bar(), '>';

$bar = 'Bar scalar';
@bar = 6 .. 10;
say "\$foo is <$foo>, \@foo is <@foo>";

When I change either the variables named bar or foo, the other is changed too because they are actually the same thing with different names. Notice the values for *foo change although I had changed values through *bar:

% perl alias.pl
$foo is <Foo scalar>, @foo is <1 2 3 4 5>
$bar is <Foo scalar>, @bar is <1 2 3 4 5>
Sub returns <I'm a subroutine!>
$foo is <Bar scalar>, @foo is <6 7 8 9 10>

I don’t have to assign an entire typeglob. If I assign a reference to a typeglob, I only affect that part of the typeglob that the reference represents. Assigning the scalar reference \$scalar to the typeglob *foo only affects the SCALAR part of the typeglob. In the next line, when I assign a \@array to the typeglob, the array reference only affects the ARRAY part of the typeglob. Having done that, I’ve made *foo a Frankenstein’s monster of values I’ve taken from other variables:

#!/usr/bin/perl
# frankenstein.pl
use strict;
use v5.10;

my $scalar = 'foo';
my @array  = 1 .. 5;

*foo = \$scalar;
*foo = \@array;

{
no strict 'vars';  # or declare them
say "\$foo is $foo";
say "\@foo is @foo";
}

% perl frankenstein.pl
$foo is foo
@foo is 1 2 3 4 5

Notice that strict doesn’t complain about *foo. It will complain about $foo and @foo though. If you have to do this sort of thing, you might want to pre-declare variables instead:

#!/usr/bin/perl
use v5.10;
use strict;
use vars qw($foo @foo);

my $scalar = 'foo';
my @array  = 1 .. 5;

*foo = \$scalar;
*foo = \@array;

say "\$foo is $foo";
say "\@foo is @foo";

This feature can be quite useful when I have a long variable name but I want to use a different name for it. This is essentially what the Exporter module does when it imports symbols into my namespace (and this doesn’t have the strict problem either). Instead of using the full package specification, I have it in my current package. Exporter takes the variables from the exporting package and assigns to the typeglob of the importing package:

package Exporter;

sub import {
    my $pkg = shift;
    my $callpkg = caller($ExportLevel);

    # ...
    *{"$callpkg\::$_"} = \&{"$pkg\::$_"} foreach @_;
    }

Filehandle Arguments in Older Code

Before Perl 5.6 introduced filehandle references, if I had to pass a subroutine a filehandle I’d have to use a typeglob. This is the most likely use of typeglobs that you’ll see in older code. For instance, the CGI module can read its input from a filehandle I specify, rather than using STDIN:

use CGI;

open FH, $cgi_data_file
    or die "Could not open $cgi_data_file: $!";

CGI->new( *FH ); # can't new( FH ), need a typeglob

This also works with references to typeglobs:

CGI->new( \*FH ); # can't new( FH ), need a typeglob

Again, this is the older way of doing things. The newer way involves a scalar that holds the filehandle reference:

use CGI;
open my $fh, '<', $cgi_data_file
    or die "Could not open $cgi_data_file: $!";
CGI->new( $fh );

In the old method, the filehandles were package variables so they couldn’t be lexical variables. And, they have no sigil. Passing them to a subroutine, however, was a problem. What name do I use for them in the subroutine? I don’t want to use another name already in use because I’ll overwrite its value. I can’t use local with a filehandle either.

local( FH ) = shift; # won't work.

That line of code gives a compilation error:

Can't modify constant item in local ...

I have to use a typeglob instead. Perl figures out to assign the GLOB and IO portions of the FH typeglob:

local( *FH ) = shift; # will work.

Once I’ve done that, I use the filehandle FH just like I would in any other situation. It doesn’t matter to me that I got it through a typeglob assignment. Since I’ve localized it, any filehandle of that name anywhere in the program uses my new value, just as in my earlier local example. Nowadays, just use filehandle references, $fh, and leave this stuff to the older code (unless I’m dealing with the special filehandles STDOUT, STDERR, and STDIN).

Naming Anonymous Subroutines

Using typeglob assignment I can give anonymous subroutines a name. Instead of dealing with a subroutine dereference I can deal with a named subroutine.

The File::Find module takes a callback function to select files from a list of directories:

use File::Find;

find( \&wanted, @dirs );

sub wanted { ... }

In File::Find::Closures, I have several functions that return two closures I can use with File::Find. That way, I can run common find tasks without recreating the &wanted function I need:

package File::Find::Closures;

sub find_by_name {
    my %hash  = map { $_, 1 } @_;
    my @files = ();

    (
    sub { push @files, canonpath( $File::Find::name )
        if exists $hash{$_} },
    sub { wantarray ? @files : [ @files ] }
    )
    }

I use File::Find::Closures by importing the generator function I want to use, in this case find_by_name, and then use that function to create two anonymous subroutines: one for find and one to use afterward to get the results:

use File::Find;
use File::Find::Closures qw( find_by_name );

my( $wanted, $get_file_list ) = find_by_name( 'index.html' );

find( $wanted, @directories );

foreach my file ( $get_file_list->() ) {
    ...
    }

Perhaps I don’t want to use subroutine references, for whatever reasons. I can assign the anonymous subroutines to typeglobs. Since I’m assigning references, I only affect subroutine entry in the typeglob. After the assignment I can then do the same thing I did with filehandles in the last section, but this time with named subroutines. After I assign the return values from find_by_name to the typeglobs *wanted and *get_file_list, I have subroutines with those names:

( *wanted, *get_file_list ) = find_by_name( 'index.html' );

find( \&wanted, @directories );

foreach my file ( get_file_list() ) {
    ...
    }

This is a contrived example since I could be much more clear by using subroutine references that I assign to scalar variables. If I absolutely need to use a subroutine named &wanted because there’s another bit of code I’m not allowed to change, this sort of thing could work. In Chapter 9 I’ll use this trick with AUTOLOAD to define subroutines on the fly or to replace existing subroutine definitions.

The Easy Way

Now that I’ve shown you how to manipulate typeglobs and stashes, I’ll show you the easy way. The Package::Stash module reduces the number of punctuation characters that I have to type. To modify a stash, I create an object for that stash using the namespace (without trailing colons):

use Package::Stash;

my $foo_stash = Package::Stash->new( 'Animals' );

Once I have the object, I call methods to do the low-level things I was doing myself. I can add a variable:

$foo_stash->add_symbol( '$camel' );

I can give it an initial value:

$foo_stash->add_symbol( '$camel', 'Amelia' );

Earlier, I had a program to show all the names in a particular stash:

#!/usr/bin/perl
# show_main_vars.pl
foreach my $entry ( keys %main:: ) {
    print "$entry\n";
    }

With Package::Stash, this changes to something slightly more complicated:

#!/usr/bin/perl
# show_main_vars_package_stash.pl

use Package::Stash;

my $main_stash = Package::Stash->new( 'main' )->get_all_symbols;

foreach my $key ( keys %$main_stash ) {
    print "$key\n";
    }

Although this is more involved, I have the $main_stash reference that I can pass around like any other reference instead of hardcoding the stash name:

#!/usr/bin/perl
# show_names.pl

use Package::Stash;

my $main_stash = Package::Stash->new( 'main' );

show_names( $main_stash );

sub show_names {
    my( $stash ) = @_;
    my $hash = $stash->get_all_symbols;
    foreach my $key ( keys %$hash ) {
        print "$key\n";
        }
    }

The module has many other methods to manipulate or remove names from stashes, but I’ll leave it to you to read the documentation.

Summary

The symbol table is Perl’s accounting system for package variables, and typeglobs are the way I access them. In some cases, such as passing a filehandle to a subroutine, I can’t get away from the typeglob because I can’t take a reference to a filehandle package variable. To get around some of these older limitations in Perl, programmers used typeglobs to get to the variables they needed. That doesn’t mean that typeglobs are outdated, though. Modules that perform magic, such as Exporter, uses them without me even knowing about it. To do my own magic, typeglobs turn out to be quite handy.

Further Reading

Stashes and globs have their documentation spread out among perlapi, perlref, perlmod, perlsub, and perldata.

Chapter 10 of Programming Perl, Fourth Edition, talks about symbol tables and how Perl handles them internally.

Phil Crow shows some symbol table tricks in “Symbol Table Manipulation” for Perl.com. http://www.perl.com/pub/2005/03/17/symtables.html.

Randal Schwartz talks about scopes in his Unix Review column for May 2003, http://www.stonehenge.com/merlyn/UnixReview/col46.html.

The Perl Advent Calendar for 2011 had an entry for Package:Stashhttp://perladvent.org/2011/2011-12-07.html.