Perl lets me hook into its variables through a mechanism it calls tying. Tied variables go back to the basics. I can decide what Perl will do when I store or fetch values from a variable. Behind the scenes, I have to implement the logic for all of the variable’s behavior. Since I can do that, I can make what look like normal variables do anything that I can program (and that’s quite a bit). Although I might use a lot of magic on the inside, at the user level, tied variables have the familiar behavior that users expect. Not only that, tied variables work throughout the Perl API. Even Perl’s internal workings with the variable use the tied behavior.
You may have already have seen tied variables in action, even without using tie
. The dbmopen
command ties a hash to a database file:
dbmopen %DBHASH, "some_file", 0644;
That’s old school Perl, though. Since then the numbers and types of these on-disk hashes proliferated and improved. Each implementation solves some problem in another one and they mostly live in CPAN modules now.
If I want to use an alternate implementation instead of the implementation Perl wants to use with dbmopen
, I use tie
to associate my hash with the right module:
tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;
There’s some hidden magic here. The programmer sees the %DBHASH
variable, which acts just like a normal hash. To make it work out, though, Perl maintains a “secret object” that it associates with the variable (%DBHASH
). I can actually get this object as the return value of tie
:
my $secret_obj = tie %DBHASH, 'SDBM_File', $filename, $flags, $mode;
If I forget to get the secret object when I called tie
, I can get it later using tied
. Either way, I end up with the normal-looking variable and the object, and I can use either one:
my $secret_obj = tied( %DBHASH );
Any time I do something with %DBHASH
, Perl will translate that action into a method call to $secret_obj
. Each variable type (scalar, arrays, and so on) has different behaviors, so they have different methods, and that’s what I have to implement to get special behavior.
Back in the day when I did a lot of HTML coding, I liked to alternate the color of table rows, something that’s much easier with CSS now. This isn’t a difficult thing to do, but it is annoying. Somewhere I have to store a list of colors to use, then I have to select the next color in the list each time I create a row:
foreach my $item ( @items ) { state $row = 0; state $colors = [ qw( AAAAAA CCCCCC EEEEEE ) ]; my $color = $colors->[ $row++ % ($#$colors + 1) ]; print qq|<tr><td bgcolor="$color">$item</td></tr>|; }
Today I’d do that with CSS by specifying one of a rotating set of class names:
use v5.10.1; foreach my $item ( @items ) { state $row = 0; state $class = [ qw( first second third ) ]; my $color = $class[ $row++ % ($#class + 1) ]; say qq|<tr><td class="$class">$item</td></tr>|; }
Those extra couple of lines with state
are really annoying since they only do anything the first time through the loop. It’s not really a problem, but aesthetically, I don’t think it looks nice. And, why should I have to deal with the mechanics of selecting a color when my loop is simply about creating a table row?
I created the Tie::Cycle
module to fix this. Instead of using an array, I create special behavior for a scalar: every time I access the special scalar, I get back the next class in the list. The tie
magic handles all of the other stuff for me. As a side benefit, I don’t have to debug those off-by-one errors I tend to get when I try to recode this operation every time I need it:
use v5.10.1; use Tie::Cycle; tie my $class, 'Tie::Cycle', [ qw( first second third ) ]; foreach my $item ( @items ) { say qq|<tr><td class="$class">$item</td></tr>|; }
I can even reuse my tied $class
variable. No matter where I stop in the cycle, I can reset it to the beginning if I’d like to start every group of rows with the same color. I get the secret object with tied
, then call the reset
method I provided when I created the module:
tied( $class )->reset; foreach my $item ( @other_items ) { say qq|<tr><td class="$class">$item</td></tr>|; }
With Tie::Cycle
, I give an array a scalar interface, but I don’t have to do something that tricky. I use the usual interface and simply place restrictions on the storage or access of the data type. I’ll show that in a moment.
Behind the scenes Perl uses an object for the tied variable. Although the user doesn’t treat the tied variable like an object, Perl figures out which methods to call and does the right thing.
At the programmer level, once I take responsibility for the variable’s behavior, I have to tell it how to do everything. The tie
mechanism uses special method names, which it expects me to implement, and without those it complains. Since each variable type acts a bit differently (I can unshift
onto an array but not a scalar, and I can get the keys of a hash but not an array), each type has its additional special tie
methods that apply only to it.
Perl 5.8 and later include base classes to help me get started. I can use Tie::Scalar
, Tie::Array
, Tie::Hash
, or Tie::Handle
as a starting point for my own Tie::*
modules. I usually find that once I decide to do something really special, I don’t get much use out of those, though.
Each variable type will have a constructor, named by prefixing TIE
to its type name (TIESCALAR
, TIEARRAY
, and so on), and optional UNTIE
and DESTROY
methods. After that, each variable type has methods specific to its behavior.
Perl calls the constructor when I use tie
. Here’s my earlier example again:
tie my $class, 'Tie::Cycle', [ qw( first second third ) ];
Perl takes the class name, Tie::Cycle
, and calls the class method, TIESCALAR
, giving it the rest of the arguments to tie
:
my $secret_object = Tie::Cycle->TIESCALAR( [ qw( first second third ) ] );
After it gets the secret object, it associates it with the variable $class
.
When $class
goes out of scope, Perl translates that into another method call on the secret object, calling its DESTROY
method:
$secret_object->DESTROY;
Or, I can decide that I don’t want my variable to be tied anymore. By calling untie
, I break the association between the secret object and the variable. Now $color
is just a normal scalar:
untie $class;
Perl translates that into the call to UNTIE
, which breaks the association between the secret object and the variable:
$secret_object->UNTIE;
Tied scalars are the easiest to implement since scalars don’t do that much. I can either store or access scalar data. For my special scalar behavior, I have to create two methods: STORE
, which Perl calls when I assign a value, and FETCH
, which Perl calls when I access the value. Along with those, I provide TIESCALAR
, which Perl calls when I use tie
, and possibly the DESTROY
or UNTIE
methods if I need them.
The TIESCALAR
method works like any other constructor. It gets the class name as its first argument, then a list of the remaining arguments. Those come directly from tie
.
In my Tie::Cycle
example, everything after the variable name that I’m tying (that is, the class name and the remaining arguments) ends up as the arguments to TIESCALAR
. Other than the method name, this looks like a normal constructor. Perl handles all the tying for me, so I don’t have to do that myself:
tie $class, 'Tie::Cycle', [ qw( first second third ) ];
That’s almost the same as calling TIESCALAR
myself:
my $object = Tie::Cycle->TIESCALAR( [ qw( first second third ) ] );
However, since I didn’t use tie
, all I get is the object, and Perl doesn’t know anything about the special interface. It’s just a normal object.
In Tie::Cycle
(available on CPAN), the start of the module is quite simple. I have to declare the package name, set up the usual module bits, and define my TIESCALAR
. I chose to set up the interface to take two arguments: the class name and an anonymous array. There isn’t anything special in that choice. TIESCALAR
is going to get all of the arguments from tie
and it’s up to me to figure out how to deal with them, including how to enforce the interface.
In this example, I’m simple-minded: I ensure that I have an array reference and that it has more than one argument. Like any other constructor, I return a blessed reference. Even though I’m tying a scalar, I use an anonymous array as my object. Perl doesn’t care what I do as long as I’m consistent. I call the internal STORE
method from TIESCALAR
so I only have that logic in one place:
package Tie::Cycle; use strict; our $VERSION = '1.19'; use Carp qw(carp); use constant CURSOR_COL => 0; use constant COUNT_COL => 1; use constant ITEM_COL => 2; sub _cursor { $_[0]->[CURSOR_COL] } sub _count { $_[0]->[COUNT_COL] } sub _item { $_[0]->[ITEM_COL][ $_[1] // $_[0]->_cursor ] } sub TIESCALAR { my( $class, $list_ref ) = @_; my $self = bless [], $class; unless( $self->STORE( $list_ref ) ) { carp "The argument to Tie::Cycle must be an array reference"; return; } return $self; }
Once I have my tied variable, I use it just like I would any other variable of that type. I use my tied scalar just like any other scalar. I already stored an anonymous array in the object, but if I wanted to change that, I simply assign to the scalar. In this case, I have to assign an anonymous array:
$colors = [ qw(FF0000 00FF00 0000FF) ];
Behind the curtain, Perl calls my STORE
method when I assign to the variable. Again, I don’t get to choose this method name, and I have to handle everything myself:
sub STORE { my( $self, $list_ref ) = @_; return unless ref $list_ref eq ref []; my @shallow_copy = map { $_ } @$list_ref; $self->[CURSOR_COL] = 0; $self->[COUNT_COL] = scalar @shallow_copy; $self->[ITEM_COL] = \@shallow_copy; }
Every time I try to get the value of the scalar, Perl calls FETCH
. As before, I have to do all of the work to figure out how to return a value. I can do anything that I like as long as I return a value. In Tie::Cycle
, I have to figure out which index I need to access, then return that value. I increment the index, figure out the index modulo the number of elements in the array, then return the right value:
sub FETCH { my( $self ) = @_; my $index = $self->[CURSOR_COL]++; $self->[CURSOR_COL] %= $self->_count; return $self->_item( $index ); }
That’s all I have to do. I could create an UNTIE (or DESTROY) method, but I didn’t create any messes I have to clean up so I don’t do that for Tie::Cycle
. There isn’t any additional magic for those. Everything that you already know about DESTROY
works the same here.
If you look in the actual Tie::Cycle
source, you’ll find additional methods. I can’t get to these through the tie
interface, but with the object form I can. They aren’t part of the tie
magic, but since it’s really just an object I can do object-oriented sorts of things, including adding methods. For example, the previous
method gets me the previous value from the list without affecting the current index. I can peek without changing anything:
my $previous = tied( $colors )->previous;
The tied
gets me the secret object and I immediately call a method on it instead of storing it in a variable. I can do the same thing, using next
to peek at the next element:
my $next = tied( $colors )->next;
And, as I showed earlier, I can reset the cycle:
tied( $colors )->reset;
I’ll create a tied scalar that sets an upper bound on the magnitude of the integer, meaning that there is some range around zero that I can store in the variable. To create the class to implement the tie
, I do the same thing I had to do for Tie::Cycle
: create TIESCALAR
, STORE
, and FETCH
routines:
package Tie::BoundedInteger; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub TIESCALAR { my $class = shift; my $value = shift; my $max = shift; my $self = bless [ 0, $max ], $class; $self->STORE( $value ); return $self; } sub FETCH { $_[0]->[0] } sub STORE { my $self = shift; my $value = shift; my $magnitude = abs $value; croak( "The [$value] exceeds the allowed limit [$self->[1]]" ) if( int($value) != $value || $magnitude > $self->[1] ); $self->[0] = $value; $value; } 1;
At the user level, I do the same thing I did before. I call tie
with the variable name, the class that implements the behavior, and finally the arguments. In this program, I want to start off with the value 1
, and set the magnitude limit to 3
. Once I do that, I’ll try to assign $number
each of the integer values between -5
and 5
, then print what happened:
#!/usr/bin/perl use Tie::BoundedInteger; tie my $number, 'Tie::BoundedInteger', 1, 3; foreach my $try ( -5 .. 5 ) { my $value = eval { $number = $try }; print "Tried to assign [$try], "; print "but it didn't work, " unless $number == $try; print "value is now [$number]\n"; }
From the output I can see that I start off with the value 1
in $number
, but when I try to assign -5
(a value with a magnitude greater than 3
), it doesn’t work and the value is still 1
. Normally my program would croak
right there, but I used an eval
to catch that error. The same thing happens for -4
. When I try -3
, it works.
Tried to assign [-5], but it didn't work, value is now [1] Tried to assign [-4], but it didn't work, value is now [1] Tried to assign [-3], value is now [-3] Tried to assign [-2], value is now [-2] Tried to assign [-1], value is now [-1] Tried to assign [0], value is now [0] Tried to assign [1], value is now [1] Tried to assign [2], value is now [2] Tried to assign [3], value is now [3] Tried to assign [4], but it didn't work, value is now [3] Tried to assign [5], but it didn't work, value is now [3]
My Tie::BoundedInteger
example changed how I could store values by limiting their values. I can also change how I fetch the values. In this example, I’ll create Tie::Timely
, which sets a lifetime on the value. After that lifetime expires, I’ll get undef
when I access the value.
The STORE
method is easy. I just store whatever value I get. I don’t care if it’s a simple scalar, a reference, an object, or anything else. Every time I store a value, though, I’ll record the current time too. That way every time I change the value I reset the countdown.
In the FETCH
routine, I have two things I can return. If I’m within the lifetime of the value, I return the value. If I’m not, I return nothing at all:
package Tie::Timely; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub TIESCALAR { my $class = shift; my $value = shift; my $lifetime = shift; my $self = bless [ undef, $lifetime, time ], $class; $self->STORE( $value ); return $self; } sub FETCH { time - $_[0]->[2] > $_[0]->[1] ? () : $_[0]->[0] } sub STORE { @{ $_[0] }[0,2] = ( $_[1], time ) } 1;
I set up tied arrays just like I do tied scalars, but I have extra methods to create since I can do more with arrays. My implementation has to handle the array operators (shift, unshift, push, pop, splice) as well as the other array operations we often take for granted:
Getting or setting the last array index
Extending the array
Checking that an index exists
Deleting a element
Clearing all the values
Once I decide that I want to implement my own array behavior, I own all of those things. I don’t really have to define methods for each of those operations, but some things won’t work unless I do. The Tie::Array
module exists as a bare-bones base class that implements most of these things, although only to croak
if a program tries to use something I haven’t implemented. Table 1 shows how some array operations translate to tie methods (and perltie has the rest). Most of the methods have the same name as the Perl operator, although in all caps.
Action | Array operation | Tie method |
Set value | $a[$i] = $n | STORE( $i, $n ) |
Get value | $n = $a[$i]; | FETCH( $i ) |
Array length | $l = $#a; | FETCHSIZE() |
Pre-extend | $#a = $n; | STORESIZE( $n ) |
Add to end | push @a, @n | PUSH( @n ); |
Remove from end | pop @a; | POP() |
When I talked about tying scalars, I showed my Tie::Cycle
module, which treated an array like a scalar. To be fair, I should go the other way by treating a scalar as an array. Instead of storing several array elements, each of which incurs all of the overhead of a scalar variable, I’ll create one scalar and chop it up as necessary to get the array values. Essentially, my example trades memory space for speed. I’ll reuse my bounded integer example since I can make a number less than 256 fit into a single character. That’s convenient, isn’t it?
package Tie::StringArray; use strict; use Carp qw(croak); use vars qw( $VERSION ); $VERSION = 1.0; sub _null { "\x00" } sub _last () { $_[0]->FETCHSIZE - 1 } sub _normalize_index { $_[1] == abs $_[1] ? $_[1] : $_[0]->_last + 1 - abs $_[1] } sub _store { chr $_[1] } sub _show { ord $_[1] } sub _string { ${ $_[0] } } sub TIEARRAY { my( $class, @values ) = @_; my $string = ''; my $self = bless \$string, $class; my $index = 0; $self->STORE( $index++, $_ ) foreach ( @values ); $self; } sub FETCH { my $index = $_[0]->_normalize_index( $_[1] ); $index > $_[0]->_last ? () : $_[0]->_show( substr( $_[0]->_string, $index, 1 ) ); } sub FETCHSIZE { length $_[0]->_string } sub STORESIZE { my $self = shift; my $new_size = shift; my $size = $self->FETCHSIZE; if( $size > $new_size ) { # truncate $$self = substr( $$self, 0, $size ); } elsif( $size < $new_size ) { # extend $$self .= join '', ($self->_null) x ( $new_size - $size ); } } sub STORE { my $self = shift; my $index = shift; my $value = shift; croak( "The magnitude of [$value] exceeds the allowed limit [255]" ) if( int($value) != $value || $value > 255 ); $self->_extend( $index ) if $index >= $self->_last; substr( $$self, $index, 1, chr $value ); $value; } sub _extend { my $self = shift; my $index = shift; $self->STORE( 0, 1 + $self->_last ) while( $self->_last >= $index ); } sub EXISTS { $_[0]->_last >= $_[1] ? 1 : 0 } sub CLEAR { ${ $_[0] } = '' } sub SHIFT { $_[0]->_show( substr ${ $_[0] }, 0, 1, '' ) } sub POP { $_[0]->_show( chop ${ $_[0] } ) } sub UNSHIFT { my $self = shift; foreach ( reverse @_ ) { substr ${ $self }, 0, 0, $self->_store( $_ ) } } sub PUSH { my $self = shift; $self->STORE( 1 + $self->_last, $_ ) foreach ( @_ ) } sub SPLICE { my $self = shift; my $arg_count = @_; my( $offset, $length, @list ) = @_; if( 0 == $arg_count ) { ( 0, $self->_last ) } elsif( 1 == $arg_count ) { ( $self->_normalize_index( $offset ), $self->_last ) } elsif( 2 <= $arg_count ) { # offset and length only ( $self->_normalize_index( $offset ), do { if( $length < 0 ) { $self->_last - $length } else { $start + $length - 1 } } ) } #@removed = map { $self->POP } $start .. $end; if( wantarray ) { @removed; } else { defined $removed[-1] ? $removed[-1] : undef; } } 1;
To make this work, I’ll treat each position in my string as an array element. To store a value, In STORE
the arguments are the index for the value and the value itself. I need to convert the value to a character and put that character in the right position in the string. If I try to store something other than a whole number between 1 and 255, I get an error.
To fetch a value I need to extract the character from the correct position in the string and convert it to a number. The argument to FETCH
is the index of the element so I need to convert that to something I can use with substr
.
Now, for the more complex array operations, I have to do a bit more work. To retrieve a splice, I have to grab several values, but splice
is also an lvalue so I have to be ready to assign those positions more values. Not only that, a user might assign fewer or more values than the splice
extracts, so I have to be ready to shrink or expand the string. That’s not scary, though, since I can already do all of that with a string by using substr
.
Deleting an element is a bit trickier. In a normal array I can have an undefined element. How am I going to handle that in the middle of a string? Amazingly, my example left me a way to handle this: I can store a undef as a null byte. If I had to store numbers between 0 and 255, I would have been in trouble. Curious how that works out.
Perl also lets me extend a tied array. In a normal array, I can extend an array to let Perl know I want it to do the work to make a certain number of elements available (thus explicitly circumventing Perl’s built-in logic to make its best guess about the proper array length). In this example, I just need to extend the string.
I contrived that last example so I could show the whole process without doing anything too tricky. I might want to store an array of characters, and that example would work quite well for that. Now I want to adapt it to store a DNA sequence. My domain changes from 256 things to something much smaller, the set { T C G A }, which represents thymine, cytosine, guanine, and adenine. If I add in the possibility of a NULL (maybe my gene sequencer can’t tell what should be in a particular position), I have five possibilities. I don’t need an entire character for that. I can actually get by with three bits and have a little to spare.
Before I get too deeply into this, let me make a guess about how much memory this can save. A typical DNA sequence has several thousand base pairs. If I used an array for that, I’d have the scalar overhead for each one of those base pairs as a separate scalar. I’ll say that’s 10 bytes, just to be kind. For 10,000 base pairs, which is just a small sequence, that’s 100,000 bytes. That scalar overhead really starts to add up! Now, instead of that, I’ll store everything in a single scalar. I’ll incur the scalar overhead once. For 10,000 base pairs at three bits a pair, that’s 30,000 bits, or 3,750 bytes. I round that off to 4,000 bytes. That’s a factor of 25! Remember, this memory parsimony comes at the expense of speed. I’ll have to do a little bit more computational work.
With six bits I have eight distinct patterns. I need to assign some of those patterns meanings. Fortunately for me, Perl makes this really easy since I can type out binary strings directly as long as I’m using Perl 5.6 or later (see Chapter 16 for more on bit operations):
use constant N => 0b000; use constant T => 0b001; use constant C => 0b100; use constant G => 0b110; use constant A => 0b011; use constant RESERVED1 => 0b111; use constant RESERVED2 => 0b101;
Also, since I’m not using characters anymore, I can’t use substr
. For vec
, I’d have to partition the bits by powers of two, but I’d have to waste another bit for that (and I’m already wasting two). If I do that, I end up with 10 unused patterns. That might be nice if we eventually meet aliens with more complex hereditary encodings, but for now I’ll just stick with what we have.
Before you get scared off by this code, remember what I’m doing. It’s exactly the same problem as the last example where I stored digits as characters in a long string. This time I’m doing it at the bit level with a some more math. My specific example doesn’t matter as much as the concept that I can make anything, and I mean anything, look like an array if I’m willing to do all the work.
package Tie::Array::DNA; use strict; use parent qw(Tie::Array); use Carp qw(croak carp); use vars qw( $VERSION ); $VERSION = 1.0; use constant BITS_PER_ELEMENT => 3; use constant BIT_PERIOD => 24; # 24 bits use constant BYTE_LENGTH => 8; use constant BYTE_PERIOD => 3; # 24 bits my %Patterns = ( T => 0b001, A => 0b011, C => 0b100, G => 0b110, N => 0b000, ); my @Values = (); foreach my $key ( keys %Patterns ) { $Values[ $Patterns{$key} ] = $key } sub _normalize { uc $_[1] } sub _allowed { length $_[1] eq 1 and $_[1] =~ tr/TCGAN// } my %Last; sub TIEARRAY { my( $class, @values ) = @_; my $string = \''; my $self = bless $string, $class; $$self = "\x00" x 10_000; $Last{ "foo" } = -1; my $index = 0; $self->STORE( $index++, $_ ) foreach ( @values ); $self; } sub _get_start_and_length { my( $self, $index ) = @_; my $bytes_to_start = int( $index * BITS_PER_ELEMENT / BYTE_LENGTH ); my $byte_group = int( $bytes_to_start / BYTE_PERIOD ); my $start = $byte_group * BYTE_PERIOD; ( $start, BYTE_PERIOD ) } sub _get_bytes { my( $self, $index ) = @_; my( $start, $length ) = $self->_get_start_and_length( $index ); my @chars = split //, substr( $$self, $start, $length ); (ord( $chars[0] ) << 16) + (ord( $chars[1] ) << 8) + ord( $chars[2] ); } sub _save_bytes { my( $self, $index, $bytes ) = @_; my( $start, $length ) = $self->_get_start_and_length( $index ); my $new_string = join '', map { chr( ( $bytes & ( 0xFF << $_ ) ) >> $_ ) } qw( 16 8 0 ); substr( $$self, $start, $length, $new_string ); } sub _get_shift { BIT_PERIOD - BITS_PER_ELEMENT - ($_[1] * BITS_PER_ELEMENT % BIT_PERIOD); } sub _get_clearing_mask { ~ ( 0b111 << $_[0]->_get_shift( $_[1] ) ) } sub _get_setting_mask { $_[0]->_get_pattern_by_value( $_[2] ) << $_[0]->_get_shift( $_[1] ) } sub _get_selecting_mask { 0b111 << $_[0]->_get_shift( $_[1] ) } sub _get_pattern_by_value { $Patterns{ $_[1] } } sub _get_null_pattern { $Patterns{ 'N' } } sub _get_value_by_pattern { $Values [ $_[1] ] } sub _string { $_[0] } sub _length { length ${$_[0]} } sub _add_to_string { ${$_[0]} .= $_[1] } sub STORE { my( $self, $index, $value ) = @_; $value = $self->_normalize( $value ); carp( qq|Cannot store unallowed element "$value"| ) unless $self->_allowed( $value ); $self->_extend( $index ) if $index > $self->_last; # get the mask my $clear_mask = $self->_get_clearing_mask( $index ); my $set_mask = $self->_get_setting_mask( $index, $value ); # clear the area my $result = ( $self->_get_bytes( $index ) & $clear_mask ) | $set_mask; # save the string my( $start, $length ) = $self->_get_start_and_length( $index ); my $new_string = join '', map { chr( ( $result & ( 0xFF << $_ ) ) >> $_ ) } qw( 16 8 0 ); substr( $$self, $start, $length, $new_string ); $self->_set_last( $index ) if $index > $self->_last; $value } sub FETCH { my( $self, $index ) = @_; # get the right substr my $bytes = $self->_get_bytes( $index ); # get the mask my $select_mask = $self->_get_selecting_mask( $index ); my $shift = $self->_get_shift( $index ); # clear the area my $pattern = 0 + ( ( $bytes & $select_mask ) >> $shift ); $self->_get_value_by_pattern( $pattern ); } sub FETCHSIZE { $_[0]->_last + 1 } sub STORESIZE { $_[0]->_set_last( $_[1] ) } sub EXTEND { } sub CLEAR { ${ $_[0] } = '' } sub EXISTS { $_[1] < $Last{ "foo" } } sub DESTROY { } __PACKAGE__;
This code gets a bit complicated because I have to implement my own array. Since I’m storing everything in a single string, and using the string as a long string of bits instead of characters, I have to come up with a way to get the information that I need.
I’m using three bits per element and characters come with eight bits. To make everything simpler, I decide to deal with everything in three-byte (24-bit) chunks because that’s the lowest common denominator between three-bit and eight-bit chunks of data. I do that in _get_bytes
and _save_bytes
, which figure out which three characters they need to grab. The _get_bytes
method turns the three characters into a single number so I can later use bit operations on it, and the _save_bytes
method goes the other way.
Once I have the number, I need to know how to pull out the three bits. There are eight elements in each group, so _get_selecting_mask
figures out which of those elements I want and returns the right bit mask to select it. That bit mask is just 0b111
shifted up the right number of places. The _get_shift
method handles that in general by using the constants BIT_PERIOD
and BITS_PER_ELEMENT
.
Once I got all of that in place, my FETCH
method can use it to return an element. It gets the bit pattern then looks up that pattern with _get_value_by_pattern
to turn the bits into the symbolic version (i.e. T, A, C, G ).
The STORE
method does all that but the other way around. It turns the symbols into the bit pattern, shifts that up the right amount, and does the right bit operations to set the value. I ensure that I clear the target bits first using the mask, I get back from _get_clearing_mask
. Once I clear the target bits I can use the bit mask from _get_setting_mask
to finally store the element.
Whew! Did you make it this far? I haven’t even implemented all of the array features. How am I going to implement SHIFT
, UNSHIFT
, or SPLICE
? Here’s a hint: remember that Perl has to do this for real arrays and strings. Instead of moving things over every time I affect the front of the data, it keeps track of where it should start, which might not be the beginning of the data. If I wanted to shift off a single element, I just have to add that offset of three bits to all of my computations. The first element would be at bits 3 to 5 instead of 0 to 2. I’ll leave that up to you, though.
Tied hashes are only a bit more complicated than tied arrays, but like all tied variables, I set them up in the same way. I need to implement methods for all of the actions I want my tied hash to handle. Table 2 shows some of the hash operations and their corresponding tied methods.
Action | Hash operation | Tie method |
Set value | $h{$str} = $val; | STORE( $str, $val ) |
Get value | $val = $h{$str}; | FETCH( $str ) |
Delete a key | delete $h{$str}; | DELETE( $str ) |
Check for a key | exists $h{$str}; | EXISTS( $str ) |
Next key | each %h; | NEXTKEY( $str ) |
Clear the hash | %h = (); | CLEAR( $str ) |
One common task, at least for me, is to accumulate a count of something in a hash. One of my favorite examples to show in Perl courses is a word frequency counter. By the time students get to the third day of the Learning Perl course, they know enough to write a simple word counter:
my %hash = (); while( <> ) { chomp; my @words = split; foreach my $word ( @words ) { $hash{$word}++ } } foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { printf "%4d %-20s\n", $hash{$word}, $word; }
When students actually start to use this, they discover that it’s really not as simple as all that. Words come in different capitalizations, with different punctuation attached to them, and possibly even misspelled. I could add a lot of code to that example to take care of all of those edge cases, but I can also fix that up in the hash assignment itself. I replace my hash declaration with a call to tie
and leave the rest of the program alone:
# my %hash = (); # old way tie my( %hash ), 'Tie::Hash::WordCounter'; while( <> ) { chomp; my @words = split; foreach my $word ( @words ) { $hash{$word}++ } } foreach my $word ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { printf "%4d %-20s\n", $hash{$word}, $word; }
I can make a tied hash do anything that I like, so I can make it handle those edge cases by normalizing the words I give it when I do the hash assignment. My tiny word counter program doesn’t have to change that much and I can hide all the work behind the tie interface.
I’ll handle most of the complexity in the STORE
method. Everything else will act just like a normal hash, and I’m going to use a hash behind the scenes. I should also be able to access a key by ignoring the case and punctuation issues so my FETCH
method normalizes its argument in the same way:
package Tie::Hash::WordCounter; use strict; use Tie::Hash; use parent qw(Tie::StdHash); our $VERSION = 1.0; sub TIEHASH { bless {}, $_[0] } sub _normalize { my( $self, $key ) = @_; $key =~ s/^\s+//; $key =~ s/\s+$//; $key = lc( $key ); $key =~ s/[\W_]//g; return $key } sub STORE { my( $self, $key, $value ) = @_; $key = $self->_normalize( $key ); $self->{ $key } = $value; } sub FETCH { my( $self, $key ) = @_; $key = $self->_normalize( $key ); $self->{ $key }; } __PACKAGE__;
By now you know what I’m going to say: tied filehandles are like all the other tied variables. Table 3 shows selected file operations and their corresponding tied methods. I simply need to provide the methods for the special behavior I want.
Action | File operation | Tie method |
Print to a filehandle | print FH “…”; | PRINT( @a ) |
Formatted print to a filehandle | printf FH “%s”, @a; | PRINTF( “%s”, @a ) |
Read from a filehandle | $line = <FH>; | READLINE() |
Close a filehandle | close FH; | CLOSE() |
For a small example, I create Tie::File::Timestamp
, which appends a timestamp to each line of output. Suppose I start with a program that already has several print statements. I didn’t write this program, but my task is to add a timestamp to each line:
# old program open LOG, '>>', 'log.txt' or die "Could not open output.txt! $!"; print LOG "This is a line of output\n"; print LOG "This is some other line\n";
I could do a lot of searching and a lot of typing, or I could even get my text editor to do most of the work for me. I’ll probably miss something, and I’m always nervous about big changes. I can make a little change by replacing the filehandle. Instead of open
, I’ll use tie
, leaving the rest of the program as it is:
# new program #open LOG, '>>', 'log.txt' or die "Could not open output.txt! $!"; tie *LOG, 'Tie::File::Timestamp', 'log.txt' or die "Could not open output.txt! $!"; print LOG "This is a line of output\n"; print LOG "This is some other line\n";
Now I have to make the magic work. It’s fairly simple since I only have to deal with four methods. In TIEHANDLE
, I open the file. If I can’t do that, I simply return, triggering the die
in the program since tie
doesn’t return a true value. Otherwise, I return the filehandle reference, which I’ve blessed into my tied class. That’s the object I’ll get as the first argument in the rest of the methods.
My output methods are simple. They’re simple wrappers around the built-in print
and printf
. I use the tie object as the filehandle reference (wrapping it in braces as Perl Best Practices recommends to signal to other people that’s what I mean to do). In PRINT
, I simply add a couple of arguments to the rest of the stuff I pass to print
. The first additional argument is the timestamp, and the second is a space character to make it all look nice. I do a similar thing in PRINTF
, although I add the extra text to the $format
argument:
package Tie::File::Timestamp; use strict; use vars qw($VERSION); use Carp qw(croak); $VERSION = 0.01; sub _timestamp { "[" . localtime() . "]" } sub TIEHANDLE { my $class = shift; my $file = shift; open my $fh, '>>', $file or return; bless $fh, $class; } sub PRINT { my( $self, @args ) = @_; print { $self } $self->_timestamp, " ", @args; } sub PRINTF { my( $self, $format, @args ) = @_; $format = $self->_timestamp . " " . $format; printf { $self } $format, @args; } sub CLOSE { close $_[0] } __PACKAGE__;
Tied filehandles have a glaring drawback, though: I can only do this with filehandles. Since Learning Perl, I’ve been telling you that bareword filehandles are the old way of doing things and that storing a filehandle reference in a scalar is the new and better way.
If I try to use a scalar variable, tie
looks for TIESCALAR
method, along with the other tied scalar methods. It doesn’t look for PRINT
, PRINTF
, and all of the other input/output methods I need. I can get around that with a little black magic that I don’t recommend. I start with a glob reference, *FH
, which creates an entry in the symbol table. I wrap a do
block around it to form a scope and to get the return value (the last evaluated expression). Since I only use the *FH
once, unless I turn off warnings in that area, Perl will tell me that I’ve only used *FH
once. In the tie
, I have to dereference $fh
as a glob reference so tie
looks for TIEHANDLE
instead of TIESCALAR
. Look scary? Good. Don’t do this!
my $fh = \do{ no warnings; local *FH }; my $object = tie *{$fh}, $class, $output_file;
I’ve showed you a lot of tricky code to reimplement Perl data types in Perl. The tie interface lets me do just about anything that I want, but I also then have to do all of the work to make the variables act like people expect them to act. With this power comes great responsibility and a lot of work.
For more examples, inspect the Tie
modules on CPAN. You can peek at the source code to see what they do and steal ideas for your own.
Teodor Zlatanov writes about “Tied Variables” for IBM developerWorks, January 2003, http://www.ibm.com/developerworks/linux/library/l-cptied/index.html.
Phil Crow uses tied filehandles to implement some design patterns in Perl in “Perl Design Patterns” for Perl.com, http://www.perl.com/pub/2003/06/13/design1.html.
Dave Cross writes about tied hashes in “Changing Hash Behaviour with tie” for Perl.com, http://www.perl.com/pub/2001/09/04/tiedhash.html.
Abhijit Menon-Sen uses tied hashes to make fancy dictionaries in “How Hashes Really Work” for Perl.com, http://www.perl.com/2002/10/01/hashes.html.
Randal Schwartz discusses tie
in “Fit to be tied (Parts 1 & 2) for Linux Magazine, March and April 2005, http://www.stonehenge.com/merlyn/LinuxMag/col68.html, and http://www.stonehenge.com/merlyn/LinuxMag/col69.html.
There are several Tie
modules on CPAN, and you can peek at the source code to see what they do and steal ideas for your own. I have a special fondness for Tie::Cycle::Sinewave
although I don’t have a need for it.