Part of mastering Perl is controlling the source code, no matter who gives it to you. People can usually read the code that they wrote, and usually complain about the code that other people wrote. In this chapter I’ll take that code and make it readable. This includes the output of so-called Perl obfuscators, which do much of their work by simply removing whitespace. You’re the programmer and it’s the source and you need to show it who’s boss.
I’m not going to give any advice about code style, where to put the braces, or how many spaces to put where. These things are the sparks for heated debates that really do nothing to help you get work done. The perl interpreter doesn’t really care, nor does the computer. But, after all, we write code for people first and computers second.
Good code, in my mind, is something that a skilled practitioner can easily read. It’s important to note that good code is not something that just anyone could read. Code isn’t bad because a novice Perl programmer can’t read it, just like a novel isn’t bad because I don’t know the langauge. The first assumption has to be that the audience for any code is people who know the language, and if they don’t, know how to look up the parts they need to learn. Along with that, a good programmer should be able to easily deal with source written in the handful of major coding styles.
After that, consistency is the a major part of good code. Not only should I try to do the same thing in the same way each time (and that might mean everyone on the team doing it in the same way), but I should format it in the same way each time. I should use the same variable names for the same data structure in different parts of the code. Of course, there are edge cases and special situations, but for the most part, doing things the same way each time helps the new reader recognize what I’m trying to do.
Lastly, I like a lot of whitespace in my code, even before my eyesight started to get bad. Spaces separate tokens and blank lines separate groups of lines that go together, just as if I were writing prose. This book would certainly be hard to read without paragraph breaks; code has the same problem.
I have my own particular style that I like, but I’m not opposed to using another style. If I edit code or create a patch for somebody else’s code, I try to mimic their style. Remember, consistency is the major factor in good style. Adding my own style to existing code makes it inconsistent.
If you haven’t developed your own style or haven’t had one forced on you, the perlstyle documentation as well as Perl Best Practices can help you set standards for you and your coding team.
The perltidy program reformats Perl programs to make them easier to read. Given a mess of code with odd indentation styles (or no indentation at all), little or no whitespace between tokens, and all other manner of obfuscation, perltidy creates something readable.
Here’s a short piece of code that I’ve intentionally written with bad style (actually, I wrote it normally then removed all of the good formatting). I haven’t done anything to obfuscate the program other than remove all the whitespace I could without breaking things:
#!/usr/bin/perl # yucky use strict;use warnings;my %Words;while(<>){chomp;s{^\s+}{};s{\s+$}{}; my $line=lc;my @words=split/\s+/,$line;foreach my $word(@words){ $word=~s{\W}{}g;next unless length $word;$Words{$word}++;}}foreach my $word(sort{$Words{$b}<=>$Words{$a}}keys %Words){last if $Words{$word}<10;printf"%5d %s\n",$Words{$word},$word;}
If somebody else handed me this program, could I tell what the program does? I might know what it does, but not how it does it. Certainly I could read it slowly and carefully keep track of things in my head, or I could start to add newlines between statements. That’s work, though, and too much work even for this little program.
I save this program in a file I name yucky
and run it through perltidy using its default options. perltidy won’t overwrite my file, but instead creates yucky.tdy
with the reformatted code.
% perltidy yucky
Here’s the result of perltidy‘s reformatting, which uses the suggestions from the perlstyle documentation:
#!/usr/bin/perl # yucky use strict; use warnings; my %Words; while (<>) { chomp; s{^\s+}{}; s{\s+$}{}; my $line = lc; my @words = split /\s+/, $line; foreach my $word (@words) { $word =~ s{\W}{}g; next unless length $word; $Words{$word}++; } } foreach my $word ( sort { $Words{$b} <=> $Words{$a} } keys %Words ) { last if $Words{$word} < 10; printf "%5d %s\n", $Words{$word}, $word; }
Maybe I’m partial to the GNU coding style, though, so I want that format instead. I give perltidy the -gnu
switch:
% perltidy -gnu yucky
Now the braces and indentation are a bit different, but it’s still more readable than the original:
#!/usr/bin/perl # yucky use strict; use warnings; my %Words; while (<>) { chomp; s{^\s+}{}; s{\s+$}{}; my $line = lc; my @words = split /\s+/, $line; foreach my $word (@words) { $word =~ s{\W}{}g; next unless length $word; $Words{$word}++; } } foreach my $word (sort { $Words{$b} <=> $Words{$a} } keys %Words) { last if $Words{$word} < 10; printf "%5d %s\n", $Words{$word}, $word; }
I can get a bit fancier by asking perltidy to format the program as HTML. The -html
option doesn’t reformat the program but just adds HTML markup and applies a stylesheet to it. To get the fancy output on the reformatted program, I convert the yucky.tdy
to HTML:
% perltidy yucky
% perltidy -html yucky.tdy
perltidy can do quite a bit more too. It has options to minutely control the formatting options for personal preference, and many options to send the output from one place to another, including an in-place editing feature.
Some people have the odd notion that they should make their Perl code harder to read. Sometimes they do this because they want to hide secrets, such as code to handle license management, or they don’t want people to distribute the code without their permission. Whatever their reason, they end up doing work that gets them nothing. The people who don’t know how to get the source back aren’t worrisome, and those who do will just be more interested in the challenge.
Perl code is very easy to reverse engineer since no matter what a code distributor does to the source, Perl still has to be able to run it. There isn’t a step where I can compile the code and get an object or bytecode file that I can distribute without the original source.
If Perl can get to the source, so can I with a little work. If you’re spending your time trying to hide your source from the people you’re giving it to, you’re wasting your time.
A favorite tactic of Perl obfuscators is also the favorite tactic of people who like to win Obfuscated Perl contests. That is, the Perl community does for sport what people try to sell you, so the Perl community has a lot of tricks to undo the damage so they can understand the contest entries.
I’ll show you the technique working forward first. Once you know the trick, it’s just monkey coding to undo it (annoying, but still tractable). Here’s a file japh-plaintext.pl
:
#/usr/bin/perl # japh-plaintext.pl print "Just another Perl hacker,\n";
I want to take that file and transpose all of the characters so they become some other character. I’ll use ROT-13, which moves all of the letters over 13 places and wraps around the end. A real obfuscator will be more robust and handle special cases such as delimiters, but I don’t need to worry about that. I’m interested in defeating ones that have already done that work. I just read a file from the code line and output an encoded version:
#!/usr/bin/perl # japh-encoder-rot13.pl my $source = do { local $/; open my $fh, $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/a-zA-Z/n-za-mN-ZA-M/; print $source;
What I get out looks like what I imagine might be some extraterrestrial language:
% perl japh-encoder.pl japh-p*
#/hfe/ova/crey
# wncu-cynvagrkg.cy
cevag "Whfg nabgure Crey unpxre,\a";
I can’t run this program because it’s no longer Perl. I need to add some code at the end that will turn it back into Perl source. That code has to undo the transformation, then use the string form of eval
to execute the decoded string as (the original) code:
#!/usr/bin/perl # japh-encoder-decoder-rot13.pl my $source = do { local $/; open my $fh, $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/a-zA-Z/n-za-mN-ZA-M/; print <<"HERE"; my \$v = q($source); \$v =~ tr/n-za-mN-ZA-M/a-zA-Z/; eval \$v; HERE
Now my encoded program comes with the code to undo the damage. A real obfuscator would also compress whitespace and remove other aids to reading, but my output will do fine for this demonstration:
% perl japh-encoder-decoder-rot13.pl japh-plaintext.pl
my $v = q(#/hfe/ova/crey
# wncu-cynvagrkg.cy
cevag "Whfg nabgure Crey unpxre,\a";
);
$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval $v;
That’s the basic idea. The output still has to be Perl code, and it’s only a matter of the work involved to encode the source. That might be as trivial as my example or use some sort of secret such as a license key to decrypt it. Some things might even use several transformations. Here’s an encoder that works like ROT-13 except over the entire 8-bit range (so, ROT-255):
#!/usr/bin/perl # japh-encoder-decoder-rot255.pl my $source = do { local $/; open my $fh, $ARGV[0] or die "$!"; <$fh> }; $source =~ tr/\000-\377/\200-\377\000-\177/; print <<"HERE"; my \$v = q($source); \$v =~ tr/\200-\377\000-\177/\000-\377/; eval \$v; HERE
I take the already-encoded output from my ROT-13 program and encoded it again. The output is mostly goobledygook, and I can’t even see some of it on the screen because some 8-bit characters aren’t printable.
% perl japh-encoder-decoder-rot13.pl japh-p* |
perl japh-encoder-decoder-rot255.pl -
my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç).
q(îáâçõòå Ãòåù õîðøòå¬Üᢻ©»¤ö ½þ ôò¯îúáíÎÚÁͯáúÁÚ¯»).
q(åöáì ¤ö»);
$v =~ tr/-ÿ-/-ÿ/;
eval $v;
Now that I’ve shown you the trick, I’ll work backwards. From the last output there, I see the string eval
. I’ll just change that to a print
:
my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç). q(îáâçõòå Ãòåù õîðøòå¬Üᢻ©»¤ö ½þ ôò¯îúáíÎÚÁͯáúÁÚ¯»). q(åöáì ¤ö»); $v =~ tr/-ÿ-/-ÿ/; print $v;
I run that program and get the next layer of encoding:
my $v = q(#/hfe/ova/crey # wncu-cynvagrkg.cy cevag "Whfg nabgure Crey unpxre,\a"; ); $v =~ tr/n-za-mN-ZA-M/a-zA-Z/; eval $v;
I change that eval
to a print
and I’m back to the original source.
#/usr/bin/perl # japh-plaintext.pl print "Just another Perl hacker,\n";
I’ve now defeated the encoding tactic by intercepting the string that it wanted to send to eval
. That’s not the only trick out there. I’ll show some more in a moment.
Not all of these techniques are about looking at other people’s code. Sometimes I can’t figure out why Perl is doing something, so I compile it then decompile it to see what Perl is thinking. The B::Deparse
module takes some code, compiles into Perl’s internal compiled structure, then works backward to get back to the source. The output won’t be the same as the original source since it doesn’t preserve anything.
Here’s a bit of code that demonstrates an obscure Perl feature. I know that I can use an alternative delimiter for the substitution operator, so I try to be a bit clever and use the dot as a delimiter. Why doesn’t this do what I expect? I want to get rid of the dot in the middle of the string:
$_ = "foo.bar"; s.\...; print "$_\n";
I don’t get rid of the dot, however. The f
disappears instead of the dot. I’ve escaped the dot, so what’s the problem? Using B::Deparse
, I see that Perl sees something different:
% perl -MO=Deparse test
$_ = 'foo.bar';
s/.//;
print "$_\n";
test syntax OK
The escape first takes care of protecting the character I used as a delimiter, instead of making it a literal character in the pattern.
Here’s an example from http://www.stunnix.com/prod/po/overview.shtml. It takes Perl source and make it harder to read by changing variable names, converting strings to hex escapes, and converting numbers to arithmetic. It can also use the encoding trick I showed in the previous section, although this example doesn’t:
#!/usr/bin/perl =head1 SYNOPSYS A small program that does trivial things. =cut sub zc47cc8b9f5 { ( my ( $z9e1f91fa38 ) = @_ ) ; print ( ( ( "\x69\x74\x27\x73\x20" . ( $z9e1f91fa38 + time ) ) . "\x20\x73\x65\x63\x6f\x6e\x64\x73\x20\x73\x69\x6e\x63\x65\x20\x65\x70\x6f\x63\x68\x0a" ) ) ; } zc47cc8b9f5 ( (0x1963+ 433-0x1b12) ) ;
It’s trivial to get around most of that with B::Deparse
. Its output un-encodes the strings and numbers and outputs them as their readable equivalents:
% perl -MO=Deparse stunnix-do-it-encoded.pl
sub zc47cc8b9f5 {
my($z9e1f91fa38) = @_;
print q[it's ] . ($z9e1f91fa38 + time) . " seconds since epoch\n";
}
zc47cc8b9f5 2;
The Stunnix program thinks it’s clever by choosing apparently random strings for identifier names, but Joshua ben Jore’s B::Deobfuscate
extends B::Deparse
to take care of that too. I can’t get back the original variable names, but I can get something easy to read and match up. Joshua chose to take identifier names from a list of flowers’ names:
% perl -MO=Deobfuscate stunnix-do-it-encoded.pl
sub SacramentoMountainsPricklyPoppy {
my($Low) = @_;
print q[it's ] . ($Low + time) . " seconds since epoch\n";
}
SacramentoMountainsPricklyPoppy 2;
B::Deparse
doesn’t stop there, either. Can’t remember what those Perl one-liners do? Add the -MO=Deparse
to the command and see what comes out:
% perl -MO=Deparse -naF: -le 'print $F[2]'
The deparser adds the code that I specified with the command line switches. The -n
adds the while
loop, the -a
adds the split
, and the -F
changes the split pattern to the colon. The -l
is one of my favorites because it automatically adds a newline to the end of print
, and that’s how I get the $\ = "\n"
:
BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = <ARGV>)) { chomp $_; our(@F) = split(/:/, $_, 0); print $F[2]; }
In Perl Best Practices, Damian Conway laid out 256 suggestions for writing readable and maintainable code. Jeffrey Thalhammer created Perl::Critic
by combining Damian’s suggestions with Adam Kennedy’s PPI
, a Perl parser, to create a way for people to find style violations in their code. This isn’t just a tool for cleaning up Perl; it can keep me honest as I develop new code. I don’t have to wait until I’m done to use this. I should check myself (and my coworkers) frequently.
Once I install the Perl::Critic
module, I can use the perlcritic command. I looked through my own programs for something that I could use as an example. When I first wrote this book in 2005, I didn’t have a problem finding a program that perlcritic would complain about. For this edition I had to work a little harder, but I found one. Here’s a small program I wrote to find the retweeters of one of my Twitter posts so I could choose one to receive what I am giving away.
This program is exactly like I originally programmed it without any cleansing to make me seem any better:
#!/usr/bin/env perl5.14.2 # retweeter.pl # https://gist.github.com/briandfoy/5478591 use Net::Twitter; use v5.10; die "Specify the original tweet id!\n" unless defined $ARGV[0]; # get your own credentials at https://dev.twitter.com/apps/new my $nt = Net::Twitter->new( traits => [qw/OAuth API::RESTv1_1/], map { $_ => $ENV{"twitter_$_"} || die "ENV twitter_$_ not set" } qw( consumer_secret consumer_key access_token access_token_secret ) ); die "Could not make Twitter object!\n" unless defined $nt; my $retweets = $nt->retweets( { id => $ARGV[0], count => 100 } ); say "Found " . @$retweets . " retweets for $ARGV[0]"; my @retweet_users = map { $_->{user}{screen_name} } @$retweets; my $chosen = int rand( @retweet_users ); say "The winner is $retweet_users[$chosen]!";
The violation I get tells me what’s wrong, gives me a page reference for Perl Best Practices, and tells me the severity of the violation. Higher numbers are more severe, with 5 being the most severe. By default, perlcritic only shows what it thinks are the worst problems:
% perlcritic retweeter.pl
Code before strictures are enabled at line 7, column 1. See page 429 of PBP. (Severity: 5)
I might feel pretty good that perlcritic only warns me about strictures, something I don’t always add to small programs like this one.
Every Perl::Critic
warning is implemented as a policy, which is a Perl module that checks for a particular coding practice. If I don’t understand the short warning I get, I can get more with the --verbose
switch and an argument from 1 to 9 to specify the amount of information I want:
% perlcritic --verbose 9 retweeter.pl
[TestingAndDebugging::RequireUseStrict] Code before strictures are enabled at line 7,
near 'die "Specify the original tweet id!\n" unless defined $ARGV[0];'. (Severity: 5)
This shows the policy name, the warning, the line of code, and the severity. I can customize that output by giving --verbose
a format string. The format looks similar to those I use with printf
, and the %p
placeholder stands in for the policy name:
% perlcritic --verbose '%p\n' retweeter.pl
TestingAndDebugging::RequireUseStrict
Now that I know the policy name, I can disable it in a .perlcriticrc
file that I put in my home directory. I enclose the policy name in square brackets, and prepend a -
to the name to signal that I want to exclude it from the analysis:
# perlcriticrc [-TestingAndDebugging::RequireUseStrict]
When I run perlcritic again, I get the all clear:
% perlcritic --verbose '%p\n' retweeter.pl
retweeter.pl source OK
That taken care of, I can start to look at less severe problems. I step down a level using the --severity
switch. As with other debugging work, I take care of the most severe problems before moving on to the lesser problems. At the next level, the severe problems would be swamped in a couple hundred of the same violation, telling me I haven’t used Perl’s warnings in this program:
% perlcritic --severity 4 retweeter.pl
Code before warnings are enabled at line 7, column 1. See page 431 of PBP. (Severity: 4)
I can also specify the severity levels according to their names. Table 1 shows the perlcritic levels. Severity level 4, which is one level below the most severe level, is -stern
:
% perlcritic -stern retweeter.pl
Code before warnings are enabled at line 7, column 1. See page 431 of PBP. (Severity: 4)
Number | Name |
—severity 5 | -gentle |
—severity 4 | -stern |
—severity 3 | -harsh |
—severity 2 | -cruel |
—severity 1 | -brutal |
I find out that the policy responsible for this is TestingAndDebugging::RequireUseWarnings
, but I’m neither testing nor debugging, so I have warnings turned off. My .perlcriticrc
is now a bit longer:
# perlcriticrc [-TestingAndDebugging::RequireUseStrict] [-TestingAndDebugging::RequireUseWarnings]
I can continue the descent in severity to get pickier and pickier warnings. The lower I go, the more obstinate I get. For instance, perlcritic starts to complain about using die
instead of croak
, although in my program croak
does nothing I need since I use die
at the top-level of code rather than in subroutines. croak
can adjust the report for the caller, but in this case there is no caller:
% perlcritic --severity 3 retweeter.pl
Version string used at line 5, column 1. Use a real number instead. (Severity: 3)
"die" used instead of "croak" at line 12, column 36. See page 283 of PBP. (Severity: 3)
If I want to keep using perlcritic, I need to adjust my configuration file for this program, but with these lower severity items, I probably don’t want to disable them across all of my perlcritic analyses. Most of my programs shouldn’t get away with these violations. I copy my .perlcriticrc
to retweeter-critic-profile
and tell perlcritic where to find my new configuration using the --profile
switch:
% perlcritic --profile retweeter-critic-profile retweeter.pl
Completely turning off a policy might not always be the best thing to do. There’s a policy to complain about using eval
in a string context and that’s generally a good idea. I do need the string eval
for dynamic module loading though. I need it to use a variable with require
, which only takes a string or a bareword:
eval "require $module";
Normally, Perl::Critic
complains about that because it doesn’t know that this particular use is the only way to do this. Ricardo Signes created Perl::Critic::Lax
for just these situations. It adds a bunch of policies that complain about a construct unless it’s a use, such as my eval
-require
, that is a good idea. His policy Perl::Critic::Policy::Lax::ProhibitStringyEval::ExceptForRequire
takes care of this one. String eval
s are still bad, but just not in this case.
You can also tell perlcritic to ignore a line by putting a ## no critic
comment on that line:
use v5.10; ## no critic
This causes it’s own violation to replace the one I turned off:
% perlcritic --severity 3 retweeter.pl
Unrestricted '## no critic' annotation at line 5, column 12. Only disable the Policies you really need to disable. (Severity: 3)
...
I can tell the comment which policy to turn off by putting the policy name in parentheses (and separating multiple policies with commas).
use v5.10; ## no critic (ValuesAndExpressions::ProhibitVersionStrings)
This is just annoying enough to make me want to change the code instead, even if I don’t like the style that Perl::Critic
recommends.
That’s just the beginning of Perl::Critic
. I’ve already seen how I want to change how it works so I can disable some policies, but I can also add policies of my own, too. Every policy is simply a Perl module. The policy modules live under the Perl::Critic::Policy::*
namespace and inherit from the Perl::Critic::Policy
module.
package Perl::Critic::Policy::Subroutines::ProhibitMagicReturnValues; use strict; use warnings; use Perl::Critic::Utils; use parent 'Perl::Critic::Policy'; our $VERSION = 0.01; my $desc = q{returning magic values}; sub default_severity { return $SEVERITY_HIGHEST } sub default_themes { return qw(pbp danger) } sub applies_to { return 'PPI::Token::Word' } sub violates { my( $self, $elem ) = @_; return unless $elem eq 'return'; return if is_hash_key( $elem ); my $sib = $elem->snext_sibling(); return unless $sib; return unless $sib->isa('PPI::Token::Number'); return unless $sib =~ m/^\d+\z/; return $self->violation( $desc, [ 'n/a' ], $elem ); } 1;
Once written, I test my policy with perlcritic and I see that I’ve written it without violations:
% perlcritic ProhibitMagicReturnValues.pm
ProhibitMagicReturnValues.pm source OK
There’s much more that I can do with Perl::Critic
. With the Test::Perl::Critic
module, I can add its analysis to my automated testing. Every time I run make test
I find out if I’ve violated the local style. The criticism
pragma adds a warnings
-like feature to my programs so I get Perl::Critic
warnings (if there are any) when I run the program.
Although I might disagree with certain policies, that does not diminish the usefulness of Perl::Critic
. It’s configurable and extendable so I can make it fit the local situation. Check the references at the end of this chapter for more information.
Code might come to me in all sorts of formats, encodings, and other tricks that make it hard to read, but I have many tools to clean it up and figure out what it’s doing. With a little work I can be reading nicely formatted code instead of suffering from the revenge of the programmers who came before me.
See the perltidy site for more details and examples: http://perltidy.sourceforge.net/. You can install perltidy by installing the Perl::Tidy
module. It also has plug-ins for Vim and Emacs, as well as other editors.
The perlstyle documentation is a collection of Larry Wall’s style points. You don’t have to follow his style, but most Perl programmers seem to. Damian Conway gives his own style advice in Perl Best Practices.
Josh McAdams wrote “Perl Critic” for The Perl Review 2.3 (Summer 2006), http://www.theperlreview.com.
Perl::Critic
has its own web site where you can upload code for it to analyze: http://perlcritic.com/. It also has a project page hosted at Tigris: http://perlcritic.tigris.org/.