Chapter 7. Cleaning Up Perl

Part of mastering Perl is controlling the source code, no matter who gives it to you. People can usually read the code that they wrote, and usually complain about the code that other people wrote. In this chapter I’ll take that code and make it readable. This includes the output of so-called Perl obfuscators, which do much of their work by simply removing whitespace. You’re the programmer and it’s the source and you need to show it who’s boss.

Good Style

I’m not going to give any advice about code style, where to put the braces, or how many spaces to put where. These things are the sparks for heated debates that really do nothing to help you get work done. The perl interpreter doesn’t really care, nor does the computer. But, after all, we write code for people first and computers second.

Good code, in my mind, is something that a skilled practitioner can easily read. It’s important to note that good code is not something that just anyone could read. Code isn’t bad because a novice Perl programmer can’t read it, just like a novel isn’t bad because I don’t know the langauge. The first assumption has to be that the audience for any code is people who know the language, and if they don’t, know how to look up the parts they need to learn. Along with that, a good programmer should be able to easily deal with source written in the handful of major coding styles.

After that, consistency is the a major part of good code. Not only should I try to do the same thing in the same way each time (and that might mean everyone on the team doing it in the same way), but I should format it in the same way each time. I should use the same variable names for the same data structure in different parts of the code. Of course, there are edge cases and special situations, but for the most part, doing things the same way each time helps the new reader recognize what I’m trying to do.

Lastly, I like a lot of whitespace in my code, even before my eyesight started to get bad. Spaces separate tokens and blank lines separate groups of lines that go together, just as if I were writing prose. This book would certainly be hard to read without paragraph breaks; code has the same problem.

I have my own particular style that I like, but I’m not opposed to using another style. If I edit code or create a patch for somebody else’s code, I try to mimic their style. Remember, consistency is the major factor in good style. Adding my own style to existing code makes it inconsistent.

If you haven’t developed your own style or haven’t had one forced on you, the perlstyle documentation as well as Perl Best Practices can help you set standards for you and your coding team.

perltidy

The perltidy program reformats Perl programs to make them easier to read. Given a mess of code with odd indentation styles (or no indentation at all), little or no whitespace between tokens, and all other manner of obfuscation, perltidy creates something readable.

Here’s a short piece of code that I’ve intentionally written with bad style (actually, I wrote it normally then removed all of the good formatting). I haven’t done anything to obfuscate the program other than remove all the whitespace I could without breaking things:

#!/usr/bin/perl
# yucky
use strict;use warnings;my %Words;while(<>){chomp;s{^\s+}{};s{\s+$}{};
my $line=lc;my @words=split/\s+/,$line;foreach my $word(@words){
$word=~s{\W}{}g;next unless length $word;$Words{$word}++;}}foreach
my $word(sort{$Words{$b}<=>$Words{$a}}keys %Words){last
if $Words{$word}<10;printf"%5d  %s\n",$Words{$word},$word;}

If somebody else handed me this program, could I tell what the program does? I might know what it does, but not how it does it. Certainly I could read it slowly and carefully keep track of things in my head, or I could start to add newlines between statements. That’s work, though, and too much work even for this little program.

I save this program in a file I name yucky and run it through perltidy using its default options. perltidy won’t overwrite my file, but instead creates yucky.tdy with the reformatted code.

% perltidy yucky

Here’s the result of perltidy‘s reformatting, which uses the suggestions from the perlstyle documentation:

#!/usr/bin/perl
# yucky
use strict;
use warnings;
my %Words;
while (<>) {
    chomp;
    s{^\s+}{};
    s{\s+$}{};
    my $line = lc;
    my @words = split /\s+/, $line;
    foreach my $word (@words) {
        $word =~ s{\W}{}g;
        next unless length $word;
        $Words{$word}++;
    }
}
foreach my $word ( sort { $Words{$b} <=> $Words{$a} } keys %Words ) {
    last
      if $Words{$word} < 10;
    printf "%5d  %s\n", $Words{$word}, $word;
}

Maybe I’m partial to the GNU coding style, though, so I want that format instead. I give perltidy the -gnu switch:

% perltidy -gnu yucky

Now the braces and indentation are a bit different, but it’s still more readable than the original:

#!/usr/bin/perl
# yucky
use strict;
use warnings;
my %Words;
while (<>)
{
    chomp;
    s{^\s+}{};
    s{\s+$}{};
    my $line = lc;
    my @words = split /\s+/, $line;
    foreach my $word (@words)
    {
        $word =~ s{\W}{}g;
        next unless length $word;
        $Words{$word}++;
    }
}
foreach my $word (sort { $Words{$b} <=> $Words{$a} } keys %Words)
{
    last
      if $Words{$word} < 10;
    printf "%5d  %s\n", $Words{$word}, $word;
}

I can get a bit fancier by asking perltidy to format the program as HTML. The -html option doesn’t reformat the program but just adds HTML markup and applies a stylesheet to it. To get the fancy output on the reformatted program, I convert the yucky.tdy to HTML:

% perltidy yucky
% perltidy -html yucky.tdy

perltidy can do quite a bit more too. It has options to minutely control the formatting options for personal preference, and many options to send the output from one place to another, including an in-place editing feature.

De-obfuscation

Some people have the odd notion that they should make their Perl code harder to read. Sometimes they do this because they want to hide secrets, such as code to handle license management, or they don’t want people to distribute the code without their permission. Whatever their reason, they end up doing work that gets them nothing. The people who don’t know how to get the source back aren’t worrisome, and those who do will just be more interested in the challenge.

De-encoding Hidden Source

Perl code is very easy to reverse engineer since no matter what a code distributor does to the source, Perl still has to be able to run it. There isn’t a step where I can compile the code and get an object or bytecode file that I can distribute without the original source.

If Perl can get to the source, so can I with a little work. If you’re spending your time trying to hide your source from the people you’re giving it to, you’re wasting your time.

A favorite tactic of Perl obfuscators is also the favorite tactic of people who like to win Obfuscated Perl contests. That is, the Perl community does for sport what people try to sell you, so the Perl community has a lot of tricks to undo the damage so they can understand the contest entries.

I’ll show you the technique working forward first. Once you know the trick, it’s just monkey coding to undo it (annoying, but still tractable). Here’s a file japh-plaintext.pl:

#/usr/bin/perl
# japh-plaintext.pl

print "Just another Perl hacker,\n";

I want to take that file and transpose all of the characters so they become some other character. I’ll use ROT-13, which moves all of the letters over 13 places and wraps around the end. A real obfuscator will be more robust and handle special cases such as delimiters, but I don’t need to worry about that. I’m interested in defeating ones that have already done that work. I just read a file from the code line and output an encoded version:

#!/usr/bin/perl
# japh-encoder-rot13.pl

my $source = do {
    local $/; open my $fh,
    $ARGV[0] or die "$!"; <$fh>
    };

$source =~ tr/a-zA-Z/n-za-mN-ZA-M/;

print $source;

What I get out looks like what I imagine might be some extraterrestrial language:

% perl japh-encoder.pl japh-p*
#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";

I can’t run this program because it’s no longer Perl. I need to add some code at the end that will turn it back into Perl source. That code has to undo the transformation, then use the string form of eval to execute the decoded string as (the original) code:

#!/usr/bin/perl
# japh-encoder-decoder-rot13.pl

my $source = do {
    local $/; open my $fh,
    $ARGV[0] or die "$!"; <$fh>
    };

$source =~ tr/a-zA-Z/n-za-mN-ZA-M/;

print <<"HERE";
my \$v = q($source);
\$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval \$v;
HERE

Now my encoded program comes with the code to undo the damage. A real obfuscator would also compress whitespace and remove other aids to reading, but my output will do fine for this demonstration:

% perl japh-encoder-decoder-rot13.pl japh-plaintext.pl
my $v = q(#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";
);
$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval $v;

That’s the basic idea. The output still has to be Perl code, and it’s only a matter of the work involved to encode the source. That might be as trivial as my example or use some sort of secret such as a license key to decrypt it. Some things might even use several transformations. Here’s an encoder that works like ROT-13 except over the entire 8-bit range (so, ROT-255):

#!/usr/bin/perl
# japh-encoder-decoder-rot255.pl

my $source = do {
    local $/; open my $fh,
    $ARGV[0] or die "$!"; <$fh>
    };

$source =~ tr/\000-\377/\200-\377\000-\177/;

print <<"HERE";
my \$v = q($source);
\$v =~ tr/\200-\377\000-\177/\000-\377/;
eval \$v;
HERE

I take the already-encoded output from my ROT-13 program and encoded it again. The output is mostly goobledygook, and I can’t even see some of it on the screen because some 8-bit characters aren’t printable.

% perl japh-encoder-decoder-rot13.pl japh-p* |
    perl japh-encoder-decoder-rot255.pl -
my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç).
        q(îáâçõòå Ãòåù õîðøòå¬Üá¢»©»¤ö ½þ ôò¯îúáíÎÚÁÍ¯áúÁÚ¯»).
        q(åöáì ¤ö»);
$v =~ tr/-ÿ-/-ÿ/;
eval $v;

Now that I’ve shown you the trick, I’ll work backwards. From the last output there, I see the string eval. I’ll just change that to a print:

my $v = q(íù ¤ö ½ ñ¨£¯èæå¯ïöá¯ãòåù£ ÷îãõãùîöáçòëç®ãùãåöáç ¢×èæç).
        q(îáâçõòå Ãòåù õîðøòå¬Üá¢»©»¤ö ½þ ôò¯îúáíÎÚÁÍ¯áúÁÚ¯»).
        q(åöáì ¤ö»);
$v =~ tr/-ÿ-/-ÿ/;
print $v;

I run that program and get the next layer of encoding:

my $v = q(#/hfe/ova/crey
# wncu-cynvagrkg.cy

cevag "Whfg nabgure Crey unpxre,\a";
);
$v =~ tr/n-za-mN-ZA-M/a-zA-Z/;
eval $v;

I change that eval to a print and I’m back to the original source.

#/usr/bin/perl
# japh-plaintext.pl

print "Just another Perl hacker,\n";

I’ve now defeated the encoding tactic by intercepting the string that it wanted to send to eval. That’s not the only trick out there. I’ll show some more in a moment.

Unparsing Code with B::Deparse

Not all of these techniques are about looking at other people’s code. Sometimes I can’t figure out why Perl is doing something, so I compile it then decompile it to see what Perl is thinking. The B::Deparse module takes some code, compiles into Perl’s internal compiled structure, then works backward to get back to the source. The output won’t be the same as the original source since it doesn’t preserve anything.

Here’s a bit of code that demonstrates an obscure Perl feature. I know that I can use an alternative delimiter for the substitution operator, so I try to be a bit clever and use the dot as a delimiter. Why doesn’t this do what I expect? I want to get rid of the dot in the middle of the string:

$_ = "foo.bar";
s.\...;
print "$_\n";

I don’t get rid of the dot, however. The f disappears instead of the dot. I’ve escaped the dot, so what’s the problem? Using B::Deparse, I see that Perl sees something different:

% perl -MO=Deparse test
$_ = 'foo.bar';
s/.//;
print "$_\n";
test syntax OK

The escape first takes care of protecting the character I used as a delimiter, instead of making it a literal character in the pattern.

Here’s an example from http://www.stunnix.com/prod/po/overview.shtml. It takes Perl source and make it harder to read by changing variable names, converting strings to hex escapes, and converting numbers to arithmetic. It can also use the encoding trick I showed in the previous section, although this example doesn’t:

#!/usr/bin/perl

=head1 SYNOPSYS

A small program that does trivial things.

=cut
 sub zc47cc8b9f5 { ( my ( $z9e1f91fa38 ) = @_ ) ; print ( ( (
"\x69\x74\x27\x73\x20" . ( $z9e1f91fa38 + time ) ) .
"\x20\x73\x65\x63\x6f\x6e\x64\x73\x20\x73\x69\x6e\x63\x65\x20\x65\x70\x6f\x63\x68\x0a"
 ) ) ; } zc47cc8b9f5 ( (0x1963+ 433-0x1b12) ) ;

It’s trivial to get around most of that with B::Deparse. Its output un-encodes the strings and numbers and outputs them as their readable equivalents:

% perl -MO=Deparse stunnix-do-it-encoded.pl
sub zc47cc8b9f5 {
    my($z9e1f91fa38) = @_;
    print q[it's ] . ($z9e1f91fa38 + time) . " seconds since epoch\n";
}
zc47cc8b9f5 2;

The Stunnix program thinks it’s clever by choosing apparently random strings for identifier names, but Joshua ben Jore’s B::Deobfuscate extends B::Deparse to take care of that too. I can’t get back the original variable names, but I can get something easy to read and match up. Joshua chose to take identifier names from a list of flowers’ names:

% perl -MO=Deobfuscate stunnix-do-it-encoded.pl
sub SacramentoMountainsPricklyPoppy {
    my($Low) = @_;
    print q[it's ] . ($Low + time) . " seconds since epoch\n";
}
SacramentoMountainsPricklyPoppy 2;

B::Deparse doesn’t stop there, either. Can’t remember what those Perl one-liners do? Add the -MO=Deparse to the command and see what comes out:

% perl -MO=Deparse -naF: -le 'print $F[2]'

The deparser adds the code that I specified with the command line switches. The -n adds the while loop, the -a adds the split, and the -F changes the split pattern to the colon. The -l is one of my favorites because it automatically adds a newline to the end of print, and that’s how I get the $\ = "\n":

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our(@F) = split(/:/, $_, 0);
    print $F[2];
}

Perl::Critic

In Perl Best Practices, Damian Conway laid out 256 suggestions for writing readable and maintainable code. Jeffrey Thalhammer created Perl::Critic by combining Damian’s suggestions with Adam Kennedy’s PPI, a Perl parser, to create a way for people to find style violations in their code. This isn’t just a tool for cleaning up Perl; it can keep me honest as I develop new code. I don’t have to wait until I’m done to use this. I should check myself (and my coworkers) frequently.

Once I install the Perl::Critic module, I can use the perlcritic command. I looked through my own programs for something that I could use as an example. When I first wrote this book in 2005, I didn’t have a problem finding a program that perlcritic would complain about. For this edition I had to work a little harder, but I found one. Here’s a small program I wrote to find the retweeters of one of my Twitter posts so I could choose one to receive what I am giving away.

This program is exactly like I originally programmed it without any cleansing to make me seem any better:

#!/usr/bin/env perl5.14.2
# retweeter.pl
# https://gist.github.com/briandfoy/5478591
use Net::Twitter;
use v5.10;

die "Specify the original tweet id!\n" unless defined $ARGV[0];

# get your own credentials at https://dev.twitter.com/apps/new
my $nt = Net::Twitter->new(
  traits   => [qw/OAuth API::RESTv1_1/],
    map { $_ => $ENV{"twitter_$_"} || die "ENV twitter_$_ not set" }
        qw(
            consumer_secret
            consumer_key
            access_token
            access_token_secret
            )
    );
die "Could not make Twitter object!\n" unless defined $nt;

my $retweets = $nt->retweets( { id => $ARGV[0], count => 100 } );

say "Found " . @$retweets . " retweets for $ARGV[0]";

my @retweet_users =
    map  { $_->{user}{screen_name} }
    @$retweets;

my $chosen = int rand( @retweet_users );
say "The winner is $retweet_users[$chosen]!";

The violation I get tells me what’s wrong, gives me a page reference for Perl Best Practices, and tells me the severity of the violation. Higher numbers are more severe, with 5 being the most severe. By default, perlcritic only shows what it thinks are the worst problems:

% perlcritic retweeter.pl
Code before strictures are enabled at line 7, column 1.  See page 429 of PBP.  (Severity: 5)

I might feel pretty good that perlcritic only warns me about strictures, something I don’t always add to small programs like this one.

Every Perl::Critic warning is implemented as a policy, which is a Perl module that checks for a particular coding practice. If I don’t understand the short warning I get, I can get more with the --verbose switch and an argument from 1 to 9 to specify the amount of information I want:

% perlcritic --verbose 9 retweeter.pl
[TestingAndDebugging::RequireUseStrict] Code before strictures are enabled at line 7,
near 'die "Specify the original tweet id!\n" unless defined $ARGV[0];'.  (Severity: 5)

This shows the policy name, the warning, the line of code, and the severity. I can customize that output by giving --verbose a format string. The format looks similar to those I use with printf, and the %p placeholder stands in for the policy name:

% perlcritic --verbose '%p\n' retweeter.pl
TestingAndDebugging::RequireUseStrict

Now that I know the policy name, I can disable it in a .perlcriticrc file that I put in my home directory. I enclose the policy name in square brackets, and prepend a - to the name to signal that I want to exclude it from the analysis:

# perlcriticrc
[-TestingAndDebugging::RequireUseStrict]

When I run perlcritic again, I get the all clear:

% perlcritic --verbose '%p\n' retweeter.pl
retweeter.pl source OK

That taken care of, I can start to look at less severe problems. I step down a level using the --severity switch. As with other debugging work, I take care of the most severe problems before moving on to the lesser problems. At the next level, the severe problems would be swamped in a couple hundred of the same violation, telling me I haven’t used Perl’s warnings in this program:

% perlcritic --severity 4 retweeter.pl
Code before warnings are enabled at line 7, column 1.  See page 431 of PBP.  (Severity: 4)

I can also specify the severity levels according to their names. Table 1 shows the perlcritic levels. Severity level 4, which is one level below the most severe level, is -stern:

% perlcritic -stern retweeter.pl
Code before warnings are enabled at line 7, column 1.  See page 431 of PBP.  (Severity: 4)

Table 7-1. perlcritic can take a severity number or a name

Number	Name
—severity 5	-gentle
—severity 4	-stern
—severity 3	-harsh
—severity 2	-cruel
—severity 1	-brutal

I find out that the policy responsible for this is TestingAndDebugging::RequireUseWarnings, but I’m neither testing nor debugging, so I have warnings turned off. My .perlcriticrc is now a bit longer:

# perlcriticrc
[-TestingAndDebugging::RequireUseStrict]
[-TestingAndDebugging::RequireUseWarnings]

I can continue the descent in severity to get pickier and pickier warnings. The lower I go, the more obstinate I get. For instance, perlcritic starts to complain about using die instead of croak, although in my program croak does nothing I need since I use die at the top-level of code rather than in subroutines. croak can adjust the report for the caller, but in this case there is no caller:

% perlcritic --severity 3 retweeter.pl
Version string used at line 5, column 1.  Use a real number instead.  (Severity: 3)
"die" used instead of "croak" at line 12, column 36.  See page 283 of PBP.  (Severity: 3)

If I want to keep using perlcritic, I need to adjust my configuration file for this program, but with these lower severity items, I probably don’t want to disable them across all of my perlcritic analyses. Most of my programs shouldn’t get away with these violations. I copy my .perlcriticrc to retweeter-critic-profile and tell perlcritic where to find my new configuration using the --profile switch:

% perlcritic --profile retweeter-critic-profile retweeter.pl

Completely turning off a policy might not always be the best thing to do. There’s a policy to complain about using eval in a string context and that’s generally a good idea. I do need the string eval for dynamic module loading though. I need it to use a variable with require, which only takes a string or a bareword:

eval "require $module";

Normally, Perl::Critic complains about that because it doesn’t know that this particular use is the only way to do this. Ricardo Signes created Perl::Critic::Lax for just these situations. It adds a bunch of policies that complain about a construct unless it’s a use, such as my eval-require, that is a good idea. His policy Perl::Critic::Policy::Lax::ProhibitStringyEval::ExceptForRequire takes care of this one. String evals are still bad, but just not in this case.

You can also tell perlcritic to ignore a line by putting a ## no critic comment on that line:

use v5.10; ## no critic

This causes it’s own violation to replace the one I turned off:

% perlcritic --severity 3 retweeter.pl
Unrestricted '## no critic' annotation at line 5, column 12.  Only disable the Policies you really need to disable.  (Severity: 3)
...

I can tell the comment which policy to turn off by putting the policy name in parentheses (and separating multiple policies with commas).

use v5.10; ## no critic (ValuesAndExpressions::ProhibitVersionStrings)

This is just annoying enough to make me want to change the code instead, even if I don’t like the style that Perl::Critic recommends.

Creating My Own Perl::Critic Policy

That’s just the beginning of Perl::Critic. I’ve already seen how I want to change how it works so I can disable some policies, but I can also add policies of my own, too. Every policy is simply a Perl module. The policy modules live under the Perl::Critic::Policy::* namespace and inherit from the Perl::Critic::Policy module.

package Perl::Critic::Policy::Subroutines::ProhibitMagicReturnValues;

use strict;
use warnings;
use Perl::Critic::Utils;
use parent 'Perl::Critic::Policy';

our $VERSION = 0.01;

my $desc = q{returning magic values};

sub default_severity  { return $SEVERITY_HIGHEST  }
sub default_themes    { return qw(pbp danger)     }
sub applies_to        { return 'PPI::Token::Word' }


sub violates {
    my( $self, $elem ) = @_;
    return unless $elem eq 'return';
    return if is_hash_key( $elem );

    my $sib = $elem->snext_sibling();

    return unless $sib;
    return unless $sib->isa('PPI::Token::Number');
    return unless $sib =~ m/^\d+\z/;

    return $self->violation( $desc, [ 'n/a' ], $elem );
    }

1;

Once written, I test my policy with perlcritic and I see that I’ve written it without violations:

% perlcritic ProhibitMagicReturnValues.pm
ProhibitMagicReturnValues.pm source OK

There’s much more that I can do with Perl::Critic. With the Test::Perl::Critic module, I can add its analysis to my automated testing. Every time I run make test I find out if I’ve violated the local style. The criticism pragma adds a warnings-like feature to my programs so I get Perl::Critic warnings (if there are any) when I run the program.

Although I might disagree with certain policies, that does not diminish the usefulness of Perl::Critic. It’s configurable and extendable so I can make it fit the local situation. Check the references at the end of this chapter for more information.

Summary

Code might come to me in all sorts of formats, encodings, and other tricks that make it hard to read, but I have many tools to clean it up and figure out what it’s doing. With a little work I can be reading nicely formatted code instead of suffering from the revenge of the programmers who came before me.

Prev		Next
Chapter 6. Benchmarking Perl	Home	Chapter 8. Symbol Tables and Typeglobs