Chapter 15. Working with Pod

Perl has a default documentation format called Plain Old Documentation, or Pod for short. I can use it directly in my programs, and even between segments of code. Other programs can easily pick out the Pod and translate it into more familiar formats, such as HTML, text, or even PDF. I’ll show some of the most used features of pod, how to test my pod, and how to create my own Pod translator.

The Pod Format

Sean Burke, the same person responsible for most of what I’ll cover in this chapter, completely specified the Pod format in the perlpodspec documentation page. This is the gory details version of the specification and how to parse it, which we’ll do in this chapter. The stuff we showed you in Learning Perl and Intermediate Perl are just the basics covered in the higher-level perlpod documentation page.

Directives

Pod directives start at the beginning of a line at any point where Perl is expecting a new statement. Each directive starts with an equal sign, =, at the beginning of a line when Perl is expecting a new statement (so not in the middle of statements). When Perl is trying to parse a new statement but sees that =, it switches to parsing Pod. Perl continues to parse the Pod until it reaches the =cut directive or the end of the file:

#!/usr/bin/perl

=encoding utf8

=head1 First level heading

Here's a line of code that won't execute:

    print "How'd you see this!?\n";

=over 4

=item First item

=item Second item

=back

=cut

print "This line executes\n";

My Pod doesn’t have to show up in one chunk, either, so I can intersperse code and Pod. For instance, I like to put the documentation for my subroutines right next to the subrountine:

#!/usr/bin/perl

=encoding utf8

=head1 First level heading

Here's a line of code that won't execute:

    print "How'd you see this!?\n";

=cut

=over 4

=item foo

=cut

sub foo { ... }

=item bar

=cut

sub bar { ... }
=back

=cut

print "This line executes\n";

This makes Pod the easiest way to comment out multiple lines of code:

print "This line runs\n";

=pod

open my $fh, '>:utf8', $filename or die ...;
print { $fh } Dumper( $big_data_structure );
close $fh

=cut

print "This line runs too\n";

Encoding

I can tell a Pod translator which encoding I used, with UTF-8 (and its subset ASCII) being common:

=encoding utf8

It can be other encodings, but some parsers might be able to recognize some encodings by their byte order marker. If they recognize one encoding but I tell them something else, they might not do the correct thing.

I should but the =encoding at the start of my document and use it only once. The Pod specification says that translators should treat a second =encoding as an error.

Since the modern Pod translators can handle UTF-8, I can use literal characters instead of interior sequences, which I show later in this chapter:

Body Elements

Inside the text of the Pod, interior sequences specify how nonstructural markup that should be displayed as particular typefaces or special characters. Each of these start with a letter, which specifies the type of sequence and has the content in brackets. For instance, in Pod I use the < to specify a literal <. If I want italic text (if the formatter supports that) I use I<...>:

=encoding utf8

=head1

Alberto Simões helped review I<Mastering Perl>.

In HTML, I would write E<lt>iE<gt>Mastering PerlE<lt>iE<gt> to
get italics.

=cut

Translating Pod

I have two ways to turn Pod into some other format: a ready-made translator or write my own. I might even do both at once by modifying something that already exists. If I need to add something extra to the basic Pod format, I’ll have to create something to parse it.

Fortunately, Sean Burke has already done most of the work by creating Pod::Parser, which, as long as I follow the basic ideas, can parse normal Pod as well as my personal extensions to it as long as I extend Pod::Parser with a subclass.

Pod Translators

Perl comes with several Pod translators already. You’ve probably used one without even knowing it; the perldoc command is really a tool to extract the Pod from a document and format it for you. Typically it formats it for your terminal settings, perhaps using color or other character features:

% perldoc Some::Module

That’s not all that perldoc can do, though. Since it’s formatting its output for the terminal window, when I redirect the output to a file it doesn’t look right. The headings, for one thing, come out weird.

<% perldoc CGI  cgi.txt>>
% more cgi.txt
CGI(3)     User Contributed Perl Documentation     CGI(3)

NNAAMMEE
       CGI - Simple Common Gateway Interface Class

Using the -t switch, I can tell perldoc to output plaintext instead of formatting it for the screen.

% perldoc -t CGI > cgi.txt
% more cgi.txt

NAME
    CGI - Simple Common Gateway Interface Class

Stepping back even further, perldoc can decide not to format anything. The -m switch simply outputs the source file (which can be handy if I want to see the source but don’t want to find the file myself. perldoc searches through @INC looking for it. perldoc can do all of this because it’s really just an interface to other Pod translators. The perldoc program is really simple because it’s just a wrapper around Pod::Perldoc, which I can see by using perldoc to look at its own source:

% perldoc -m perldoc
#!/usr/bin/perl
    eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
        if 0;

# This "perldoc" file was generated by "perldoc.PL"

require 5;
BEGIN { $^W = 1 if $ENV{'PERLDOCDEBUG'} }
use Pod::Perldoc;
exit( Pod::Perldoc->run() );

The Pod::Perldoc module is just code to parse the command-line options and dispatch to the right subclass, such as Pod::Perldoc::ToText. What else is there? To find the directory for these translators, I use the -l switch:

% perldoc -l Pod::Perldoc::ToText
/usr/local/lib/perl5/5.8.4/Pod/Perldoc/ToText.pm

% ls /usr/local/lib/perl5/5.8.4/Pod/Perldoc
BaseTo.pm       ToChecker.pm    ToNroff.pm      ToRtf.pm        ToTk.pm
GetOptsOO.pm    ToMan.pm        ToPod.pm        ToText.pm       ToXml.pm

Want all that as a Perl one-liner?

% perldoc -l Pod::Perldoc::ToText | perl -MFile::Basename=dirname \
    -e 'print dirname( < )' | xargs ls >

I could make that a bit shorter on my unix machines since they have a dirname utility already (but it’s not a Perl program):

% perldoc -l Pod::Perldoc::ToText | xargs dirname  | xargs ls

If you don’t have a dirname utility, here’s a quick Perl program that does the same thing, and it looks quite similar to the dirname program in the Perl Power Tools^[7]. It’s something I use often when moving around the Perl library directories:

#!/usr/bin/perl
use File::Basename qw(dirname);
print dirname( $ARGV[0] );

Just from that, I can see that I can translate Pod to nroff (that’s the stuff going to my terminal), to text, RTF, XML, and a bunch of other formats. In a moment I’ll create another one.

perldoc doesn’t have switches to go to all of those formats, but its -o switch can specify a format. Here I want it in XML format, so I use -oxml, and add the -T switch, which just tells perldoc to dump everything to standard output. I could have also used -d to send it to a file.

% perldoc -T -oxml CGI

I don’t have to stick to those formatters, though. I can make my own. I could use my own formatting module with the -M switch to pull in Pod::Perldoc::ToRtf for instance:

% perldoc -MPod::Perldoc::ToRtf CGI

Pod::Perldoc::ToToc

Now I have everything in place to create my own Pod formatter. For this example, I want a table of contents from the Pod input. I can discard everything else, but I want the text from the =head directives, and I want the text to be indented in outline style. I’ll follow the naming sequence of the existing translators and name mine Pod::Perldoc::ToToc. I’ve even put it on CPAN. I actually used this module to help me write this book.

The start of my own translator is really simple. I look at one of the other translators and do what they do until I need to do something differently. This turns out to be really easy because most of the hard work happens somewhere else:

package Pod::Perldoc::ToToc;
use strict;
use parent qw(Pod::Perldoc::BaseTo);

use Pod::TOC;

use warnings;
no warnings;

our $VERSION = '1.10';

sub is_pageable        { 1 }
sub write_with_binmode { 0 }
sub output_extension   { 'toc' }

sub parse_from_file {
    my( $self, $file, $output_fh ) = @_; # Pod::Perldoc object

    my $parser = Pod::TOC->new();

    $parser->output_fh( $output_fh );

    $parser->parse_file( $file );
    }

1;

For my translator I inherit from Pod::Perldoc::BaseTo. This handles almost everything that is important. It connects what I do in parse_from_file to perldoc‘s user interface. When perldoc tries to load my module, it checks for parse_from_file because it will try to call it once it finds the file it will parse. If I don’t have that subroutine, perldoc will move onto the next formatter in its list. That -M switch I used earlier doesn’t tell perldoc which formatter to use; it just adds it to the front of the list of formatters that perldoc will try to use.

In parse_from_file, the first argument is a Pod::Perldoc object. I don’t use that for anything. Instead I create a new parser object from my Pod::TOC module, which I’ll show in the next section. That module inherits from Pod::Simple and most of its interface comes directly from Pod::Simple.

The second argument was the filename I’m parsing, and the third argument is the filehandle, which should get my output. After I create the parser, I set the output destination with $parser->output_fh(). The Pod::Perldoc::BaseTo module expects output on that filehandle and will be looking for it. I shouldn’t simply print to STDOUT, which would bypass the Pod::Perldoc output mechanism, and the module will complain that I didn’t send it any output. Again, I get the benefit of all of the inner workings of the Pod::Perldoc infrastructure. If the user wanted to save the output in a file, that’s where $output_fh points. Once I have that setup, I call $parser->parse_file(), and all the magic happens.

Pod::Simple

I didn’t have to actually parse the Pod in my TOC creator because I use Pod::Simple behind the scenes. It gives me a simple interface that allows me to do things when certain events occur. All of the other details about breaking apart the Pod and determining what those pieces represent happen somewhere else, where I don’t have to deal with them. Here’s the complete source for my Pod::TOC module to extract the table of contents from a Pod file.

package Pod::TOC;
use strict;

use parent qw( Pod::Simple );

our $VERSION = '1.10';

BEGIN {
    my @Head_levels = 0 .. 4;

    my %flags = map { ( "head$_", $_ ) } @Head_levels;

    foreach my $directive ( keys %flags ) {
        no strict 'refs';

        *{"_start_$directive"} = sub {
            $_[0]->_set_flag( "_start_$directive" );
            print { $_[0]->output_fh } "\t" x ( $_[0]->_get_flag - 1 )
            };

        *{"_end_$directive"}   = sub {
            $_[0]->_set_flag( "_end_$directive" );
            print { $_[0]->output_fh } "\n"
            };
        }

    sub _is_valid_tag { exists $flags{ $_[1] } }
    sub _get_tag      {        $flags{ $_[1] } }
    }

sub _handle_element {
    my( $self, $element, $args ) = @_;

    my $caller_sub = ( caller(1) )[3];
    return unless $caller_sub =~ s/.*_(start|end)$/_${1}_$element/;

    my $sub = $self->can( $caller_sub );

    $sub->( $self, $args ) if $sub;
    }

sub _handle_element_start {
    my $self = shift;
    $self->_handle_element( @_ );
    }

sub _handle_element_end {
    my $self = shift;
    $self->_handle_element( @_ );
    }

sub _handle_text {
    return unless $_[0]->_get_flag;

    print { $_[0]->output_fh } $_[1];
    }


{
my $Flag;

sub _get_flag { $Flag }

sub _set_flag {
    my( $self, $caller ) = @_;

    return unless $caller;

    my $on  = $caller =~ m/\A_start_/ ? 1 : 0;
    my $off = $caller =~ m/\A_end_/   ? 1 : 0;

    unless( $on or $off ) { return };

    my( $tag ) = $caller =~ m/\A_.*?_(.*)/g;

    return unless $self->_is_valid_tag( $tag );

    $Flag = do {
           if( $on  ) { $self->_get_tag( $tag ) } # set the flag if we're on
        elsif( $off ) { undef }                   # clear if we're off
        };

    }
}

1;

The Pod::TOC module inherits from Pod::Simple. Most of the action happens when Pod::Simple parses the module. I don’t have a parse_file subroutine that I need for Pod::Perldoc::ToToc because Pod::Simple already has it and I don’t need it to do anything different.

What I need to change, however, is what Pod::Simple will do when it runs into the various bits of Pod. Allison Randal wrote Pod::Simple::Subclassing to show the various ways to subclass the module, and I’m only going to use the easiest one. When Pod::Simple runs into a Pod element, it calls a subroutine named _handle_element_start with the name of the element, and when it finishes processing that element, it calls _handle_element_end in the same way. When it encounters text within an element, it calls _handle_text. Behind the scenes, Pod::Simple figures out how to join all the text so I can handle them as logical units (e.g. a whole paragraph) instead of layout units (e.g. a single line with possibly more lines to come later).

My _handle_element_start and _handle_element_end are just wrappers around _handle_element. I’ll figure out which one it is by looking at caller. In _handle_element, I take the calling subroutine stored in $caller_sub and pick out either start or end. I put that together with the element name, which is in $element. I end up with things such as start_head1 and end_head3 in $caller_sub. I need to show a little more code to see how I handle those subroutines.

When I get the begin or end event, I don’t get the text inside that element, so I have to remember what I’m processing so _handle_text knows what to do. Every time Pod::Simple runs into text, no matter if it’s a =headN directive, a paragraph in the body, or something in an item list, it calls _handle_text. For my table of contents, I only want to output text when it’s from a =head directive. That’s why I have a bit of indirection in _handle_text.

In the foreach loop, I go through the different levels of the =head directive. Inside the outer foreach loop, I want to make two subroutines for every one of those levels: start_head0, end_head0, start_head1, end_head1, and so on. I use a symbolic reference (see Chapter 8) to create the subroutine names dynamically, and assign an anonymous subroutine to the typeglob for that name (see Chapter 9).

Each of those subroutines is simply going to set a flag. When a start_headN subroutine runs, it turns on the flag, and when the end_headN subroutine runs, it turns off the same flag. That all happens in _set_flag, which sets $Flag.

My _handle_text routine looks at $flag to decide what to do. If it’s a true value, it outputs the text, and if it’s false, it doesn’t. This is what I can use to turn off output for all of the text that doesn’t belong to a heading. Additionally, I’ll use $flag to determine the indentation level of my table of contents by putting the =head level in it.

So, in order of execution: when I run into =head1, Pod::Simple calls _handle_element_start. From that, I immediately dispatch to _handle_element, which figures out that it’s the start, and it knows it just encountered a =head1. From that, _handle_element figures out it needs to call start_head1 I dynamically created. start_head1 calls _set_flag( 'start_head1' ), which figures out based on the argument to turn on $Flag. Next, Pod::Simple runs into a bit of text, so it calls _handle_text, which checks _get_flag and gets a true value. It keeps going and prints to the output filehandle. After that, Pod::Simple is done with =head1, so it calls _handle_element_end, which dispatches to _handle_element, which then calls end_head1. When end_head1 runs, it calls _set_flag, which turns off $Flag. This sequence happens every time Pod::Simple encounters =head directives.

Subclassing Pod::Simple

I wrote this book using the Pod format, but one that O’Reilly Media has extended to meet its publishing needs. For instance, O’Reilly added an N directive for footnotes^[8]. Pod::Parser can still handle those, but it needs to know what to do when it finds them.>.

Allison Randal created Pod::PseudoPod as an extension of Pod::Simple. It handles those extra things O’Reilly added, and serves as a much longer example of a subclass. I subclassed her module to create Local::DocBook, which I used to create the XML sources that the O’Reilly Atlas publishing system uses.

Pod in Your Web Server

Andy Lester wrote the Apache::Pod module (based on Apache::Perldoc by Rich Bowen) so he could serve the Perl documentation from his apache web server and read it with his favorite browser. I certainly like this more than paging to a terminal, and I get the benefits of everything the browser gives me, including display styling, search, and links to the modules or URLs the documentation references.

Sean Burke’s Pod::Webserver makes its own web server to translate Pod for the web. It uses Pod::Simple to do its work and should run anywhere that Perl will run. If I don’t want to install Apache I can still have my documentation server.

Testing Pod

Once I’ve written my Pod, I can check it to ensure that I’ve done everything correctly. When other people read my documentation, they shouldn’t get any warnings about formatting, and a Pod error shouldn’t keep them from reading it because the parser gets confused. What good is the documentation if the user can’t even read it?

Checking Pod

Pod::Checker is another sort of Pod translator, although instead of spitting out the Pod text in another format, it watches the Pod and text go by. When it finds something suspicious, it emits warnings. Perl already comes with podchecker, a ready-to-use program similar to perl -c, but for Pod. The program is really just a program version of Pod::Checker, which is just another subclass of Pod::Parser.

% podchecker Module.pm

The podchecker program is good for manual use, and I guess that somebody might want to use it in a shell script, but I can also check errors directly through Pod::Simple. While parsing the input, Pod::Simple keeps track of the errors it encounters. I can look at these errors later:

*** WARNING: preceding non-item paragraph(s) at line 47 in file test.pod
*** WARNING: No argument for =item at line 153 in file test.pod
*** WARNING: previous =item has no contents at line 255 in file test.pod
*** ERROR: =over on line 23 without closing =back (at head2) at line 255 in file test.pod
*** ERROR: empty =head2 at line 283 in file test.pod
Module.pm has 2 pod syntax errors.

A long time ago, I wanted to do this automatically for all of my modules, so I created Test::Pod. It’s been almost completely redone by Andy Lester, who now maintains the module. I can drop a t/pod.t file into my test directory:

use Test::More;
eval "use Test::Pod 1.00";
plan skip_all => "Test::Pod 1.00 required for testing POD" if $@;
all_pod_files_ok();

Pod Coverage

After I’ve checked the format of my documentation, I also want to ensure that I’ve actually documented everything. The Pod::Coverage module finds all of the functions in a package and tries to match those to the Pod it finds. After skipping any special function names and excluding the function names that start with an underscore, Perl convention for indicating private methods, it complains about anything left undocumented.

The easiest invocation is directly from the command line. For instance, I use the -M switch to load the CGI module. I also use the -M switch to load Pod::Coverage but I tack on the =CGI to tell it which package to check. Finally, since I don’t really want to run any program, I use -e 1 to give perl a dummy program:

% perl -MCGI -MPod::Coverage=CGI -e 1

The output gives the CGI module a rating, then lists all of the functions for which it didn’t see any documentation:

CGI has a Pod::Coverage rating of 0.04
The following are uncovered: add_parameter, all_parameters, binmode, can,
cgi_error, compile, element_id, element_tab, end_form, endform, expand_tags,
init, initialize_globals, new, param, parse_params, print, put, r,
save_request, self_or_CGI, self_or_default, to_filehandle, upload_hook

I can write my own program, which I’ll call podcoverage, to go through all of the packages I specify on the command line. That rating comes from the coverage method, which either returns a number between 0 or 1, or undef if it couldn’t rate the module:

#!/usr/bin/perl

use Pod::Coverage;

foreach my $package ( @ARGV ) {
    my $checker = Pod::Coverage->new(
        package => $package
        );

    my $rating = $checker->coverage;

    if( $rating == 1 ) {
        print "$package gets a perfect score!\n\n";
        }
    elsif( defined $rating ) {
        print "$package gets a rating of ", $checker->coverage, "\n",
            "Uncovered functions:\n\t",
            join( "\n\t", sort $checker->uncovered ),
            "\n\n";
        }
    else {
        print "$package can't be rated: ", $checker->why_unrated, "\n";
        }
    }

When I use this to test Module::NotThere and HTML::Parser, my program tells me that it can’t rate the first because it can’t find any Pod and it finds a couple of undocumented functions in HTML::Parser.

% podcoverage Module::NotThere HTML::Parser
Module::NotThere can't be rated: couldn't find pod
HTML::Parser gets a rating of 0.925925925925926
Uncovered functions:
    init
    netscape_buggy_comment

My podcoverage program really isn’t all that useful, though. It might help me find hidden functions in modules, but I don’t really want to depend on those since they might disappear in later versions. I can use podcoverage to check my own modules to ensure I’ve explained all of my functions, but that would be tedious.

Fortunately, Andy Lester automated the process with Test::Pod::Coverage, which is based on Pod::Checker. By creating a test file that I drop into the t directory of my module distribution, I automatically test the Pod coverage each time I run make test. I lift this snippet right out of the documentation. It first tests for the presence of Test::Pod::Coverage before it tries anything, making the whole thing optional for the user who doesn’t have that module installed, just like the Test::Pod module.

use Test::More;
eval "use Test::Pod::Coverage 1.00";
plan skip_all => "Test::Pod::Coverage 1.00 required for testing POD coverage" if $@;
all_pod_coverage_ok();

Hiding and Ignoring Functions

I mentioned earlier that I could hide functions from these Pod checks. Perl doesn’t have a way to distinguish between public functions that I should document and other people should use, and private functions that I don’t intend users to see. The Pod coverage tests just sees functions.

That’s not the whole story, though. Inside Pod::Coverage is the wisdom of which functions it should ignore. For instance, all of the special Tie:: functions (see Chapter 17) are really private functions. By convention, all functions starting with an underscore (e.g. _init) are private functions for internal use only so Pod::Checker ignores them. If I want to create private functions, I put an underscore in front of their names.

I can’t always hide functions, though. Consider my earlier Pod::Perldoc::ToToc subclass. I had to override the parse_from_file method so it would call my own parser. I don’t really want to document that function because it does the same thing as the method in the parent class but with a different formatting module. Most of the time the user doesn’t call it directly, and it really just does the same thing as documentation for parse_from_file in the Pod::Simple superclass. I can tell Pod::Checker to ignore certain names or names that match a regular expression.

my $checker = Pod::Coverage->new(
    package => $package,
    private      => [ qr/\A_/ ],
    also_private => [ qw(init import DESTROY AUTOLOAD) ],
    trustme      => [ qr/\Aget_/ ],
    );

The private key takes a list of regular expressions. It’s intended for the truly private functions. The also_private is just a list of strings for the same thing so I don’t have to write a regular expression when I already know the names. The trustme key is a bit different. I use it to tell Pod::Checker that even though I apparently didn’t document those public functions, I’m not going to. In my example, I used the regular expression qr/\Aget_/. Perhaps I documented a series of functions in a single shot instead of giving them all individual entries. Those might even be something that AUTOLOAD creates. The Test::Pod::Coverage module uses the same interface to ignore functions.

Summary

Pod is the standard Perl documentation format, and I can easily translate it to other formats with the tools that come with Perl. When that’s not enough, I can write my own Pod translator to go to a new format or provide new features for an existing format. When I use Pod to document my software, I also have several tools to check its format and ensure I’ve documented everything.

Prev		Next
Chapter 14. Data Persistence	Home	Chapter 16. Working with Bits