Perl has a default documentation format called Plain Old Documentation, or Pod for short. I can use it directly in my programs, and even between segments of code. Other programs can easily pick out the Pod and translate it into more familiar formats, such as HTML, text, or even PDF. I’ll show some of the most used features of pod, how to test my pod, and how to create my own Pod translator.
Sean Burke, the same person responsible for most of what I’ll cover in this chapter, completely specified the Pod format in the perlpodspec documentation page. This is the gory details version of the specification and how to parse it, which we’ll do in this chapter. The stuff we showed you in Learning Perl and Intermediate Perl are just the basics covered in the higher-level perlpod documentation page.
Pod directives start at the beginning of a line at any point where Perl is expecting a new statement. Each directive starts with an equal sign, =
, at the beginning of a line when Perl is expecting a new statement (so not in the middle of statements). When Perl is trying to parse a new statement but sees that =
, it switches to parsing Pod. Perl continues to parse the Pod until it reaches the =cut
directive or the end of the file:
#!/usr/bin/perl =encoding utf8 =head1 First level heading Here's a line of code that won't execute: print "How'd you see this!?\n"; =over 4 =item First item =item Second item =back =cut print "This line executes\n";
My Pod doesn’t have to show up in one chunk, either, so I can intersperse code and Pod. For instance, I like to put the documentation for my subroutines right next to the subrountine:
#!/usr/bin/perl =encoding utf8 =head1 First level heading Here's a line of code that won't execute: print "How'd you see this!?\n"; =cut =over 4 =item foo =cut sub foo { ... } =item bar =cut sub bar { ... } =back =cut print "This line executes\n";
This makes Pod the easiest way to comment out multiple lines of code:
print "This line runs\n"; =pod open my $fh, '>:utf8', $filename or die ...; print { $fh } Dumper( $big_data_structure ); close $fh =cut print "This line runs too\n";
I can tell a Pod translator which encoding I used, with UTF-8 (and its subset ASCII) being common:
=encoding utf8
It can be other encodings, but some parsers might be able to recognize some encodings by their byte order marker. If they recognize one encoding but I tell them something else, they might not do the correct thing.
I should but the =encoding
at the start of my document and use it only once. The Pod specification says that translators should treat a second =encoding
as an error.
Since the modern Pod translators can handle UTF-8, I can use literal characters instead of interior sequences, which I show later in this chapter:
Inside the text of the Pod, interior sequences specify how nonstructural markup that should be displayed as particular typefaces or special characters. Each of these start with a letter, which specifies the type of sequence and has the content in brackets. For instance, in Pod I use the <
to specify a literal <
. If I want italic text (if the formatter supports that) I use I<...>
:
=encoding utf8 =head1 Alberto Simões helped review I<Mastering Perl>. In HTML, I would write E<lt>iE<gt>Mastering PerlE<lt>iE<gt> to get italics. =cut
I have two ways to turn Pod into some other format: a ready-made translator or write my own. I might even do both at once by modifying something that already exists. If I need to add something extra to the basic Pod format, I’ll have to create something to parse it.
Fortunately, Sean Burke has already done most of the work by creating Pod::Parser
, which, as long as I follow the basic ideas, can parse normal Pod as well as my personal extensions to it as long as I extend Pod::Parser
with a subclass.
Perl comes with several Pod translators already. You’ve probably used one without even knowing it; the perldoc command is really a tool to extract the Pod from a document and format it for you. Typically it formats it for your terminal settings, perhaps using color or other character features:
% perldoc Some::Module
That’s not all that perldoc can do, though. Since it’s formatting its output for the terminal window, when I redirect the output to a file it doesn’t look right. The headings, for one thing, come out weird.
<% perldoc CGI
cgi.txt>>% more cgi.txt
CGI(3) User Contributed Perl Documentation CGI(3) NNAAMMEE CGI - Simple Common Gateway Interface Class
Using the -t
switch, I can tell perldoc to output plaintext instead of formatting it for the screen.
% perldoc -t CGI > cgi.txt % more cgi.txt NAME CGI - Simple Common Gateway Interface Class
Stepping back even further, perldoc can decide not to format anything. The -m
switch simply outputs the source file (which can be handy if I want to see the source but don’t want to find the file myself. perldoc searches through @INC
looking for it. perldoc can do all of this because it’s really just an interface to other Pod translators. The perldoc program is really simple because it’s just a wrapper around Pod::Perldoc
, which I can see by using perldoc to look at its own source:
% perldoc -m perldoc
#!/usr/bin/perl
eval 'exec /usr/local/bin/perl -S $0 ${1+"$@"}'
if 0;
# This "perldoc" file was generated by "perldoc.PL"
require 5;
BEGIN { $^W = 1 if $ENV{'PERLDOCDEBUG'} }
use Pod::Perldoc;
exit( Pod::Perldoc->run() );
The Pod::Perldoc
module is just code to parse the command-line options and dispatch to the right subclass, such as Pod::Perldoc::ToText
. What else is there? To find the directory for these translators, I use the -l
switch:
% perldoc -l Pod::Perldoc::ToText
/usr/local/lib/perl5/5.8.4/Pod/Perldoc/ToText.pm% ls /usr/local/lib/perl5/5.8.4/Pod/Perldoc
BaseTo.pm ToChecker.pm ToNroff.pm ToRtf.pm ToTk.pm GetOptsOO.pm ToMan.pm ToPod.pm ToText.pm ToXml.pm
Want all that as a Perl one-liner?
% perldoc -l Pod::Perldoc::ToText | perl -MFile::Basename=dirname \
-e 'print dirname( <
)' | xargs ls >
I could make that a bit shorter on my unix machines since they have a dirname utility already (but it’s not a Perl program):
% perldoc -l Pod::Perldoc::ToText | xargs dirname | xargs ls
If you don’t have a dirname utility, here’s a quick Perl program that does the same thing, and it looks quite similar to the dirname
program in the Perl Power Tools[7]. It’s something I use often when moving around the Perl library directories:
#!/usr/bin/perl use File::Basename qw(dirname); print dirname( $ARGV[0] );
Just from that, I can see that I can translate Pod to nroff (that’s the stuff going to my terminal), to text, RTF, XML, and a bunch of other formats. In a moment I’ll create another one.
perldoc doesn’t have switches to go to all of those formats, but its -o
switch can specify a format. Here I want it in XML format, so I use -oxml
, and add the -T
switch, which just tells perldoc to dump everything to standard output. I could have also used -d
to send it to a file.
% perldoc -T -oxml CGI
I don’t have to stick to those formatters, though. I can make my own. I could use my own formatting module with the -M
switch to pull in Pod::Perldoc::ToRtf
for instance:
% perldoc -MPod::Perldoc::ToRtf CGI
Now I have everything in place to create my own Pod formatter. For this example, I want a table of contents from the Pod input. I can discard everything else, but I want the text from the =head
directives, and I want the text to be indented in outline style. I’ll follow the naming sequence of the existing translators and name mine Pod::Perldoc::ToToc
. I’ve even put it on CPAN. I actually used this module to help me write this book.
The start of my own translator is really simple. I look at one of the other translators and do what they do until I need to do something differently. This turns out to be really easy because most of the hard work happens somewhere else:
package Pod::Perldoc::ToToc; use strict; use parent qw(Pod::Perldoc::BaseTo); use Pod::TOC; use warnings; no warnings; our $VERSION = '1.10'; sub is_pageable { 1 } sub write_with_binmode { 0 } sub output_extension { 'toc' } sub parse_from_file { my( $self, $file, $output_fh ) = @_; # Pod::Perldoc object my $parser = Pod::TOC->new(); $parser->output_fh( $output_fh ); $parser->parse_file( $file ); } 1;
For my translator I inherit from Pod::Perldoc::BaseTo
. This handles almost everything that is important. It connects what I do in parse_from_file
to perldoc‘s user interface. When perldoc tries to load my module, it checks for parse_from_file
because it will try to call it once it finds the file it will parse. If I don’t have that subroutine, perldoc will move onto the next formatter in its list. That -M
switch I used earlier doesn’t tell perldoc which formatter to use; it just adds it to the front of the list of formatters that perldoc will try to use.
In parse_from_file
, the first argument is a Pod::Perldoc
object. I don’t use that for anything. Instead I create a new parser object from my Pod::TOC
module, which I’ll show in the next section. That module inherits from Pod::Simple
and most of its interface comes directly from Pod::Simple
.
The second argument was the filename I’m parsing, and the third argument is the filehandle, which should get my output. After I create the parser, I set the output destination with $parser->output_fh()
. The Pod::Perldoc::BaseTo
module expects output on that filehandle and will be looking for it. I shouldn’t simply print to STDOUT
, which would bypass the Pod::Perldoc
output mechanism, and the module will complain that I didn’t send it any output. Again, I get the benefit of all of the inner workings of the Pod::Perldoc
infrastructure. If the user wanted to save the output in a file, that’s where $output_fh
points. Once I have that setup, I call $parser->parse_file()
, and all the magic happens.
I didn’t have to actually parse the Pod in my TOC creator because I use Pod::Simple
behind the scenes. It gives me a simple interface that allows me to do things when certain events occur. All of the other details about breaking apart the Pod and determining what those pieces represent happen somewhere else, where I don’t have to deal with them. Here’s the complete source for my Pod::TOC
module to extract the table of contents from a Pod file.
package Pod::TOC; use strict; use parent qw( Pod::Simple ); our $VERSION = '1.10'; BEGIN { my @Head_levels = 0 .. 4; my %flags = map { ( "head$_", $_ ) } @Head_levels; foreach my $directive ( keys %flags ) { no strict 'refs'; *{"_start_$directive"} = sub { $_[0]->_set_flag( "_start_$directive" ); print { $_[0]->output_fh } "\t" x ( $_[0]->_get_flag - 1 ) }; *{"_end_$directive"} = sub { $_[0]->_set_flag( "_end_$directive" ); print { $_[0]->output_fh } "\n" }; } sub _is_valid_tag { exists $flags{ $_[1] } } sub _get_tag { $flags{ $_[1] } } } sub _handle_element { my( $self, $element, $args ) = @_; my $caller_sub = ( caller(1) )[3]; return unless $caller_sub =~ s/.*_(start|end)$/_${1}_$element/; my $sub = $self->can( $caller_sub ); $sub->( $self, $args ) if $sub; } sub _handle_element_start { my $self = shift; $self->_handle_element( @_ ); } sub _handle_element_end { my $self = shift; $self->_handle_element( @_ ); } sub _handle_text { return unless $_[0]->_get_flag; print { $_[0]->output_fh } $_[1]; } { my $Flag; sub _get_flag { $Flag } sub _set_flag { my( $self, $caller ) = @_; return unless $caller; my $on = $caller =~ m/\A_start_/ ? 1 : 0; my $off = $caller =~ m/\A_end_/ ? 1 : 0; unless( $on or $off ) { return }; my( $tag ) = $caller =~ m/\A_.*?_(.*)/g; return unless $self->_is_valid_tag( $tag ); $Flag = do { if( $on ) { $self->_get_tag( $tag ) } # set the flag if we're on elsif( $off ) { undef } # clear if we're off }; } } 1;
The Pod::TOC
module inherits from Pod::Simple
. Most of the action happens when Pod::Simple
parses the module. I don’t have a parse_file
subroutine that I need for Pod::Perldoc::ToToc
because Pod::Simple
already has it and I don’t need it to do anything different.
What I need to change, however, is what Pod::Simple
will do when it runs into the various bits of Pod. Allison Randal wrote Pod::Simple::Subclassing
to show the various ways to subclass the module, and I’m only going to use the easiest one. When Pod::Simple
runs into a Pod element, it calls a subroutine named _handle_element_start
with the name of the element, and when it finishes processing that element, it calls _handle_element_end
in the same way. When it encounters text within an element, it calls _handle_text
. Behind the scenes, Pod::Simple
figures out how to join all the text so I can handle them as logical units (e.g. a whole paragraph) instead of layout units (e.g. a single line with possibly more lines to come later).
My _handle_element_start
and _handle_element_end
are just wrappers around _handle_element
. I’ll figure out which one it is by looking at caller
. In _handle_element
, I take the calling subroutine stored in $caller_sub
and pick out either start
or end
. I put that together with the element name, which is in $element
. I end up with things such as start_head1
and end_head3
in $caller_sub
. I need to show a little more code to see how I handle those subroutines.
When I get the begin or end event, I don’t get the text inside that element, so I have to remember what I’m processing so _handle_text
knows what to do. Every time Pod::Simple
runs into text, no matter if it’s a =headN
directive, a paragraph in the body, or something in an item list, it calls _handle_text
. For my table of contents, I only want to output text when it’s from a =head
directive. That’s why I have a bit of indirection in _handle_text
.
In the foreach
loop, I go through the different levels of the =head
directive. Inside the outer foreach loop, I want to make two subroutines for every one of those levels: start_head0
, end_head0
, start_head1
, end_head1
, and so on. I use a symbolic reference (see Chapter 8) to create the subroutine names dynamically, and assign an anonymous subroutine to the typeglob for that name (see Chapter 9).
Each of those subroutines is simply going to set a flag. When a start_headN
subroutine runs, it turns on the flag, and when the end_headN
subroutine runs, it turns off the same flag. That all happens in _set_flag
, which sets $Flag
.
My _handle_text
routine looks at $flag
to decide what to do. If it’s a true value, it outputs the text, and if it’s false, it doesn’t. This is what I can use to turn off output for all of the text that doesn’t belong to a heading. Additionally, I’ll use $flag
to determine the indentation level of my table of contents by putting the =head
level in it.
So, in order of execution: when I run into =head1
, Pod::Simple
calls _handle_element_start
. From that, I immediately dispatch to _handle_element
, which figures out that it’s the start, and it knows it just encountered a =head1
. From that, _handle_element
figures out it needs to call start_head1
I dynamically created. start_head1
calls _set_flag( 'start_head1' )
, which figures out based on the argument to turn on $Flag
. Next, Pod::Simple
runs into a bit of text, so it calls _handle_text
, which checks _get_flag
and gets a true value. It keeps going and prints to the output filehandle. After that, Pod::Simple
is done with =head1
, so it calls _handle_element_end
, which dispatches to _handle_element
, which then calls end_head1
. When end_head1
runs, it calls _set_flag
, which turns off $Flag
. This sequence happens every time Pod::Simple
encounters =head
directives.
I wrote this book using the Pod format, but one that O’Reilly Media has extended to meet its publishing needs. For instance, O’Reilly added an N
directive for footnotes[8]. Pod::Parser
can still handle those, but it needs to know what to do when it finds them.>.
Allison Randal created Pod::PseudoPod
as an extension of Pod::Simple
. It handles those extra things O’Reilly added, and serves as a much longer example of a subclass. I subclassed her module to create Local::DocBook
, which I used to create the XML sources that the O’Reilly Atlas publishing system uses.
Andy Lester wrote the Apache::Pod
module (based on Apache::Perldoc
by Rich Bowen) so he could serve the Perl documentation from his apache web server and read it with his favorite browser. I certainly like this more than paging to a terminal, and I get the benefits of everything the browser gives me, including display styling, search, and links to the modules or URLs the documentation references.
Sean Burke’s Pod::Webserver
makes its own web server to translate Pod for the web. It uses Pod::Simple
to do its work and should run anywhere that Perl will run. If I don’t want to install Apache I can still have my documentation server.
Once I’ve written my Pod, I can check it to ensure that I’ve done everything correctly. When other people read my documentation, they shouldn’t get any warnings about formatting, and a Pod error shouldn’t keep them from reading it because the parser gets confused. What good is the documentation if the user can’t even read it?
Pod::Checker
is another sort of Pod translator, although instead of spitting out the Pod text in another format, it watches the Pod and text go by. When it finds something suspicious, it emits warnings. Perl already comes with podchecker, a ready-to-use program similar to perl -c
, but for Pod. The program is really just a program version of Pod::Checker
, which is just another subclass of Pod::Parser
.
% podchecker Module.pm
The podchecker program is good for manual use, and I guess that somebody might want to use it in a shell script, but I can also check errors directly through Pod::Simple
. While parsing the input, Pod::Simple
keeps track of the errors it encounters. I can look at these errors later:
*** WARNING: preceding non-item paragraph(s) at line 47 in file test.pod *** WARNING: No argument for =item at line 153 in file test.pod *** WARNING: previous =item has no contents at line 255 in file test.pod *** ERROR: =over on line 23 without closing =back (at head2) at line 255 in file test.pod *** ERROR: empty =head2 at line 283 in file test.pod Module.pm has 2 pod syntax errors.
A long time ago, I wanted to do this automatically for all of my modules, so I created Test::Pod
. It’s been almost completely redone by Andy Lester, who now maintains the module. I can drop a t/pod.t
file into my test directory:
use Test::More; eval "use Test::Pod 1.00"; plan skip_all => "Test::Pod 1.00 required for testing POD" if $@; all_pod_files_ok();
After I’ve checked the format of my documentation, I also want to ensure that I’ve actually documented everything. The Pod::Coverage
module finds all of the functions in a package and tries to match those to the Pod it finds. After skipping any special function names and excluding the function names that start with an underscore, Perl convention for indicating private methods, it complains about anything left undocumented.
The easiest invocation is directly from the command line. For instance, I use the -M
switch to load the CGI
module. I also use the -M
switch to load Pod::Coverage
but I tack on the =CGI
to tell it which package to check. Finally, since I don’t really want to run any program, I use -e 1
to give perl a dummy program:
% perl -MCGI -MPod::Coverage=CGI -e 1
The output gives the CGI
module a rating, then lists all of the functions for which it didn’t see any documentation:
CGI has a Pod::Coverage rating of 0.04 The following are uncovered: add_parameter, all_parameters, binmode, can, cgi_error, compile, element_id, element_tab, end_form, endform, expand_tags, init, initialize_globals, new, param, parse_params, print, put, r, save_request, self_or_CGI, self_or_default, to_filehandle, upload_hook
I can write my own program, which I’ll call podcoverage, to go through all of the packages I specify on the command line. That rating comes from the coverage
method, which either returns a number between 0 or 1, or undef
if it couldn’t rate the module:
#!/usr/bin/perl use Pod::Coverage; foreach my $package ( @ARGV ) { my $checker = Pod::Coverage->new( package => $package ); my $rating = $checker->coverage; if( $rating == 1 ) { print "$package gets a perfect score!\n\n"; } elsif( defined $rating ) { print "$package gets a rating of ", $checker->coverage, "\n", "Uncovered functions:\n\t", join( "\n\t", sort $checker->uncovered ), "\n\n"; } else { print "$package can't be rated: ", $checker->why_unrated, "\n"; } }
When I use this to test Module::NotThere
and HTML::Parser
, my program tells me that it can’t rate the first because it can’t find any Pod and it finds a couple of undocumented functions in HTML::Parser
.
% podcoverage Module::NotThere HTML::Parser
Module::NotThere can't be rated: couldn't find pod
HTML::Parser gets a rating of 0.925925925925926
Uncovered functions:
init
netscape_buggy_comment
My podcoverage program really isn’t all that useful, though. It might help me find hidden functions in modules, but I don’t really want to depend on those since they might disappear in later versions. I can use podcoverage to check my own modules to ensure I’ve explained all of my functions, but that would be tedious.
Fortunately, Andy Lester automated the process with Test::Pod::Coverage
, which is based on Pod::Checker
. By creating a test file that I drop into the t
directory of my module distribution, I automatically test the Pod coverage each time I run make test
. I lift this snippet right out of the documentation. It first tests for the presence of Test::Pod::Coverage
before it tries anything, making the whole thing optional for the user who doesn’t have that module installed, just like the Test::Pod
module.
use Test::More; eval "use Test::Pod::Coverage 1.00"; plan skip_all => "Test::Pod::Coverage 1.00 required for testing POD coverage" if $@; all_pod_coverage_ok();
I mentioned earlier that I could hide functions from these Pod checks. Perl doesn’t have a way to distinguish between public functions that I should document and other people should use, and private functions that I don’t intend users to see. The Pod coverage tests just sees functions.
That’s not the whole story, though. Inside Pod::Coverage
is the wisdom of which functions it should ignore. For instance, all of the special Tie::
functions (see Chapter 17) are really private functions. By convention, all functions starting with an underscore (e.g. _init
) are private functions for internal use only so Pod::Checker
ignores them. If I want to create private functions, I put an underscore in front of their names.
I can’t always hide functions, though. Consider my earlier Pod::Perldoc::ToToc
subclass. I had to override the parse_from_file
method so it would call my own parser. I don’t really want to document that function because it does the same thing as the method in the parent class but with a different formatting module. Most of the time the user doesn’t call it directly, and it really just does the same thing as documentation for parse_from_file
in the Pod::Simple
superclass. I can tell Pod::Checker
to ignore certain names or names that match a regular expression.
my $checker = Pod::Coverage->new( package => $package, private => [ qr/\A_/ ], also_private => [ qw(init import DESTROY AUTOLOAD) ], trustme => [ qr/\Aget_/ ], );
The private
key takes a list of regular expressions. It’s intended for the truly private functions. The also_private
is just a list of strings for the same thing so I don’t have to write a regular expression when I already know the names. The trustme
key is a bit different. I use it to tell Pod::Checker
that even though I apparently didn’t document those public functions, I’m not going to. In my example, I used the regular expression qr/\Aget_/
. Perhaps I documented a series of functions in a single shot instead of giving them all individual entries. Those might even be something that AUTOLOAD creates. The Test::Pod::Coverage
module uses the same interface to ignore functions.
Pod is the standard Perl documentation format, and I can easily translate it to other formats with the tools that come with Perl. When that’s not enough, I can write my own Pod translator to go to a new format or provide new features for an existing format. When I use Pod to document my software, I also have several tools to check its format and ensure I’ve documented everything.
The perlpod documentation outlines the basic Pod format, and the perlpodspec documentation gets into the gory implementation details.
Allison Randal shows other ways to subclass Pod::Simple
in Pod::Simple::Subclassing
.
Pod::Webserver
shows up as Hack #3 in Perl Hacks.
I wrote about subclassing Pod::Simple
to output HTML in “Playing with Pod” for The Perl Journal, December 2005 http://www.ddj.com/dept/lightlang/184416231.
I wrote about Test::Pod
in “Better Documentation Through Testing” for The Perl Journal, November 2002.