Recycled Perl: An Introduction to Writing Modules

Being able to reuse code is a Good Thing, and Perl makes it easy. It's easy to use other people's code, and it's not too hard to write code so that you (and maybe others) can reuse it. This discussion will concentrate on how to reuse your own code; once you understand that, it will be simple to use other people's. As with most of Perl, there's more than one way to do it; we'll start from the least complex, where that means something like ``least fiddling with existing code'', and work up to full-fledged modules.

This is just an introduction, so be warned that we won't cover all there is to know about modules. See the references section for more. Although many modules are Object-Oriented, you don't need to know Perl OO to write modules. We'll only discuss ``regular'' non-OO modules.

Terminology: I'll refer to the code we want to reuse as the ``source'' (library is a better choice but that already has a meaning in Perl). The code we want to use the source in will be referred to as the ``script.''

To motivate the discussion, imagine this scenario: we're writing a package that will parse lots of different log files. Each type of log file gets its own script, but the scripts will have many parts in common, and so we want to put those common bits in a separate file and reuse them when necessary. For instance, we want to use a common config file format in each script. The sample code implements this function.


Sample Code: Reading Config File

We'll refer to this sample code throughout the discussion. It's large enough that I wouldn't want to cut and paste it into each script! More importantly, I don't want to make changes in each and every script. Note that there are two subroutines and a global variable. You'd use it like this: %config=&read_config('/etc/blah.rc'). Here's the source:

   $Mode = 022;     # Test for writeability; use 066 for read or write

   sub read_config {          # parse file with "var=value" format
     my ($file) = @_;
     my %hash = ();           # left side of = will be key
     if (is_safe($file)) {    # check permissions and ownership
          open(FH, $file) or die "Can't open $file: $!\n";
          while (<FH>) {
               next if /^#/;  # ignore comments
               s/#.*//;       # remove trailing comments
               s/^\s*//;      # remove leading space
               s/\s*$//;      # remove trailing space
               $hash{$1}=$2 if (/(.*)\s*=\s*(.*)/);
          }
          close FH;
     }
   return %hash;              # return this to the caller
   }

   sub is_safe {    # test file for group or other write permissions
     return 1 unless defined $Mode;      # undef $Mode to skip this check
     my ($file) = @_;
     my ($file_mode, $uid) = (stat($file))[2,4]   # stat file
        or die "Can't stat $file: $!\n";
     return 0 if ($file_mode & 0+$Mode);          # bitwise and
     return 0 if ( ($uid != 0) && ($uid != $<) ); # $< is current UID
     return 1;                          # If get here, it's ok
   }


Iteration 1: Rudimentary Modules


Preparing source file

The simplest way to turn some subroutines into a source file is to put that code into its own file, and add 1; as the last line. You could use other means for this, but 1; is traditional. This is necessary because the last executed command in the imported code must return a true value. Thus the last few lines of the above code would read:

     return 0 if ($file_mode & 0+$Mode);
     return 0 if ( ($uid != 0) && ($uid != $<) );
     return 1;
   }

   1;


Naming the source file

What you call your source file depends upon how you will import it into the script. The 'use' directive requires that the file end in .pm -- the 'require' directive doesn't have that restriction. It's safest and most flexible to use the .pm suffix, so we'll call the sample code ``Conf.pm''. The first letter is capitalized because traditionally only pragmas like 'use strict' or 'use vars' are all lower-case.

Save it either in the system Perl directories or in your own private library, like /home/fprice/perllib/.

Realize that your source code must compile first. You can check this with perl -c Conf.pm.


Use or Require

Now that the source file is prepared, we need to modify the script to import it. For this you can use 'use' or 'require' (You can also use 'do', but don't!). We can load /home/fprice/perllib/Conf.pm like this:

   use lib '/home/fprice/perllib'; # Add private library to path
   require Conf;            # Load source at runtime,
   use Conf;                # Or load at compile time

Neither directive will reload a function that has already been loaded. Both look in @INC to find files; the 'use lib' directive prepends a directory to @INC so Perl will look in non-standard places. Sometimes you see a use directive that looks like this:

   use Conf::Read;

In this case, Perl would look in the @INC locations for a directory named Conf, and then try to load a file named Read.pm. If you want to create such a ``hierarchical'' module, see the section on h2xs for an easy way to set it up.

In either case, call the function just as if it was explicitly written in the script:

   %config=&read_config('/etc/blah.rc')>.

The primary difference between require and use is that require loads the source at runtime, while use loads it at compile time. So if your source file doesn't compile or can't be found, require will start running the script and then die; whereas use won't compile at all. For these reasons, it's recommended to use 'use'.

It's important to realize that there is a possibility of variable conflicts with this method. For instance, if you have a variable $Mode in your script, it will be redefined by (or overwrite) the value of $Mode in the source. But this is a quick and dirty way to import a function into a script.


Interlude: package and namespace

To keep variables distinct, Perl has a concept called ``namespace.'' Every global symbol, by default and until you say otherwise, is in the namespace called 'main'. You can change the namespace by issuing the 'package' declaration. A namespace is in effect until another package declaration is encountered, or until the end of the block or file in which package is declared.

For example, switch namespaces to get distinct variables with the same apparent names:

   #!/usr/bin/perl -w

   $blah = 20;          # by default, in namespace main
   package foo;         # switch to namespace foo
   print $blah;         # undefined; no var $blah in namespace foo
   $blah = 100;         # NOT redefining $blah in namespace main
   package main;        # back in namespace main
   print $blah;         # yields 20

But you don't have to explicitly switch namespaces to access a symbol in a different namespace. Use the syntax <namespace>::<variable>, and prefix it with the appropriate notation for the symbol type ($ for scalars, @ for lists, etc). We can call this the Fully Qualified Name (FQN). So for instance:

   $blah = 20;          # created in namespace main
   $foo::blah = 100;    # creates $blah in namespace foo
   print $blah          # prints 20
   print $foo::blah     # FQN; prints 100
   $foo::blah = 200;    # FQN; change value
   print $main::blah    # FQN; print 20, redundant but ok.

Note that lexical variables -- those created with 'my' -- are outside any namespace and thus cannot be referred to with a FQN. Also note that special Perl variables like $_ and $| are visible and modifiable from any namespace; create a localized one with 'local' if you want to.

Finally, realize that packages just provide a grouping mechanism, not enforced privacy! It is quite possible from any namespace to use, create, and modify a variable declared in another package. Packages just make it harder to step on your own toes without realizing it.


Iteration 2: Full modules

Packages form the heart of most Perl modules. The key idea is for each module to declare its own namespace, and then to carefully declare to the script which variables are meant to be used. In turn, the calling script cooperates by only importing recommended variables into its operating namespace. Of course, the module writer needs to document the intended usage of the module!


Declare a package

An obvious modification we can make to Conf.pm is to put its code into a package. Since the file is called Conf.pm, it's stylistically nice to call the namespace ``Conf''. We could use another name -- there's no intrinsic connection between module names and packages -- but that might confuse you as the author, not to mention those who use your module! However, the namespace in the module and the 'use' line in the script must match exactly (including :: if there are any). Here's the change to Conf.pm:

   package Conf;    # start new namespace; scope extends to EOF

   $Mode = 022;     # Test for writeability; use 066 for read or write

   sub read_config { # parse file with "var=value" format

Once we've done this, we must change the way we call read_config() from the calling script. Since it is now in a different package from main::, the Conf package must be explicitly specified, either by switching to the Conf namespace or by using the FQN:

   use lib '/home/fprice/perllibs';     # Add private library to path
   use Conf;                            # Load module at compile time

   $Conf::Mode = 066;                   # Change value of $Conf::Mode

   # Call read_config in Conf package
   %config = &Conf::read_config('/etc/blah.conf');

Now we've minimized the chances of variable conflict between the script and the module.


Selective Importing

Sometimes you may want to import certain symbols from a module into your script's namespace (typically into main). You can give an optional list of symbols to the 'use' directive in your script, and then just access those symbols as if they were defined in the current package. Although this does reintroduce the problem of variable conflict, the fact that you specifically request symbols tends to minimize it.

   use lib '/home/fprice/perllibs';
   use Conf qw($Mode);          # load Conf.pm and import $Mode into
                                # namespace main
   $Mode = 066;                 # $Mode is not FQN, but is from Conf::

In turn, the module must be equipped to export this symbol. The easiest way to do this is to tell the module to inherit from the Exporter module. Add this to Conf.pm:

   package Conf;        # start new namespace; scope extends to EOF
   use Exporter;        # load Exporter module
   @ISA=qw(Exporter);   # Inherit from Exporter
   # Export $Mode and read_config on request
   @EXPORT_OK=qw($Mode read_config);

   $Mode = 022;         # Test for writeability; use 066 for read or write

   sub read_config {    # parse file with "var=value" format

Putting a symbol in @EXPORT_OK means that the script must specifically request for it to be imported. If instead you'd like some symbols to be automatically imported when the module is loaded, use @EXPORT:

   @EXPORT=qw(read_config);   # export by default

And then in the script:

   use Conf;        # loads Conf.pm and imports all symbols in @EXPORT

Note that if you give a list to 'use', thus requesting symbols to be imported, only those specific symbols will be imported no matter what is in @EXPORT. If you put a symbol in @EXPORT, you don't have to also put it in @EXPORT_OK to request it specifically. But don't go overboard with @EXPORT since the importing script might not realize that all those symbols will be imported.

To request that no symbols be imported into your script, give an empty list to 'use'. You can still access these symbols with their FQN.

   use Conf ();          # load Conf.pm but doesn't import ANY symbols

Another symbol you can put in a module that uses Exporter is $VERSION. This represents the version number of the module, and should be something like 1.01 (not 1.0.1). Then the script can specify a minimal version number and fail if it isn't high enough.

   use Conf 2.0 qw( $Mode read_config);    # fail if $VERSION < 2.0


Private Symbols in Modules

To ensure that symbols in your module can't be accessed or imported into a script, mark them as lexical with 'my'. Since lexical variables are never part of any namespace, and since their scope cannot extend past file boundaries, they will be private to that module or subroutine.

For example, we could make $Mode usable only by read_config() by changing Conf.pm like so:

   package Conf;        # start new namespace; scope extends to EOF
   use Exporter;        # load Exporter module
   @ISA=qw(Exporter);   # Inherit from Exporter
   @EXPORT_OK=qw(read_config); # Export read_config() upon request

   my $Mode = 022;      # $Mode is now lexical and can't be exported or
                        # changed from the script.


Exporting Guidelines

If you are creating an object-oriented module, it is recommended that you not export any symbols; i.e., leave @EXPORT and @EXPORT_OK blank. Instead, you should provide access methods for your object attributes.

If you are creating a regular old module -- a set of functions -- it's probably best to put symbols in @EXPORT_OK rather than @EXPORT. Then the module user must explictly import symbols into her namespace, and presumably won't be surprised by variable conflicts.


Module Setup and Cleanup

Sometimes you'd like to execute some code when your module loads; or perhaps when it exits. The setup code is easy: just put regular Perl commands outside any subroutine definitions, and they will be executed when the module is loaded.

For exit code, use an END subroutine. This subroutine will run when the script finishes, or on an error (like a call to die). It is defined just like any other subroutine, except that it must be called 'END' (all upper-case) and you can omit the 'sub' part of the definition. For example:

    END {
        print "Executing module cleanup now ...\n";
        # Do more interesting things here
    }


Module skeletons with h2xs

Perl comes with a script called h2xs which can create a standard module skeleton for you to fill out with your code. To start a new module, run h2xs -XA -n mod_name. This makes stubs for the named module, including Makefile.PL. It is the easiest and safest way to get all the details correct! The '-X' option keeps it from creating C extension stubs, while the '-A' option doesn't use the Autoloader.

To create a skeleton for the Conf module, run h2xs -XA -n Conf, which creates a directory called Conf containing these files:

   Changes      -- revision history
   Conf.pm      -- skeleton for the module
   MANIFEST     -- list of files in the module
   Makefile.PL  -- Perl to make a Makefile
   test.pl      -- skeleton for testing the module

If you want to distribute this module, first edit Conf.pm and fill in the stubs with the working code. If you want people to be able to test the module, put some testing code in test.pl. Once you're done, run these commands from your shell:

   perl Makefile.PL     # creates a Makefile
   make dist            # tars everything up in a nice package.  
                        # Uses the value of $VERSION for naming.


References

Everything that was mentioned here is covered in available documentation. Good references include: