This paper is split into two sections, local and universal modifiers.
key = val
into a hash. To make it a bit more realistic, we will also allow a
hash mark (#) to mark the beginning of a comment.
Yes, I could use require but that could cause an entire script to fail due to an improperly formatted file. This way, I can allow my users to edit configuration files in a way more natural to their thinking.
So to start, my first pass at doing something like this would look something
like this:
There are a number of commonly used local modifiers:
There are a variety of not so commonly used local modifiers that I have had
occasion to use. One of my favourites is a conditional regex. The template
looks like:
Using what we have learned so far, we could write our file parser like:
The most commonly used universal modifiers are:
open FILE, "/some/path/to/some/file" or die "$!";
while ( <FILE> ) {
next if ( /^#/ ); # Skip comment lines
s/#.*$//; # Remove any trailing comments
( $key, @val ) = split /=/; # Split the pair
$key =~ s/^\s+//; # Strip leading
$key =~ s/\s+$//; # and trailing spaces
$val = join( ' ', @val );
$val =~ s/^\s+//; # Strip leading but leave trailing
$hash{$key} = $val; # Save the value in a hash.
}
close( FILE );
In way of explanation, the reason I use an array in the second half of the
split is in case the config file contains something like:
key = (foo = bar)
Local modifiers
I use the term local to indicate something that appears within the
regex itself and modifies the behaviour of a small portion of the regex. To
reiterate, there a lot of local modifiers I don't use out like look-aheads and
look-behinds so I will not spend any time on them.
(?(condition)yes-pattern)
Confusing, but consider trying to match
telephone numbers where the area code may or may not be in (). If it is, you
need to make the sure parentheses balance. We could use two regex - one to
would for parentheses and the other wouldn't. Or, we could say this:
The regex would be:
m#(\()? [ 0 or 1 open paren, saved ]
\d\d\d [ three digits ]
(? (1)\) [ if $1 is set, look for a closing paren ]
)
Which, IMHO, is much more the Perl way.
open FILE, "/path/to/some/file" or die "$!";
while ( <FILE> ) {
next if ( /^#/ );
$hash{$1} = $2 if ( /^\s*(.+?)\s*=\s*([^#\n]+)$/ );
}
close( FILE ); # 9 seconds to parese same file
This will ignore incorrectly formatted lines but it will not remove whitespace
from the end of the line. But notice we have reduced the number of lines it
took and our Perl is begining to look like line noise.
Universal modifiers
A universal modifier exists outside of the regex and affects the
behaviour of the entire regex. There are fewer of these than the local
modifiers and I have personally used them more frequently. The only one I
will not discuss is the 'c' modifier. Read your man
pages for an explanation.
There are three less commonly modifiers:
Finally, I will cover one rarely used modifier, o.
o - whenever you use varaibles in a regex, perl will recompile the regex
everytime it is used. If the variable will not change in the lifetime of
the program ( ie, the value is set after you have parsed an options file
:) this can be very expensive. To avoid this, the o flag will cause the
regex to be compiled only once.
Using this little bit more knowledge, we can rewrite our file parser in a very
simple and elegant fashion. To make this a little more understandable, I am
clearing the input record seperator ( IRS ) so the entire file will be read
into one string.
$/ = ""; # unset IRS
open FILE, "/some/path/to/some/file" or die "$!";
$line = <FILE>;
close( FILE );
%hash = ( $line =~ /^\s*(.+?)\s*=\s*([^#]+?)$/mg );
Notes
Since there seems to be a great deal of emphasis in the perl community on
bench marks as a measure of correctness ( if it runs fast, it must be good ) I
decided to run my three versions of the parser through the Benchmark module.
To get reliable numbers, I ran 10,000 iterations of each method over a 13 line
file. The result, in order of appearance were:
So, the final example is not only elegant perl but it is fast perl as well.
References
If you sysadmin has installed them, the online man pages for perl are an
excellent if overwhelming resource. Everything I have discussed can be found
in either perlre or perlop.
If you haven't purchased it already, I cannot recommned O'Reilly's Mastering Regular Expressions, by Jeffery Friedl, strongly enough. It is an excellent overview of Regular Expressions in general and the Perl regex engine in specific.
[ Lexington.pm ]