"Linux Gazette...making Linux just a little more fun!"

Learning Perl, part 4

By Ben Okopnik

The Internet Revolution was founded on open systems; an open system is one whose software you can look at, a box you can unwrap and play with. It's not about secret binaries or crippleware or brother-can-you-spare-a-dime shareware. If everyone always had hidden software, you wouldn't have 1/100th the useful software you have right now.
And you wouldn't have Perl.
-- Tom Christiansen

Overview

If you have been following this series, you now have a few tools - perhaps you've even experimented with them - which can be used to build scripts. So, this month we're going to take a look at actually building some, particularly by using the "open" function which allows us to assign filehandles to files, sockets, and pipes. "open" is a major building block in using Perl, so we'll give it a good long look.

Excercises

Last time, I mentioned writing a few scripts for practice. Let's take a look at a few possible ways to do that.

The first one was a script that would take a number as input, and print "Hello!" that many times. It would also test the input for illegal (non-numeric) characters. Here is a good example, sent in by David Zhuwao:

#! /usr/bin/perl -w
#@author David Zhuwao
#@since Apr/19/'01
print "Enter number of times to loop: ";
#get input and assign it to a variable.
chomp ($input = <>);
# check the input for non-numeric characters.
if ($input !~ m/\D/ && length($input) > 0) {
    for ($i = 0; $i < $input; $i++) {
        print "Hello!\n";
    }
} else {
    print "Non-numeric input.\n";
}

First, to point out good coding practices: David has used the "-w" switch so that Perl will warn him if there are any compile-time warnings - an excellent habit. He has also used whitespace (blank lines and tabs) effectively to make the code easy to read, as well as commenting it liberally. Also, rather than checking for the presence of a number (which would create a problem with input like "1A"), he is testing for non-numerical characters and a length greater than zero - good thinking!

Minor points (note that none of these are problems as such, simply observations): in using the match operator, "m//", the "m" is unnecessary unless the delimiter is something other than "/". As well, the Perl "for/foreach" loop would be more compact than the C-like "for" loop, while still fulfilling the function:

print "Hello!\n" for 1 .. $input;

It would also render "$i" unnnecessary. Other than those minor nits - well done, David!

Here's another way:

#!/usr/bin/perl -w
print "Please enter a number: ";
chomp ( $a = <> );
print "Hello!\n" x $a if $a =~ /^\d+$/;

Unlike David's version, mine does not print a failure message; it simply returns you to the command prompt if the input is not numeric. Also, instead of testing for non-numerical characters, I'm testing the string from its beginning to its end for only numerical content. Either of these techniques will work fine. Also, instead of using an explicit loop, I'm using Perl's "x" operator, which will simply repeat the preceding print instruction "$a" times.

...And, One More Time...

Let's break down another one, the second suggestion from last month: a script that takes an hour (0-23) as input and says "Good morning", "Dobriy den'", "Guten Abend", or "Buenas noches" as a result (I'll cheat here and use all English to avoid confusion.)

#!/usr/bin/perl -w
$_ = <>;
if    ( /^[0-6]$/          )   { print "Good night\n";     }
elsif ( /^[7-9]$|^1[0-2]$/ )   { print "Good morning\n";   }
elsif ( /^1[3-8]$/         )   { print "Good day\n";       }
elsif ( /^19$|^2[0-3]$/    )   { print "Good evening\n";   }
else                           { print "Invalid input!\n"; }

On the surface, this script seems pretty basic - and, really, it is - but it contains a few hidden considerations that I'd like to mention. First, why do we need the "beginning of line" and "end of line" tests for everything? Obviously, we want to avoid confusing "1" and "12" - but what could go wrong with /1[3-8]/?

What could go wrong is a mis-type. Not that it matters too much in this case, but being paranoid about your tests is a good idea in general. :) What happens if a user, while trying to type "14", typed "114"? Without those "limits", it would match "11" - and we'd get a wrong answer.

OK - why didn't I use numeric tests instead of matching? I mean, after all, we're just dealing with numbers... wouldn't it be easier and more obvious? Yes, but. What happens if we do a numeric test and the user types in "joe"? We'd get an error along with our "Invalid input!":

Argument "joe\n" isn't numeric in gt at -e line 5, <> chunk 1.

As a matter of good coding practice, we want the user to see only the output that we generate (or expect); there should not be any errors caused by the program itself. A regex match isn't going to be "surprised" by non-digit input; it will simply return a 0 (no match) and pass on to the next "elsif" or "else", which is the "catchall" clause. Anything that does not match one of the first four tests is invalid input - and that's what we want reported.

Handling Files

An important capability in any language is that of dealing with files. In Perl, this is relatively easy, but there are a couple of places where you need to be careful.

# The right way
open FILE, "/etc/passwd" or die "Can't open /etc/password: $!\n";

Here are some wrong or questionable ways to do this:

# Doesn't test for the return result
open FILE, "/etc/passwd";
# Ignores the error returned by the shell via the '$!' variable
open FILE, "/etc/passwd" or die "Can't open /etc/password\n";
# Uses "logical or" to test - can be a problem due to precedence issues
open FILE, "/etc/passwd" || die "Can't open /etc/password: $!\n";

By default, files are open for reading. Other methods are specified by adding a rather obvious "modifier" to the specified filename:

# Open for writing - anything written will overwrite file contents
open FILE, ">/etc/passwd" or die "Can't open /etc/password: $!\n";
# Open for appending - data will be added to the end of the file
open FILE, ">>/etc/passwd" or die "Can't open /etc/password: $!\n";
# Open for reading and writing
open FILE, "+>/etc/passwd" or die "Can't open /etc/password: $!\n";
# Open for reading and appending
open FILE, "+>>/etc/passwd" or die "Can't open /etc/password: $!\n";
Having created the filehandle ("FILE", in the above case), you can now use it in the following manner:
while ( <FILE> ) {
print; # This will loop through the file and print every line
}

Or you can do it this way, if you just want to print out the contents in one shot:

print ;

Writing to the file is just as easy:

print FILE "This line will be written to the file.\n";

Remember that the default open method is "read". I usually like to emphasize this by writing the statement this way:

open FILE, "</etc/passwd" or die "Can't open /etc/password: $!\n";

Note the "<" sign in front of the filename: Perl has no problem with this, and it makes a good visual reminder. The phrase "leaving breadcrumbs" describes this methodology, and has to do with the idea of making what you write as obvious as possible to anyone who may follow. Don't forget that the person "following" might be you, a couple of years after you've written the code...

Perl automatically closes filehandles when the script exits... or, at least, is supposed to. From what I've been told, some OSs have a problem with this - so, it's not a bad idea (though not a necessity) to perform an explicit "close" operation on open filehandles:

close FILE or die "Can't close FILE: $!\n";

By the way, the effect of the "die" function should be relatively obvious: it prints the specified string and exits the program.

Don't do this, unless you're at the last line of your script:

close;

This closes all filehandles... including STDIN, STDOUT, and STDERR (the standard streams), which leaves your program dumb, deaf, and blind. Also, you cannot specify multiple handles in one close, so you do indeed have to close them one at a time:

close Fh1 or die "Can't close Fh1: $!\n";
close Fh2 or die "Can't close Fh2: $!\n";
close Fh3 or die "Can't close Fh3: $!\n";
close Fh4 or die "Can't close Fh4: $!\n";

You could, of course, do this:

for ( qw/Fh1 Fh2 Fh3 Fh4/ ) { close $_ or die "Can't close $_: $!\n"; }

That's Perl for you; There's More Than One Way To Do It...

Using Those Handles

Let's say that you have two files with some financial data - loan rates in one, the type and amount of your loans in the other - and you want to calculate how much interest you'll be paying, and write the result out to a file. Here is the data:

rates.txt

House    9%
Car     16%
Boat    19%
Misc    21%

loans.txt

Chevy   CAR     8000
BMW     car     22000
Scarab BOAT    150000
Pearson boat    8000
Piano   Misc    4000

All right, let's make this happen:

#!/usr/bin/perl -w
open Rates, "<rates.txt" or die "Can't open rates.txt: $!\n";
open Loans, "<loans.txt" or die "Can't open loans.txt: $!\n";
open Total, ">total.txt" or die "Can't open total.txt: $!\n";
while ( <Rates> ) {
    # Get rid of the '%' signs
    tr/%//d;
    # Split each line into an array
    @rates = split;
    # Create hash with loan types as keys and percentages as values
    $r{lc $rates[0]} = $rates[1] / 100;
}
while ( <Loans> ) {
    # Split the line into an array
    @loans = split;
    # Print the loan and the amount of interest to the "Total" handle;
    # calculate by multiplying the total amount by the value returned
    # by the hash key.
    print Total "$loans[0]\t\t\$", $loans[2] * $r{lc $loans[1]}, "\n";
}
# Close the filehandles - not a necessity, but can't hurt
for ( qw/Rates Loans Total/ ) {
    close $_ or die "Can't close $_: $!\n";
}

Rather obviously, Perl is very good at this kind of thing: we've done the job in a dozen lines of code. The comments took up most of the space. :)

Here's another example, one that came about as a result of one of my article about procmail ("No More Spam!" in LG#62). The original "blacklist" script that was invoked from Mutt pulled out the spammer's e-mail address via "formail", then parsed the result down to the actual "user@host" address with a one-line Perl script. It took the entire spam mail as piped input. Martin Bock, however, suggested doing the whole thing with Perl; after exchanging a bit of e-mail with him, I came up with the following script based on his idea:

#!/usr/bin/perl -wln
# The '-n' switch makes the script read the input one line at a time--
# the entire script is executed for each line;
# the '-l' enables line processing, which appends carriage returns to
# the lines that are printed out.
# If the line matches the expression, then...
if ( s/^From: .*?(\w\S+@\S+\w).*/$1/ ) {
    # Open the "blacklist" with the "OUT" filehandle in append mode
    open OUT, ">>$ENV{HOME}/.mutt/blacklist" or die "Aargh: $!\n";
    # Print $_ to that filehandle
    print OUT;
    # Close
    close OUT or die "Aargh: $!\n";
    # Exit the loop
    last;
}

The substitution operator in the first line is not perfect - I can write some rather twisted e-mail addresses which it would not parse correctly - but it works well with variations like

one-two@three-four.net
<one-two@three-four.net>
joe.blow.from.whatever@whoever.that-might-be.com (Joe Blow)
Joe Blow <joe.blow.from.whatever@whoever.that-might-be.com>
[ The artist formerly known as squiggle ] <prince@loco.net>
(Joe) joe-blow.wild@hell.and.gone.com ["Wildman"]

To "decode" what the regular expression in it says, consult the "perlre" manpage. It's not that complex. Hint: look for the word "greed" to understand that ".*?", and look for the word "capture" to understand the "(...) / $1" construct. Both of them are very important concepts, and both have been mentioned in this series.

Here's a somewhat more compact (and that much less readable) version of the above; note that the mechanism here is somewhat different:

#!/usr/bin/perl -wln
BEGIN { open OUT, ">>$ENV{HOME}/.mutt/blacklist" or die "Aargh: $!\n"; }
if ( s/^From: .*?(\w\S+@\S+\w).*/$1/ ) { print OUT; close OUT; last; }

The BEGIN block on the first line of the script runs only once during execution, despite the fact that the script loops multiple times; it's very similar to the same construct in Awk.

Next Time

Next month, we'll be looking at a few nifty ways to save ourselves work by using modules: useful code that other people have written from the Comprehensive Perl Archive Network (CPAN). We'll also take a look at how Perl can be used to implement CGI, the Common Gateway Interface - the mechanisms that "hew the wood and draw the water" behind the scenes of the Web. Until then, here are a few things to play with:

Write a script that opens "/etc/services" and counts how many ports are listed as supporting UDP operation, and how many support TCP. Write the service names into files called "udp.txt" and "tcp.txt", and print the totals to the screen.

Open two files and exchange their contents.

Read "/var/log/messages" and print out any line that contains the word "fail", "terminated/terminating", or " no " in it. Make it
case-insensitive.

Until then -

perl -we 'print "See you next month!"'

Ben Okopnik
perl -we'print reverse split//,"rekcah lreP rehtona tsuJ"'
References:

Relevant Perl man pages (available on any pro-Perl-y configured
system):

perl      - overview              perlfaq   - Perl FAQ
perltoc   - doc TOC               perldata - data structures
perlsyn   - syntax                perlop    - operators/precedence
perlrun   - execution             perlfunc - builtin functions
perltrap - traps for the unwary perlstyle - style guide

"perldoc", "perldoc -q" and "perldoc -f"

Ben Okopnik

A cyberjack-of-all-trades, Ben wanders the world in his 38' sailboat, building networks and hacking on hardware and software whenever he runs out of cruising money. He's been playing and working with computers since the Elder Days (anybody remember the Elf II?), and isn't about to stop any time soon.

"Linux Gazette...making Linux just a little more fun!"

Learning Perl, part 4

By Ben Okopnik

Ben Okopnik

Copyright © 2001, Ben Okopnik. Copying license http://www.linuxgazette.net/copying.html Published in Issue 67 of Linux Gazette, June 2001

Copyright © 2001, Ben Okopnik.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 67 of Linux Gazette, June 2001