Using the HTML::Template module

Recently, I needed to generate a Web page - the Linux Gazette's "Mirrors and Translations" page, actually - based on the contents of a database. Perl is famous for its ability to connect to almost any database via a common interface, given its DBD::DBI module kit; however, the challenge in this case came from the front end, the HTML generation. Sure, I could use the CGI module to output whatever I needed - but in this case, I already had the static page that I wanted to create, and saw no reason to rewrite all the static content in CGI. Also, the final product was not to be a CGI file but a generated HTML page. In fact, everything in this case hinted at templating, a process in which I would use the static HTML with a few special tags and a script which would then apply processing based on those tags. This made especially good sense since it drew a clean separating line between writing HTML and creating code, very different tasks and ones for which I have different mental states (layout designer vs. programmer.)

As with anything in Perl, TMTOWTDI - there was a number of modules available on CPAN (the Comprehensive Perl Archive Network) that could do the job. However, I had used the HTML::Template module in the past, and the job wasn't particularly complicated (although HTML::Template can handle some very complex jobs indeed), so that's what I settled on. My first task was to hunt through the HTML, removing the dozens of repetitive stanzas and replacing them with the appropriate tag framework that the module would utilize later. We had also made the decision not to display the maintainers email addresses, even in the munged form that I use to deter spammers; those of you who use our mirrors and want to thank these fine folks for making LG available should be able to find an address link on the mirror site without much trouble.

Fragment of the old page (there were several dozen entries like this):


...
<A name="AU"></A>
<DT><B><font color="maroon">AUSTRALIA (AU)</font></B></DT>
<DD>
<STRONG><FONT COLOR="green"><TT>[WWW]</TT></FONT></STRONG>
<A HREF="http://www.localnet.com.au/lg/index.html">http://www.localnet.com.au/lg/index.html</A>
<BR>
<SMALL>
Maintainer: Jim McGregor &lt;<A HREF="mailto:nospam@here.please">nospam@here.please</A>&gt; &nbsp;
</SMALL>
<P>
</DD>

<DD>
<STRONG><FONT COLOR="green"><TT>[WWW]</TT></FONT></STRONG>
<A HREF="http://www.eastwood.apana.org.au/Linux/LinuxGazette/">http://www.eastwood.apana.org.au/Linux/LinuxGazette/</A>
<BR>
<SMALL>
Maintainer: Mick Stock &lt;<A HREF="mailto:nospam@here.please">nospam@here.please</A>&gt; &nbsp;
</SMALL>
<P>
</DD>

...

Single-stanza replacement for all the entries (new template):


...
<a name="<TMPL_VAR NAME=FQDN>"></a>
<dt><b><font color="maroon"><TMPL_VAR NAME=FQDN> (<TMPL_VAR NAME=TLD>)</font></b></dt>
<dd><strong><font color="green"><tt>[WWW]</tt></font></strong>
<a href="<TMPL_VAR NAME=HTTP>"><TMPL_VAR NAME=HTTP></a>
<br>
<strong><font color="red"><tt>[FTP]</tt></font></strong>
<a href="<TMPL_VAR NAME=FTP>"><TMPL_VAR NAME=FTP></a>
<br>
<small>
Maintainer: <TMPL_VAR NAME=MAINT>
</small>
<p>
</dd>

...

Now, the challenge had shifted away from generating the HTML to just dealing with code. What I needed to do was sort the data into groups and subgroups - that is, there would some number of "country" headings, some number of "mirror" headings under each of those, and either one or two (WWW, FTP, or both) hosts plus a maintainer under each "mirror" heading. In programmatic terms, these are known as "nested loops", and are not that difficult to code. However, translating that into HTML terms could be an exercise in kind of language abilities of which your mother would not approve... if it wasn't for HTML::Template.

Note: Using HTML::Template is normally very simple; in fact, learning the basics of using it usually takes only a minute or two (see the example at the top of "perldoc HTML::Template".) However, in this instance, we're creating nested lists - a rather more complex issue than simple variable/tag replacement - and thus, the coding issues get a bit deeper. However, this isn't due to HTML::Template; if you think about the issues inherent in modeling what is already a complex data structure and then transferring that structure into a "passive" layout language... truth to tell, I'm somewhat surprised that it can be done at all. Kudos and my hat's off to Sam Tregar (author of the module) and Jesse Erlbaum (the man responsible for TMPL_LOOP.)

References

The area that seems to strike fear into the heart of neophyte programmers, more so than anything else, is the topic of references. Particularly in Perl, where everything is supposed to be warm, fuzzy, and easy to understand. However, understanding references and objects - in my opinion - are the very things that take one from being a Perl user to a Perl programmer. I'm going to simply show how references work with a bit of an explanation, but the real comprehensive reference for references :) is included with the standard Perl documentation. Simply type "perldoc perlreftut" at your command line for a good introduction, and be sure to take a look at "perldoc perlref" for the complete documentation.

First, in order to understand how the data structure must be laid out to create the pattern that we need, let's take a look at that pattern. Fortunately, in Perl it's easy to lay out the data structures to match what they represent (whitespace is arbitrary, so you can follow your preferences - but see "perldoc perlstyle".) What we'll want to do here is build the structure that contains all the values we want to assign within the loop as well as the names which are associated with those values. Those of you with a little Perl experience are nodding and smiling already: the word "associated" points very clearly to the type of variable we need - a hash! Taking a single "row" (per-country entry) - Austria, as a random example - here is how it looks:


%row = (
	tld	=> AT,
	fqdn	=> Austria,
	sites	=> [
			{ 
			  http	=>  "http://www.luchs.at/linuxgazette/",
			  maint =>  "Rene Pfeiffer"
			},

			{ 
			  http  =>  "http://info.ccone.at/lg/",
			  maint =>  "Gerhard Beck"
			},

			{
			  http	=>  "http://linuxgazette.tuwien.ac.at/",
			  ftp	=>  "ftp://linuxgazette.tuwien.ac.at/pub/linuxgazette/",
			  maint	=>  "Tony Sprinzl"
			}
		   ]
);

The above hash, %row, matches our requirements exactly: its keys will be used to match (case-insensitively) the tag names in the HTML, and the values associated with those keys will be used to replace those tags. That is, every instance of <TMPL_VAR NAME=FQDN> in the template will be replaced by "Austria" while this entry is being processed. Here are some of the less-obvious points of the above structure:

The anonymous hash constructor, defined by the curly braces surrounding each group, stores all the data in an anonymous hash and returns a reference to it.
In turn, the anonymous array constructor, defined by the square braces surrounding the list of groups, stores all of the above references in an anonymous array and returns a reference to it.
The sites key points to (is associated with) the reference to the above anonymous array, and is the name of the loop that we'll use within the HTML to iterate through all of the above data.

As we create a "row" for each country, we will need to store all of them in a list. Each entry in this list must, of course, contain a reference to each row that we have built:


# Add the hashref to the end of the array
push @mirrors, \%row;

Note the '\' preceding the %row; this stores a reference to %row rather than the hash itself (stuffing a hash into an array would result in a generally unusable mess - key/value pairs in effectively random order as array elements.) This is a standard mechanism for creating multidimensional arrays, lists-of-hashes, etc. in Perl.

And - one more time, with gusto - HTML::Template's param() subroutine, as most other subroutines in Perl and many other languages, expects a reference to the array rather than the array itself:


# Create a new HTML::Template object
my $t = HTML::Template -> new( filename => "mirrors.tmpl" );

# Pass the listref to param()
$t -> param( MIRR => \@mirrors );

"And", as Austin Powers would say, "Oi'm spent." Those of you scared of the Big Bad References may come out from under the bed now. :)

Looking at it from the other end, the matching part of the template for this loop looks like this:


<dl>
<TMPL_LOOP NAME=MIRR>

<dt><b><font color="maroon">
<a name="<TMPL_VAR NAME=TLD>">
<img src="gx/flags/<TMPL_VAR NAME=TLD>.jpg" border="1">
</a>
<TMPL_VAR FQDN> [<TMPL_VAR NAME=TLD>]
</font></b></dt>

<TMPL_LOOP NAME=SITES>

<dd>
<TMPL_IF NAME="HTTP">
<strong><font color="green"><tt>[WWW]</tt></font></strong>
<a href=<TMPL_VAR HTTP>>
<TMPL_VAR HTTP>
</a><br>
</TMPL_IF>

<TMPL_IF NAME="FTP">
<strong><font color="green"><tt>[FTP]</tt></font></strong>
<a href=<TMPL_VAR NAME=FTP>>
<TMPL_VAR NAME=FTP>
</a><br>
</TMPL_IF>
<small>
Maintainer:
<TMPL_VAR NAME=MAINT>
</small>
<p>
</dd>

</TMPL_LOOP>
</TMPL_LOOP>

</dl>

To recap what we're looking at, there are two loops defined in the above template, one inside the other: <TMPL_LOOP NAME=MIRR> and <TMPL_LOOP NAME=SITES>. Note that the outside loop corresponds to the name of the parameter key that we assigned when passing the data construct to param(), and the name of the inside loop is the same as the key associated with the groups inside the hash we created.

However, fine as the above may be for static data that we can simply type into those anonymous hashes in the 'groups' listref, static data isn't often what we get in the real world. Databases are updated, file contents change - and we obviously need to reflect this in our HTML. So, let's take a look at a code fragment that does this:


for $tld ( @tlds ){
	# Set some temporary (per-loop) variables
	my @sites;
	my %row;
	my %line;

	# Here's the inner loop!
	for ( grep /^$tld/, @mirr ){
		# Parse the CSV into fields
		my @rec;
		my %site;
		s/\\,/&comma;/g;
		@rec = split /,/;
		s/&comma;/,/g for @rec;

		# Mirror listings don't require much data
		$site{ http  } = $rec[2];
		$site{ ftp   } = $rec[3];
		$site{ maint } = $rec[4];
		# Load it up!
		push @sites, \%site;
	}
	
	# Outer loop vars
	$row{ tld } = $tld;
	$row{ country } = $country{ $tld };
	# Ref to the inner loop, attached
	$row{ sites } = \@sites;
	
	# ...and load up the total into the array to be passed to param()
	push @mirrs, \%row;
}

# Feed the data to the hungry HTML::Template object
$t -> param( MIRR => \@mirrs );

By the way, the data we're reading in looks like this:


AT,,http://www.luchs.at/linuxgazette/,,Rene Pfeiffer,nospam@here.please,
AT,,http://info.ccone.at/lg/,,Gerhard Beck,nospam@here.please,
BE,,http://linuxgazette.linuxbe.org/,,Cedric Gavage,nospam@here.please,
CA,,http://blue7green.crosswinds.net/hobbies/lg/,,Jim Pierce,nospam@here.please,

Now we have a highly dynamic chunk of code that will process the data that we give it, generate the necessary data structure on the fly, and feed it out to the template. Voila!

If you want to see the complete script that I wrote for this project, go here; the template can be found here. If you would like to see the latest generated page, go here. If you would like to change the way the page looks and do something great for the Linux community, join the folks on the list and become a mirror maintainer: commit some of your disk space and bandwidth and let the Linux Gazette "mirrors and translations" person - that's me! - know about it here.

Happy Linuxing to all!

Source material: (I was going to write "References"... :)

perldoc perlreftut
perldoc perlref
perldoc HTML::Template

Motivation:

My annoyance at the lack of good documentation for nested loops under HTML::Template. :)

Ben is the Editor-in-Chief for Linux Gazette and a member of The Answer Gang.

Ben was born in Moscow, Russia in 1962. He became interested in electricity at the tender age of six, promptly demonstrated it by sticking a fork into a socket and starting a fire, and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the recurrent nightmares.

His subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. After a seven-year Atlantic/Caribbean cruise under sail and passages up and down the East coast of the US, he is currently anchored in St. Augustine, Florida. He works as a technical instructor for Sun Microsystems and a private Open Source consultant/Web developer. His current set of hobbies includes flying, yoga, martial arts, motorcycles, writing, and Roman history; his Palm Pilot is crammed full of alarms, many of which contain exclamation points.

He has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.

Copyright © 2003, Ben Okopnik. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 97 of Linux Gazette, December 2003

<-- prev | next -->