...making Linux just a little more fun!

HTML obfuscation

By Ben Okopnik

Whoops. Darn it, I meant to publish this article in the April issue, for obvious reasons... but between being sicker than I have been for years (I'm usually healthy as a bull), driving 700+ miles in that condition, teaching an intensive 3-day class, and then driving back - and publishing LG in the middle of all that - somehow, I managed to forget it. Perhaps perfectly appropriate for a guy with an April 1st birthday, but... well, I hope you enjoy it anyway, despite the delay.
-- Ben


There are times when you want the content of your web page to be a deep, dark secret, but you still want it to be visible. [1] Or, perhaps, you want to express your deepest, darkest feelings and yearnings in your blog, but you don't want anyone without a browser to be able to read them (take that, you browserless wimps! Oh, and you wimpettes, too.) "How," you cry, "shall this desperate need be addressed?"

Nil desperandum, my friends - it's (ta-daa!!!) Geekman to the rescue! And Perl, of course. [2]

First, you need to install the HTML::Parser module. Don't worry about the name, really; behind that deceptively innocent-sounding monicker, it's got all the proper telepathy and black-helicopter-shielding functions built in (although they're really well hidden; there's a password involved, and deadly traps in the walls, and beautiful half-naked girls in golden cages, and all that stuff, so you know that it's The Real Thing. Just like Indiana Jones and the Temple of Doom. I mean, you'd never doubt the authenticity of that, right?) The important bit is where we do all that fancy, computer-science-sounding stuff, like Object Oriented Programming, and Subclassing, and Using Methods. After all, when you give big, important, and capitalized names like that to things, they can't help but work - or at least be really, really magical!

(Next month, we'll build a time machine and a hyperspace transporter using this same technique. No, it'll be a time machine inside a hyperspace transporter - or maybe the other way around. Anyway, it'll be really exciting, trust me!)

The Code

So, OK. Here we go with the magical code:

#!/usr/bin/perl -w
# Created by Ben Okopnik on Thu Feb 27 22:00:12 PST 2003
use strict;

die "Usage: ", $0 =~ /([^\/]+)$/, " <file_to_mung.html>\n"
	unless @ARGV && -r $ARGV[0];

package HTML::Parser::Mung;
use HTML::Parser;
@UNIVERSAL::ISA = 'HTML::Parser';

sub start { print $_[4] }
sub end   { print $_[2] }
sub text  {
    $_[1] =~ s#(&(\w+);?)#${{qw/lt < gt > amp & quot "/}}{$2}||$1#ge;
	$_[1] =~ s/(\S)/sprintf "&#%s;", ord $1/eg;
	print $_[1]
}

my $p = HTML::Parser::Mung->new();
$p->parse_file($ARGV[0]);

Yay! Since it's all magical, we'll just apply it to this article. Poof! Go ahead, view the source. (It wasn't like that until you read to this point in the article. That's all part of the magic.)

Explaining the Code

Are you kidding??? Does David Copperfield explain his illusions? Does Houdini tell you how he did it? Does... well, if I told you, I'd have to kill you. The important part is, you can download this script and do it too. Be satisfied, and go away.

Otherwise, when I'm testing next month's project, I'll accidentally transport a Tyranosaurus Rex into your living room.


[1] This is in case the CIA is following you and tapping your phone, or your Web page is being targeted for assassination by Colombian drug lords, or the aliens are tapping your thoughts and the foil-lined beanie with the titanium propellor is Just Not Working. Since these things are so commonplace that all three often occur at the same time, this article seeks to redress the injustice.

[2] I mean, if you're going to either 1) perform the most amazing feats of text manipulation that will stun and amaze the most jaded audience, or 2) turn perfectly sensible text into completely senseless gibberish, there's only one possible answer. (And it's not "dadadodo" for #1 and "Microsoft Word spell-check" for #2, either.)


Talkback: Discuss this article with The Answer Gang


picture

Ben is the Editor-in-Chief for Linux Gazette and a member of The Answer Gang.

Ben was born in Moscow, Russia in 1962. He became interested in electricity at the tender age of six, promptly demonstrated it by sticking a fork into a socket and starting a fire, and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory (the recurring nightmares have almost faded, actually.)

His subsequent experiences include creating software in more than two dozen languages, network and database maintenance during the approach of a hurricane, writing articles for publications ranging from sailing magazines to technological journals, and teaching on a variety of topics ranging from Soviet weaponry and IBM hardware repair to Solaris and Linux administration, engineering, and programming. He also has the distinction of setting up the first Linux-based public access network in St. Georges, Bermuda as well as one of the first large-scale Linux-based mail servers in St. Thomas, USVI.

After a seven-year Atlantic/Caribbean cruise under sail and passages up and down the East coast of the US, he is currently anchored in northern Florida. His consulting business presents him with a variety of challenges such as teaching professional advancement courses for Sun Microsystems and providing Open Source solutions for local companies.

His current set of hobbies includes flying, yoga, martial arts, motorcycles, writing, Roman history, and mangling playing with his Ubuntu-based home network, in which he is ably assisted by his wife, son and daughter; his Palm Pilot is crammed full of alarms, many of which contain exclamation points.

He has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.


Copyright © 2010, Ben Okopnik. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 174 of Linux Gazette, May 2010

Tux