Tux

...making Linux just a little more fun!

How do you sort an IP address list?

Rick Moen [rick at linuxmafia.com]
Tue, 7 Nov 2006 15:48:08 -0800

Thread quoted below could be grist for the TAG mill, or the makings of a 2 cent tip, or something else.

Date: Tue, 7 Nov 2006 12:16:25 -0800
To: conspire@linuxmafia.com
X-Mas: Bah humbug.
User-Agent: Mutt/1.5.11+cvs20060403
From: Rick Moen <rick@linuxmafia.com>
Subject: [conspire] Puzzle: How do you sort IP address lists?
There's a maintenance task I have to do occasionally, that is very much The Wrong Thing over the long term, but necessary in the sort term: I keep a blocklist of IP addresses that my SMTP server shouldn't accept mail from. SVLUG's server, on which I'm interim sysadmin, has a list just like it. Since I maintain both lists, it's logical to combine them, run them through 'uniq' (to eliminate duplicates), and sort the result -- to benefit both sites.

That's where the 'puzzle' bit comes in. But first, why it's The Wrong Thing:

Security author Marcus J. Ranum has a dictum that 'enumerating badness' is dumb (http://www.ranum.com/security/computer_security/editorials/dumb/):

  Back in the early days of computer security, there were only a
  relatively small number of well-known security holes. That had a lot
  to do with the widespread adoption of "Default Permit" because, when
  there were only 15 well-known ways to hack into a network, it was
  possible to individually examine and think about those 15 attack
  vectors and block them. So security practitioners got into the habit
  of "Enumerating Badness" - listing all the bad things that we know
  about.  Once you list all the badness, then you can put things in
  place to detect it, or block it.
 
  Why is "Enumerating Badness" a dumb idea? It's a dumb idea because
  sometime around 1992 the amount of Badness in the Internet began to
  vastly outweigh the amount of Goodness. For every harmless,
  legitimate, application, there are dozens or hundreds of pieces of
  malware, worm tests, exploits, or viral code. Examine a typical
  antivirus package and you'll see it knows about 75,000+ viruses that
  might infect your machine. Compare that to the legitimate 30 or so apps
  that I've installed on my machine, and you can see it's rather dumb to
  try to track 75,000 pieces of Badness when even a simpleton could track
  30 pieces of Goodness.  [...]
So, in keeping blocklists of IP addresses that have been zombified and used for mass-mailed spam, 419-scammail, etc., I'm aware of doing something a bit dumb. It's a losing stategy. I'm doing it on linuxmafia.com because the site is badly short on RAM and disk space in the short term (still need to migrate to that VA Linux 2230), and so software upgrades are deferred. Similarly, the SVLUG host has a scarily broken package system, and is therefore to be migrated rather than worked on in place, as well. So, we limp by on both machines with some long-term losing anti-spam methods because they're short-term palliatives.

Getting back to the puzzle, you'd think that GNU sort would be easily adaptable to a list like this, right? Consider this 11-address chunk of linuxmafia.com's blocklist:

4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
Just 'sort' as a filter with no options does this:

10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
4.3.76.194
8.10.33.176
Hmm, fine up until the last three lines, but then it becomes apparent that 'sort' is using strict ASCII order. So, you hit the manpage. '-n' for 'compare according to string numerical value' seems promising, as does '-g' for 'compare according to general numerical value'. Those get you:

4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
and

4.3.76.194
8.10.33.176
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
No cigar.

Personally, I played with these things for a while, gave up and switched to awk, and had the problem mostly solved with a rather ghastly script when I thought 'Wait a second! That's absurd. We should be able to do this using just GNU sort. If it can't sort IP addresses, what the hell good is it?'

So, I went back and eventually figured it out -- and I'm wondering if any other subscriber has either already solved this problem or cares to take a crack at it.

(I'll also really admire someone's elegant solution in, e.g., Python, Perl, or Ruby -- but I'm just boggling at how non-obvious my 'sort' solution seems, and want to compare notes.)

-- 
Cheers,
Rick Moen                                    Ita erat quando hic adveni.
rick@linuxmafia.com

Date: Tue, 7 Nov 2006 12:33:23 -0800 (PST)
From: Tom Macke <macke@scripps.edu>
To: Rick Moen <rick@linuxmafia.com>
Cc: conspire@linuxmafia.com
Subject: [conspire] sort -t.
Use -t. to break the lines into fields on ., then sort 4 ints from left to right:
	sort -t. +0n -1 +1n -2 +2n -3 +3n -4 <ip.list > ip.list.sort
Input:
10.123.189.105
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
12.30.72.162
4.3.76.194
8.10.33.176
Output:
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
cheers, tom

Date: Tue, 7 Nov 2006 20:37:29 +0000
From: Nick Moffitt <nick@zork.net>
To: conspire@linuxmafia.com
Subject: Re: [conspire] Puzzle: How do you sort IP address lists?
Rick Moen:

> Personally, I played with these things for a while, gave up and
> switched to awk, and had the problem mostly solved with a rather
> ghastly script when I thought 'Wait a second!  That's absurd.  We
> should be able to do this using just GNU sort.  If it can't sort IP
> addresses, what the hell good is it?'
>
> So, I went back and eventually figured it out -- and I'm wondering if
> any other subscriber has either already solved this problem or cares
> to take a crack at it.

I have run into this in the past, actually, only with timestamps. I ended up doing a first pass sort, then breaking it up and sorting within using -t and -k to set the field separator and sort starting field, respectively. I ended up doing three successive runs, in reverse order, and naming the files with the prefixes. I then did a final sort to get the filenames in the order I wanted and catted them together. It was a one or two-liner that has since expired from my bash_history, but I was cursing and spitting the whole time.

But as I look now, it seems that you can specify multiple -k entries, and force it to sort on the column alone by specifying an end to the sort criterion as well:

sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4
I'm kind of alarmed that GNU sort hasn't picked up more sort contexts. I found myself in dire need of a hex sort a while ago, and ended up resorting to python.

On a vaguely related tangent, Ryan Finnie has packaged cidrgrep in sid, and it should be in testing by now. Hooray for grep using CIDR ranges instead of regexes! Why doesn't grep have this already? It makes me wonder if there are more pattern specifications we use that would be useful as options to common tools.

-- 
"N'aimez pas votre voiture?                             Nick Moffitt
Alor, l'heure est arrive pour la brul&eacute;"          nick@teh.entar.net
	-- Mark Jaroski

Date: Tue, 7 Nov 2006 12:57:08 -0800
From: Rick Moen <rick@linuxmafia.com>
To: conspire@linuxmafia.com
X-Mas: Bah humbug.
User-Agent: Mutt/1.5.11+cvs20060403
Subject: Re: [conspire] sort -t.
Quoting Tom Macke (macke@scripps.edu):

> Use -t. to break the lines into fields on ., then sort 4 ints from
> left to right:
> 
> 	sort -t. +0n -1 +1n -2 +2n -3 +3n -4 <ip.list > ip.list.sort
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Huh! I did figure out that I needed '-t.' (or, equivalently, '--field-separator=.') -- but I can't find your field-specification strings described in the manpage or texinfo docs. When I dug into the latter, what I found instead was a suggestion to use the '-k' (key) option. Looking more closely, I now find this reference in the info docs:

   On older systems, `sort' supports an obsolete origin-zero syntax
   `+POS1 [-POS2]' for specifying sort keys.  POSIX 1003.1-2001 (*note
   Standards conformance::) does not allow this; use `-k' instead.
Your Unix-greybeard credentials are showing, Tom. ;->

My solution, using '-k', did end up being uglier than yours by a fair measure. The info docs say:

`-k POS1[,POS2]'
`--key=POS1[,POS2]'
     Specify a sort field that consists of the part of the line between
     POS1 and POS2 (or the end of the line, if POS2 is omitted),
     inclusive.  Fields and character positions are numbered starting
     with 1.  So to sort on the second field, you'd use `--key=2,2'
     (`-k 2,2').  See below for more examples.
I thus up with:
$ sort -u -n -t. -k 1,1 -k 2,2 -k 3,3 -k 4,4 ip > ip.sorted
The '-u' is for uniq-ing on the fly. '-n' is a numeric-value sort appropriate for most types of numbers (that don't use leading plus characters or exponential notation, blessedly unlikely in IP addresses). '-t.' specifies that period is the applicable field separator (rather than whitespace).

Which leaves the '-k' string (equivalent to your origin-zero specifiers): It says, 'Hey, stupid sort program! Now that I've handed you a clue about where the fields begin and end, and told you to sort by numeric value, please also be aware that I'd like you to find a number, then a second number, then a third number, then a fourth number. Kindly use all four _as numbers_ when you sort this puppy.'

What a hassle. First, you have to say 'Use numbers', then you have to add '...and I mean, specifically, four of them.'

Date: Tue, 7 Nov 2006 13:16:52 -0800
From: Don Marti <dmarti@zgp.org>
To: Nick Moffitt <nick@zork.net>, conspire@linuxmafia.com
User-Agent: Mutt/1.5.9i
Subject: Re: [conspire] Puzzle: How do you sort IP address lists?
Alternate approach:

tr '.' ' ' < address_list | xargs printf '%03d.%03d.%03d.%03d\n' \
| sort -u | sed -re 's/\b0+//g' ) < address_list
Another way would be to multiply each address out into an int, sort, and re-format.

-- 
Don Marti                    
http://zgp.org/~dmarti/
dmarti@zgp.org

Date: Tue, 7 Nov 2006 14:39:02 -0800
To: conspire@linuxmafia.com
X-Mas: Bah humbug.
User-Agent: Mutt/1.5.11+cvs20060403
From: Rick Moen <rick@linuxmafia.com>
Subject: Re: [conspire] Puzzle: How do you sort IP address lists?
Quoting Don Marti (dmarti@zgp.org):

> Alternate approach: 
> 
> tr '.' ' ' < address_list | xargs printf '%03d.%03d.%03d.%03d\n' \
> | sort -u | sed -re 's/\b0+//g' ) < address_list
^

Works, after you lose the errant parenthesis. I like it; it's a little messy but logical.

Date: Tue, 7 Nov 2006 14:44:22 -0800
To: Rick Moen <rick@linuxmafia.com>
User-Agent: Mutt/1.5.9i
From: Tim Utschig <tim@tetro.net>
Cc: conspire@linuxmafia.com
Subject: Re: [conspire] Puzzle: How do you sort IP address lists?
On Tue, Nov 07, 2006 at 12:16:25PM -0800, Rick Moen wrote:

> 
> (I'll also really admire someone's elegant solution in, e.g., Python,
> Perl, or Ruby -- but I'm just boggling at how non-obvious my 'sort'
> solution seems, and want to compare notes.)
> 

I wouldn't call mine elegant, but the last time I tried to figure out how to do it using sort I gave up and used Perl...

:r!grep ipsort ~/.bashrc
 
    alias ipsort='perl -MSocket -lne '\''$ips{inet_aton($_)}++; END { for (sort keys %ips) { while($ips{$_}--) { print inet_ntoa($_); } } }'\'
 
    alias ipsortu='perl -MSocket -lne '\''$ips{inet_aton($_)} = 1; END { print inet_ntoa($_) for sort keys %ips }'\'
-- 
   - Tim Utschig <tim@tetro.net>


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Wed, 8 Nov 2006 00:00:11 -0500

On Tue, Nov 07, 2006 at 03:48:08PM -0800, Rick Moen wrote:

> Thread quoted below could be grist for the TAG mill, or the makings of a
> 2 cent tip, or something else.
Mmmm... 2-Cent Tip, I think. It's a common-enough problem that we should have a good answer for our readers.

Sorting IPs is a classic problem for budding Perl hackers to sharpen their brains on. :) One of the better solutions (highly efficient and relatively short) is a modified Schwartzian Transform:

ben@Fenrir:~$ cat iplist
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
ben@Fenrir:~$ perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
For those interested in the details - the IP is parsed into the numerical fields by 'split'; the result is converted to a 4-byte char string which is prepended to the line. This is now sorted using the default lexical sort (much like the one in the shell) - which will now actually work due to the prepended string - and displayed after clipping the prefix. /Voila/! ... I only wish that I'd thought of it. :)
-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Paul Sephton [paul at inet.co.za]
Wed, 08 Nov 2006 15:18:40 +0200

That's a bit confusing. Why go perl & regex if there's a perfectly good

           cat iplist | sort -k1.1n,2.1n,3.1n,4.1n -t'.'
which gives you exactly what you want anyway- unless you absolutely have to use perl, of course :)

btw: anyone know of people who use perl as their default shell? <grin>

Paul


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]
Wed, 8 Nov 2006 20:25:31 +0000

Hi --

On 08/11/06, Paul Sephton <paul@inet.co.za> wrote:

> That's a bit confusing.  Why go perl & regex if there's a perfectly good
>
>            cat iplist | sort -k1.1n,2.1n,3.1n,4.1n -t'.'

The above is inaccurate (don't top post). What you probably meant (and was already mentioned in the thread Rick bounced to TAG) was:

[n6tadam@workstation ~]% sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 < ./test
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
The use of cat in the above example (don't top post) was also OTT. I suppose you win a UUoC award (don't top post).

I'd admit the perl solution (don't top post) is way OTT, but YMMV, TMTOWTDI, etc.

-- Thomas Adam


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Wed, 8 Nov 2006 15:31:32 -0500

[ Hi, Paul - please don't top-post; this severely decreases people's ability to read things in order. Also, please clip content that you're not replying to; see "Asking Questions of The Answer Gang" at http://linuxgazette.net/tag/ask-the-gang.html for details. I've restored the correct sequence, re-added correct attribution, and clipped extraneous material. ]

On Wed, Nov 08, 2006 at 03:18:40PM +0200, Paul Sephton wrote:

>    On Wed, 2006-11-08 at 07:00, Benjamin A. Okopnik wrote:
> > On Tue, Nov 07, 2006 at 03:48:08PM -0800, Rick Moen wrote:
> > 
> > > Thread quoted below could be grist for the TAG mill, or the makings of a
> > > 2 cent tip, or something else.
> > 
> > Mmmm... 2-Cent Tip, I think. It's a common-enough problem that we should
> > have a good answer for our readers.
> > 
> > Sorting IPs is a classic problem for budding Perl hackers to sharpen
> > their brains on. :) One of the better solutions (highly efficient and
> > relatively short) is a modified Schwartzian Transform:
> 
> [ snip ] 
> 
> > ben@Fenrir:~$ perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>'
> 
>    That's a bit confusing.

To me, it's perfectly readable and illustrates a powerful sorting algorithm that's worth knowing.

>    Why go perl & regex if there's a perfectly good
>               cat iplist | sort -k1.1n,2.1n,3.1n,4.1n -t'.'
>    which gives you exactly what you want anyway- unless you absolutely have to
>    use perl, of course :)

In that case, Paul, why go 'cat' when there's a perfectly good filespec option to 'sort'?

sort -k1.1n,2.1n,3.1n,4.1n -t'.' iplist
Worse than that, your solution apparently fails:

ben@Fenrir:/tmp$ cat iplist | sort -k1.1n,2.1n,3.1n,4.1n -t'.'
sort: stray character in field spec: invalid field specification `1.1n,2.1n,3.1n,4.1n'
I would imagine that there's some simple solution to the above, but I'll let you troubleshoot it. :)

The answer in general, however, is the Perl motto: TMTOWTDI (There's more than one way to do it.) Your way is not better than mine or vice versa; if it works, and it's what you prefer, Linux - and Unix in general - provides you with options in how you choose to do it. We don't need to compete for whose way is better (although healthy comparisons are useful; I'm always willing to steal^Wadapt someone else's method if it's a significant improvement on what I'm doing.)

>    btw: anyone know of people who use perl as their default shell? <grin>

However low that number may be, there are fewer using 'sort' as one. :)

http://sourceforge.net/projects/psh/

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Faber J. Fedor [faber at linuxnj.com]
Wed, 8 Nov 2006 15:43:27 -0500

On 08/11/06 15:31 -0500, Benjamin A. Okopnik wrote:

> On Wed, Nov 08, 2006 at 03:18:40PM +0200, Paul Sephton wrote:
> >    btw: anyone know of people who use perl as their default shell? <grin>
> 
> However low that number may be, there are fewer using 'sort' as one. :)
> 
> http://sourceforge.net/projects/psh/

And all this time I had you pegged as a Futurama fan, Ben.

http://zoidberg.student.utwente.nl/

-- 
 
Regards,
 
Faber Fedor
President
Linux New Jersey, Inc.
908-320-0357
800-706-0701


Top    Back


Paul Sephton [paul at inet.co.za]
Thu, 09 Nov 2006 00:08:24 +0200

Hi;

I am sorry if I offended anyone in my previous post; I was honestly not trying to be contrary. I personally use perl, and quite like the language. What I was trying to show, is a command which works perfectly well for GNU sort v5.0, but as you point out fails for GNU sort v5.2.1 (as evidenced by my testing on two machines).

The documentation for 'sort' shows the following:

 -k, --key=POS1[,POS2]
              start a key at POS1, end it at POS 2 (origin 1)
and
 -t, --field-separator=SEP use SEP instead of non-blank to blank transition
also, further down,
       POS  is  F[.C][OPTS],  where  F is the field number and C the character
       position in the field.  OPTS is  one  or  more  single-letter  ordering
       options,  which  override  global ordering options for that key.  If no
       key is given, use the entire line as the key.
So my command line is perfectly valid according to the documentation. As tested earlier on my older machine, it also provides a very valid result for GNU sort v5.0. The command line -k1.1 -k2.2 -k3.3 etc. is invalid, as the [.C] is an offset into the field for the first character from which to commence the ordering. By going -k3.3, you drop the first two characters of the field.

I can only conclude that there is a programming error (read BUG) in the later sort v5.2.1 as that version does not function according to documentation as embedded (sort --help) or the man pages.

Apologies again for not testing on a later machine prior to posting. On the other hand, I find myself pulling my hair out sometimes at the way some utilities (for example nslookup) which I have used for many years are deprecated seemingly at someone's whim, or others (such as ps) have their arguments changed (again at someone's whim) breaking all sorts of scripts. Upgrading production machines is a nightmare, and I cannot bring myself to believe that there is a valid reason behind this practice. Standards seem to be things that apply only to those without a sense for adventure, such as SCO Unix.

Perhaps there is no "standard" answer to the query as to how lists of IP's may be sorted. Clearly, any production system which used my method would have broken the moment the binutils and textutils were updated. Who knows; Perl keeps morphing as well- I can hardly recognise the original language anymore- although backward compatiblity seems to have been retained however astounding that might seem.

Regards my previous comments which were intended to be humerous (as indicated by the smileys) I do realise that those comments were severely lacking in substance, and likely to be misconstrued. Clearly, no-one would use "sort" as a shell, although it is perfectly valid to use Perl, Tcl or even Python as shells. A previous acquaintance of mine actually lived in Tcl, eschewing Bash as a creation from Hell.

Regards, Paul


Top    Back


Rick Moen [rick at linuxmafia.com]
Wed, 8 Nov 2006 15:03:39 -0800

Quoting Paul Sephton (paul@inet.co.za):

> I am sorry if I offended anyone in my previous post; I was honestly
> not trying to be contrary. 

If you don't even try to be contrary, how on earth will you fit in? ;->

Seriously, nobody's offended, and please accept our cheery welcome. (We just tend to have to frequently remind people to not accidentally drop the mailing list, on follow-ups.)

> I personally use perl, and quite like the language.  What I was trying
> to show, is a command which works perfectly well for GNU sort v5.0,
> but as you point out fails for GNU sort v5.2.1 (as evidenced by my
> testing on two machines).
> 
> The documentation for 'sort' shows the following:
> 
>  -k, --key=POS1[,POS2]
>               start a key at POS1, end it at POS 2 (origin 1)

Near as I can tell, the bracket syntax ("zero or more of these") is a little misleading, as it appears that you need no more than a pair of POSn numbers, and then need to use additional "-k" / "--key=" options for the second and following keys. So, yes, the sort(1) v.5.2.1 manpage is buggy, but only to the extent of not making that clear.

Also, your syntax omitted "-u" (uniq) and the numeric-sort option. In short, you probably meant something more like my GNU sort example:

   sort -u -n -t. -k 1,1 -k 2,2 -k 3,3 -k 4,4 iplist
...but somehow it came out as this (rearranged per Ben's suggestion to eliminate "cat"), which doesn't quite work:
   sort -k1.1n,2.1n,3.1n,4.1n -t'.' iplist

> I find myself pulling my hair out sometimes at the way some utilities
> (for example nslookup) which I have used for many years are deprecated
> seemingly at someone's whim, or others (such as ps) have their
> arguments changed (again at someone's whim) breaking all sorts of
> scripts.

In the case of nslookup(1), its eclipse by dig(1) turns out to have ample justification: nslookup relies on some BIND8-specific implementation features (though that may have been fixed in recent cleanup), carries out some unintended network lookups, conceals critical data from its output results, tends to issue non-helpful error messages, and in general is just buggy and ready for the scrap heap.

The change to "ps" options owes, if I remember correctly to some infamous BSD / SysV trainwreck, such that it provides for both syntaxes while making nobody particularly happy.

-- 
Cheers,                  Higgeldy Piggeldy             "Phooey on Freud and his 
Rick Moen                Hamlet of Elsinore            Psychoanalysis -- 
rick@linuxmafia.com      Ruffled the critics by        Oedipus, Schmoedipus,    
                         Dropping this bomb:           I just loved Mom."       


Top    Back


Paul Sephton [paul at inet.co.za]
Thu, 09 Nov 2006 08:33:11 +0200

On Wed, 2006-11-08 at 15:03 -0800, Rick Moen wrote:

> Quoting Paul Sephton (paul@inet.co.za):
> 
> > I am sorry if I offended anyone in my previous post; I was honestly
> > not trying to be contrary. 
> 
> If you don't even try to be contrary, how on earth will you fit in?  ;-> 
> 
> Seriously, nobody's offended, and please accept our cheery welcome.
> (We just tend to have to frequently remind people to not accidentally
> drop the mailing list, on follow-ups.)
> 
> > I personally use perl, and quite like the language.  What I was trying
> > to show, is a command which works perfectly well for GNU sort v5.0,
> > but as you point out fails for GNU sort v5.2.1 (as evidenced by my
> > testing on two machines).
> > 
> > The documentation for 'sort' shows the following:
> > 
> >  -k, --key=POS1[,POS2]
> >               start a key at POS1, end it at POS 2 (origin 1)
> 
> Near as I can tell, the bracket syntax ("zero or more of these") is a
> little misleading, as it appears that you need no more than a pair 
> of POSn numbers, and then need to use additional "-k" / "--key=" options
> for the second and following keys.  So, yes, the sort(1) v.5.2.1 manpage
> is buggy, but only to the extent of not making that clear.
> 

The man page is unaltered between 5.0 and v5.2.1 or sort. The difference is in operation.

Interpreting the syntax, the [] brackets simply means "optional" ( refer BNF ). Therefore, --key=POS1[,POS2] simply means "one or more POS separated by comma". Looking at the definition for POS, we see POS=F[.C][OPTS] where F is the field number. Optionally, the field may be specified as 'F', or as 'F.C' or as 'F.COPTS' where F is a field number, C is an offset into that specific field from whence the sort starts, and OPTS are field specific options (in my case 1.1n means field 1, offset 1, numeric sort for field 1).

> Also, your syntax omitted "-u" (uniq) and the numeric-sort option.  In
> short, you probably meant something more like my GNU sort example:
> 
>    sort -u -n -t. -k 1,1 -k 2,2 -k 3,3 -k 4,4 iplist
> 
> ...but somehow it came out as this (rearranged per Ben's suggestion to 
> eliminate "cat"), which doesn't quite work:
> 
>    sort -k1.1n,2.1n,3.1n,4.1n -t'.' iplist
> 

Um, no. I did not mean that.

I am sorry to say that I did not refer to your example before replying to the thread. As I said, the command line which I provided does indeed work with GNU sort v5.0. The syntax breaks with GNU sort v5.2.1, but I believe that is the fault of incorrect implementation. Incorrect interpretation of the documentation for sort on the part of the developer, not of mine.

A very highly recommended book, Unix Power Tools http://www.oreilly.com/catalog/upt3/ describes the use of sort in great detail. GNU sort is well specified and not a candidate playing ground for someone's innovation.

> > I find myself pulling my hair out sometimes at the way some utilities
> > (for example nslookup) which I have used for many years are deprecated
> > seemingly at someone's whim, or others (such as ps) have their
> > arguments changed (again at someone's whim) breaking all sorts of
> > scripts.
> 
> In the case of nslookup(1), its eclipse by dig(1) turns out to have ample
> justification:  nslookup relies on some BIND8-specific implementation 
> features (though that may have been fixed in recent cleanup), carries
> out some unintended network lookups, conceals critical data from its
> output results, tends to issue non-helpful error messages, and in
> general is just buggy and ready for the scrap heap.
> 
> The change to "ps" options owes, if I remember correctly to some
> infamous BSD / SysV trainwreck, such that it provides for both syntaxes
> while making nobody particularly happy.

Indeed, nslookup does rely on BIND8 features. Again, a change to those features led to the demise of nslookup, which had existed (in it's pristine form) for some 15 years prior to it's demise.

Understand me well; I am not against change. As a CTO, Architect and skilled programmer, I am in fact concerned with introducing and managing change every day of my working life. What frustrates me is that some people don't seem to have a handle on when to change something and when to leave well alone. Adoption of Linux (the OS) is very much influenced by the the stability and associated adoption of core binary tools.

It is not good enough to say that the "open source choice will end up in the right thing being adopted" when a core tool is in effect superceded and replaced, or deprecated through introducing a new tool- possibly with the same name as the old.

Changing the binary interface to the OS as presented to the user through the set of core GNU tools in a way which is not backward-compatible should be taboo. Whereas change is not always bad, it is not always good either

Regards, Paul


Top    Back


Jason Creighton [jcreigh at gmail.com]
Wed, 8 Nov 2006 23:44:16 -0700

On Wed, Nov 08, 2006 at 12:00:11AM -0500, Benjamin A. Okopnik wrote:

> On Tue, Nov 07, 2006 at 03:48:08PM -0800, Rick Moen wrote:
> > Thread quoted below could be grist for the TAG mill, or the makings of a
> > 2 cent tip, or something else.
>  
> Mmmm... 2-Cent Tip, I think. It's a common-enough problem that we should
> have a good answer for our readers.
> 
> Sorting IPs is a classic problem for budding Perl hackers to sharpen
> their brains on. :) One of the better solutions (highly efficient and
> relatively short) is a modified Schwartzian Transform:
> 
> ``
> ben@Fenrir:~$ cat iplist
> 12.154.4.213
> 12.159.232.66
> 12.205.7.190
> 12.206.142.76
> 12.214.50.126
> 12.221.163.162
> 4.3.76.194
> 8.10.33.176
> 10.123.189.105
> 12.30.72.162
> 12.149.177.21
> ben@Fenrir:~$ perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
^^^^ Hmm...I don't really understand how that works. By the time Perl sees that, it's not quoted anymore:

~/tmp$ ruby -e 'p ARGV' perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
["perl", "-weprint map substr($_,4),sort map pack(C4,split/\\./).$_,<>", "iplist"]
Which Perl seems to happily accept as a bareword. I had thought that -w and/or "use strict" caused Perl to say "YOU FOOL! NO BAREWORDS ALLOWED!", but just playing around with a test script, I can't get a warning to fire with either -w or "use strict". (Perl 5.8.8, Debian etch). But it's been a long time since I've actually tried to code anything in Perl, so I'm probably mistaken.

> 4.3.76.194
> 8.10.33.176
> 10.123.189.105
> 12.30.72.162
> 12.149.177.21
> 12.154.4.213
> 12.159.232.66
> 12.205.7.190
> 12.206.142.76
> 12.214.50.126
> 12.221.163.162
> ''

"Me too!" Ruby implementation: (same input file):

~/tmp$ ruby -e "puts readlines().sort_by { |ip| ip.split('.').map { |d| d.to_i } }" iplist 
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
~/tmp$ 
sort_by does a Schwartzian transform for you, so just map the ip to an array of integers ("4.3.76.194" -> [4, 3, 76, 194]) which will then sort correctly.

Jason Creighton


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Thu, 9 Nov 2006 07:49:07 -0500

On Wed, Nov 08, 2006 at 11:44:16PM -0700, Jason Creighton wrote:

> On Wed, Nov 08, 2006 at 12:00:11AM -0500, Benjamin A. Okopnik wrote:
> > 
> > ``
> > ben@Fenrir:~$ cat iplist
> > 12.154.4.213
> > 12.159.232.66
> > 12.205.7.190
> > 12.206.142.76
> > 12.214.50.126
> > 12.221.163.162
> > 4.3.76.194
> > 8.10.33.176
> > 10.123.189.105
> > 12.30.72.162
> > 12.149.177.21
> > ben@Fenrir:~$ perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
>                                                               ^^^^
> Hmm...I don't really understand how that works. By the time Perl sees
> that, it's not quoted anymore:

Oh, right. It seems that 'pack' will accept a template argument without it being quoted. I didn't know that. Doing this with, say, 'N*' would create a bit of a problem, though.

The reason it works, of course - despite my senior moment at the keyboard - is that the shell evaluates 'C4' as a string and returns it literally. So, on the one hand, it is double-plus-ungood that I managed to type the wrong quotes - but on the other hand, I've just learned a cute trick that I could use (at least under some shells) to do more low, nasty, mean things with Perl golf. :)

> ``
> ~/tmp$ ruby -e 'p ARGV' perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist

[glower] Young man, if you're going to act smarter than me regularly, we're going to Have A Talk. A simple capo does not does not do that to Il Padrino, capish?

Nicely done. :)

In Perl, of course, that would be

perl -we'print "@ARGV"' !!
or, better yet - with clearer formatting -

ben@Fenrir:/tmp$ perl -wle'print for @ARGV' !!
The latter gives you each argument on a line by itself.

> ``
> ~/tmp$ ruby -e "puts readlines().sort_by { |ip| ip.split('.').map { |d| d.to_i } }" iplist 
> 4.3.76.194
> 8.10.33.176
> 10.123.189.105
> 12.30.72.162
> 12.149.177.21
> 12.154.4.213
> 12.159.232.66
> 12.205.7.190
> 12.206.142.76
> 12.214.50.126
> 12.221.163.162
> ~/tmp$ 
> ''
> 
> sort_by does a Schwartzian transform for you, so just map the ip to an
> array of integers ("4.3.76.194" -> [4, 3, 76, 194]) which will then sort
> correctly.

Sweet! It's nice that somebody has implemented it as a fixed routine. Does Ruby do GRTs (Gutman-Rossler Transforms) as well?

Incidentally, I've been occasionally glancing at "Why's (Poignant) Guide to Ruby" (http://poignantguide.net/). That is one seriously bent individual. I like him. :) And it's a fairly nice language from what I can see so far; if I'm going to add another scripting language to my kit, that's a good candidate. Python just leaves me cold and slightly queasy - unsurprising, perhaps, considering its poikilothermic and venomous nature... [1]

[1] Why, yes, this is intended to poke Mike Orr. Why do you ask? :)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Neil Youngman [ny at youngman.org.uk]
Thu, 9 Nov 2006 13:19:46 +0000

On or around Thursday 09 November 2006 12:49, Benjamin A. Okopnik reorganised a bunch of electrons to form the message: <SNIP>

> Python just leaves me cold and slightly
> queasy - unsurprising, perhaps, considering its poikilothermic and
> venomous nature... [1]

I thought pythons were constrictors and constrictors generally ain't venomous.

Neil


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Thu, 9 Nov 2006 08:34:39 -0500

On Thu, Nov 09, 2006 at 01:19:46PM +0000, Neil Youngman wrote:

> On or around Thursday 09 November 2006 12:49, Benjamin A. Okopnik reorganised 
> a bunch of electrons to form the message:
> <SNIP>
> 
> > Python just leaves me cold and slightly
> > queasy - unsurprising, perhaps, considering its poikilothermic and
> > venomous nature... [1]
> 
> I thought pythons were constrictors and constrictors generally ain't venomous.

Yes, but we're talking about the language. ;)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]
Thu, 9 Nov 2006 13:42:13 +0000

On Thu, 9 Nov 2006 07:49:07 -0500 "Benjamin A. Okopnik" <ben@linuxgazette.net> wrote:

> Sweet! It's nice that somebody has implemented it as a fixed routine.
> Does Ruby do GRTs (Gutman-Rossler Transforms) as well?

Not by default, no.

> Incidentally, I've been occasionally glancing at "Why's (Poignant)
> Guide to Ruby" (http://poignantguide.net/). That is one seriously
> bent individual. I like him. :) And it's a fairly nice language
> from what I can see so far; if I'm going to add another scripting
> language to my kit, that's a good candidate. Python just leaves me
> cold and slightly queasy - unsurprising, perhaps, considering its
> poikilothermic and venomous nature... [1]

It is a somewhat seminal piece, although it's not in my style of writing such that I can read it for long without being frustrated. By _why is cool -- he wrote the YAML bindings for ruby.

-- Thomas Adam


Top    Back


Jason Creighton [jcreigh at gmail.com]
Thu, 9 Nov 2006 23:59:54 -0700

On Thu, Nov 09, 2006 at 07:49:07AM -0500, Benjamin A. Okopnik wrote:

> On Wed, Nov 08, 2006 at 11:44:16PM -0700, Jason Creighton wrote:
> > ``
> > ~/tmp$ ruby -e 'p ARGV' perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
> 
> [glower] Young man, if you're going to act smarter than me regularly,
> we're going to Have A Talk. A simple /capo/ does not does not do that to 
> Il Padrino, capish?
> 
> Nicely done. :)
> 
> In Perl, of course, that would be
> 
> ``
> perl -we'print "@ARGV"' !!
> ''
> 
> or, better yet - with clearer formatting -
> 
> ``
> ben@Fenrir:/tmp$ perl -wle'print for @ARGV' !!
> ''
> 
> The latter gives you each argument on a line by itself.

One thing I forgot to mention is that I often use that trick to figure out what the heck the shell is doing. For example, I have this in my .bashrc:

alias putargs='ruby -e "p ARGV" --'
Or the equivalent Perl, of course. :)

Anyway, with that in place, you can play around with how the shell interprets command lines:

~/tmp$ ls
another_file  filename with spaces  some_file
~/tmp$ putargs *
["another_file", "filename with spaces", "some_file"]
~/tmp$ var='hello *'
~/tmp$ putargs $var 
["hello", "another_file", "filename with spaces", "some_file"]
~/tmp$ putargs "$var"
["hello *"]
~/tmp$ putargs '$var'
["$var"]
~/tmp$ putargs `/bin/ls`
["another_file", "filename", "with", "spaces", "some_file"]
~/tmp$ putargs "`/bin/ls`"
["another_file\nfilename with spaces\nsome_file"]
~/tmp$ putargs '`/bin/ls`'
["`/bin/ls`"]

> > ``
> > ~/tmp$ ruby -e "puts readlines().sort_by { |ip| ip.split('.').map { |d| d.to_i } }" iplist 
> > 4.3.76.194
> > 8.10.33.176
> > 10.123.189.105
> > 12.30.72.162
> > 12.149.177.21
> > 12.154.4.213
> > 12.159.232.66
> > 12.205.7.190
> > 12.206.142.76
> > 12.214.50.126
> > 12.221.163.162
> > ~/tmp$ 
> > ''
> > 
> > sort_by does a Schwartzian transform for you, so just map the ip to an
> > array of integers ("4.3.76.194" -> [4, 3, 76, 194]) which will then sort
> > correctly.
> 
> Sweet! It's nice that somebody has implemented it as a fixed routine.
> Does Ruby do GRTs (Gutman-Rossler Transforms) as well?

What's the Guttman-Rossler transform? Google is unusually unenlightening.

> Incidentally, I've been occasionally glancing at "Why's (Poignant) Guide
> to Ruby" (http://poignantguide.net/). That is one seriously bent
> individual. I like him. :) And it's a fairly nice language from what I
> can see so far; if I'm going to add another scripting language to my
> kit, that's a good candidate. Python just leaves me cold and slightly
> queasy - unsurprising, perhaps, considering its poikilothermic and
> venomous nature... [1]

As Thomas mentioned, _why is the author of Syck, a C YAML parser with bindings in Ruby and a couple other languages. And Hpricot, a nice HTML parser for Ruby. And RedCloth, an implementation of the Textile markdown language. And a handful of other libraries. And, of course, the aforementioned "(Poignant) Guide". If life were Slashdot, _why would be +5 Productive.

Jason Creighton


Top    Back


Paul Sephton [paul at inet.co.za]
Fri, 10 Nov 2006 10:03:25 +0200

On Thu, 2006-11-09 at 23:59 -0700, Jason Creighton wrote:

> On Thu, Nov 09, 2006 at 07:49:07AM -0500, Benjamin A. Okopnik wrote:
> > On Wed, Nov 08, 2006 at 11:44:16PM -0700, Jason Creighton wrote:
> > > ``
> > > ~/tmp$ ruby -e 'p ARGV' perl -we'print map substr($_,4),sort map pack('C4',split/\./).$_,<>' iplist
> > 
> > [glower] Young man, if you're going to act smarter than me regularly,
> > we're going to Have A Talk. A simple /capo/ does not does not do that to 
> > Il Padrino, capish?
> > 
> > Nicely done. :)
> > 
> > In Perl, of course, that would be
> > 
> > ``
> > perl -we'print "@ARGV"' !!
> > ''
> > 
> > or, better yet - with clearer formatting -
> > 
> > ``
> > ben@Fenrir:/tmp$ perl -wle'print for @ARGV' !!
> > ''
> > 
> > The latter gives you each argument on a line by itself.
> 
> One thing I forgot to mention is that I often use that trick to figure
> out what the heck the shell is doing. For example, I have this in my
> .bashrc:
> 
> ``
> alias putargs='ruby -e "p ARGV" --'
> ''
> 
> Or the equivalent Perl, of course. :)
> 
> Anyway, with that in place, you can play around with how the shell
> interprets command lines:
> 
> ``
> ~/tmp$ ls
> another_file  filename with spaces  some_file
> ~/tmp$ putargs *
> ["another_file", "filename with spaces", "some_file"]
> ~/tmp$ var='hello *'
> ~/tmp$ putargs $var 
> ["hello", "another_file", "filename with spaces", "some_file"]
> ~/tmp$ putargs "$var"
> ["hello *"]
> ~/tmp$ putargs '$var'
> ["$var"]
> ~/tmp$ putargs `/bin/ls`
> ["another_file", "filename", "with", "spaces", "some_file"]
> ~/tmp$ putargs "`/bin/ls`"
> ["another_file\nfilename with spaces\nsome_file"]
> ~/tmp$ putargs '`/bin/ls`'
> ["`/bin/ls`"]
> ''
> 

That's a really cool trick for getting a list from args. I think I could use that.

> > > ``
> > > ~/tmp$ ruby -e "puts readlines().sort_by { |ip| ip.split('.').map { |d| d.to_i } }" iplist 
> > > 4.3.76.194
> > > 8.10.33.176
> > > 10.123.189.105
> > > 12.30.72.162
> > > 12.149.177.21
> > > 12.154.4.213
> > > 12.159.232.66
> > > 12.205.7.190
> > > 12.206.142.76
> > > 12.214.50.126
> > > 12.221.163.162
> > > ~/tmp$ 
> > > ''
> > > 

Like a bulldog that can't let go of a blanket, I just had to see if there was another more perverse approach to this. I came up with the idea of turning the IP address into a number, sorting and then displaying the result.

Python has some built-in methods [socket.inet_aton(ip_string) and socket.inet_ntoa(packed_ip)] that could do this, sort the list of numbers and unpack; perhaps someone could do that as an exercise.

However, where Python would indubitably be more readable, just using bash, and standard tools, we could do:

paul@wart:~$ ((IFS=`echo -e"\n."`; \
while read a b c d; do echo $[((a*256 +b)*256+c)*256+d]; done) | \
sort -n -u | \
while read ip; do \
echo $[ip/0x1000000].$[ip%0x1000000/0x10000].\
$[ip%0x10000/0x100].$[ip%0x100]; done) < iplist
4.3.76.194
8.10.33.176
10.123.189.105
12.30.72.162
12.149.177.21
12.154.4.213
12.159.232.66
12.205.7.190
12.206.142.76
12.214.50.126
12.221.163.162
paul@wart:~$ 

> > > sort_by does a Schwartzian transform for you, so just map the ip to an
> > > array of integers ("4.3.76.194" -> [4, 3, 76, 194]) which will then sort
> > > correctly.
> > 
> > Sweet! It's nice that somebody has implemented it as a fixed routine.
> > Does Ruby do GRTs (Gutman-Rossler Transforms) as well?
> 
> What's the Guttman-Rossler transform? Google is unusually
> unenlightening.
> 

I would also like to know, please?

Paul Sephton


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Fri, 10 Nov 2006 08:09:21 -0500

On Thu, Nov 09, 2006 at 11:59:54PM -0700, Jason Creighton wrote:

> On Thu, Nov 09, 2006 at 07:49:07AM -0500, Benjamin A. Okopnik wrote:
> > 
> > ``
> > ben@Fenrir:/tmp$ perl -wle'print for @ARGV' !!
> > ''
> > 
> > The latter gives you each argument on a line by itself.
> 
> One thing I forgot to mention is that I often use that trick to figure
> out what the heck the shell is doing. For example, I have this in my
> .bashrc:
> 
> ``
> alias putargs='ruby -e "p ARGV" --'
> ''
> 
> Or the equivalent Perl, of course. :)

[laugh] Or you could use Bash. The usage and the output would vary slightly, of course:

ben@Fenrir:/tmp/foo$ touch another_file "filename with spaces" some_file
ben@Fenrir:/tmp/foo$ function putargs() { IFS="|"; echo "$*"; }
ben@Fenrir:/tmp/foo$ putargs *
another_file|filename with spaces|some_file
etc.

> What's the Guttman-Rossler transform? Google is unusually
> unenlightening.

Excellent paper by Uri Guttman and Larry Rosler, "A Fresh Look at Efficient Perl Sorting" that covers the ST, the GRT, the Orcish Maneuver and more:

http://www.sysarch.com/Perl/sort_paper.html

> > Incidentally, I've been occasionally glancing at "Why's (Poignant) Guide
> > to Ruby" (http://poignantguide.net/). That is one seriously bent
> > individual. I like him. :) And it's a fairly nice language from what I
> > can see so far; if I'm going to add another scripting language to my
> > kit, that's a good candidate. Python just leaves me cold and slightly
> > queasy - unsurprising, perhaps, considering its poikilothermic and
> > venomous nature... [1]
> 
> As Thomas mentioned, _why is the author of Syck, a C YAML parser with
> bindings in Ruby and a couple other languages. And Hpricot, a nice HTML
> parser for Ruby. And RedCloth, an implementation of the Textile markdown
> language. And a handful of other libraries. And, of course, the
> aforementioned "(Poignant) Guide". If life were Slashdot, _why would be
> +5 Productive.

Wow. Another Fabrice Bellard... if such a thing is possible. All kudos.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Fri, 10 Nov 2006 09:51:47 -0500

On Fri, Nov 10, 2006 at 10:03:25AM +0200, Paul Sephton wrote:

> 
> Like a bulldog that can't let go of a blanket, I just had to see if
> there was another more perverse approach to this.  I came up with the
> idea of turning the IP address into a number, sorting and then
> displaying the result.
> 
> Python has some built-in methods [socket.inet_aton(ip_string) and
> socket.inet_ntoa(packed_ip)] that could do this, sort the list of
> numbers and unpack; perhaps someone could do that as an exercise.
> 
> However, where Python would indubitably be more readable,

...that being my reason for demonstrating the algorithm in Perl... :)

> just using
> bash, and standard tools, we could do:
> 
> ``
> paul@wart:~$ ((IFS=`echo -e"\n."`; \
> while read a b c d; do echo $[((a*256 +b)*256+c)*256+d]; done) | \
> sort -n -u | \
> while read ip; do \
> echo $[ip/0x1000000].$[ip%0x1000000/0x10000].\
> $[ip%0x10000/0x100].$[ip%0x100]; done) < iplist
> 4.3.76.194
> 8.10.33.176
> 10.123.189.105
> 12.30.72.162
> 12.149.177.21
> 12.154.4.213
> 12.159.232.66
> 12.205.7.190
> 12.206.142.76
> 12.214.50.126
> 12.221.163.162
> paul@wart:~$ 
> ''

Nice, Paul! The double conversion strikes me as a little unnecessary, but - TMTOWTDI, as I'd mentioned before. Speaking of which:

IFS=`echo -e"\n."`
is a bit unnecessary (especially since 'echo' is horribly broken in a number of shells, and the above will fail in many situations); you can just do

IFS='
.'
and accomplish the same thing.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]
Fri, 10 Nov 2006 19:21:48 +0000

On 10/11/06, Benjamin A. Okopnik <ben@linuxgazette.net> wrote:

> ``
> IFS=`echo -e"\n."`
> ''
>
> is a bit unnecessary (especially since 'echo' is horribly broken in a
> number of shells, and the above will fail in many situations); you can
> just do
>
> ``
> IFS='
> .'
> ''
>
> and accomplish the same thing.

As does:

IFS=$'\n'
-- Thomas Adam


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Fri, 10 Nov 2006 15:13:04 -0500

On Fri, Nov 10, 2006 at 07:21:48PM +0000, Thomas Adam wrote:

> On 10/11/06, Benjamin A. Okopnik <ben@linuxgazette.net> wrote:
> > ``
> > IFS=`echo -e"\n."`
> > ''
> >
> > is a bit unnecessary (especially since 'echo' is horribly broken in a
> > number of shells, and the above will fail in many situations); you can
> > just do
> >
> > ``
> > IFS='
> > .'
> > ''
> >
> > and accomplish the same thing.
> 
> As does:
> 
> ``
> IFS=$'\n'
> ''

Not exactly, although the error is understandable: what's needed is a newline followed by a period. However, the above also fails in other shells:

ben@Fenrir:/tmp/foo$ ls -1		# Bash
another_file
filename with spaces
some_file
ben@Fenrir:/tmp/foo$ ksh		# KSH
$ for n in `ls`; do echo $n; done
another_file
filename
with
spaces
some_file
$ IFS=$'\n'
$ for n in `ls`; do echo $n; done
a
other_file
file
ame with spaces
some_file
$ 
-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Benjamin A. Okopnik [ben at linuxgazette.net]
Fri, 10 Nov 2006 15:51:09 -0500

On Wed, Nov 08, 2006 at 03:43:27PM -0500, Faber Fedor wrote:

> On 08/11/06 15:31 -0500, Benjamin A. Okopnik wrote:
> > On Wed, Nov 08, 2006 at 03:18:40PM +0200, Paul Sephton wrote:
> > >    btw: anyone know of people who use perl as their default shell? <grin>
> > 
> > However low that number may be, there are fewer using 'sort' as one. :)
> > 
> > http://sourceforge.net/projects/psh/
> 
> And all this time I had you pegged as a Futurama fan, Ben.
> 
> http://zoidberg.student.utwente.nl/

Hadn't even heard of that one, believe it or not (or have forgotten about it if I did.) Zoinks and zounds! It looks very nice, and quite mature. Although I think I'll stick with Bash - I'd hate to get out of the habit. :)

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Rick Moen [rick at linuxmafia.com]
Fri, 10 Nov 2006 13:16:33 -0800

Quoting Paul Sephton (paul@inet.co.za):

> Interpreting the syntax, the [] brackets simply means "optional" ( refer
> BNF ).  Therefore, --key=POS1[,POS2] simply means "one or more POS
> separated by comma".

I've actually been minutely aware of exactly how Backus-Naur Form works since Algol days -- and had, thank you, already read the sort(1) documentation (such as it is), in some detail. What I was suggesting is that the documentation is inaccurate and misleading. That's just a surmise based on experimentation, however.

I can't speak to how ancient versions such as v.5.0 and v5.2.1 work; modern versions such as v5.94 do function as I described.

> A very highly recommended book, Unix Power Tools
> http://www.oreilly.com/catalog/upt3/ describes the use of sort in
> great detail.

Yes, I of course have had a copy since ancient days -- but invariably not near me when I'm dealing with e-mail.

> Indeed, nslookup does rely on BIND8 features.  Again, a change to those
> features led to the demise of nslookup, which had existed (in it's
> pristine form) for some 15 years prior to it's demise.

It was not a change to BIND8's features, exactly, but rather the (richly deserved) demine of BIND8 itself -- not to mention a large number of other, serious implementation errors in nslookup that have jointly necessitated switching to a better tool.


Top    Back


Paul Sephton [paul at inet.co.za]
Fri, 10 Nov 2006 23:43:30 +0200

On Fri, 2006-11-10 at 13:16 -0800, Rick Moen wrote:

> Quoting Paul Sephton (paul@inet.co.za):
> 
> documentation (such as it is), in some detail.  What I was suggesting is
> that the documentation is inaccurate and misleading.  That's just a
> surmise based on experimentation, however.
> 
> I can't speak to how ancient versions such as v.5.0 and v5.2.1 work;
> modern versions such as v5.94 do function as I described.

Ok, I think there is room enough here for both of our beliefs to be at least partially accurate. Certainly, the documentation does not reflect the behaviour of GNU sort. What I pointed out, is that it once did as per GNU sort v5.0. Somewhere either with or prior to v5.2.1, behaviour changed, and the new behaviour apparently persists up to v5.94?

Yes, documentation is inaccurate in that it does not describe behaviour, and yes, GNU sort is broken when measured against the documentation (man page, embedded documentation and formal).

I think this situation is unacceptable.

-- Paul Sephton


Top    Back