...making Linux just a little more fun!

<-- prev | next -->

2-cent Tips

By Kat Tanaka Okopnik

2-Cent Tips

2-cent tip: Finding clunky files
2-cent tip: ethereal became wireshark
2-cent Tip: Annotating PDF
2-cent Tip: Real editing of PDF Forms
2-cent tip: Renaming music files

2-cent tip: Finding clunky files

Teal (teal at mailshack.com)
Sun 3 Sep 2006 13:58:28 PDT

Followed up by: Ben, Neil, Rick

What's eating up your hard-drive?

Most linux'ers familiar with the bash shell know that df is good for finding out just how much space is being taken up in a partition. They may also know that du lists each folder in the current dir, and the size of all that folder's contents.

Those are neat commands, but not that informative. The latter inspired me to come up with a more helpful shell one-liner that points out clear as day the files which are sucking up your space. I keep it handy to clean out my tiny 40GB hard drive every now and then. I also shared it with someone who runs a 160GB personal server, and they were very thankful. So if it's useful for me, and useful for him, I can be moderately sure that it'll be useful for you, too. Here it is:

cd ~; du -Sa --block-size=MB | sed -r '/^0/d' | sort -nr | less

You may have to wait a minute for it to get the size of all the files (with my small HD, takes me about 20 seconds).

This is only to scan your home directory for big files. To scan your root directory, change the ~ at the beginning to / ... and while it's scanning, press Ctrl+C, and then 'q' to quit. Or after it's done and the results are shown, just press 'q' to leave the pager program and go back to your prompt.

[Neil] - That's an interesting variation on the usual approach. Most people use 'find' to pick out large files, which I find preferable, e.g.

  find ~ -size +250k -ls

will list every file under your home directory larger than 250kB. If you want it sorted

  find ~ -size +250k -ls | sort -nr -k 7 

will do that.

As the saying goes "there's more than one way to do it" and your approach works just fine.

[Ben] - It may be that one solution is significantly faster than another (although I rather doubt it); I'd certainly like to find out. I wish I knew how to flush the page cache that 'find', etc. use to keep the relevant info ('du' uses the same one); I'd have liked to compare the speed of the two solutions, as well as perhaps 'ls -lR|sort -nrk5'. However, no matter what, Teal's is a good, useful approach to solving (or at least reporting) a common problem. Heck, I just cleaned out a bunch of thumbnails (187MB!) going back to... umm, given that I've been just carrying my '~' structure forward all along, back to when I started using Linux, probably.

ben at Fenrir:~$ time find ~ -size +250k -ls | sort -nr -k 7 > /dev/null
real    0m45.453s
user    0m0.120s
sys     0m0.500s

Maybe I'll remember to test one of the others when I next turn this laptop on.

[Rick] - Here's my own favourite solution to that problem:

:r /usr/local/bin/largest20

#!/usr/bin/perl -w
# You can alternatively just do:  
# find . -xdev -type f -print0 | xargs -r0 ls -l | sort -rn +4 | head -20
use File::Find;
@ARGV = $ENV{ PWD } unless @ARGV;
find ( sub { $size{ $File::Find::name } = -s if -f; }, @ARGV );
@sorted = sort { $size{ $b } <=> $size{ $a } } keys %size;
splice @sorted, 20 if @sorted > 20;
printf "%10d %s\n", $size{$_}, $_ for @sorted

[Ben] - [smile] Why, thank you. Nice to see it making the rounds. Original credit to Randal Schwartz, of course, but I've mangled the thing quite a bit since then.

[Neil] - The advantages of the find solution are

  1. It is somewhat more portable, the options to du used in teal's solution aren't available on some old distros I can't escape from.
  2. It's easier to fine tune the file size threshold.
  3. When sorted, it sorts in exact file size (but not exact disk usage). The du based solution won't sort a set of 1.2MB, 1.8MB and 1.6MB files into order of size.

In terms of speed, there may be an advantage in not having to remove small files from the initial list, but I would expect that difference to be lost in the noise.

[Nate (Teal)] - Hrm... the 'du' tool can sort based on a smaller size, you'd just have to set the block-size to say, kb, or just stick with bytes like find does, and you can fine-tune the files the 'du' tool shows based on size with grep. But of course, neither of those are as intuitive or easy-to-use the find solution, so 'du' is still worse in that aspect.

I have to say, I'm pretty humbled. It'd probably be better to just include the 'find' solution, or Moen's perl-based solution in the Gazette than my 'du' cruft.

[Ben] - Heck no, Nate. The point of all those tools in Linux is well represented by the motto of Perl, "TMTOWTDI": There's More Than One Way To Do It. It was nice to see someone else applying some brainpower to solving a common problem in a useful way.

[Nate (Teal)] - Good stuff, there.

[Ben] - Yep. Yours included.

[Rick] - As Ben reminded me, he's one of the most recent people to polish up that Perl gem ('largest20'): I'm merely one of the many people passing around variations of it -- and grateful for their craftsmanship.

2-cent tip: ethereal became wireshark

Peter Knaggs (peter.knaggs at gmail.com)
Thu Sep 7 19:02:15 PDT 2006

Old news to frequent ethereal users I guess, but back in July 2006 ethereal became "wireshark". It seems that the company Ethereal, Inc. is keeping the old name.

If you've been using the command line version tethereal, you're probably wondering what to call it now. Well tethereal has become "tshark".

2-cent Tip: Annotating PDF

Kapil Hari Paranjape (kapil at imsc.res.in)
Tue Sep 12 20:07:35 PDT 2006

Followed up by: Ben


If you have ever wanted to do the Guardian sudoku and not wanted to waste trees then you need to find a way to annotate PDF files on your computer.

"flpsed" (FL toolkit PostScript EDitor) to the rescue.

Install "flpsed" and import any PDF file for annotation. The interface is simple and intuitive.

This can also be used to fill forms which are not quite in the PDF form format. More about that in the next tip.

It can also be used to annotate PS files of course.



[Ben] - That's a great tool, Kapil. I've needed something like that for ages - many of the contracts that I get sent by my clients are in PDF, and up until now, I've been converting them to PS, editing them in Gimp, and reconverting them to PDF before shipping them back. This will save me tons of time - thanks! I hope others will find it at least as useful.

[Kapil] - Don't shoot (as in photograph) the messenger :)

I too am extremely grateful to the author (Mortan Brix Pedersen morten at wtf.de) of "flpsed".

Glad to have been of help.

2-cent Tip: Real editing of PDF Forms

Kapil Hari Paranjape (kapil at imsc.res.in)
Tue Sep 12 23:49:11 PDT 2006


"Real" PDF forms are quite common nowadays. How does edit them with a "Real" editor like vi (OK also emacs :))?

"pdftk" (PDF ToolKit) to the rescue.

Suppose that "form.pdf" is your PDF form.

1. Extract the form information:

	pdftk form.pdf generate_fdf output form.fdf

2. This only gets the text fields to get an idea of all the fields do:

	pdftk form.pdf dump_data_fields output form.fields

3. Sometimes the field names are cryptic. It helps to also view the form:

	xpdf form.pdf


	pdftotext -layout form.pdf; less form.txt

(if you insist on text-mode)

4. You can now edit the file form.fdf and fill in the fields marked with the string '\n%%EOF\n'.

Once you have edited form.fdf you can generate the filled in form with:

	pdftk form.fdf fill_form form.fdf output filled.pdf


	pdftk form.fdf fill_form form.fdf output filled.pdf flatten

to get a non-editable pdf.

Some additional hints:

1. If your form.fdf file contains no '\n%%EOF\n' strings then you are out of luck---it means your PDF form is only a printable form and cannot be filled on the computer (but see the hint about "flpsed").

2. Checkboxes/buttons will not appear in the fdf file. You can use form.fields to find out what these fields are called and introduce entries in the fdf file as (here replace FN by the field name)

   	 <</V (Yes) /T (FN) >> 


   	 <</V (Off) /T (FN) >> 

3. It helps to have three windows open. One for editing, one for viewing the form.fields and one for viewing the filled pdf file.

4. You may also want to periodically update the filling of the form to see whether the filling works.


Clearly this is crying for someone to write a nice interface---why don't I you ask? I will ... but don't hold your breath.

You can skip all of this and use Adobe's Distiller, but most readers should be able to guess why I don't want to use that!

2-cent tip: Renaming music files

Benjamin A. Okopnik (ben at linuxgazette.net)
Wed 27 Sep 2006 11:24:37 PDT

Much of the available CD-ripping software out there produces files with names like 'trackname_01.wav' or '01_track.wav' instead of actual song names. Yes, there's software available that will look up CDDB entries... but what if your CD isn't in the CDDB, or you don't have a net connection readily available?

'wavren' to the rescue. :)

This script, when executed in a directory containing the 'standard' track names, takes the name of a file that contains the names of the songs on that album and returns a paired list of the current track name and the line in the file that it will be renamed to. It will exit with an error message if the lists aren't the same length, and it will not actually rename anything until you specify a '-rename' argument. Example:

ben@Fenrir:/tmp/foo$ ls
01.wav  02.wav  03.wav  04.wav  05.wav  06.wav  07.wav  08.wav
09.wav 10.wav names
ben@Fenrir:/tmp/foo$ cat names
01. Hells Bells
02. Shoot To Thrill
03. What Do You Do For Money Honey
04. Given The Dog A Bone
05. Let Me Put My Love Into You
06. Back In Black
07. You Shook Me All Night Long
08. Have A Drink On Me
09. Shake A Leg
10. Rock And Roll Ain't Noise Pollution
ben@Fenrir:/tmp/foo$ wavren names
"01.wav" will be "01. Hells Bells.wav"
"02.wav" will be "02. Shoot To Thrill.wav"
"03.wav" will be "03. What Do You Do For Money Honey.wav"
"04.wav" will be "04. Given The Dog A Bone.wav"
"05.wav" will be "05. Let Me Put My Love Into You.wav"
"06.wav" will be "06. Back In Black.wav"
"07.wav" will be "07. You Shook Me All Night Long.wav"
"08.wav" will be "08. Have A Drink On Me.wav"
"09.wav" will be "09. Shake A Leg.wav"
"10.wav" will be "10. Rock And Roll Ain't Noise Pollution.wav"

If the lineup isn't exactly how you want it, you can either renumber the original files, or change the order of the lines in the "names" file. Also note that you can rename mp3 files, etc., just by changing the 'ext' variable at the top of the script to reflect the extension that you're looking for.

Talkback: Discuss this article with The Answer Gang

Bio picture

Kat likes to tell people she's one of the youngest people to have learned to program using punchcards on a mainframe (back in '83); but the truth is that since then, despite many hours in front of various computer screens, she's a computer user rather than a computer programmer.

When away from the keyboard, her hands have been found full of knitting needles, various pens, henna, red-hot welding tools, upholsterer's shears, and a pneumatic scaler.

Copyright © 2006, Kat Tanaka Okopnik. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 131 of Linux Gazette, October 2006

<-- prev | next -->