Tux

...making Linux just a little more fun!

Talkback:152/lg_tips.html

[ In reference to "2-Cent Tips" in LG#152 ]

Greg Metcalfe [metcalfegreg at qwest.net]


Fri, 25 Jul 2008 11:35:02 -0700

Regarding "2-cent tip: Removing the comments out of a configuration file":

I don't like to invoke Yet Another Interpreter (Perl, Python, etc.) for simple problems, when I've already got a perfectly good one (the bash shell) running, and all those wonderful GNU programs. So I often view 'classic' config files (for httpd, sshd, etc) via the following, which I store as ~/bin/dense:

!#/bin/bash
# Tested on GNU grep 2.5.1
grep -Ev ^\([[:space:]]*#\)\|\(^$\) $1

~/bin is in my path, so the command is simply 'dense PATH/FILE'. This code strips comments, indented comments, and blank lines.

Of course, if you need this frequently, and bash is your login shell, a better approach might be to just add:

alias dense='grep -Ev ^\([[:space:]]*#\)\|\(^$\)'
to your ~/.bashrc, since it's so small, thus loading it into your environment at login. Don't forget to source the file via:
'. ~/.bashrc'
after the edit, if you need it immediately!

Regards, Greg Metcalfe


Top    Back


Ben Okopnik [ben at linuxgazette.net]


Sat, 26 Jul 2008 00:02:51 -0400

On Fri, Jul 25, 2008 at 11:35:02AM -0700, Greg Metcalfe wrote:

> Regarding "2-cent tip: Removing the comments out of a configuration file":
> 
> I don't like to invoke Yet Another Interpreter (Perl, Python, etc.) for simple 
> problems, when I've already got a perfectly good one (the bash shell) 
> running, and all those wonderful GNU programs. 

You know, I'm often puzzled when people say that. Whether you have Bash running or not, your script launches another instance of it - *as an interpreter.* The memory footprint of bash plus grep is not going to be much smaller than that of Perl, either. You also lose the capability of (easily) writing the result back to the original file. In what way is this better?

As far as I'm concerned, if the process does not take some huge amount of resources and runs at a comparable speed, it's a case of TMTOWTDI (There's More Than One Way To Do It.)

> So I often view 'classic' 
> config files (for httpd, sshd, etc) via the following, which I store as 
> ~/bin/dense:
> 
> !#/bin/bash
> # Tested on GNU grep 2.5.1
> grep -Ev ^\([[:space:]]*#\)\|\(^$\) $1

I just finished teaching a class on shell scripting, so I just can't resist. :) You should always escape the command string that you send to a program - usually by single-quoting it - to avoid having the shell interpret it. For example, it's quite conceivable that a shell could interpret '$\)' as a variable - or that it could treat the '#' in the string as the beginning of a comment. Bash can parse your construct correctly - but 'csh' can't:

% grep -Ev ^\([[:space:]]*#\)\|\(^$\) /etc/passwd
Illegal variable name.
> ~/bin is in my path, so the command is simply 'dense PATH/FILE'. This code 
> strips comments, indented comments, and blank lines.

Is there any reason to have those parentheses in the first part of the regex? It works just fine without them. I would also extend the definition of 'blank lines' to include lines consisting solely of whitespace, and get rid of all those useless backslashes:

egrep -v '^[[:space:]]*(#|$)' /etc/hosts
-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Top    Back


Greg Metcalfe [metcalfegreg at qwest.net]


Sat, 26 Jul 2008 11:02:39 -0700

On Friday 25 July 2008 21:02:51 Ben Okopnik wrote:

> On Fri, Jul 25, 2008 at 11:35:02AM -0700, Greg Metcalfe wrote:
> > Regarding "2-cent tip: Removing the comments out of a configuration
> > file":
> >
> > I don't like to invoke Yet Another Interpreter (Perl, Python, etc.) for
> > simple problems, when I've already got a perfectly good one (the bash
> > shell) running, and all those wonderful GNU programs.
>
> You know, I'm often puzzled when people say that. Whether you have Bash
> running or not, your script launches another instance of it - *as an
> interpreter.* The memory footprint of bash plus grep is not going to be
> much smaller than that of Perl, either. You also lose the capability of
> (easily) writing the result back to the original file. In what way is
> this better?
>
The bit about launching another interpreter is forehead-slappingly correct, of course. I have systems where bash is the only interpreter present. Sendmail's restricted shell is disabled, etc. I wonder if I'm not subconciously making peace with that situation, rather than growling about it.

My 'dense' actually does other things via switches, such as reporting on local modifications to config files via a mandated '# LOCALMOD date name reason' standard, etc.

But it just growed, and most of it is far too nasty (non-standard switches, etc.) to ever be seen by the public. I just grabbed the two most relevant lines, and pasted. You'll have seen my shebang typo, for instance. The 'dense' that's really in use does have the advantage of actually being runable...

Regarding the bit about writing back to the original file. I suppose that could be useful, but it's not something I'd want (or be allowed) to do, in the case of classic config files. In which case, redirection is all that's needed.

> As far as I'm concerned, if the process does not take some huge amount
> of resources and runs at a comparable speed, it's a case of TMTOWTDI
> (There's More Than One Way To Do It.)
>
> > So I often view 'classic'
> > config files (for httpd, sshd, etc) via the following, which I store as
> > ~/bin/dense:
> >
> > !#/bin/bash
> > # Tested on GNU grep 2.5.1
> > grep -Ev ^\([[:space:]]*#\)\|\(^$\) $1
>
> I just finished teaching a class on shell scripting, so I just can't
> resist. :) You should always escape the command string that you send
> to a program - usually by single-quoting it - to avoid having the shell
> interpret it. For example, it's quite conceivable that a shell could
> interpret '$\)' as a variable - or that it could treat the '#' in the
> string as the beginning of a comment. Bash can parse your construct
> correctly - but 'csh' can't:
>
> ```
> % grep -Ev ^\([[:space:]]*#\)\|\(^$\) /etc/passwd
> Illegal variable name.
> '''

No need to resist, as it's an excellent point. As mentioned above, portability isn't currently as much of an issue here as it might be at other installations. But greater portability still wins--especially when it's achieved at the expense of a pair of single quotes. Things do change (though glacially slowly, for good reasons, on the server set I'm thinking of), and the machine(s) I personally use are fairly standard Linux installs.

> > ~/bin is in my path, so the command is simply 'dense PATH/FILE'. This
> > code strips comments, indented comments, and blank lines.
>
> Is there any reason to have those parentheses in the first part of the
> regex? It works just fine without them. I would also extend the
> definition of 'blank lines' to include lines consisting solely of
> whitespace, and get rid of all those useless backslashes:
>
The parenthesis and backslashes may be an artifact of dragging this over from some (possibly ancient) version of HP-UX. Provenance might be hard to trace, particularly as it likely predates the last two version control systems.

> ``
> egrep -v '^[[:space:]]*(#|$)' /etc/hosts
> ''

Simpler, more portable, more powerful. What's not to like? Well, other than needing to work late a couple of nights next week, grovelling through 'dense' and a couple of other scripts that are likely just as old. There won't be time during what I laughingly refer to as normal hours, so the lesson here is plain. Do not get involved in discussions like this when the weather is nice, for you may find yourself working late. :)


Top    Back