Tux

...making Linux just a little more fun!

Version control for /etc

Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 26 Oct 2007 08:53:02 +0530

Hello,

I was looking at version control mechanisms to handle /etc on the machines here. If people on TAG have used such systems I would appreciate feedback.

CVS: seems to be the classic solution. Cons: People say it is old and unmaintained code which is "end-of-life".

Mercurial: One of the modern VC systems considered "notable" on Rick Moen's knowledge base. One difficulty with "hg" is that it insists on the "distributed" model. Putting the version control history outside /etc (a la CVS) would require convoluted mounts.

GIT: Another modern VC system (though not "notable" as per Rick's kb). It is rather similar to Mercurial in many ways. One difference is that one can use the environment variable GIT_DIR to point to a different directory for storing VC history.

Some reasons to keep VC history outside /etc: 1. This way one can easily check for "cruft" without adding an explicit "ignore" for ".hg" or ".git" ... 2. Uses less space in "/etc". 3. Can keep the history on an "archival" disk safe from potential corruption.

Any thoughts/suggestions by people on TAG are welcome as usual!

Regards,

Kapil. --


Top    Back


Rick Moen [rick at linuxmafia.com]


Thu, 25 Oct 2007 21:46:20 -0700

Quoting Kapil Hari Paranjape (kapil at imsc.res.in):

> I was looking at version control mechanisms to handle /etc
> on the machines here. If people on TAG have used such systems I would
> appreciate feedback.

Oddly enough, I was just talking about that exact problem with my friends in Victoria, Australia:

From rick Thu Oct 25 14:41:45 2007 Date: Thu, 25 Oct 2007 14:41:46 -0700 To: luv-main at luv.asn.au Subject: Re: Directory crud

Quoting Craig Sanders (cas at taz.net.au):

> I still use RCS in various /etc sub-directories at home (postfix > and others), mostly because I'm too lazy to change something that > works. RCS does have the annoying habit of locking or changing the > permissions of files when they are checked in until you check them out > again....quite annoying when what you are checking in are executable > scripts.

Permissions, ownership, special files (symlinks being the pointy end of that problem), and pretty nearly all such metadata -- but RCS remains useful anyway.

Oh, and leaving litttle RCS droppings all over in-working-tree locations: right in your working tree itself unless you create a "RCS" subtree to receive them. This avoidable misfeature has been tragically replicated by several of the otherwise excellent next-generation SCMs, too (bzr, darcs, Mercurial). Ugh, hate that.

Only G.B. Shaw's fabled "unreasonable man" would seek something markedly better for sysadmin housekeeping, and the *buntu boys are working the bugs out of adapting bzr for that purpose:

https://wiki.ubuntu.com/VersionControlledEtc

Worth watching, I think. (All hail the unreasonable man!)

I should note that sharp-eyed mailing list member Trent W. Buck demurred slightly: Uh, all three of the SCMs you mentioned have one top-level metadata directory in the root project directory.

Which is a good point, and is indeed an improvement on RCS having a metadata directory in each working directory. This matter, by the way, was extensively explored in Kevin Smith's blog in http://web.archive.org/web/20061206073756/http://blog.fxa.org/articles/category/scm .

> CVS: seems to be the classic solution. Cons: People say it is old and
> unmaintained code which is "end-of-life".

CVS should never be deployed afresh, at this point: If nothing else, use svn instead, as a drop-in replacement. It fixes all of CVS's implementation problems, while retaining (for good or bad) the same design aspirations, no more and no less.

> Mercurial: One of the modern VC systems considered "notable" on
> Rick Moen's knowledge base. One difficulty with "hg" is that it
> insists on the "distributed" model. Putting the version control
> history outside /etc (a la CVS) would require convoluted mounts.

You're actually mixing together a good thing and a bad one: The distributed model is the good part, and nothing about it inherently prevents what one wishes to achieve, here.

> GIT: Another modern VC system (though not "notable" as per Rick's
> kb). 

I think, since you mention it, that git is overdue for that marker. It's rapidly moved from being a raw, brand-new, and evolving project to a mature tool and major success story.

> It is rather similar to Mercurial in many ways. One difference
> is that one can use the environment variable GIT_DIR to point to a
> different directory for storing VC history.

Fantastic! See, I didn't know that. I've learned something useful today. Thank you, Kapil.

Oddly enough, this is precisely the issue I stubbornly insisted was significant, in my conversation with Trent -- and is the very solution I suggested:

> Where else could metadata be kept? Remember that distributed SCMs > only have a working tree -- there is no "server" which can hang onto > metadata.

I would suggest: Ideally wherever specified via an appopriate dotfile entry, environment variable, or system-wide /etc datum.

My exploration of SCMs has been extremely scattershot, so it's always possible that darcs, Hg (Mercurial), Monotone, bzr (Bazaar), Codeville, or ArX has also sprouted this facility, and I just haven't yet heard about it.


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]


Fri, 26 Oct 2007 05:59:31 +0100

On 26/10/2007, Kapil Hari Paranjape <kapil at imsc.res.in> wrote:

> Hello,
>
> I was looking at version control mechanisms to handle /etc
> on the machines here. If people on TAG have used such systems I would
> appreciate feedback.

Avoid them. They become problematic, since you enforce something on /etc for which it was never designed. At best you can try and tie in the package manager (http://www.isisetup.ch/wiki/IsiSetupRevisionControl) but even that relies on so many things.

> GIT: Another modern VC system (though not "notable" as per Rick's
> kb). It is rather similar to Mercurial in many ways. One difference

Heh. I've been using GIT for a long time, it's a tool I migrated to at work away from SVN. It's very good, but not so good for something like /etc -- since its permissions model is something to desired.

-- Thomas Adam


Top    Back


Rick Moen [rick at linuxmafia.com]


Thu, 25 Oct 2007 22:09:26 -0700

Quoting Thomas Adam (thomas.adam22 at gmail.com):

> Heh.  I've been using GIT for a long time, it's a tool I migrated to
> at work away from SVN.  It's very good, but not so good for something
> like /etc -- since its permissions model is something to desired.

I had the impression that all checked-in file metadata gets accurately stored and versioned. I could be wrong (or misreading your point) -- and I've not actually tried this sort of thing. I've only (slightly) considered the concept, so far.

Mind you, although I think it a capital idea to store a full history of /etc contents in an SCM, I would have to think long and hard before using /etc checkouts to populate /etc directly, e.g., files in /etc existing only after SCM checkout into /etc.

The big win would tend to come with, e.g., routinely tracking changes to /etc/apache and /etc/bind configuration files, since both Apache HTTPd and BIND9 (especially the latter) are horrifically uninformative in the ways they fall over and die from syntax errors in their conffiles. Much detective work can be saved from the simple ability to be able to retrace diff histories in, say, DNS zonefiles.


Top    Back


Thomas Adam [thomas.adam22 at gmail.com]


Fri, 26 Oct 2007 06:28:26 +0100

On 26/10/2007, Rick Moen <rick at linuxmafia.com> wrote:

> Quoting Thomas Adam (thomas.adam22 at gmail.com):
>
> > Heh.  I've been using GIT for a long time, it's a tool I migrated to
> > at work away from SVN.  It's very good, but not so good for something
> > like /etc -- since its permissions model is something to desired.
>
> I had the impression that all checked-in file metadata gets accurately
> stored and versioned.  I could be wrong (or misreading your point) --

No -- it does not. GIT only tracks the execute bit on files; and the requisite of using isisetup is that the metastore has to be updated each time something changes within this supposed version controlled /etc -- that's not good.

> The big win would tend to come with, e.g., routinely tracking changes to
> /etc/apache and /etc/bind configuration files, since both Apache HTTPd
> and BIND9 (especially the latter) are horrifically uninformative in
> the ways they fall over and die from syntax errors in their conffiles.
> Much detective work can be saved from the simple ability to be able to
> retrace diff histories in, say, DNS zonefiles.

See git-bisect.

-- Thomas Adam


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 26 Oct 2007 11:17:45 +0530

Dear Rick,

Thanks for these links:

On Thu, 25 Oct 2007, Rick Moen wrote:

>   https://wiki.ubuntu.com/VersionControlledEtc
> http://web.archive.org/web/20061206073756/http://blog.fxa.org/articles/category/scm

And thanks for clearing up any vestiges of doubt about choosing CVS.

> > Mercurial: One of the modern VC systems considered "notable" on
> > Rick Moen's knowledge base. One difficulty with "hg" is that it
> > insists on the "distributed" model. Putting the version control
> > history outside /etc (a la CVS) would require convoluted mounts.
> 
> You're actually mixing together a good thing and a bad one:  The
> distributed model is the good part, and nothing about it inherently 
> prevents what one wishes to achieve, here.  

I used "distributed model" rather loosely and you are right.

> Fantastic!  See, I didn't know that.  I've learned something useful
> today.  Thank you, Kapil.

Pleased to be of help.

I just noticed that there is mention of this approach on the ubuntu wiki link that you sent me.

As I mentioned (and you are aware) there are some convoluted mechanisms available to achieve almost this goal with the other SCMs. For example, mount --bind /var/lib/repos/hg/etc.hg /etc/.hg (for this the /etc/.hg directory has to exist which is a small bit of cruft). After this Mercurial should work as expected.

A different approach would be to use /etc as a target rather than as a working directory. The folks at infrastructures.org seem to believe in this approach which uses a "Gold Server" as explained in http://www.infrastructures.org/bootstrap/gold.shtml

But basically, (as you suggest) keeping the working directory separate from its VC history data --- but tied using a configuration file of some kind would be the best. Even the cloning operation would be much faster to set up and would have the (somewhat dubious) advantage of working fast on file-systems which do not have hardlinks (FAT file systems!).

Regards,

Kapil. --


Top    Back


Rick Moen [rick at linuxmafia.com]


Thu, 25 Oct 2007 22:58:43 -0700

Quoting Kapil Hari Paranjape (kapil at imsc.res.in):

> As I mentioned (and you are aware) there are some convoluted
> mechanisms available to achieve almost this goal with the other SCMs.
> For example,
> 	mount --bind /var/lib/repos/hg/etc.hg /etc/.hg
> (for this the /etc/.hg directory has to exist which is a small bit of
> cruft). After this Mercurial should work as expected.
> 
> A different approach would be to use /etc as a target rather than as
> a working directory. The folks at infrastructures.org seem to believe
> in this approach which uses a "Gold Server" as explained in
> 	http://www.infrastructures.org/bootstrap/gold.shtml

One of those two approaches may end up being the best compromise.

It seems that -- predictably in retrospect -- the design of "git" and quite possibly a number of the other next-generation SCMs is narrowly focussed on the needs of developers, judging by what Thomas is now revealing, e.g., the git developers' reportedly not bothering to version permissions, ownership, etc. (if I read Thomas right).

I'll freely admit to having delved only into shallow waters on this topic, and not drunk deeply of the Pierian Spring: I've been hoping someone else would prototype, for example, some good way of keeping /etc routinely version-tracked, and write up a well-debugged tutorial. (I hate being a pioneer: There are all of those pesky arrows.)

Don Marti has had perhaps a somewhat more steely-eyed assessment of the likelihood of that happening: He'd like IDG / LinuxWorld.com to pay for it (http://www.linuxworld.com/community/?q=node/444):

What the world needs now (besides an ATX power supply with a built-in USB ammeter) is good Git tutorials.

Ted Ts'o points out (http://tytso.livejournal.com/29467.html) that git is powerful but needs help in the documentation department.

Donnie Berkholz points (http://www.linuxworld.com/community/?q=node/275#comment-3542) to X.org's UsingGit (http://freedesktop.org/wiki/UsingGit), but we need more tutorials, especially "git for webmasters" and "git for system administrators." Git gives you so many options starting off that a good set of "cookbook" pages for different types of projects would be really helpful.

LinuxWorld.com would run (and pay for) a good git tutorial, and the author contract allows an author to contribute the tutorial to the project docs right away.


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 26 Oct 2007 12:24:31 +0530

Hello,

On Thu, 25 Oct 2007, Rick Moen wrote:

> Quoting Kapil Hari Paranjape (kapil at imsc.res.in):
> 
> > As I mentioned (and you are aware) there are some convoluted
> > mechanisms available to achieve almost this goal with the other SCMs.
> > For example,
> > 	mount --bind /var/lib/repos/hg/etc.hg /etc/.hg
> > (for this the /etc/.hg directory has to exist which is a small bit of
> > cruft). After this Mercurial should work as expected.
> > 
> > A different approach would be to use /etc as a target rather than as
> > a working directory. The folks at infrastructures.org seem to believe
> > in this approach which uses a "Gold Server" as explained in
> > 	http://www.infrastructures.org/bootstrap/gold.shtml
> 
> One of those two approaches may end up being the best compromise.

Thinking about this some more, another approach is to use some sort of patch manager like "quilt" or "mercurial queues" --- the patches can be maintained "outside the working area".

This avoids the permissions issue that Thomas mentioned. However, I am not sure whether these mechanisms can handle (addition/removal of) symlinks and file movement.

One problem with patching is that there are wheels within wheels within wheels. One needs some sort of "consolidation step" which is triggered by a threshold like size of patch or age of patch. (Mercurial is supposed to have this feature.)

Using "/etc" as a target can also resolve the permissions issue but there are other complications ...

> I'll freely admit to having delved only into shallow waters on
> this topic, and not drunk deeply of the Pierian Spring:  I've been
> hoping someone else would prototype, for example, some good way of
> keeping /etc routinely version-tracked, and write up a well-debugged
> tutorial. (I hate being a pioneer:  There are all of those pesky arrows.)

... and we can always trust Rick to come out with quotable quotes! I took me a while to figure out the last line :-)

Regards,

Kapil. --


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 26 Oct 2007 15:39:42 +0530

Hello,

On Fri, 26 Oct 2007, Thomas Adam wrote:

> [git]'s very good, but not so good for something
> like /etc -- since its permissions model is something to desired.

A solution to this problem which is used in a slightly different context (user home directories rather than /etc) is to use an update hook.

http://kitenet.net/~joey/cvshome/ (using CVS) http://www.onlamp.com/pub/a/onlamp/2005/01/06/svn_homedir.html (using SVN)

Basically, the idea is to run a script at the end which fixes permissions. The config file that dictates the permissions (and possibly empty directories) can be one of the files in your configuration which will be updated. The hook-script is part of the VC infrastructure and resides in $GIT_DIR.

Regards,

Kapil. --


Top    Back


Paul Sephton [paul at inet.co.za]


Fri, 26 Oct 2007 12:37:19 +0200

On Fri, 2007-10-26 at 15:39 +0530, Kapil Hari Paranjape wrote:

> Hello,
> 
> On Fri, 26 Oct 2007, Thomas Adam wrote:
> > [git]'s very good, but not so good for something
> > like /etc -- since its permissions model is something to desired.
> 
> A solution to this problem which is used in a slightly different
> context (user home directories rather than /etc) is to use an update
> hook.
> 
> 	http://kitenet.net/~joey/cvshome/ (using CVS)
> 	http://www.onlamp.com/pub/a/onlamp/2005/01/06/svn_homedir.html (using SVN)
> 

I have been following this thread, and thought to share an interesting approach with you.

Back in the days of the PDP-11, RSX-11M OS used to version each file as it changed. Only once you did a 'purge' command, would the earlier versions be removed. That was hugely useful. The latest file would be named 'filename', and all other versions of the same file would be 'filename;1', 'filename;2', etc.

A version control system such as rcs is cool enough, I suppose, and common to most platforms as well. It does take a bit of knowledge of ci/co etc. to drive, and has the disadvantage that you only see the currently checked-out file at any given stage.

In the early days of playing Linux, I wrote a script that traversed a directory using 'find', and created hard links to each file in a target directory and subdirectories. The hard links and duplicate directory structure were saved in a separate base directory on the same volume. Yup, you guessed it- I used file name extensions ;1, ;2 etc. for their unique versions.

So long as your editor created a new file rather than just rewriting the same inode, a regular cron of the script ended up in keeping my versioned files for me. I never lost a thing after that, and could always go back to see change history.

As the file was always linked in the backup directory, you could delete the original and restore it again effortlessly. The use of hard links meant that hardly any additional disk space was used unless you actually edited a file, resulting in a new inode and hence a new version (when cron got around to it).

Paul


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Fri, 26 Oct 2007 20:00:08 +0530

On Fri, 26 Oct 2007, Paul Sephton wrote:

> I have been following this thread, and thought to share an interesting
> approach with you.
> 
> Back in the days of the PDP-11, RSX-11M OS used to version each file as
> it changed.  Only once you did a 'purge' command, would the earlier
> versions be removed.  That was hugely useful.  The latest file would be
> named 'filename', and all other versions of the same file would be
> 'filename;1', 'filename;2', etc.

The VAX/VMS system that I once used had a similar feature which was quite useful on more than one occasion. Nowadays, I think systems like "Veritas" have similar features for Linux but are proprietary.

> A version control system such as rcs is cool enough, I suppose, and
> common to most platforms as well. It does take a bit of knowledge of
> ci/co etc. to drive, and has the disadvantage that you only see the
> currently checked-out file at any given stage.

Actually, you can see older versions at the same time with systems which have lightweight cloning like "mercurial" and "git". It is just a little more complex ( ;-) ) than something like vimdiff 'a' 'a;1'

So versioned file-systems are interesting but somehow I don't see many implementations.

Wonder why?!

Kapil. --


Top    Back


Neil Youngman [Neil.Youngman at youngman.org.uk]


Fri, 26 Oct 2007 18:57:12 +0100

On Friday 26 October 2007 15:30, Kapil Hari Paranjape wrote:

> On Fri, 26 Oct 2007, Paul Sephton wrote:
> > I have been following this thread, and thought to share an interesting
> > approach with you.
> >
> > Back in the days of the PDP-11, RSX-11M OS used to version each file as
> > it changed. ?Only once you did a 'purge' command, would the earlier
> > versions be removed. ?That was hugely useful. ?The latest file would be
> > named 'filename', and all other versions of the same file would be
> > 'filename;1', 'filename;2', etc.
>
> The VAX/VMS system that I once used had a similar feature which
> was quite useful on more than one occasion. Nowadays, I think systems
> like "Veritas" have similar features for Linux but are proprietary.

Yeah, I remember purge /keep=5 and all that.

Emacs will do numbered versions for you. My .emacs has

(setq version-control 't) ; make numbered backups (setq vc-make-backup-files 't) ; keep backups for files in version control (setq delete-old-versions 't) ; delete excess backups without asking (setq kept-old-versions 2) ; keep the 2 oldest backups (setq kept-new-versions 3) ; keep the 3 newest backups

Neil


Top    Back


Samuel Bisbee-vonKaufmann [sbisbee at computervip.com]


Sun, 28 Oct 2007 20:08:00 -0400

Kapil Hari Paranjape wrote:

> Any thoughts/suggestions by people on TAG are welcome as usual!
> 

This method has already been suggested at by Rick, but I'll try to go a step or two deeper.

I would only use version control systems for specific directories in /etc, instead of tracking the entire tree. Example, track for apache2/, [mail server here]/, etc.

This has the advantage of being much easier to branch. You could either have separate trees for each program config directory (apache2/, etc.) or a broader root directory for all of /etc and then config directories bellow that. I suppose it would depend on how you like to configure your version control systems, repository user permissions (give developers and operations access to the apache2 config, but only operations get to touch the bind config), etc.

So maybe your svn (so shoot me, I like the method in their madness) tree would look like...

/				top /etc
/config_dir			a specific config directory
/config_dir/trunk	        working copy
/config_dir/branch		configs known to be stable
/config_dir/branch/arch		oh sexy, we just broke down by arch type

Also, tracking all of /etc's contents doesn't make sense to me. If you were to pull from the repository to a different distro or the same distro running different versions of the same software, then you are going to be in a bad way. Also, if you are like me, then you love installing new programs on your home-not-for-production-or-work machine, which would create MANY insertion and deletion calls to your revision tree(s).

Yes, I am starting to get complicated. Yes, this may be more than a home user needs for their desktop machine. I say all of this because at the end of the day I don't see a point to keeping a revision tree for a simple, stay at home machine. Revision control systems are inherently robust [and complicated], and are therefore more than you probably need for even a home mail/web server. I would stick to running a script via cron to save diffs for you. Then, if you make multiple copies before cron runs again and want to save your new copy, you can still run the script yourself. Much simpler.

-- 
Samuel Kotel Bisbee-vonKaufmann


Top    Back


Ramon van Alteren [ramon at forgottenland.net]


Mon, 29 Oct 2007 17:09:39 +0100

Kapil Hari Paranjape wrote:

> I was looking at version control mechanisms to handle /etc
> on the machines here. If people on TAG have used such systems I would
> appreciate feedback.
>
> CVS: seems to be the classic solution. Cons: People say it is old and
> unmaintained code which is "end-of-life".
>
> Mercurial: One of the modern VC systems considered "notable" on
> Rick Moen's knowledge base. One difficulty with "hg" is that it
> insists on the "distributed" model. Putting the version control
> history outside /etc (a la CVS) would require convoluted mounts.
>
> GIT: Another modern VC system (though not "notable" as per Rick's
> kb). It is rather similar to Mercurial in many ways. One difference
> is that one can use the environment variable GIT_DIR to point to a
> different directory for storing VC history.
>
> Some reasons to keep VC history outside /etc:
> 	1. This way one can easily check for "cruft" without
> 	   adding an explicit "ignore" for ".hg" or ".git" ...
> 	2. Uses less space in "/etc".
> 	3. Can keep the history on an "archival" disk safe from
> 	   potential corruption.
>
> Any thoughts/suggestions by people on TAG are welcome as usual!
>   
Mildly related but not quite...

We use puppet to manage the configurations on all machines in our park (>1000 nodes) and version-control our filestore and manifests.

Manifests are puppet-recipes for configuration, filestore is the associated configfile/template storage. The puppet client on the host can be configured to hold a versioned backup-copy of changed configfiles if it pleases you to do so, we have found that it is easier to roll-back our change on the puppet-master server which holds all config-parts for one or more servers in a servergroup.

Totally unsuitable for single desktop use.

Very effective for large-ish computerparks. As far as I can gather from the puppet irc channel and mailing lists the tool is used by desktop and server administrators alike.

I've seen happy deployment reports from people starting from 15 hosts and upwards.

Puppet is a new-comer on the block and definitly has rough edges and very fast development turnover but it has been (and still is) invaluable in growing our serverpark from 400+ hosts to the staggering 1000+ hosts we manage today.

Regards,

Ramon


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Tue, 30 Oct 2007 08:55:29 +0530

Hello,

I agree overall with the remarks made by Samuel but I have one bone to pick:

On Sun, 28 Oct 2007, Samuel Bisbee-vonKaufmann wrote:

> I would stick to running a script via
> cron to save diffs for you. Then, if you make multiple copies before
> cron runs again and want to save your new copy, you can still run the
> script yourself. Much simpler.

Let us assume that a person who uses 1-5 machines also wants to keep track of daily work in a systematic way. Using version control is a good way to do this. I can imagine "formal" letters which are represented by different branches for different people as an example. (Using a LaTeX style file is probably a better way!).

So a "systematic" user of a small number of machines is likely to learn version control. In which case it may be easier to use version control wherever different versions of data need to be tracked.[*]

There is of course a risk that everything will start looking like a nail --- with VC as the hammer!

Regards,

Kapil. [*] Which is why I liked "git"'s description of itself as "content tracker" rather than "version control". --


Top    Back