From Milton bradley on Tue, 02 May 2000
Don't really know if you'll answer my questions but it doesn't hurt to give it a try. If you can all I can say is thanks. Well here goes
the situation is this
You and your friends have decided that e-mail is the easiest way to get your homework done for you?
[I got another question from a different address at Hotmail yesterday. It had a similarly "Do my homework for me" tone to it.]
Directory tress can include large numbers of files. Referencing a file by full path name can be burdensome on the user. Consequently in UNIX there is an environment variable $PATH (e.g. .:/bin:/usr/bin) which directs the system for the directories it is to search for an executable file. All non-executable files are looked for only in current working directory(.).
Actually this set of propositions is full of minor inaccuracies. First the $PATH environment variable is not a feature of UNIX per se. It is not unique to UNIX, and it is not necessitated by UNIX. However it is a widely used convention --- and it's probably required by POSIX in the implementation of shells and possibly some standard libraries.
Non-executable files are found according to the semantics of the program doing the opening. Usually this is a path (either an absolute path from the root directly or one that is relative to the current working directory or $CWD).
The main flaw in your propositions is that the PATH exists primarily for convenience. There is actually a more important reason for things to use the PATH.
1) Why shouldn't other non-executable file be referenced by this mechanism?
Why should they.
2) SuperUsers are cautioned that the shell should not look in the current working directory first (e.g. /bin:/usr/bin:.) for security reasons. Why?
All users are cautioned that adding . (CWD, the current working directory) to their PATH carries some risk.
Let's say that you put . on your path. If you put it at the beginning of your path you've implemented a policy that any executable in the current directly takes precedence over any other executables by that name. So I'm an evil user and I just create a program name 'ls' which does "bad things(TM)"
(I'll leave the exact nature of "bad things(TM)" to your imagination).
When 'root' or any other user then does a 'cd' into my directory and types 'ls' (a very common situation) then my program runs in their security context. I effectively can do anything that they could do. I can access any file they can access. I can completely subvert their account.
So let's put that . at the end of the PATH. That's solve the problem. Now the /bin/ls or /usr/bin/ls will be executed in preference to my copy of 'ls.'
So now the user "evil" has to get more clever. He makes a number of useful links to his "bad things(TM)" script. These are carefully crafted strings like: "sl" and "ls-al" (common typos that the hurried user might make make while visiting my directory).
Quod erat demonstratum.
3) The c-shell creates a hash table of the files in $PATH on start-up. Give one advantage of this scheme:
The hash tables is basically an index of all executables on the path. Thus one can find, in O(logN) time if an executable exists and where it is. (Look up "theta notation" in any text book on "computational complexity analysis to understand that "big Oh" notation).
4) Give one disadvantage of the above mentioned scheme:
I'll give two.
- The shell will need to malloc more memory than a non-hash version would require. It needs to build the hash table and keep it in core. Moreover this data is not shareable memory --- it is private to each instance of the shell.
- The hash table may get out of sync with the real list of executables on the disk. Some additional binaries may be added and the shell has no way of detecting it. (Shells that support PATH hashing generally also offer some command to update their hash table --- 'rehash' and 'hash -r' are common).
5) Since the system can easily maintain a list of files referenced in teh course of a login session, one could also maintain a REFERENCE FILE TABLE and use it as part of a scheme to locate files. Give one advantage of this scheme:
Which "one" could do this? Would this be a new API? What programs would support it? How?
Ergo I unask your question.
6) Give one disadvantage of this scheme:
Commands with the same name are presumed to provide compatible semantics. Ambiguity among data files is likely to have severe consequence.
One could use expressions like `locate foo` in each case where one wished to refer to "the first file named 'foo' on my data search path." One could certainly implement an API that took filenames, perhaps of the form: ././foo and resolved them via a search mechanism.
(Note: GNU systems, such as Linux, often have the "updatedb" or "slocate" packages installed. These provide a hashed index of all files on the system which are linked through publicly readable directories. Thus the `locate` command expression could be used already --- though the user wouldn't be able to implement a policy over how many and in which order the file names were returned. It would a simple matter of programming to write one's own shell function or script which read a DPATH environment variable, called the 'locate' command and search the return list for matches in a preferential order).
BTW: Some shells implement a CDPATH environment setting.
Here's an excerpt from the 'bash' man page:
CDPATH The search path for the cd command. This is a colon-separated list of directories in which the shell looks for destination directories specified by the cd command. A sample value is ".:~:/usr".
As I see it the man reason for UNIX to implement support for executable search PATH is to allow scripts to be more portable, while allowing users and administrators to implement their own polices and preferences among multiple versions of executables by the same name.
Thus when I use 'awk' or 'sed' in a script I don't care which 'awk' or 'sed' it is and where this particular version of UNIX keeps its version of these utilities. All I care about is that these utilities provide the same semantics as the rest of my scripts and commands require.
If I find that the system default 'awk' or 'sed' is deficient in some way (and if I'm a "mere mortal user") I can still serve my needs by installing a personal copy of a better 'awk' (gawk or mawk) and/or a better 'sed' (such as the GNU version). PATHs are the easiest way to accomplish this.
So, the disadvantage of implement some sort of "data path" feature into the UNIX shells and libraries would basically be:
IT'S A STUPID IDEA!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18