The Advantage of Linux
Setting up Apache
Starting and Testing Apache
Recently an article was published in Linux Gazette entitled Web Page Design Under Linux. This article produced some criticism in later issues. The main criticism seems to have been of the authors preference for hand coding HTML rather than using a HTML editor like the Windows HotDog editor. This is an argument I do not really want to get involved with. Neither do I want to spend much time on style. Whilst in most cases users want simple fast loading, clear pages, there will always be a place for garish eye candy, huge graphics and all kinds of complexities that take forever to download on a 28k modem. What I do want to address are the great things that linux offers. Great things that are free and would cost a fortune to implement on other operating systems. In particular I shall explain how to set your linux box up to be your own intranet server, and thereby fully exploit the abilities Linux offers for designing applications for the Web.
One point I think needs making, and which does not fit in with the rest of this article, is the Plugger Plug-in for Netscape Navigator. In the past many people have complained that Netscape plug-ins are not generally available for Linux. Plugger from http://fredrik.hubbe.net/plugger.html, seeks to address this by providing support for many audio/video/image types.
By way of introduction though, I will put my two penny worth into the 'editor argument'. I have never yet found a HTML editor that I like! I am writing this article in StarOffice 5.0. I have never used it to write HTML so this is something of a test. I expect I'll have to edit the source when I finish writing. Another editor that seems as good as any other I have tried is the composer part of Netscape Communicator. I find this irritating, very very irritating. Why? Because I like my text to be fully justified. OK I know that some people think that full justification 'goes against the spirit of HTML', but personally I would rather read text that is fully justified than text which is not. I do not believe I am alone in this preference.
What happens with Netscape is that after I have spent a couple of hours designing some pages until I am happy with them, I load them all into vi and change every occurrence of <P> into <P align=justify>, which can take some time if I've written a lot of text. Now a little later I want to make some changes, so I load the pages into Netscape Composer and I make some changes. But whist Communicator understands <P align=justify>, Composer does not. In fact Composer does not allow <P align=justify> and changes each occurrence back to <P>.... Bummer... I have to re-edit all the source by hand again. If I thought there was some advantage to using Composer, rather than hand writing my HTML I guess I would write a little program to search HTML files for <P> and replace with <P align=justify>. But this is not the only short coming of HTML wysiwyg editors. They just don't seem able to do exactly what I want, how I want.
OK in fairness I am now impressed with StarOffice! Although there is no button to give full justification, it is easy to edit the Text Body style so that full justification is automatic. It is also easy to automatically indent the first line of a paragraph, set double line spacing etc. etc. Maybe I will be converted to using a wysiwyg editor for my HTML after all.
One feature that seems to be missing from StarOffice 5.0, is any easy way to define lists. Tables are well supported, but lists are not. I guess that it should be possible to define some new styles to allow the use of different kinds of list, but one would have thought that a button should be available for them. Also given the different kinds of list available for HTML, one might find that the styles menu becomes cumbersome and more difficult than it should be.
OK simple layouts are quicker with a HTML editor, but if you want full control you have to hand edit at some point. So to my way of thinking if you want to write good HTML you must learn HTML. It is a very bad idea to to think you can skip learning HTML by getting an editor that works like a word processor. You will not have the skills you need to produce good web pages. HTML is very easy to learn. Once you know it then you might find that Netscape or StarOffice provide useful tools to help you. But please do not think such tools replace the need to be able to hand code HTML.
The essential document to read if you want to produce great Web-Pages efficiently is HTML 4.0 (W3C: HTML 4.0 Specification), this is the full Document Type Definition for HTML and SGML. For once I have taken my own advice and read it! The problems I mention above regarding text formatting have all been solved for me! I look at the HTML source StarOffice has given, whilst I am impressed, I am not happy. Again I think that an editor like vi or emacs really is better and more efficient than using a wysiwyg editor.
The reason is that HTML 4.0 allows the use of Style Sheets. This article depends upon the use of a style sheet, special.css. This is a document that says how a browser should render my document. An important feature is that browsers that cannot display certain things (e.g. graphics) are not disadvantaged. All browsers can access this page in the way I intend them to. In the past authors have been forced to use techniques to format their pages that cannot be displayed correctly on all browsers. Propriety HTML extensions, the conversion of text into graphics, the use of images for white space control, the use of tables for layout and even the use of programs, have all been used to format text, all these methods cause difficulty for users and extra work for developers. The correct use of style sheets avoids these problems.
Once you are familiar with the use of style sheets, it will not matter how badly Netscape Composer performs, or how unfamiliar you are with StarOffice, using an editor like vi, really can be simpler than using something like Hotdog. Load my style sheet into your favorite editor and see for yourself how easy it is to change the look and feel of this document (this link and the one above are to an identical copy called special.txt, so that you can see the source without the browser parsing it).
Even as I am writing this document, I have found yet another web browser for linux! This one is worh some attention since it is produced by the W3 consortium, the same people who define the HTML specification. In fact this is the browser they use to test their specification. The following text is displayed when you start it for the first time:-
This browser certainly has some advantages. The version I have is still beta (1.3b), so there are some short comings. I found that the File - Open Document dialog can resize its file box so it is non-functional. Also for some reason not all directories can appear in the directory box. At least one can specify the required file in the URL box! The fact that the manual does not come with the package is a definate minus for me.
What is nice about the browser is the pleasent way it renders pages. This page, for instance, uses full text justification, Amaya can actually split words in the traditional manner when required.
The really nice thing about this browser is the fact that you can edit files as you browse them. So if you are creating a document with many pages it is easy to switch between them. The down side of this is that there seems to be no way to to edit or view document source. Something that I would like to see in other browsers is the ability to create a "Table of Contents", with Amaya you can generate one based on the <H...> elements in your document. This will pop up as a seperate window and allow you to easily navigate through a document that has no links of its own.
At about 4.5 Megbytes, this is probably a very good alternative to StarOffice if you do not have the disk space required for StarOffice. I am certainly interested in seeing how this browser develops in the future. If you want to give it a try you can obtain it from the Amaya homepage. Additionaly there was a review of an earlier release of Amaya in Linux Gazette some years ago see issue 15. All I have to add to that review is that improvements must have been made. It seems the same in appearence as the screen shots show. Amaya displays the old style of Linux Gazette Contents pages quite well, but the new style in the last three or four issues is completey garbled. When Amaya starts up it no longer looks for a page on its home site, and I have not seen it seg fault as described. On the whole it does a very good job.
Now I've got that out of my system I'll get on to my main point. Drum roll please..... With linux it is simple to build a system you can gain http access to. Trumpet fanfare please.
Even if, like me, you are a stand alone machine, with no kind of network, it is easy to start up your favorite browser and http:// yourself. This means you can get into the wonderful worlds of cgi scripts, client server applications, java. etc. etc. etc. Without the need to access a 'real' network you can test any network application you care to develop for the Internet. You can test every aspect of your web design without wasting a phone bill. You can test applications safe in the knowledge that no matter what mistakes are in your code, only the machine you are using will be at risk, the "real" network will be unaffected until you decide your code is working correctly.
Web page design is not just about putting text/graphics and links onto the Internet. Increasingly it is about providing good user interfaces to network applications and providing an efficient means of communication. In the past only the largest corporations could afford to implement a WAN (Wide area network). Today anybody with a modem and pc can join the Internet, or implement their own intranet (a private network that acts in the same way as the Internet).
To illustrate my points consider the following scenario. You own a small tobacconists and live in a village called Tiny. Because the village is small you do not have many customers, so you don't sell items in vast numbers. That means you do not buy in large quantities from your suppliers and you cannot get the kind of discounts larger shops would get. But you have many relations and friends in other, similar villages who also run small tobacconists. If you all clubbed together and ordered your supplies as one entity you could take the discount advantages of bulk buying from your suppliers. The only problem is knowing which shop needs what items at any given time. You know that the discounts you would get would allow you to employ a van driver to deliver to all the shops and still leave each shop a significant saving.
How can web design under linux help you solve this problem?
The 'man with a van' needs information, what to buy in what quantity and where to deliver it. This sounds like a classic database application. Linux offers many sql database solutions. We want to keep costs to a minimum, we also want to maximize security and reliability. So good choices might be ingres or postgreSQL. If we look at these DBMS's we find that postgreSQL comes with a java interface. So lets say we design a suitable database with postgreSQL. This database will be held on a box that will be our server.
What we need is the ability for each shop to communicate with the server to tell it what stock we need to buy in. Shop keepers do not have to be computer literate. They also do not want to spend much money on computer systems. At least at this time it is unlikely that they could be persuaded to learn a UNIX operating system like linux. Cheap boxes already have Windows. An ideal solution is one where each shop can dial into the server, the manager can start up his/her favorite browser and use it to enter information to the server. It should not matter what operating system each shop uses.
What does our server need to do?
The first thing is to get Apache set up and running. Apache is a web server and comes with most if not all linux distributions as standard. What is not always clear is how to set it up correctly. This is something an installation program cannot do (easily) and needs to be done by hand. It is Apache that allows us to http ourselves. Of course, we will also need to allow remote machines to dial into our server, but that is a matter outside the scope of this document.
Once Apache is running we can design a java application to act as a user interface to our database.
We can test both the client and the server parts of our application on our server until we are certain it performs as required.
Then all we need to do is allow the shopkeepers to be able to dial into the server and gain access via their browsers to the java database interface.
The wonderful thing is that at the test stage we only need to use one linux box which acts as both client and server at the same time.
If you do not already know, then Apache is one of the most common http servers in existence. A great many ISP's (Internet Service Providers) use Apache to give their clients (i.e. You) access to the world wide web.
This document does not attempt to address the requirements of a true Internet or intranet server. All I am concerned with here is getting Apache up and running on a standalone machine so that client/server software can be tested. In particular I am not concerned with security issues here. If you do not intend to have a permanent network connection then all should be well. If you intend other machines to have access to your http server then you should read all the relevant documentation. Complete configuration of Apache can be a very complex issue which does not fall within the scope of this document.
Modern Linux distributions, such as S.u.S.E., have special requirements for setting up Apache correctly. To avoid confusion please read the documentation that came with both your linux distribution and your Apache distribution. The following steps will work for any Linux distribution, but be warned, if your distribution has special requirements I cannot be responsible for getting your system startup files in a mess.
For instance I shall describe how to start Apache automatically at boot time by adding a line to your /etc/inittab. Whilst some Slackware users will benefit from this approach S.u.S.E. users should find that it is better to edit their /etc/rc.config file in the appropriate manner.
These steps will prepare your machine for the installation of Apache. You might find that Apache is already installed, following the above steps will not hurt such installations.
127.0.0.1 localhost 127.0.0.2 Hawklord.Varteg Hawklord
ALL: 127.0.0.1 ALL: 0.0.0.0 ALL: localhost ALL: Hawklord.Varteg
If Apache is not already installed, find a pre-compiled version and install it as per the instructions. You should find that configuration files are placed under /etc/httpd, and other files are installed under /usr/local/httpd.
The directory /usr/local/httpd/htdocs should contain the Apache user manual in html format. Actually this directory will become the root directory of our http site, so you may want to move this documentation elsewhere eg. /usr/doc/Apache.
When you log into a http site, eg http://linux.org, you find yourself at the root of what can be a very complicated directory structure. You can think of a http site as being a file system just like your own root file system. Whilst it is true that to a user the http site will look like a regular file system, the reality on the servers hard disk(s) can be very different. It is important to understand the differences and use them to your advantage.
On my system the document root is at /usr/local/httpd/htdocs, and this is the directory a user lands in when they access http://Hawklord.Varteg. But there is only one file and no sub-directories on my hard disk. I only keep index.html in the physical location /usr/local/httpd/htdocs. All the documentation users can access is held in other locations on my hard disks.
Looking again at /usr/local/httpd you should find other sub-directories, in particular cgi-bin and icons. These directories should seem to be located under your document root because they will contain files that should be available to any html file on your site that requires them. Though a user should not be able to directly access these directories. Much of my documentation is under /usr/doc, so I make that directory appear as /doc to the http server.
What this means is that you can store all your documentation on the server in locations that seem logical to you, you do not need to copy files or even make symbolic links to /usr/local/httpd/htdocs. Instead plan how you want your documentation to appear to a user. Also you can have directories that users cannot directly access, but which html documents can access.
For instance, the directory /usr/doc/ contains
Linux_gazette Howto Ldp java-documentation
I also want to access files under /usr/hobbies/literature and /usr/src/java/applets
I want my site to have the following structure:
/ ---> cgi-bin docs ---> Linux_gazette Howto Ldp java-documentation literature icons java_applets
Planning your http site in this way will save you headaches in the future!
/etc/httpd/httpd.conf is the main configuration file for Apache. Some versions of Apache and/or Linux distributions recommend that all configuration information is kept in this file. Other versions recommend that you use all three files I shall mention below. If you want to keep all information in one file, simply put all the information in one file, there is no real difference between the two methods. You will find that the example files will contain sufficient comments to enable you to make the best choices for your system. I am only going to describe the changes you need to make to get Apache to work for you. Careful reading of the files will let you configure Apache better for your needs.
I am aware that a TCL configuration utility called Comanche exists for Apache. However, this is still in an early stage of development, so I do not recommend it for beginners. I found in practice the utility would not function correctly if you use only httpd.conf to configure your system. However it could prove useful for experimenting with different configurations.
For each line in the configuration files you can assume that your example file has a correct or sensible entry, unless I specifically mention it. Back up the examples before you make any changes!
This file contains site specific information. It is where we define how our site will look to a user.
Alias /cgi-bin/ /usr/local/httpd/cgi-bin/ Alias /docs/ /usr/doc/ Alias /docs/Linux_gazette/ /usr/doc/Linux_gazette/ Alias /docs/Howto/ /usr/doc/Howto/ Alias /docs/LDP/ /usr/doc/LDP/ Alias /docs/java-documentation/ /usr/doc/java-documentation/ Alias /docs/literature/ /usr/hobbies/literature/ Alias /icons/ /usr/local/httpd/icons/ Alias /java_applets/ /usr/src/java/compiled/
This file contains permissions for our sites directories. If. when you test your configuration by starting httpd and pointing your browser to (eg.) http://Hawklord, or http://localhost (both will work for the above example), you get a file access error you will need to alter this file. Each directory in your site should have its own entry.
By default Apache has a very restricted set of permissions for the root directory, I have found that changing to:
<Directory /> Options All Order allow,deny Allow from all Options FollowSymLinks </Directory>
solved some problems for me. It is important to realize that a directory inherits its permissions from its parent directory. So if you want to allow outside access to your site you need to take great care when setting up your directory permissions.
Once you are satisfied that you have correctly installed and configured Apache, you will want to test it! Log into your machine as root. At the prompt type:
#: httpd &
Now you can log into your machine as any user, start your favorite browser and enter the URL http://localhost. If all goes well you should load the Apache site file index.html. That is unless you moved the Apache documentation and provided your own index.html in /usr/local/httpd/htdocs
Once you ar satisfied that all is well, you will want to have httpd start at system boot time. Some Linux distributions, such as Red-Hat or S.u.S.E. will have a script to start Apache in their init.d directory. If this is the case then you just need to enable the script for sys V init in the normal manner.
As an alternative you can put the following line in your /etc/inittab
'ap' must be a unique identifier. '45' refers to the runlevels for which the command will be executed. Once is probably safer to use than 'respawn', since if there is a mistake in this line you will see a lot of error messages ;-(
The final part of the line '/bin/su --command=/usr/sbin/httpd', is intended to start up Apache as a process owned by wwwrun. It would be wise to test this command before you put it in your /etc/inittab.
If you have Apache running, and a large linux installation, then you might want to consider implementing a search engine. S.u.S.E. Linux provides htdig, in fact to gain full benefit from the S.u.S.E. Help System you need to use something like htdig. The only problem is the disk space you will need. I have a 1Gig partition devoted to documentation, this may seem a lot to many users! I have a lot of personal documentation, program documentation (increasingly this is HTML), all issues of Linux Gazette, Gimp documentation, java documentation etc. This takes about 500 Meg. The database htdig uses is between 200 - 300 Meg on my system. To update the database I need 200 - 300 Meg spare under /tmp. Actually when I update the database I change the location of /tmp since I do not have enough space on my root partition. Now since I have arranged all the documentation to be available to Apache, it is all referenced in htdig's database. If I have a question about any aspect of linux, or any of my personal subjects, all I have to do is formulate a suitable search pattern. I cannot adequately describe the savings in time this has given me! In the past I would have needed to access newsgroups to find answers to my problems. With htdig I can avoid this 99.9% of the time! Given the low cost of hard disk space, the fact that current program documentation is usually given as html, that most documentation of any kind is available as html, then it makes good sense to use Apache in conjunction with a search engine in order to have a most efficient information retrieval system.
Htdig may not be perfect, if you are used to Infoseek or lycos, it is a bit annoying because you cannot search for a phrase e.g. "starting the x server". Rather a document is searched for that contains all the words you enter. An advantage is that related words are searched for as well, e.g. if you search for 'god' you can also get results for 'gods' and 'godly'. Once you get used to htdig it becomes an indispensable tool. The time it saves you in looking for information is well worth the cost in terms of disk space. (on my system the real cost is about 250 Meg, though I need another temporary 250Meg when re-building the database).
Finally I shall mention Linux's SGML (Standard Generalized Markup Language) support, this is not normally concidered part of web page design since most home users will simpy want to be able to create their own HTML home pages and have no other use for such documents.
However, a great many people will want to produce documents in many formats. The same document might need to be available for publication as a book, or as an info page as well as being available as web pages. The linux documentation project contains many documents that are available in different formats according to users needs.
SGML allows a single source to be used to produce many different kinds of
text format. The following package descriptions are taken directly from
the S.u.S.E. 6.0 distribution, though they should all be available for
I have no real experience with SGML so I will leave the appraisal of these packages to the reader. For some people these will prove indespensible tools for
producing HTML pages.