Linux, Java and XML Eoin Lanemailto:eoinlane@esatclear.ie This article is a basic introduction to the new web markup language XML and the transformation language XSL. Here I show how the Apache web server can be configured using the servlet engine JServ, to do client side XML/XSL transformation using Apache's Cocoon servlet. Future updates for this article will be located at http://www.inconn.ie/article/cocoon.htmhttp://www.inconn.ie/article/cocoon.htm.
Introduction The eXtensible Markup Language (XML) is a powerful new web markup language (ISO approval in February 1999). It is a powerful way of separating web content and style. A lot has been written about XML, but to be used effectively in web design the technologies behind it must be understood. To this end I have added my own two pence worth to the already vast amount of literature out there on the subject. This article is not a place to learn XML, nor is it a place where the capabilities of XML are explored to their fullest, but is is a place where the technologies behind XML can be put in practice immediately. Before I go any further, I should recommend the two sites where definitive information on XML can be obtained. The first is the World Wide Web Consortium (W3C) site http://www.w3.org/http://www.w3.org/. The W3C are responsible for the XML specification. The second site is the XML frequency asked questions site (http://www.ucc.ie/xml/http://www.ucc.ie/xml/) which will answer any other questions. I also recommend the XML pages hosted by IBM, http://www.ibm.com/xml/http://www.ibm.com/xml/, where you will find a wide range of excellent tutorials and articles on XML. The original web language, SGML (around since 1986) is the mother of all mark-up languages. SGML can be used to document any conceivable system; from complex aeronautical design to ancient Chinese dialects. However, it suffers from being over complex and unwieldy for routine web applications. HTML is basically a very cut down version of SGML, originally designed with the scientific publishing community in mind. It is a simple mark-up language (it has been said "anyone with a pulse can learn it") and with the explosion of the web it is clear that the people with pulses have spoken. Since its foundation the web has grown in complexity and it has long outgrown its lowly beginning in the scientific community. Today web pages need to be dynamic, interactive, back-ended with databases, secure and eye catching to compete in an ever more crowded cyberspace. Enter XML, a new mark-up language to deal with the complexities of modern web design. XML is only 20 percent as complex as SGML and can handle 80 percent of SGML situations (believe me when you are talking about coding ancient Chinese dialects, 80 percent is plenty). In the following section I will will briefly compare two markup examples, one in HTML and the second is XML, demonstrating the benefits of an XML approach. In the final section I will show you how to set up an Apache web server to serve an XML document so that you may begin immediately to start using XML in your web design.
HTML The following example is a very simple HTML document that everyone will be familiar with: Two important points can be made about this document. The content and style are tied together in the document. It would be very difficulty for a search program to search this document and extract the mail address of Eoin lane. XML addresses these two issues.
XML The XML equivalent is as follows The first thing to note is that this document, along with all other valid XML documents, is well formed. To be a well formed document every tag must have an open and close brace. A program searching for the mail address then has only to locate the text in between the opening and closing tags of mail. The second and crucial point is that this XML document contains just data. There is nothing in this document that dictates how to display the author's name or his mail address. In practice it is easier to think about web design in terms of data and presentation separately. In the design of medium to large web sites, where all the pages have the same look and only the data is changing form page to page, this is clearly a better solution. Also it allows a division of labour where, style and content can be handled by two different departments, working independently. It also allows the possibility of having one set of data with a number of ways of presenting it. An XML document can be presented using two different methods. One is using a Cascading Style Sheet (CSS) (see http://www.w3.org/style/css/http://www.w3.org/style/css/) to markup up the text in HTML. The second is using a transformation language called XSL, which converts the XML document into HTML, XML, pdf, ps, or Latex. As to which one to use, the W3C (the people responsible for these specification) has this to say:Use CSS when you can, use XSL when you must. They go on to say: The reason is that CSS is much easier to use, easier to learn, thus easier to maintain and cheaper. There are WYSIWYG editors for CSS and in general there are more tools for CSS than for XSL. But CSS's simplicity means it has its limitations. Some things you cannot do with CSS, or with CSS alone. Then you need XSL, or at least the transformation part of XSL. So what are the things you cannot do with CSS? In general everything that needs transformations. For example, if you have a list and want it displayed in lexicographical order, or if words have to be replaced by other words, or if empty elements have to be replaced by text. CSS can do some text generation, but only for generating small things, such as numbers of section headers.
XSL XSL (eXtensible Stylesheet Language)http://www.w3.org/style/xsl/ is the language used to transform and display XML documents. It is not yet finished so beware! It is a complex document formating language that is itself an XML document. It can be further subdivided in two parts: transformation (XSLT) and formatting objects (sometimes referred to as FO, XSL:FO or simply XSL). For the sake of simplicity I will only deal with XSLT here. XSL Transformations (XSLT) As of the 16th of November 1999 the World Wide Web Consortium has announced the publication of XSLT as a W3C Recommendation. This basically means that XSLT is stable and will not change in the future. The above XML document can be transformed into a HTML document and subsequently displayed on any browser using the following XSLT file. To learn more about XSLT, I recommend the XSLINFO site (http://www.xslinfo.com/http://www.xslinfo.com/ as a good starting point. Also I found the revised Chapter 14 from the XML Biblehttp://metalab.unc.edu/xml/books/bible/updates/14.html to be very good. This revision is based on the specifications that eventually became the recommendation. With the arrival of the next generation of browsers, i.e. Netscape 5 (currently under construction http://www.mozilla.org/http://www.mozilla.org) this transformation with be done client side. When an XML file is requested the corresponding XSL file will be sent along with it, and the transformation will be done by the browser. Currently there are a lot of browsers only capable of displaying HTML, and until then the transformation must be done server side. This can be accomplished by using Java servlets (Java server side programs). The Cocoon servlet is such a servlet, written by some very clever people at Apache (http://www.apache.org/http://www.apache.org/). It basically takes an XML document and transforms it using a XSL document. An example of such a transformation would be to convert the XML document into HTML so that the browser can display it. So if your web server is configured to run servlets, and you include the cocoon servlet, then you can start designing your web pages using XML. The rest of this article will show exactly how to do this.
How do I do it? I have tested the following instructions on a fresh installation of Red Hat 6.0, so I know it works. Apache Web ServerFirst set up the Apache web server. On Red Hat this comes pre installed but I want you to blow it away using: rpm -e --nodeps apache and do not worry about the error messages. Next get a hold of the most recent Apache (http://www.apache.org/http://www.apache.org/) (currently verison 1.3.9) and copy it somewhere handy. I put mine in /usr/local/src. Tar and unzip the file using: tar zxvf apache_1.3.9.tar.gz This will expand the installation into the directory /usr/local/src/apache_1.3.9. Change into this directory and configure, build and install the application using the following:./configure --prefix=/usr/local/apache --mandir=/usr/local/man --enable-shared=maxmakemake install This will install apache into the directory /usr/local/apache and the important file to note here is http.conf which can be found in the directory /usr/local/apache/conf. This file contains most of the important information necessary to run apache correctly. It contains information on: where to serve the web documentsfrom, virtual web servers and folder aliases. We will be returning to this file shortly so become familiar with it's general layout. At this stage I had to reboot Linux and then start Apache using the following instruction /usr/local/apache/bin/apachectl start To test it, point your web browser to http://localhost/http://localhost/ and you're in business, hopefully! For good web design and planning I would refer you to an article that I found invaluable in setting up my own web site: Better Web Site Design under Linuxhttp://www.linuxgazette.com/issue43/gibbs/Web_Design.html Java and JSDK As of October, IBM have released the Java Development Kit 1.1.8 for Linux. It claims to be faster than the corresponding Blackdown's (http://www.blackdown.org/http://www.blackdown.org/) and Sun's JDKs. Download IBM JDK (see http://www.ibm.com/java/http://www.ibm.com/java/). Again tar and unzip this into the /usr/local/src/jdk118 directory. Next, download the JavaSoft's JSDK2.0http://java.sun.com/products/servlet/, the solaris version (not JSDK2.1 or any other flavours you might be tempted to get) and tar and unzip it - again I put it in /usr/local/src/JSDK2.0. Add the following or equivalent to /etc/profile to make them available to your system. JAVA_HOME="/usr/local/src/jdk118" JSDK_HOME="/usr/local/src/JSDK2.0" CLASSPATH="$JAVA_HOME/lib/classes.zip:$JSDK_HOME/lib/jsdk.jar" PATH="$JAVA_HOME/bin:$JSDK_HOME/bin:$PATH" export PATH CLASSPATH JAVA_HOME JSDK_HOME To test them run: java -version at the command prompt, and you should get back the following message java version "1.1.8" and to test the servlet development kit run: servletrunner and if all goes well you should get back the following: servletrunner starting with settings: port = 8080 backlog = 50 max handlers = 100 timeout = 5000 servlet dir = ./examples document dir = ./examples servlet propfile = ./examples/servlet.properties We are now ready to install Apache's servlet engine, ApacheJServ. ApacheJServ Again, download the latest ApacheJServ (version 1.0 at this time, although version 1.1 is in it's final beta stage) from Apache's Java Site (http://java.apache.org/http://java.apache.org/) and expand it into /usr/local/src/ApacheJServ-1.0/. Configure, make and install it using the following instructions: ./configure --with-apache-install=/usr/local/apache --with-jsdk=/usr/local/src/JSDK2.0 makemake install When this has successfully completed add the following line to the end of the http.conf file that I refereed to earlier during the Apache web server installation: Include /usr/local/src/ApacheJServ-1.0/example/jserv.confand restart the web server using: /usr/local/apache/bin/apachectl restart Now comes the moment of truth, point your web browser to http://localhost/example/Hellohttp://localhost/example/Hello and if you get back the following two lines:Example Apache JServ Servlet Congratulations, Apache JServ is working! then you are almost home. Cocoon Finally, download the latest version of Cocoon (version 1.5 at this time) from Apache's Java Site (http://java.apache.org/http://java.apache.org/). Cocoon is distributed as a Java jar file and can be extracted using the command jar. First, create the directory /usr/local/src/cocoon and then expand the cocoon jar file into it: mkdir /usr/local/src/cocoon jar -xvf Cocoon_1.5.jar Now comes the tricky part of configuring the JServ engine to recognise a file with a .xml extension and to use the cocoon servlet process and serve them. Locate the file jserv.properties which you will find in the directory /usr/local/src/ApacheJServ-1.0/example/ and at the end of the section that begins:# CLASSPATH environment value passed to the JVM add the following: wrapper.classpath=/usr/local/src/cocoon/bin/xxx.jar In the case of Cocoon 1.5 this means adding the following three lines: wrapper.classpath=/usr/local/src/cocoon/bin/fop.0110.jar wrapper.classpath=/usr/local/src/cocoon/bin/openxml.106-fix.jar wrapper.classpath=/usr/local/src/cocoon/bin/xslp.19991017-fix.jar Although these files will change with different versions. The next file to locate is the example.properties file, again found in the /usr/local/src/ApacheJServ-1.0/example/ directory and add the following line: repositories=/usr/local/src/cocoon/bin/Cocoon.jar In my example.properties file it meant changing the line: repositories=/usr/local/src/ApacheJServ-1.0/example to the following: repositories=/usr/local/src/ApacheJServ-1.0/example,/usr/local/src/cocoon/bin/Cocoon.jar Also add the following line to the end of the example.properties file: servlet.org.apache.cocoon.Cocoon.initArgs=properties=/usr/local/src/cocoon/bin/cocoon.properties The JServ engine is now properly configured and all that is left for us to do it to tell Apache to direct any call to an XML file (or any other file you want Cocoon to process) to the Cocoon servlet. For this we need the JServ configuration file, jserv.conf mentioned earlier (again in the same directory). Include the following line: ApJServAction .xml /example/org.apache.cocoon.Cocoon In order to access the cocoon documentation and examples add the following lines to the alias section of your http.conf file: Alias /xml/ "/usr/local/src/cocoon/" Alias /xml/example/ "/usr/local/src/cocoon/example/" Restart the web browser for this to take effect: /usr/local/apache/bin/apachectl restart Now point your browser to http://localhost/xml/http://localhost/xml/ to browse the documentation and http://localhost/xml/example/http://localhost/xml/example/ to try out the examples. If Cocoon complains about a exceeding a memory limit then open the file cocoon.properties found in the /usr/local/src/cocoon/ directory. Find the line store.memory = 150000and change it to something lower like 15000. To try out the PDF examples, which I think are very cool, you have to have Acrobat Reader installed as a netscape plug-in, but it is worth the extra effort to get this working.
Cocoon 2 The Cocoon 1.x series has basically been a work in progress. What started out as a simple servlet for static XSL transformation has grown into something much more. With this ongoing development, design considerations taken at the beginning of the project are now hampering future developments as the scale and the scope of the project becomes apparent. To add to this, XSL is also a work in progress, although the current version of XSLT has become a W3C Recommendation (as of November, 16 1999). Cocoon 2 intends to address these issues and provide us with a servlet for XML transformations that is scalable to handle large quantities of web traffic. Web design of medium to large sites in the future will be based entirely around XML, as its benefit become apparent, and the Cocoon 2 servlet will hopefully provide us with a way to use it effectively.
Conclusions Even as I have been writing this article, Apache have opened a new site dedicated exclusively to XML (see http://xml.apache.org/http://xml.apache.org/). The cocoon project has obviously grown beyond all expectations, and with the coming of Cocoon 2 will be a commercially viable servlet to enable design of web sites in XML to become a reality. The people at Apache deserve a lot of credit for this so write to them and thank them, join the mailing list and generally lend your support. After all this is open source code and this is what Linux is all about.