Displaying XML Documents
by Michel Rodriguez
Boardwatch Magazine
XML is the way to go: It allows creation, storage, processing and exchange of documents in a much more powerful way than HTML. It makes it easy to handle documents that include "live" information that can be easily updated, retrieved and processed for different purposes.
One thing HTML is better at, though, is displaying that information in a Web browser. HTML is a display-oriented markup: a Web browser knows how it should process each of the tags it includes. With XML the browser has no such knowledge. Chances are it doesn't even know what the tags are, much less how to display them.
So this column will explain the options available to display XML documents and give some examples of what some of the most common ones look like.
The Easy Way
The easiest way to display these documents is to create XML documents based on the XHTML DTD and to let the browser display them as HTML.
XHTML, which can be found at www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd, is "a reformulation of HTML 4 as an XML application". Simply put, it defines a way to write HTML that is also XML.
The rules are quite simple, and basically come down to:
- All tags should be properly opened and closed,
- Tags should all be written in lower case,
- Empty tags should be written in an XML style that still displays properly in browsers, with a trailing space and slash before the closing bracket (e.g. <br />).
This ensures that your documents can both be displayed in a Web browser and still be processed by XML tools.
What's the point? An XHTML document has just the same information content as an HTML one, so why bother?
The point is that new tags can now be added to those XHTML documents, identifying specific items. Take a <price> tag, or a <rating> tag for example. These tags will be ignored by the browser, so the document will display just fine, but they can be used by XML tools for any processing, such as updating the value from a database and extracting it to include in a report.
Admittedly, this easy approach is severely limited in the sense that any additional tag will not be formatted in the browser, so the <rating> element will be in the same font as the rest of the paragraph in which it is included. Better formatting can be achieved either by inserting the appropriate HTML tags in the document or by using one of the other methods discussed below.
Client-side formatting
There are two main style sheet languages for XML: cascading style sheets (CSS) coming from HTML, and XML style sheet language (XSL) specifically created for XML.
CSS
While CSS1 is a style sheet language used for HTML documents, CSS2, which can be found at www.w3.org/TR/REC-CSS2/ can also be used for XML. The major browsers on the market ( Netscape, Internet Explorer and Opera ) offer support for CSS.
Linking an XML document to a CSS2 style sheet is done by adding the following tag (actually called a processing instruction, note the <? ... ?> syntax):
<?xml_stylesheet type="text/css" href="css_style.css"?>
A CSS2 style sheet includes instructions for displaying tags found in the document in the form selector { properties;} such as this example:
my_para {display: block} rating {display:inline; font_ weight:bold;}
XSL
XSL is a W3C recommendation defining a style language for XML documents. To link a document to an XSL style sheet the following processing instruction should be added:
<?xml-style sheet type= "text/xls" href="xls_style.xlss"?>
An XSL style sheet is built with blocks
<xsl:template pattern="pattern"> <formatting objects/> </xsl:template>
and looks like this:
<xsl:template pattern="doc"> <html> <head> <title Document <body><xsl:process_children/> </body></h> </xsl:template><xsl:template pattern="p/rating"> <b><xsl:process_children/>
XSL is more complex and more powerful than CSS. For example, it can be used to create a table of contents, or even generate several HTML documents from a single XML file. A companion standard, XSLT (XSL transformation) allows even more complex transformations, such as extracting data from a database and integrating it into the file or sorting a table on various criteria.
XSL's most severe limitation is that, at the moment, it is supported by only one browser, Microsoft?s Internet Explorer. But don?t give up on it yet. There might not be much support for XSL in browsers right now, but there is certainly a plethora of server-side implementations.
At the moment, client-side formatting can be achieved only on a browser-specific basis. Furthermore, most "old" browsers do not allow it at all. So although it can certainly be implemented in an intranet using only one browser, running on only one platform (does such an environment really exist?), for general Web delivery the browser market is not yet capable to support it.
Server-side formatting
Server-side formatting is based on a simple idea: If the server cannot rely on the client browsers' style sheet implementation, or even on their implementing style sheets at all, then the best answer is to convert the XML to HTML and serve HTML files.
A number of tools allow server-side conversion from XML to HTML:
- XSL (and XSLT) processors are available in any of the major languages, Java, Perl, Python and C,
- DOM (document object model) implementations can also be used to process XML, using a language-independent (but very Java-ish) API.
Deleting the <rating> elements with the Perl version of the DOMmy $ratings= $doc->getElementsByTagNames( 'rating'); my $n= $nodes->getLength; for( my $i=0; $i < $n; $i++) { my $parent= $ratings->item( $i); $parent->removeChild( $rating); }
Any number of custom Perl, Python, Java or even Omnimark programs can be used to generate HTML from XML.
Server-side formatting is probably the most practical way to display XML today. It doesn't even have to be complex, just changing the name of document-specific tags such as <price> to HTML ones, say <b> can be easily done with most methods.
Cocoon AND AxKIT
The most powerful tools to serve XML documents are both based on Apache: Cocoon (http://xml.apache.org/cocoon/), an Apache project developed in Java and AxKit (http://xml.sergeant.org/axkit/), a Perl environment to be used with mod-perl.
Those tools create an XML content delivery infrastructure: XML documents can be processed by one or more XSLT transformations, then XSL style sheets are applied to them. Which transformation and style sheets should be applied to each document can be decided at the document level, at the document class level or depending on the target (HTML, WML ...).
Best of all, they handle caching for the server, only reformatting documents that have changed and serving the already-created HTML for the others.
Cocoon uses a dedicated language called XSP to embed logic in XML documents and to call XSL style sheets. AxKit, developed by Matt Sergeant, is written in Perl and tightly integrates with mod-perl and Apache. It is highly configurable: Although it comes with two XSLT engines, other plug-ins can be added.
AxKit and Cocoon are powerful tools that will allow the creation and maintenance of whole XML-based Web sites. Although they are still very young they are definitely worth looking at, if only to get ideas on what the process flow for such a Web site should be.
Conclusion
This brief survey of the different ways to display XML documents shows that the range of options is pretty wide, from the "do-nothing" one to those using a full-featured publishing framework. The key is that whatever system is used it can later be replaced by another, perhaps more powerful, one that will still use the same XML files.
One last piece of information that can be of interest: Everything described in this article can be implemented using Open Source tools.
Note: this article was published in 2000 in Boardwatch magazine. More recent articles about XML and especially Perl & XML can be found on www.xmltwig.com