Article Summary: demonstrates
a simple way to separate content from presentation using
PHP’s SAX-based XML processing methods to read and display a categorized
list of links stored in an XML file. The resulting output
is displayed using HTML.
PHP APIs used: file, implode, xml_parser_create, xml_parser_free,
xml_parser_set_option, xml_set_element_handler, xml_parse
Download the code for this tutorial
A little while back I wanted to add a new feature to my site’s home
page that displayed a list of links to useful sites categorized
by topic. It seemed like a simple enough job — just put together
some <ul>
,
<li>
, and <a>
tags and I would
be done.
The more I thought about it, however, the more I wanted to be able to accomplish with this list of links. First, I wanted to minimize the amount of editing I would need to do in order to update the list. Second, I wanted to keep the content of the list separate from its presentation, so that if I ever changed my mind about how it should look, I wouldn’t have to change a lot of repetitive HTML and CSS code. Finally, I wanted to make the list accessible from more places than just the HTML version of the site’s home page.
The solution I ended up with was to use XML to store the link data, and to use PHP’s support for SAX, the Simple API for XML, to process it and output the result. In this article I’ll review how the solution works and discuss some of the decisions made along the way.
It didn’t take long to figure out that the right solution was to use XML to store the link data. However, the decision involved more than just choosing to use XML as the storage medium — choosing how to process the XML information and display it in the browser was just as important. In this case, I decided to use the SAX method to process the XML data.
PHP4 provides two main APIs for processing XML: the SAX and DOM interfaces. Each has its own advantages and disadvantages in particular situations. The DOM is the preferred method to use when the XML document must be modified in place or when you need to keep the document around in memory to do more advanced processing of the XML data, but is more complicated to use than the SAX method. The SAX API is better when you only need to run through the XML file once and process each XML tag individually, but does not provide a way to edit the contents of the document. For my purpose, since I only needed to run through the document once and output the results, SAX was the way to go.
The SAX model of XML processing works by sequentially running through the entire XML file from beginning to end and calling event handler functions for each type of element that is encountered in the XML file. Your code tells the SAX parser what types of XML elements it is interested in, such as tags, character data, entities, etc., and then defines functions that will be called when the parser encounters that type of element in the XML file. You then register these functions with the XML parser, and give it some XML to parse.
While simple (that is, after all, what the "S" in SAX stands for), some of the major downsides of this method should be readily apparent. First, you can’t re-visit a part of the XML file that has already been processed. If you need to go back again, you have to start the processing all over from the beginning. Second, you can’t modify the XML document that is being processed. Third, the SAX parser doesn’t keep track of context for you. For example, you can’t ask the parser whether the tag you’re about to process is inside of another tag, or whether some other tag has already been processed. That information is long gone by the time your handler gets called, so if you need to keep track of things like that, you have to do it yourself.
Still, the SAX method is highly efficient when all you need to do is examine the contents of the XML file and things like order of processing aren’t very important. Since that describes the current need for processing this links file, SAX fits the bill rather well.
The format of the XML file that contains the links is fairly straightforward.
The <links>
tag is the root element of the XML file,
which contains a series of <category>
tags. Each category
tag contains one or more <link>
tags that defines each link.
The url
attribute contains the URL for the link, and the
desc
attribute contains the link’s description:
<?xml version="1.0" encoding="iso-8859-1"?>
<links>
<category desc="XAML Related">
<link url="http://longhorn.msdn.microsoft.com" desc="Longhorn
SDK Home"/>
<link url="http://longhornblogs.com" desc="Longhorn Blogs"/>
<link url="http://www.xamlon.com" desc="Xamlon.com"/>
<link url="http://www.zaml.com" desc="Zaml.com"/>
<link url="http://www.xaml.net" desc="XAML.net"/>
</category>
<!– more categories would follow here –>
</links>
When processed, each of these links will be displayed to the user as
a clickable hyperlink on the web page. The url
attribute
will be transformed into an <a>
tag, and the desc
attribute
will become the <a>
tag’s text content.
Processing the links file consists mainly of converting each of the XML tags into a corresponding HTML tag with the right content. Since we’re only interested in the content of element tags, the PHP code only needs to define handlers for elements.
The complete code to do the job is shown here:
function startElemHandler($parser, $name, $attribs)
{
if (strcasecmp($name, "links") == 0) {
echo "<div id=’linksList’>n";
}
if (strcasecmp($name, "category") == 0) {
$desc = $attribs["desc"];
echo "<p>$desc</p>n<ul>n";
}
if (strcasecmp($name, "link") == 0) {
$linkRef = $attribs["url"];
$desc = $attribs["desc"];
if ($desc == "")
echo "t<li><a href=’$linkRef’ target=’_blank’>$linkRef</a></li>n";
else
echo "t<li><a href=’$linkRef’ target=’_blank’>$desc</a></li>n";
}
}
function endElemHandler($parser, $name) {
if (strcasecmp($name, "links") == 0) {
echo "</div>n";
}
if (strcasecmp($name, "category") == 0) {
echo "</ul>n";
}
}
/* create the parser */
$parser = xml_parser_create();
xml_set_element_handler($parser, startElemHandler,
endElemHandler);
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING,
0);
// read the contents of the links file
$strXML = implode("",file(‘links.xml’));
// output each link
xml_parse($parser, $strXML);
// clean up – we’re done
xml_parser_free($parser);
The first two functions, startElemHandler
and endElemHandler
, are the
functions that will be called by the XML parser to handle
the beginning and ending of the XML tags that it encounters
during parsing. I’ll get back to those in a moment.
The XML parser object itself is created by the call to xml_parser_create()
.
This returns an object that you use in calls to other XML
parser methods. After the parser is created, the call to
xml_set_element_handler()
is where the two element handler
functions are supplied to the parser.
The call to xml_parser_set_option()
turns off the case
folding option. Case folding basically means applying an
upper-casing transform to characters that are lower-case.
This option is on by default in the parser, so the code turns
it off. If your element processing code doesn’t care about
the case of the tag names that it handles, you can just leave
it on.
The XML file is then loaded into the $strXML
variable via
the call to the
implode()
function, which reads the contents of the links
file into an array by calling the file()
function and concatenates
all of the lines into a string. This variable is then given
to the xml_parse()
method,
which starts the parsing process and results in the handler
functions being called.
The XML parser will call the element handler functions at two distinct
points: once when the opening tag has been processed, and
again when the matching closing tag is processed. The handler
for the opening tag is called when the ">
" character for
the tag is reached. At this point, the parser knows the name
of the tag, and has also collected all of the attributes
and their values into an array. Both of these are sent to
the handler function as arguments.
The startElemHandler
function transforms each of the incoming
XML tags into a snippet of HTML code that will display the
list. The top-level <links>
tag in the XML file is
converted into a
<div>
here, which will contain the entire list. Each
<category>
tag is transformed into a <p>
with
the value of the category’s desc
attribute serving as the
text content, along with an opening <ul>
tag to start
the list. Finally, each <link> tag is turned into a <li>
with a nested <a> tag that holds the value of the url attribute
as the link destination. If there is a desc attribute for
the link its value becomes the text content of the <a> tag,
otherwise the value of the url attribute is used.
The endElemHandler
function is used to close off the HTML
tags as the end of each XML tag in the links file is processed.
When the processing is finished, the xml_parser_free()
function
call frees the parser from memory and disposes of it.
Although this is a relatively simple example, the concepts presented here carry over to more complex situations as well. The content of the links file has now been effectively separated from how it will be presented to the user. This decoupling allows the XML and presentation parts to be changed independently from one another. We’ve also seen how the Simple API for XML (SAX) can be used to quickly process an XML data file when it isn’t necessary to maintain the contents of the XML document in memory or to edit the document’s contents.
Copyright © Joe Marini. All rights reserved.
For information about how to obtain permission to re-publish this material, please contact us at [email protected].