XML

XML

The first thing you should know about XML is that it's over-hyped. When it first started becoming big news around the turn of the millennium, everybody seized on it as the successor to HTML. It really isn't; today, even XML proponents acknowledge that XML is a supplement to HTML, not a replacement. Furthermore, XML doesn't do anything from the user's perspective; Web surfers won't actually notice that a site uses XML.

XML is really just another step in a long series of computer innovations that make things easier on developers, further working to encapsulate data so that programmers are isolated from the machines they work on and allowed to concentrate more on the actual data they use. That's the sort of thing that object-oriented coder types love, and the kind of thing that tends to annoy computer engineer types like me. XML also continues the tradition of new protocols that expand the current set of Web technologies to make the Web do things it was never intended to do with HTTP.

Okay, so what is XML good for? Essentially, XML is a system that allows you to define your own HTML tags instead of being dependant on the standard HTML tags that are defined by the W3C. HTML tags define how data should look; XML tags define what data actually is. A typical snippet of XML sample code looks something like this:

<buscard>
<name>John Smith</name>
<phonenum>555-1234</phonenum>
<email>jsmith@somewhere.com</email>
</buscard>

This XML code defines a data set called "buscard", representing someone's business card. Within this business card are data fields for the person's name, phone number, and e-mail address. You can format a whole website in this way, indicating what all the data is, rather than just how it's supposed to look. There are two basic reasons why you might want to do this over the standard HTML approach:

1. You can create a template for how each of these elements is supposed to look. For example, you could make it standard so that the e-mail address is rendered in a specific font. Once you define this for the <email> tag, you don't need to define it again; all you need to do is use that tag within the site code. If you ever want to change how the e-mail addresses look, you can do it by simply changing the definition for the tag, rather than changing every instance of an e-mail address on the site.

2. Knowing what type of data is being given is useful for non-human surfers. A wide variety of bots search web sites looking for data, and using XML tags that specify exactly what kind of data is being represented helps these Web bots. For example, if you have a bot that's looking for phone numbers, all the bot has to do is look for the <phonenum> tag and it knows it's seeing a phone number. This would be more difficult to do with HTML, which doesn't specify what each piece of data actually is. XML makes it easier to interface websites with databases, since the database can easily identify what each piece of data is.

Okay, so that's what XML does. The question that probably comes to your mind first is: I can create these customized tags to state what each piece of data is, but where do I define these tags? Surely I need to create (for example) the phonenum tag before I can use it. If that's what you're thinking, you're right. XML includes two important concepts that allow you to describe how your custom-created tags work: Extensible Stylesheets, and Document Type Definitions (DTD). We'll discuss extensible stylesheets first.

If you've ever used CSS (Cascading Style Sheets) for Web design, you're already familiar with the concept of using a style definition to describe how each part of a website looks. Extensible stylesheets are like the same concept for XML: With an extensible stylesheet, you can specify how a person's name, phone number, and e-mail address are formatted. Extensible stylesheets are typically made using Extensible Stylesheet Language (XSL). To use XSL, you create a reference to an XSL file from within your XML file, and then the XSL file contains the style descriptions for each element in the XML file.

To create a very simple example of how XSL might work, consider the following XML file:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="xmlstyle.xsl"?>

<bigblueheader>
</bigblueheader>

The first line of this file simply identifies the file as an XML file, using version 1.0 of XML. The second line is a reference to an XSL file called "xmlstyle.xsl" which we'll create in just a moment. Finally, the file contains a "bigblueheader" tag, to create a big blue header. As you can probably guess, this is a custom-made tag, and to explain just what this big blue header looks like, we'll create an XSL file. Our xmlstyle.xsl file might look something like this:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

<xsl:template match="bigblueheader">
   <html>
   <font color="blue"><h1>This is my header!</h1></font>
   </html>
</xsl:template>

</xsl:stylesheet>

Again, the first line of this file just identifies it as an XML file. The second line begins an XSL stylesheet, and the last line of the file concludes the XSL stylesheet. The five lines in the middle of the file are the key: They create a template for an element called "bigblueheader", and when such an element is found in an XML document, this stylesheet specifies how that element should look. In this case, we're making it look like a blue, large-font header that says "This is my header!" Inane, maybe, but it works.

Now that we've seen an XML stylesheet, we want to get familiar with DTD (Document Type Definitions), which are much like variable declarations in programming languages. A DTD specifies a data element, including what other entities that data element might exist in and what kind of data the element holds. For example, suppose we want to make an XML file for a business card. Within this business card, we want to contain a phone number. Our XML file might look something like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="xmlstyle.xsl"?>

<!DOCTYPE buscard [

   <!ELEMENT phonenum (#PCDATA)>

]>

<buscard>

  <phonenum>555-1234</phonenum>

</buscard>

We already know that the first two lines just introduce the XML file and point to our XSL file. The next three lines of the file constitute the DTD; In these lines, we state that the document is of a type called "buscard", and this document can contain elements called "phonenum". phonenum elements can contain a string of data ("#PCDATA" is a generic XML identifier referring to a string of data). Finally, after defining the buscard and phonenum tags in the DTD, we go ahead and create a buscard which contains one phonenum.

We've declared our data types so that they exist, but we still haven't stated how they're supposed to look. Rememeber, defining how XML tags are supposed to look on the webpage is the job of the stylesheet. Our XSL file might look like this:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

<xsl:template match="/">
   <xsl:apply-templates />
</xsl:template>

<xsl:template match="buscard">

        <xsl:for-each select="phonenum">
                <b>Phone number:</b> <xsl:value-of/>
        </xsl:for-each>

</xsl:template>

</xsl:stylesheet>

The key elements of this file are the ones for "buscard"; notice that once the parser finds a "buscard" element, we use a "xsl:for-each" tag. This tag specifies that for each instance of its operator, we should do something. In this case, the operator is "phonenum", so for each instance of a phonenum element in the buscard, something should be shown. The next line states that what should be shown is a boldified block of text saying "Phone number: ", and the <xsl:value-of/> tag means that the actual value of the phonenum element should be printed there.

Of course, our original example of a business card contained more than just a phone number; it also contained a person's name and e-mail address. These elements can be easily added to our XML functionality by simply adding their data types to the DTD, and also adding what they look like to our stylesheet. Our XML file, then, might look like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="xmlstyle.xsl"?>

<!DOCTYPE buscard [

   <!ELEMENT name (#PCDATA)>

   <!ELEMENT phonenum (#PCDATA)>

   <!ELEMENT email (#PCDATA)>

]>

<buscard>

  <name>John Smith</name>

  <phonenum>555-1234</phonenum>

  <email>jsmith@somewhere.com</email>

</buscard>

We have created name, phonenum, and email elements. All that's left is to define what they look like in our XSL file:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

<xsl:template match="/">
   <xsl:apply-templates />
</xsl:template>

<xsl:template match="buscard">

        <xsl:for-each select="name">
                <b>Name:</b> <xsl:value-of/><br/>
        </xsl:for-each>

        <xsl:for-each select="phonenum">
                <b>Phone number:</b> <xsl:value-of/><br/>
        </xsl:for-each>

        <xsl:for-each select="email">
                <b>E-mail address:</b> <xsl:value-of/><br/>
        </xsl:for-each>

</xsl:template>

</xsl:stylesheet>

Now we have style definitions for each of our data elements. (The "br/" tag on the end of each style definition is XML's line-break tag, so that after the element is printed, we move to the next line so everything doesn't appear on one line.)

We're not limited to just using each of these data elements once; Since in our XSL file, we're saying to do something for each instance of that data element, we can include (for example) multiple phone numbers if a person has more than one. Our XML file, then, might look like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="xmlstyle.xsl"?>

<!DOCTYPE buscard [

   <!ELEMENT name (#PCDATA)>

   <!ELEMENT phonenum (#PCDATA)>

   <!ELEMENT email (#PCDATA)>

]>

<buscard>

  <name>John Smith</name>

  <phonenum>555-1234</phonenum>
  <phonenum>555-4321</phonenum>
  <phonenum>555-5555</phonenum>

  <email>jsmith@somewhere.com</email>

</buscard>

...And all the phone numbers will be listed on their own line, with the appropriate "Phone number: " label. Changing our XSL file isn't necessary to make this work.

So, there you go. That's what XML is for, and how to make it do something.

Back to the main page