Minibases: Transitionary Use of Schema-validated XML Data Islands2003-03-25 :: Glenn Slayden
Q: How do I create, load, validate, and access XML data islands?
A: An XML data island is a fragment of well-formed (and perhaps valid) XML on an HTML page. It's very easy to create and access this data, although I found it quite difficult to accomplish validation against an XML schema. This was probably due more to the fact that I was learning about namespaces, schemas, and validation for the first time during this exercise, rather than any inherent difficulty with data islands. I can't be sure how difficult it would have been if I had been a schema definition expert.
For the purpose of this blurb, I'll be using the so-called W3C XML Schema 1.1* (May 2001). Note that this was (at the time of writing, 2003) the most modern schema definition mechanism, as opposed to the two other commonly used systems: DTD and a stillborn Microsoft device. There are numerous articles on the web which compare these three systems, and there are also several other fringe systems, but without going into too much detail, suffice it to say that the W3C XML Schema definition language is the most comprehensive and capable. An official overview of it is available in this primer
In the example case, a website contains pages in which a subset of a large database is displayed, but individual items within the subset may be used on the page many times. It may be possible to reduce the size of the page downloads through normalization—sending only a single copy of the data items (which I'll call the minibase), and then using a client-side script such as java to build the final page from the minibase. As a bonus, processing cycles are also distributed away from the server to the clients. Rather than spending time formatting HTML, the server now just analyzes the data dependencies for a particular page, removes duplicates from the list, and prepares the minibase, an XML data island containing just the data that the client will need to build the page.
Typically, the client-side script would not be subject to change as the minibase chages, so isolating that code as a <script> element with an external source would reduce download size even more.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head> <BASE HREF="http://www.glennslayden.com/XML_data_islands.htm"> </head><body> <xml id="xroot"> <tl:minibase xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tl="urn:thai-language-schema" xsi:schemaLocation="urn:thai-language-schema http://www.thai-language.com/tl.xsd"> <te teid='12345'> <thai>เก่า</thai> <xlit>khaao<span class=tt>F</span></xlit> <xid>333333</xid> </te> <te teid='555333'> <thai>เก่า</thai> <xlit>xxxyxx<span class=tt>F</span></xlit> <xid>444444</xid> </te> <te teid='556777'> <thai>เก่า</thai> <xlit>xxxyxx<span class=tt>F</span></xlit> <xid>1</xid> </te> </tl:minibase>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tl="urn:thai-language-schema" targetNamespace="urn:thai-language-schema"> <xsd:element name="minibase"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element name="te"> <xsd:complexType> <xsd:sequence> <xsd:element name="thai"/> <xsd:element name="xlit"/> <xsd:element name="xid"/> </xsd:sequence> <xsd:attribute name="teid" type="xsd:unsignedInt" use="required"/> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema>I strongly suggest that you run your schema through a schema validator such as W3C's, and fix all the problems before attempting to use MSXML/XMLDOM to try to validate XML against it.
Notice that the attribute 'teid' is defined as an unsigned integer, one of the W3C XML primitive datatypes. This means that in the XPath statement of selectSingleNode, we don't have to put quotes around the number we're looking for. However, even though it's a numeric value, it must have quotes in the XML data island, since XML requires all attribute values to be quoted.
We can be sure that validation against the schema is actually occurring by changing one of those numeric values for teid in the data island to a non-numeric value, say by inserting an 'x' into the middle of the number. When you refresh the HTML page, you should get an error that the validation failed because of a type problem.
One maddening aspect of developing this code was that validation against the schema appears to be finicky and fragile. If the XML processor doesn't like the slightest thing about your namespaces and the "hook-up" between the data and the schema, it will not perform the validation, and the parse error will report success. The only way I found to make sure that the validation is actually happening is to "break" it, as described in the previous paragraph, and see if the error is reported.