Checking Your XML Document's Syntax

It's easiest to learn the rules of creating your own XML vocabulary if you have a program that will quickly point out your errors while you are learning. There are three tools that will identify problems.

First JDeveloper 3.1's built-in XML syntax checking can be used by selecting Check XML Syntax... on any XML file in your project. In the blink of an eye, JDeveloper finds any problems, prints out the offending error messages in the XML Errors tab, and positions the cursor in the file at the location of the first error.

Second, if you are a fan of command-line tools, use the oraxml command-line utility that comes with the Oracle XML Parser to check your XML syntax. Just type:

oraxml filename

and either you'll be told that your file is well-formed, or the offending errors will print out to the console.

If the oraxml command does not work, try the following instead:

java oracle.xml.parser.v2.oraxml filename.xml

to explicitly run the oraxml command-line utility as a Java class. If this also fails, make sure you have done the following:

  1. You must have a Java runtime environment properly set up.
     
  2. You must list the fully qualified Java archive containing the Oracle XML Parser for Java in your CLASSPATH environment variable.

If JDeveloper 3.1 from the CD-ROM has been installed into the C:\JDev directory, for example, do the following to properly set up the Java environment to run oraxml. Run the following command to set up your Java runtime environment:

C: \> c:\jdev\bin\setvars c:\jdev

then run:

C: \> set CLASSPATH=
   c:\jdev\lib\xmlparserv2_2027.jar;%CLASSPATH%

to add the Oracle XML Parser for Java to your CLASSPATH.

Third, simply attempt to browse your file with Microsoft's Internet Explorer version 5.0 or later. IE5 features built-in support for visualizing the structure of any XML document. Simply type its URL or filename in the Address bar. If there are any syntax errors, they show up immediately in your browser window.

While initially presenting a few new rules to get used to, XML's unforgiving syntax is actually one of its core strengths. It means that an XML file is either well-formed or ill-formed. There is no gray area. Programs needing to process XML documents are easy to write because they don't have to account for lots of loopholes, endless exceptions to the rules, or oodles of optional features. They check the ten elements of style presented earlier, and don't waste their time processing the document if any of the rules is broken.

In practice, the ten elements of style are the main things you need to know. For an exhaustive list of all XML syntax arcana, you can refer to the XML 1.0 specification at:

http://www.w3.org/TR/1998/REC-xml-19980210

or to Tim Bray's helpful annotated version at:

http://www.xml.com 

Validating Your XML Against a DTD

To be sure that an XML document's usage of elements and attributes is consistent with their intended use—with respect to a particular XML vocabulary—some additional information is needed. A document type definition (DTD) specifies all of the valid element names that are part of a particular XML vocabulary. In addition, it stipulates the valid combinations of elements that are allowed—what can appear nested within what, and how many times—as well as what attributes each element is allowed to have. It is important to understand two key points about using DTD:

  • How to associate an XML document with a DTD
     
  • How to validate the document by that DTD

Since JDeveloper does not include built-in support for visually creating or inspecting DTDs, a tool like XML Authority from Extensibilitycan be used. Figure 1 illustrates the document type definition for the frequently asked questions document seen in Creating XML. The figure shows a graphical view of the element structure that it defines.

The diagram in the figure illustrates that:

  • The <FAQ-List> element is comprised of one or more <FAQ> elements.
     
  • A <FAQ> element is comprised of one or more pairs of <Question> followed by <Answer>.
     
  • A <FAQ> element has attributes named Submitter and Level.
     
  • The <Question> and <Answer> elements contain text.

Using another view, the tool shows some additional information about the attributes, as shown in Figure 2.

Here the Level attribute has a default value of Intermediate, and must be one of the values Beginner, Intermediate, or Advanced. If you open the DTD file in vi, Emacs, or JDeveloper, you'll see this text document:

<!ELEMENT FAQ-List  (FAQ+ )>
<!ELEMENT FAQ  (Question , Answer )+>
<!ATTLIST FAQ  Submitter CDATA  #IMPLIED
               Level (Beginner | Intermediate | Advanced )  'Intermediate' >
<!ELEMENT Question  (#PCDATA )>
<!ELEMENT Answer  (#PCDATA )>

Even a cursory read of this very simple DTD shows the importance of the tools that are out there to help with the process! To associate an XML document with a particular DTD add one extra line to the top of the XML document, called the Document Type Declaration, which looks like this:

<!DOCTYPE DocumentElementName SYSTEM "DTDFilename">

This line goes between the XML declaration and the document element, as follows:

<?xml version="1.0"?>
<!DOCTYPE FAQ-List SYSTEM "FAQ-List.dtd">
<FAQ-List>
  <FAQ Submitter="smuench@oracle.com">
    <Question>Is it easy to get started with XML?</Question>
    <Answer>Yes!</Answer>
  </FAQ>
</FAQ-List>

Though the document does not specify a Level attribute on the FAQ element, IE5 shows Level="Intermediate". This happens because the DTD defined a default value for Level. An XML processor that conforms to the XML 1.0 standard treats the document as if it had specified the attribute with its default value. So default attribute values are one effect a DTD can have on an XML document that refers to it in its <!DOCTYPE> declaration.

Let's look at an example of the other kind of effect: validation errors. Suppose we extend the file to have a couple of questions like this:

<?xml version="1.0"?>
<!DOCTYPE FAQ-List SYSTEM "FAQ-List.dtd">
<FAQ-List>
  <FAQ Submitter="smuench@oracle.com">
    <Question>Is it easy to get started with XML?</Question>
    <Answer>Yes!</Answer>
  </FAQ>
  <FAQ Submitter="derek@spinaltap.com" Level="Silly">
    <Question>Are we going to play Stonehenge?</Question>
  </FAQ>
</FAQ-List>

If we try the oraxml command on this file we get the message:

The input file parsed without errors

So the file is well-formed, but is it valid with respect to the DTD? Let's find out. Use the oraxml command-line tool again—this time with the -v flag—to validate the document against its DTD:

oraxml -v FAQWithTwoQuestions.xml

We immediately get two errors:

FAQWithTwoQuestions.xml <Line 8, Column 53>
XML-0141: (Error) Attribute value 'Silly' should be one of 
                  the declared enumerated values.

FAQWithTwoQuestions.xml <Line 10, Column 9>
XML-0150: (Error) Element FAQ not complete, expected elements '[Answer]'.

Error occurred while parsing

The oraxml tool consulted the rules in the associated FAQ-List.dtd file and validated the contents of FAQWithTwoQuestions.xml to find two inconsistencies. You can correct them by:

  1. Changing the attribute value Silly to a valid value like Advanced
     
  2. Adding the expected <Answer> element to go with the <Question> inside the second <FAQ> element

This produces the modified document:

<?xml version="1.0"?>
<!DOCTYPE FAQ-List SYSTEM "FAQ-List.dtd">
<FAQ-List>
  <FAQ Submitter="smuench@oracle.com">
    <Question>Is it easy to get started with XML?</Question>
    <Answer>Yes!</Answer>
  </FAQ>
  <FAQ Submitter="derek@spinaltap.com" Level="Advanced">
    <Question>Are we going to play Stonehenge?</Question>
    <Answer>But of course</Answer>
  </FAQ>
</FAQ-List>

which now passes validation if the oraxml-v command is repeated on it.

DTDs can be invaluable tools for ensuring an additional level of consistency in the XML information you'll be working with or exchanging with others. As more and more web-based repositories of DTDs (also known as schemas) emerge, the likelihood of finding existing domain-specific vocabularies increases. This bodes well for a future of reuse with less need for custom DTD development.