Since XML tags are strictly hierarchical, and XML document can be mapped to a tree. XML parsing is the process of producing an xml parse tree from an XML document. As stated earlier, parsing is one of two low-level, basic operations one often performs on an XML document; the other is XSL transformation. But, in fact, transformation includes a parsing, which is normally hidden.There are two basic approaches to XML parsing. One is based on the Document Object Model (DOM) standard and results in a literal tree. The second is based on the Simple API for XML (SAX) and results in a virtual tree. A DOM parser requires that the entire document be read in before parsing can take place, and a complete parse tree is produced, regardless of the size of the document. Thus, both document and parse tree must reside in main memory. By contrast, SAX parsers provide access to a virtual parse tree through callbacks. As a SAX parser encounters the various elements that comprise a document, it calls methods defined in an interface and supplied by the user's program to process the data associated with the elements. Thus, it works incrementally and without having to have the entire document in memory. SAX parsers are generally faster than DOM parsers.
There are numerous parsers available, so one normally does not write an XML parser but, rather, downloads a parsing package and uses a parser included in it. The parsers used in this discussion are part of the xerces distribution.
Working with parsed XML documents is greatly simplified by using the JDOM packages. These packages are optimized for Java and include several significant simplifications that make writing java programs to deal with parse trees much more intuitive. Downloads as well as information are available from the JDOM project site. Two excellent JDOM tutorials are Jason Hunter's and Brett McLaughlin's Easy Java/XML Integration with JDOM, Part 1 and Part 2. Part 1 focuses on traversing a parse tree and extracting data from it. Part 2 focuses on creating and modifying a parse tree. Also see Hunter's JDOM in the Real World, parts 1, 2, and 3.
Processing an Existing XML Document
Parse the XML document
SAXBuilder builder = new SAXBuilder(); Document document = builder.build( aURL );Note, that although JDOM provides a view of the document as if it had been produced by a DOM parser, it actually works with a SAX parser and, thus, brings most of the greater efficiency of that parser to the task.
Note, also, that the parse tree is provided as an object of type Document.
Get the root Element
Element root = document.getRootElement();Get an Element's children
List children = root.getChildren();Gets all of the children of an element as a list. Valid method for both root and non-root elements.
List someChildren = root.getChildren( "someName" );Gets all of the children with the specified name.
Element thatChild = root.getChild( "someName" );Gets the first child with the specified name.
Get the attributes of an Element
String attributeString = thatChild.getAttributeValue( "attributeName" );Get the content of an Element
String contentString = thatChild.getText();Gets the text content of an element.
List mixedContent = thatChild.getMixedContent();Gets all of the potential content of an Element, including String, Comment, Element, and ProcessingInstructions. Subsequent processing can then differentiate among types.
Dealing with Namespaces
Namespace namesSpace = Namespace.getnamespace( "prefix", "url" );Builds a Namespace object.
List someChildren = root.getChildren( "someName", nameSpace ); Element thatChild = root.getChild( "someName", nameSpace );Uses the namespace as qualifier to get children or specific child of an Element.
Output XML document from the parse tree.
XMLOutputter outputter = new XMLOutputter(); outputter.output( document, System.out );Outputs an XML document produced by walking the parse tree.
Creating and Editing an XML Document
Create a root Element
Element root = new Element( "root" );Creates a free-standing node (Element);
Create a new XML DocumentDocument document = new Document( root );Creates a new XML Document (parse tree) rooted at root.
Add a child Element
Element child1 = new Element( "child_1" ); child.setText( "first child" ); root.addContent( child1);Creates a new child Element, adds text to it, and then adds child to parent (root).
List children = element.getChildren(); children.add( 1, child1 );Gets a List of children for Element, element, and adds child1 as the second child by inserting it into the List after item 1.
Delete
List children = element.getChildren(); children.remove( 1 );Removes first child from List, children.
children.removeAll( element.getChild("someName") ); element.removechildren( "someName" );Two ways -- List and non List -- of removing all children named someName.
Set attribute of an Element
element.addAttribute( "attributeName", "attributeValue" );Delete attribute of an Element
element.removeAttribute( "attributeName" );Add content to an Element
String contentString = new String( "content string" ); element.setText( contentString );Sets text content of an element.
List mixedContent = element.getMixedContent(); manipulate mixedContent element.setMixedContent( mixedContent );Gets mixed content of an Element, including String, Comment, Element, and ProcessingInstructions. manipulates the list. Returns list to the element as (updated) mixed content.
Dealing with Namespaces
Namespace namesSpace = Namespace.getnamespace( "prefix", "url" );Builds a Namespace object.
Element element = new Element( child_1, nameSpace );Uses the namespace as qualifier to create Element
Attribute attribute = new Attribute( "attributeName", "attributeValue", nameSpace );.Uses the namespace as qualifier to create Attribute
element.removeChildren( "someName", nameSpace );Uses the namespace as qualifier to remove Elements.
Output XML document from the parse tree.
XMLOutputter outputter = new XMLOutputter(); outputter.output( document, System.out );Outputs an XML document produced by walking the newly constructed or edited parse tree.