Chitter: XML Parsers

Five types of XML Parsers.

Event based push: SAX, XNI
Event based pull: StAX, XMLPull
Tree: DOM, JDOM , dom4j
Data Binding: JAXB, Castor
Query API: TrAX

Source: http://www.artima.com/intv/xmlapis2.html

Push APIs have a number of advantages. They are very fast. You don't need to read to the end of the document before you start working with the beginning of the document. They use very little memory, because the entire document isn't in memory at once. Instead, you just see sort of a peephole into the document, just the current thing you're looking at. Typically in a push API the work goes into building up some data structure and gradually filling it from the input document until there is enough information there to act on. If you're document is, for example, a collection of articles, a list of records, something for which there are clear chunks in the data and you can process each chunk individually, a push API works very well.

On the other hand, the whole callback interface observer design pattern can be less than ideal for some developers. This brings us to the second major style of XML API, and the newest style: a pull API. A pull API is still streaming, still very fast, still very memory efficient. But instead of the parser being in control, telling the client application when it has some new information, the client application is in control, and it asks the parser to give it the next piece of information when it wants it. But the basic advantages of a pull API are the same as with a push API, except maybe the pull API is little simpler. The implementations of the various pull APIs are not very mature yet. When you actually look at the ones out there—NekoPull, XMLPULL—they have a lot of idiosyncracies both with respect to Java and XML. That's mostly just a function of maturity. There's nothing fundamentally wrong with the idea of a pull API. They're not just fully baked yet. With a little time, in a year or two, I expect pull APIs will be a very popular style of XML parsing.

The third style of XML parsing, and perhaps the most obvious style to most programmers, is a tree-based API. In a tree-based API, an XML document is read by a parser, and the parser constructs an object model, typically around a tree with nodes for elements , attributes, comments, processing instructions, text, and so forth. The entire document is stored in memory. You use the methods of the object to query the document, to navigate the document, to change and modify the document, and so forth. There are more tree-based APIs than any other kind of API: DOM, JDOM, DOM4J, Sparta, ElectricXML, and my own XOM are all tree-based APIs.

The fourth style of API, which is also a fairly recent style, is a data-binding API. It is similar to tree APIs in that the entire document is parsed and an object model is built, but in a data binding API rather than having classes which represent XML concepts, like element and processing instructions, you have classes that represent the concepts the XML represents. So a book element might become a Book object. An employee element might become an Employee object. Typically, some form of schema is compiled to produce these classes automatically. Either a W3C XML schema language schema, a DTD, or a special purpose binding schema written just for that purpose in some special purpose schema language.

And then finally the fifth kind of API is what I would refer to as a query API. These would typically be things like TrAX for transforming with XSLT, or various APIs like Jaxen for searching with XPath. There are no real standards here, but there is some interesting work being done. Generally there the real focus, the real code, goes into the XPath or XSLT query, which we merely call from Java or some other language. It's like using SQL from inside a Java program using JDBC.

XML Parsers

1 comment:

Search This Blog

Quick Links

About Me

Articles

Rails Passion

BPMS Watch

Planet TW

JavaWorld

Followers

Movie Blog