Processing XML with Perl | Michel Rodriguez |
Example: extracting information from an XML document | Example: updating an XML document |
XML::Parser Styles
Styles are handler bundles. 5 styles are defined in XML::Parser, others can be created by users.
Pre-defined styles
Subs
Each time an element starts, a sub by that name is called with the same parameters that the Start handler gets called with.
Each time an element ends, a sub with that name appended with an underscore ("_"), is called with the same parameters that the End handler gets called with.
Tree
Parse will return a parse tree for the document. Each node in the tree takes the form of a tag, content pair. Text nodes are represented with a pseudo-tag of "0" and the string that is their content. For elements, the content is an array reference. The first item in the array is a (possibly empty) hash reference containing attributes.
The remainder of the array is a sequence of tag-content pairs representing the content of the element.
Objects
This is similar to the Tree style, except that a hash object is created for each element. The corresponding object will be in the class whose name is created by appending "::" and the element name to the package set with the Pkg option. Non-markup text will be in the ::Characters class. The contents of the corresponding object will be in an anonymous array that is the value of the Kids property for that object.
Stream
If none of the subs that this style looks for is there, then the effect of parsing with this style is to print a canonical copy of the document without comments or declarations. All the subs receive as their 1st parameter the Expat instance for the document they're parsing.
It looks for the following routines:
- StartDocument: called at the start of the parse.
- StartTag: called for every start tag with a second parameter of the element type. The $_ variable will contain a copy of the tag and the %_ variable will contain attribute values supplied for that element.
- EndTag: called for every end tag with a second parameter of the element type. The $_ variable will contain a copy of the end tag.
- Text: called just before start or end tags with accumulated non-markup text in the $_ variable.
- PI: called for processing instructions. The $_ variable will contain a copy of the PI and the target and data are sent as 2nd and 3rd parameters respectively.
- EndDocument: called at conclusion of the parse.
Debug
This just prints out the document in outline form. Nothing special is returned by parse.
Example: extracting information from an XML document | Example: updating an XML document |