XML::DOM::Twig - a perl module that adds an XML::Twig like interface to XML::DOM
use XML::DOM;
use TwigDOM;
my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile( "doc.xml");
my $root= $doc->root; # the DOM method is getDocumentElement
my $para= $root->next_elt( 'para'); # adds a condition to DOM's getNextSibling
The DOM is a standard interface to XML documents, but it is very low-level, not to mention down right ugly, at least as implemented in XML::DOM.
XML::DOM::Twig adds nicer (and more perlish) method names, plus a whole slew of extra methods,most of them straight out of XML::Twig, that makes it easier and safer to use it.
XML::DOM::Twig navigation functions for example accept an extra condition that makes it easy to specify what the tag of the next element should be, instead of either hoping that DOM methods will return the right one (they won't!) or having to write a similar function yourself.
aliases
XML::DOM::Twig creates aliases for a number of XML::DOM methods:
get_elements_by_tag_name = getElementsByTagName;
doc = getOwnerDocument;
att = getAttribute;
get_attribute = getAttribute;
sprint = toString;
gi
returns the gi (tag) of an element if the node is of type ELEMENT_NODE or 0
parent ($opt_cond)
returns the parent, or the first (innermost) parent that passes the condition the condition can be either a node type, or a gi, or a regexp (applied to the gi) or a code ref (applied to the element)
descendants ($opt_cond)
Returns the list of all descendants (optionally whose gi) of the element This is the equivalent of the getElementsByTagName of the DOM
text
Returns a string consisting of all the PCDATA and CDATA in an element, without any tags. The text is not XML-escaped: base entities such as & and < are not escaped.
first_child ($optional_cond)
Returns the first child of the element, or the first child matching the cond
field ($optional_cond)
Returns the text of the first child of the element, or the first child If there is no first_child then returns ''. This avoids getting the child, checking for its existence then getting the text for trivial cases.
last_child ($optional_gi)
Returns the last child of the element, or the last child whose gi is $optional_gi (ie the last of the element children whose gi matches).
next_sibling($optional_cond)
Returns the next sibling of the element, or the first one matching cond.
prev_sibling($optional_cond)
Returns the previous sibling of the element, or the previous sibling matching cond
set_atts ( $optional_atts, @list_of_elt_and_strings)
Sets the content for the element, from a list of strings and elements. Cuts all the element children, then pastes the list elements as the children. This method will create a PCDATA element for any strings in the list.
The optional_atts argument is the ref of a hash of attributes. If this argument is used then the previous attributes are deleted, otherwise they are left untouched.
set_content
paste ($optional_position, $ref)
Pastes a (previously cut or newly generated) element. Dies if the element already belongs to a tree.
The optional position element can be:
The element is pasted as the first child of the element object this method is called on.
The element is pasted as the last child of the element object this method is called on.
The element is pasted before the element object, as its previous sibling.
The element is pasted after the element object, as its next sibling.
In this case an extra argument, $offset, should be supplied. The element will be pasted in the reference element (or in its first text child) at the given offset. To achieve this the reference element will be split at the offset.
cut
Cuts the element from the tree. The element still exists, it can be copied or pasted somewhere else, it is just not attached to the tree anymore.
children($optional_cond)
Returns the list of children (optionally which matches cond) of the element. The list is in document order.
next_elt($optional_elt, $optional_cond)
Returns the next elt (optionally matching cond) of the element. This is defined as the next element which opens after the current element opens. Which usually means the first child of the element. Counter-intuitive as it might look this allows you to loop through the whole document by starting from the root.
The $optional_elt is the root of a subtree. When the next_elt is out of the subtree then the method returns undef. You can then walk a sub tree with:
my $elt= $subtree_root;
while( $elt= $elt->next_elt( $subtree_root)
{ # insert processing code here
}
prev_elt ($optional_cond)
Returns the previous elt (optionally matching cond) of the element. This is the first element which opens before the current one. It is usually either the last descendant of the previous sibling or simply the parent
get_xpath ($xpath_like_expression, $optional_offset)
Returns a list of elements satisfying the $xpath_like_expression.
A subset of the XPATH abbreviated syntax is covered:
gi
gi[1] (or any other positive number)
gi[last()]
gi[@att] (the attribute exists for the element)
gi[@att="val"]
gi[att1="val1" and att2="val2"]
gi[att1="val1" or att2="val2"]
gi[string()="toto"] (returns gi elements which text is toto)
gi[string()=~/regexp/] (returns gi elements which text matches regexp)
expressions can start with / (search starts at the document root)
expressions can start with . (search starts at the current element, optional)
// can be used to get all descendants instead of just direct children
* matches any gi
So the following examples from the XPATH recommendation (http://www.w3.org/TR/xpath.html#path-abbrev) work:
para selects the para element children of the context node
* selects all element children of the context node
para[1] selects the first para child of the context node
para[last()] selects the last para child of the context node
*/para selects all para grandchildren of the context node
/doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc
chapter//para selects the para element descendants of the chapter element children of the context node
//para selects all the para descendants of the document root and thus selects all para elements in the same document as the
context node
//olist/item selects all the item elements in the same document as the context node that have an olist parent
.//para selects the para element descendants of the context node
.. selects the parent of the context node
para[@type="warning"] selects all para children of the context node that have a type attribute with value warning
employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and
an assistant attribute
The elements will be returned in the document order.
If $optional_offset is used then only one element will be returned, the one with the appropriate offset in the list, starting at 0
Quoting and interpolating variables can be a pain when the Perl syntax and the XPATH syntax collide, so here are some more examples to get you started:
my $p1= "p1";
my $p2= "p2";
my @res= $t->get_xpath( "p[string( '$p1') or string( '$p2')]");
my $a= "a1";
my @res= $t->get_xpath( "//*[@att=\"$a\"]);
my $val= "a1";
my $exp= "//p[ \@att='$val']"; # you need to use \@ or you will get a warning
my @res= $t->get_xpath( $exp);
XML::DOM::Twig does not provide full XPATH support. If that's what you want then look no further than the XML::XPath module on CPAN.
find_nodes
alias to get_xpath
Most of the navigation functions accept a condition as an optional argument The first element (or all elements for children
or ancestors
) that passes the condition is returned.
The condition can be
return a "real" element (not a PCDATA, CDATA, comment or pi element)
return a PCDATA or CDATA element
return an element whose gi is equal to the string
return an element whose gi matches the regexp. The regexp has to be created with qr//
(hence this is available only on perl 5.005 and above)
applies the code, passing the current element as argument, if the code returns true then the element is returned, if it returns false then the code is applied to the next candidate.
adds a prefix to an element
new_elt
children
descendants
get_xpath
Michel Rodriguez <m.v.rodriguez@ieee.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Bug reports and comments to m.v.rodriguez@ieee.org.
The XML::DOM::Twig web site is http://www.xmltwig.com/domtwig/
XML::Parser XML::DOM XML::Twig
The XML::Twig page is at http://www.xmltwig.com/xmltwig/ It includes examples and a tutorial at http://www.xmltwig.com/xmltwig/tutorial/index.html
Hey! The above document had some coding errors, which are explained below:
Expected text after =item, not a bullet