Simple Perl XML Benchmark
This article presents a simple benchmark using various Perl XML modules (and some non-Perl solutions).
You will need a whole bunch of modules installed to test all of them. The test framework itself needs Getopt::Long, Benchmark, Text::Diff, XML::SemanticDiff and XML::Twig (and probably some more I have forgotten!). In order to run the XSLT examples you will need libxml2 install (which you will also need to you run all the XML::LibXML-based examples).
Remember that performance is just one (small) part of why you choose a solution: ease of use and power are also very important (for example the regexp based solutions are not generic and would not work for different XML files).
The tests
4 different tests are performed:
- nothing: just parse the document and output it back,
- extract: extract the content (text only) of all elements with a given name (message),
- replace: prefix the content of all elements with a given name (message) with a text (including the element number), then output the document back,
- complex: an element (process) has an attribute action used to perform an action on the element: delete, duplicate, change_tag (for a new, fixed, one), erase (the tags, not their content), prefix (with a new element) or add_att (add a fixed new element), the document is then output back as XML.
Not all modules are used for all tests, but hey... there's already a good number of them.
The size is computed by taking the process size (VmSize from proc/$$/status in an END block.
Conclusions
No conclusion yet... but you can draw your own ;--)
Just note that I am not advocating any of the tested solutions. They all work for the problems at hand (except that XML::Simple extracts properly all messages but does not return them in the right order), but that's all. Especially a number of them (mostly those based on regexps, but some of the SAX and XML::Parser ones too) depend on the original XML file being very simple: no entities, no non-7-bit-ascii characters, no comments, PI's, CDATA sections, no DTD, no namespaces, no > in attribute values (which is legal!), no nested message or process elements...
Contributors
I would like to thank Alberto Sim�es for contributing the XML::DT examples (in record time!) and Robin Berjon, Matt Sergeant and Barrie Slaymaker for their comments.
TODO
- comment the examples, especially the non-obvious ones
- add more!
Instructions
- download simple_benchmark.tar.gz,
tar zxvf simple_benchmark.tar.gz
perl run_all- if you want to test XML::SAX::PurePerl, you can add the --slow option to run_all, just be aware that it might take a while...
The test file (by default in test.xml) is generated by gen_benchmark, it size is about 3 Mb. You can tweak gen_benchmark to generate other sizes and other types of documents.
Note that you will need a whole bunch of modules and libraries installed to run all the tests:
- external libraries: expat, libxml2 and libxslt,
- XML modules: XML::DOM, XML::Filter::BufferText, XML::Filter::Dispatcher, XML::LibXML, XML::LibXML::SAX, XML::Parser, XML::Parser::Lite, XML::SAX::Base, XML::SAX::Expat, XML::SAX::Machine, XML::SAX::PurePerl, XML::SAX::Writer, XML::Simple, XML::TreeBuilder, XML::Twig,
- modules used by run_all: Getopt::Long, Benchmark, Text::Diff, Digest::MD5, XML::SemanticDiff, File::stat, Number::Format, File::ReadBackwards, Memoize, YAML, File::Slurp.
Results
Results on my machine:
Module versions: XML::DOM 1.43 - XML::DT 0.24 - XML::Filter::BufferText 1.01 - XML::Filter::Dispatcher 0.52 - XML::LibXML has no version in module - XML::LibXML::SAX 1.00 - XML::Parser 2.34 - XML::Parser::Lite 0.55 - XML::SAX::Base 1.04 - XML::SAX::Expat 0.35 - XML::SAX::Machines 0.4 - XML::SAX::PurePerl 0.90 - XML::SAX::Writer 0.44 - XML::SemanticDiff 0.95 - XML::Simple 2.12 - XML::TreeBuilder 3.08 - XML::Twig 3.16
XML document test.xml (3.03M) | |||
� | |||
read and output the document | |||
perl | OK | 0 wallclock secs ( 0.06 cusr 0.06 csys = 0.12 CPU) | 8 444 kB |
xmllint | OK | 0 wallclock secs ( 0.28 cusr 0.03 csys = 0.32 CPU) | size na |
xslt | OK | 1 wallclock secs ( 0.51 cusr 0.10 csys = 0.61 CPU) | size na |
libxml | OK | 1 wallclock secs ( 0.55 cusr 0.08 csys = 0.64 CPU) | 19 404 kB |
parser | OK | 1 wallclock secs ( 0.67 cusr 0.04 csys = 0.71 CPU) | 6 788 kB |
parser_lite | OK | 1 wallclock secs ( 0.94 cusr 0.09 csys = 1.03 CPU) | 8 636 kB |
parser_stream | OK | 1 wallclock secs ( 0.99 cusr 0.05 csys = 1.04 CPU) | 6 792 kB |
tree_builder | OK | 2 wallclock secs ( 2.45 cusr 0.07 csys = 2.53 CPU) | 23 392 kB |
twig | OK | 3 wallclock secs ( 2.88 cusr 0.09 csys = 2.98 CPU) | 22 040 kB |
dt | OK | 3 wallclock secs ( 2.83 cusr 0.22 csys = 3.06 CPU) | 43 180 kB |
sax_base_libxml | OK | 6 wallclock secs ( 5.51 cusr 0.10 csys = 5.62 CPU) | 10 964 kB |
sax_base_expat | OK | 7 wallclock secs ( 6.78 cusr 0.14 csys = 6.93 CPU) | 9 064 kB |
xml_pp | OK | 9 wallclock secs ( 8.19 cusr 0.18 csys = 8.37 CPU) | size na |
sax_base_pureperl | OK | 33 wallclock secs ( 30.38 cusr 0.31 csys = 30.69 CPU) | 10 376 kB |
� | |||
extracting the text of all elements message | |||
regexp | OK | 0 wallclock secs ( 0.10 cusr 0.04 csys = 0.14 CPU) | 8 356 kB |
xslt | OK | 1 wallclock secs ( 0.24 cusr 0.06 csys = 0.31 CPU) | size na |
parser | OK | 1 wallclock secs ( 0.37 cusr 0.03 csys = 0.40 CPU) | 6 788 kB |
libxml | OK | 1 wallclock secs ( 0.37 cusr 0.06 csys = 0.43 CPU) | 16 708 kB |
parser_lite | OK | 1 wallclock secs ( 0.85 cusr 0.04 csys = 0.89 CPU) | 8 636 kB |
parser_stream | OK | 1 wallclock secs ( 0.89 cusr 0.05 csys = 0.94 CPU) | 6 796 kB |
sax_libxml | OK | 1 wallclock secs ( 1.17 cusr 0.01 csys = 1.18 CPU) | 9 856 kB |
sax_base_libxml | OK | 1 wallclock secs ( 1.17 cusr 0.02 csys = 1.19 CPU) | 9 908 kB |
tree_builder | OK | 1 wallclock secs ( 1.20 cusr 0.05 csys = 1.26 CPU) | 16 244 kB |
twig | OK | 2 wallclock secs ( 1.50 cusr 0.05 csys = 1.56 CPU) | 11 108 kB |
xml_grep | OK | 1 wallclock secs ( 1.66 cusr 0.06 csys = 1.73 CPU) | size na |
sax_base_expat | OK | 3 wallclock secs ( 2.36 cusr 0.09 csys = 2.45 CPU) | 8 104 kB |
sax_expat | OK | 3 wallclock secs ( 2.41 cusr 0.06 csys = 2.47 CPU) | 8 100 kB |
filter_dispatcher | OK | 2 wallclock secs ( 2.45 cusr 0.05 csys = 2.50 CPU) | size na |
simple | NOK | 3 wallclock secs ( 2.53 cusr 0.09 csys = 2.62 CPU) | 22 116 kB |
dt | OK | 3 wallclock secs ( 2.73 cusr 0.15 csys = 2.88 CPU) | 37 772 kB |
dom | OK | 3 wallclock secs ( 3.60 cusr 0.12 csys = 3.72 CPU) | 41 140 kB |
sax_base_pureperl | OK | 23 wallclock secs ( 22.80 cusr 0.09 csys = 22.89 CPU) | 9 236 kB |
� | |||
prefixing the text of all element messages by the message number | |||
regexp | OK | 1 wallclock secs ( 0.16 cusr 0.06 csys = 0.22 CPU) | 14 680 kB |
libxml | OK | 1 wallclock secs ( 0.65 cusr 0.09 csys = 0.75 CPU) | 19 528 kB |
parser_lite | OK | 1 wallclock secs ( 0.94 cusr 0.09 csys = 1.03 CPU) | 8 636 kB |
parser_stream | OK | 2 wallclock secs ( 1.11 cusr 0.05 csys = 1.16 CPU) | 6 796 kB |
twig | OK | 2 wallclock secs ( 2.00 cusr 0.18 csys = 2.19 CPU) | 11 228 kB |
tree_builder | OK | 3 wallclock secs ( 2.61 cusr 0.05 csys = 2.67 CPU) | 23 480 kB |
dt | OK | 3 wallclock secs ( 2.91 cusr 0.14 csys = 3.06 CPU) | 43 680 kB |
sax_base_libxml | OK | 5 wallclock secs ( 5.57 cusr 0.15 csys = 5.73 CPU) | 10 968 kB |
xslt | OK | 6 wallclock secs ( 5.97 cusr 0.12 csys = 6.10 CPU) | size na |
dom | OK | 6 wallclock secs ( 6.01 cusr 0.20 csys = 6.22 CPU) | 44 176 kB |
filter_dispatcher | OK | 9 wallclock secs ( 9.20 cusr 0.18 csys = 9.38 CPU) | size na |
� | |||
complex transformation | |||
regexp | OK | 0 wallclock secs ( 0.22 cusr 0.16 csys = 0.39 CPU) | 27 072 kB |
xslt | OK | 1 wallclock secs ( 0.58 cusr 0.04 csys = 0.63 CPU) | size na |
libxml | OK | 1 wallclock secs ( 0.62 cusr 0.08 csys = 0.70 CPU) | 20 172 kB |
tree_builder | OK | 3 wallclock secs ( 2.57 cusr 0.11 csys = 2.68 CPU) | 23 308 kB |
twig_small | OK | 3 wallclock secs ( 2.97 cusr 0.10 csys = 3.07 CPU) | 18 968 kB |
dt_compact | OK | 3 wallclock secs ( 2.94 cusr 0.16 csys = 3.11 CPU) | 43 804 kB |
twig_smallest | OK | 4 wallclock secs ( 3.07 cusr 0.05 csys = 3.12 CPU) | 10 200 kB |
dt | OK | 3 wallclock secs ( 2.98 cusr 0.18 csys = 3.16 CPU) | 44 296 kB |
twig | OK | 4 wallclock secs ( 3.39 cusr 0.10 csys = 3.49 CPU) | 22 156 kB |
libxml_by_step | OK | 4 wallclock secs ( 3.37 cusr 0.44 csys = 3.81 CPU) | 30 200 kB |
sax_base_libxml | OK | 7 wallclock secs ( 6.85 cusr 0.09 csys = 6.94 CPU) | 12 728 kB |
� |
Notes:
[1] In the extraction test XML::Simple returns all the messages, but not in the right order.
[2] The version of XML::SAX::PurePerl is the CVS version, which performs about 3 times faster than the CPAN one.
updated Mon Nov 15 14:18:58 2004 | home | Copyright � 2003, Michel Rodriguez |