pugixml documentation |
pugixml is just another XML parser. This is a successor to pugxml (well, to be honest, the only part that is left as is is wildcard matching code; the rest was either heavily refactored or rewritten from scratch). The main features are:
Okay, you might ask - what's the catch? Everything is so cute - it's small, fast, robust, clean solution for parsing XML. What is missing? Ok, we are fair developers - so here is a misfeature list:
Here there is a small collection of code snippets to help the reader begin using pugixml.
For everything you can do with pugixml, you need a document. There are several ways to obtain it:
#include <fstream> #include <iostream> #include "pugixml.hpp" using namespace std; using namespace pugi; int main() { // Several ways to get XML document { // Load from string xml_document doc; cout << doc.load("<sample-xml>some text <b>in bold</b> here</sample-xml>") << endl; } { // Load from file xml_document doc; cout << doc.load_file("sample.xml") << endl; } { // Load from any input stream (STL) xml_document doc; std::ifstream in("sample.xml"); cout << doc.load(in) << endl; } { // More advanced: parse the specified string without duplicating it xml_document doc; char* s = new char[100]; strcpy(s, "<sample-xml>some text <b>in bold</b> here</sample-xml>"); cout << doc.parse(transfer_ownership_tag(), s) << endl; } { // Even more advanced: assume manual lifetime control xml_document doc; char* s = new char[100]; strcpy(s, "<sample-xml>some text <b>in bold</b> here</sample-xml>"); cout << doc.parse(s) << endl; delete[] s; // <-- after this point, all string contents of document is invalid! } { // Or just create document from code? xml_document doc; // add nodes to document (see next samples) } } |
_Winnie C++ Colorizer |
This sample should print a row of 1, meaning that all load/parse functions returned true (of course, if sample.xml does not exist or is malformed, there will be 0's)
Once you have your document, there are several ways to extract data from it.
#include <iostream> #include "pugixml.hpp" using namespace std; using namespace pugi; struct bookstore_traverser: public xml_tree_walker { virtual bool for_each(xml_node& n) { for (int i = 0; i < depth(); ++i) cout << " "; // indentation if (n.type() == node_element) cout << n.name() << endl; else cout << n.value() << endl; return true; // continue traversal } }; int main() { xml_document doc; doc.load("<bookstore><book title='ShaderX'><price>3</price></book><book title='GPU Gems'><price>4</price></book></bookstore>"); // If you want to iterate through nodes... { // Get a bookstore node xml_node bookstore = doc.child("bookstore"); // Iterate through books for (xml_node book = bookstore.child("book"); book; book = book.next_sibling("book")) { cout << "Book " << book.attribute("title").value() << ", price " << book.child("price").first_child().value() << endl; } // Output: // Book ShaderX, price 3 // Book GPU Gems, price 4 } { // Alternative way to get a bookstore node (wildcards) xml_node bookstore = doc.child_w("*[sS]tore"); // this will select bookstore, anyStore, Store, etc. // Iterate through books with STL compatible iterators for (xml_node::iterator it = bookstore.begin(); it != bookstore.end(); ++it) { // Note the use of helper function child_value() cout << "Book " << it->attribute("title").value() << ", price " << it->child_value("price") << endl; } // Output: // Book ShaderX, price 3 // Book GPU Gems, price 4 } { // You can also traverse the whole tree (or a subtree) bookstore_traverser t; doc.traverse(t); // Output: // bookstore // book // price // 3 // book // price // 4 doc.first_child().traverse(t); // Output: // book // price // 3 // book // price // 4 } // If you want a distinct node... { // You can specify the way to it through child() functions cout << doc.child("bookstore").child("book").next_sibling().attribute("title").value() << endl; // Output: // GPU Gems // You can use a sometimes convenient path function cout << doc.first_element_by_path("bookstore/book/price").child_value() << endl; // Output: // 3 // And you can use powerful XPath expressions cout << doc.select_single_node("/bookstore/book[@title = 'ShaderX']/price").node().child_value() << endl; // Output: // 3 // Of course, XPath is much more powerful // Compile query that prints total price of all Gems book in store xpath_query query("sum(/bookstore/book[contains(@title, 'Gems')]/price)"); cout << query.evaluate_number(doc) << endl; // Output: // 4 // You can apply the same XPath query to any document. For example, let's add another Gems // book (more detail about modifying tree in next sample): xml_node book = doc.child("bookstore").append_child(); book.set_name("book"); book.append_attribute("title") = "Game Programming Gems 2"; xml_node price = book.append_child(); price.set_name("price"); xml_node price_text = price.append_child(node_pcdata); price_text.set_value("5.3"); // Now let's reevaluate query cout << query.evaluate_number(doc) << endl; // Output: // 9.3 } } |
_Winnie C++ Colorizer |
Finally, let's get into more details about tree modification and saving.
#include <iostream> #include "pugixml.hpp" using namespace std; using namespace pugi; int main() { // For this example, we'll start with an empty document and create nodes in it from code xml_document doc; // Append several children and set values/names at once doc.append_child(node_comment).set_value("This is a test comment"); doc.append_child().set_name("application"); // Let's add a few modules xml_node application = doc.child("application"); // Save node wrapper for convenience xml_node module_a = application.append_child(); module_a.set_name("module"); // Add an attribute, immediately setting it's value module_a.append_attribute("name").set_value("A"); // You can use operator= module_a.append_attribute("folder") = "/work/app/module_a"; // Or even assign numbers module_a.append_attribute("status") = 85.4; // Let's add another module xml_node module_c = application.append_child(); module_c.set_name("module"); module_c.append_attribute("name") = "C"; module_c.append_attribute("folder") = "/work/app/module_c"; // Oh, we missed module B. Not a problem, let's insert it before module C xml_node module_b = application.insert_child_before(node_element, module_c); module_b.set_name("module"); module_b.append_attribute("folder") = "/work/app/module_b"; // We can do the same thing for attributes module_b.insert_attribute_before("name", module_b.attribute("folder")) = "B"; // Let's add some text in module A module_a.append_child(node_pcdata).set_value("Module A description"); // Well, there's not much left to do here. Let's output our document to file using several formatting options doc.save_file("sample_saved_1.xml"); // Contents of file sample_saved_1.xml (tab size = 4): // <?xml version="1.0"?> // <!--This is a test comment--> // <application> // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module> // <module name="B" folder="/work/app/module_b" /> // <module name="C" folder="/work/app/module_c" /> // </application> // Let's use two spaces for indentation instead of tab character doc.save_file("sample_saved_2.xml", " "); // Contents of file sample_saved_2.xml: // <?xml version="1.0"?> // <!--This is a test comment--> // <application> // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module> // <module name="B" folder="/work/app/module_b" /> // <module name="C" folder="/work/app/module_c" /> // </application> // Let's save a raw XML file doc.save_file("sample_saved_3.xml", "", format_raw); // Contents of file sample_saved_3.xml: // <?xml version="1.0"?><!--This is a test comment--><application><module name="A" folder="/work/app/module_a" status="85.4">Module A description</module><module name="B" folder="/work/app/module_b" /><module name="C" folder="/work/app/module_c" /></application> // Finally, you can print a subtree to any output stream (including cout) xml_writer_stream writer(cout); doc.child("application").child("module").print(writer); // Output: // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module> } |
_Winnie C++ Colorizer |
Note, that these examples do not cover the whole pugixml API. For further information, look into reference section.
pugixml is a library for parsing XML files, which means that you give it XML data some way, and it gives you the DOM tree and the ways to traverse it and to get some useful information from it. The library source consist of two headers, pugixml.hpp and pugiconfig.hpp, and two source files, pugixml.cpp and pugixpath.cpp. You can either compile cpp files in your project, or build a static library. All library classes reside in namespace pugi, so you can either use fully qualified names (pugi::xml_node) or write a using declaration (using namespace pugi;, using pugi::xml_node) and use plain names. All classes have eitther xml_ or xpath_ prefix.
By default it's supposed that you compile the source file with your project (add it into your project, or add relevant entry in your Makefile, or do whatever you need to do with your compilation environment). The library is written in standard-conformant C++ and was tested on following platforms:
The documentation for pugixml classes, functions and constants is available here.
pugixml is not a compliant XML parser. The main reason for that is that it does not reject most malformed XML files. The more or less complete list of incompatibilities follows (I will be talking of ones when using parse_w3c mode):
This table summarizes the comparison in terms of time and memory consumption between pugixml and other parsers. For DOM parsers (all, except Expat, irrXML and SAX parser of XercesC), the process is as follows:
For SAX parsers, the parse step is skipped (hence the N/A in relevant table cells), structure is filled during 'walk' step.
For all parsers, 'total time' column means total time spent on the whole process, 'total allocs' - total allocation count, 'total memory' - peak memory consumption for the whole process.
The tests were performed on a 1 Mb XML file with a small amount of text. They were compiled with Microsoft Visual C++ 8.0 (2005) compiler in Release mode, with checked iterators/secure STL turned off. The test system is AMD Sempron 2500+, 512 Mb RAM.
parser | parse time | parse allocs | parse memory | walk time | walk allocs | total time | total allocs | total memory |
---|---|---|---|---|---|---|---|---|
irrXML | N/A | N/A | N/A | 352 Mclocks | 697 245 | 356 Mclocks | 697 284 | 906 kb |
Expat | N/A | N/A | N/A | 97 Mclocks | 19 | 97 Mclocks | 23 | 1028 kb |
TinyXML | 168 Mclocks | 50 163 | 5447 kb | 37 Mclocks | 0 | 242 Mclocks | 50 163 | 5447 kb |
PugXML | 100 Mclocks | 106 597 | 2747 kb | 38 Mclocks | 0 | 206 Mclocks | 131 677 | 2855 kb |
XercesC SAX | N/A | N/A | N/A | 411 Mclocks | 70 380 | 411 Mclocks | 70 495 | 243 kb |
XercesC DOM | 300 Mclocks | 30 491 | 9251 kb | 65 Mclocks | 1 | 367 Mclocks | 30 492 | 9251 kb |
pugixml | 17 Mclocks | 40 | 2154 kb | 14 Mclocks | 0 | 32 Mclocks | 40 | 2154 kb |
pugixml (test of non-destructive parsing) | 12 Mclocks | 51 | 1632 kb | 21 Mclocks | 0 | 34 Mclocks | 51 | 1632 kb |
Note, that non-destructive parsing mode was just a test and is not yet in pugixml.
Q: I do not have/want STL support. How can I compile pugixml without STL?
A: There is an undocumented define PUGIXML_NO_STL. If you uncomment the relevant line in pugixml header file, it will compile without any STL classes. The reason it is undocumented are that it will make some documented functions not available (specifically, xml_document::load, that operates on std::istream, xml_node::path function, XPath-related functions and classes and as_utf16/as_utf8 conversion functions). Otherwise, it will work fine.
Q: Do paths that are accepted by first_element_by_path have to end with delimiter?
A: Either way will work, both /path/to/node/ and /path/to/node is fine.
I'm always open for questions; feel free to write them to arseny.kapoulkine@gmail.com.
I'm always open for bug reports; feel free to write them to arseny.kapoulkine@gmail.com. Please provide as much information as possible - version of pugixml, compiling and OS environment (compiler and it's version, STL version, OS version, etc.), the description of the situation in which the bug arises, the code and data files that show the bug, etc. - the more, the better. Though, please, do not send executable files.
Note, that you can also submit bug reports/suggestions at project page.
Here are some improvements that will be done in future versions (they are sorted by priority, the upper ones will get there sooner).
The pugixml parser is distributed under the MIT license:
Copyright (c) 2006-2010 Arseny Kapoulkine Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Revised 25 May, 2010
© Copyright Arseny Kapoulkine 2006-2010. All Rights Reserved.