From 26ab424b0302f73704c58b3b6deb62a85bfacba8 Mon Sep 17 00:00:00 2001
From: "arseny.kapoulkine"
pugixml is just another XML parser. This is a successor to
-pugxml (well, to be honest, the only part
-that is left as is is wildcard matching code; the rest was either heavily refactored or rewritten
-from scratch). The main features are: Okay, you might ask - what's the catch? Everything is so cute - it's small, fast, robust, clean solution
-for parsing XML. What is missing? Ok, we are fair developers - so here is a misfeature list: Here there is a small collection of code snippets to help the reader begin using pugixml. For everything you can do with pugixml, you need a document. There are several ways to obtain it: This sample should print a row of 1, meaning that all load/parse functions returned true (of course, if sample.xml does not exist or is malformed, there will be 0's) Once you have your document, there are several ways to extract data from it. Finally, let's get into more details about tree modification and saving. Note, that these examples do not cover the whole pugixml API. For further information, look into reference section. pugixml is a library for parsing XML files, which means that you give it XML data some way,
-and it gives you the DOM tree and the ways to traverse it and to get some useful information from it.
-The library source consist of two headers, pugixml.hpp and pugiconfig.hpp, and two source
-files, pugixml.cpp and pugixpath.cpp.
-You can either compile cpp files in your project, or build a static library.
-All library classes reside in namespace pugi, so you can either use fully qualified
-names (pugi::xml_node) or write a using declaration (using namespace pugi;, using
-pugi::xml_node) and use plain names. All classes have eitther xml_ or xpath_ prefix. By default it's supposed that you compile the source file with your project (add it into your
-project, or add relevant entry in your Makefile, or do whatever you need to do with your compilation
-environment). The library is written in standard-conformant C++ and was tested on following platforms:
-
-
-
-
-
-
- pugixml documentation
-
-Contents
-
-
-
-
-
-
-Introduction
-
-
-
-
-
-
-
-
-1 The tests were done on a 1 mb XML file with a 4 levels deep tree
-with a small amount of text. The times are that of building DOM tree. pugixml was run in default
-parsing mode, so differences in speed are even bigger with minimal settings.
-2 Obviously, you can't estimate time of building DOM tree for a
-SAX parser, so the times of reading the data into storage that closely represented the structure of
-an XML file were measured.
-
-
-
-
-Quick start
-
-
-
-
-#include <fstream>
-#include <iostream>
-
-#include "pugixml.hpp"
-
-using namespace std;
-using namespace pugi;
-
-int main()
-{
- // Several ways to get XML document
-
- {
- // Load from string
- xml_document doc;
-
- cout << doc.load("<sample-xml>some text <b>in bold</b> here</sample-xml>") << endl;
- }
-
- {
- // Load from file
- xml_document doc;
-
- cout << doc.load_file("sample.xml") << endl;
- }
-
- {
- // Load from any input stream (STL)
- xml_document doc;
-
- std::ifstream in("sample.xml");
- cout << doc.load(in) << endl;
- }
-
- {
- // More advanced: parse the specified string without duplicating it
- xml_document doc;
-
- char* s = new char[100];
- strcpy(s, "<sample-xml>some text <b>in bold</b> here</sample-xml>");
- cout << doc.parse(transfer_ownership_tag(), s) << endl;
- }
-
- {
- // Even more advanced: assume manual lifetime control
- xml_document doc;
-
- char* s = new char[100];
- strcpy(s, "<sample-xml>some text <b>in bold</b> here</sample-xml>");
- cout << doc.parse(s) << endl;
-
- delete[] s; // <-- after this point, all string contents of document is invalid!
- }
-
- {
- // Or just create document from code?
- xml_document doc;
-
- // add nodes to document (see next samples)
- }
-}
-
_Winnie C++ Colorizer
-
-
-#include <iostream>
-
-#include "pugixml.hpp"
-
-using namespace std;
-using namespace pugi;
-
-struct bookstore_traverser: public xml_tree_walker
-{
- virtual bool for_each(xml_node& n)
- {
- for (int i = 0; i < depth(); ++i) cout << " "; // indentation
-
- if (n.type() == node_element) cout << n.name() << endl;
- else cout << n.value() << endl;
-
- return true; // continue traversal
- }
-};
-
-int main()
-{
- xml_document doc;
- doc.load("<bookstore><book title='ShaderX'><price>3</price></book><book title='GPU Gems'><price>4</price></book></bookstore>");
-
- // If you want to iterate through nodes...
-
- {
- // Get a bookstore node
- xml_node bookstore = doc.child("bookstore");
-
- // Iterate through books
- for (xml_node book = bookstore.child("book"); book; book = book.next_sibling("book"))
- {
- cout << "Book " << book.attribute("title").value() << ", price " << book.child("price").first_child().value() << endl;
- }
-
- // Output:
- // Book ShaderX, price 3
- // Book GPU Gems, price 4
- }
-
- {
- // Alternative way to get a bookstore node (wildcards)
- xml_node bookstore = doc.child_w("*[sS]tore"); // this will select bookstore, anyStore, Store, etc.
-
- // Iterate through books with STL compatible iterators
- for (xml_node::iterator it = bookstore.begin(); it != bookstore.end(); ++it)
- {
- // Note the use of helper function child_value()
- cout << "Book " << it->attribute("title").value() << ", price " << it->child_value("price") << endl;
- }
-
- // Output:
- // Book ShaderX, price 3
- // Book GPU Gems, price 4
- }
-
- {
- // You can also traverse the whole tree (or a subtree)
- bookstore_traverser t;
-
- doc.traverse(t);
-
- // Output:
- // bookstore
- // book
- // price
- // 3
- // book
- // price
- // 4
-
- doc.first_child().traverse(t);
-
- // Output:
- // book
- // price
- // 3
- // book
- // price
- // 4
- }
-
- // If you want a distinct node...
-
- {
- // You can specify the way to it through child() functions
- cout << doc.child("bookstore").child("book").next_sibling().attribute("title").value() << endl;
-
- // Output:
- // GPU Gems
-
- // You can use a sometimes convenient path function
- cout << doc.first_element_by_path("bookstore/book/price").child_value() << endl;
-
- // Output:
- // 3
-
- // And you can use powerful XPath expressions
- cout << doc.select_single_node("/bookstore/book[@title = 'ShaderX']/price").node().child_value() << endl;
-
- // Output:
- // 3
-
- // Of course, XPath is much more powerful
-
- // Compile query that prints total price of all Gems book in store
- xpath_query query("sum(/bookstore/book[contains(@title, 'Gems')]/price)");
-
- cout << query.evaluate_number(doc) << endl;
-
- // Output:
- // 4
-
- // You can apply the same XPath query to any document. For example, let's add another Gems
- // book (more detail about modifying tree in next sample):
- xml_node book = doc.child("bookstore").append_child();
- book.set_name("book");
- book.append_attribute("title") = "Game Programming Gems 2";
-
- xml_node price = book.append_child();
- price.set_name("price");
-
- xml_node price_text = price.append_child(node_pcdata);
- price_text.set_value("5.3");
-
- // Now let's reevaluate query
- cout << query.evaluate_number(doc) << endl;
-
- // Output:
- // 9.3
- }
-}
-
_Winnie C++ Colorizer
-
-
-#include <iostream>
-
-#include "pugixml.hpp"
-
-using namespace std;
-using namespace pugi;
-
-int main()
-{
- // For this example, we'll start with an empty document and create nodes in it from code
- xml_document doc;
-
- // Append several children and set values/names at once
- doc.append_child(node_comment).set_value("This is a test comment");
- doc.append_child().set_name("application");
-
- // Let's add a few modules
- xml_node application = doc.child("application");
-
- // Save node wrapper for convenience
- xml_node module_a = application.append_child();
- module_a.set_name("module");
-
- // Add an attribute, immediately setting it's value
- module_a.append_attribute("name").set_value("A");
-
- // You can use operator=
- module_a.append_attribute("folder") = "/work/app/module_a";
-
- // Or even assign numbers
- module_a.append_attribute("status") = 85.4;
-
- // Let's add another module
- xml_node module_c = application.append_child();
- module_c.set_name("module");
- module_c.append_attribute("name") = "C";
- module_c.append_attribute("folder") = "/work/app/module_c";
-
- // Oh, we missed module B. Not a problem, let's insert it before module C
- xml_node module_b = application.insert_child_before(node_element, module_c);
- module_b.set_name("module");
- module_b.append_attribute("folder") = "/work/app/module_b";
-
- // We can do the same thing for attributes
- module_b.insert_attribute_before("name", module_b.attribute("folder")) = "B";
-
- // Let's add some text in module A
- module_a.append_child(node_pcdata).set_value("Module A description");
-
- // Well, there's not much left to do here. Let's output our document to file using several formatting options
-
- doc.save_file("sample_saved_1.xml");
-
- // Contents of file sample_saved_1.xml (tab size = 4):
- // <?xml version="1.0"?>
- // <!--This is a test comment-->
- // <application>
- // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module>
- // <module name="B" folder="/work/app/module_b" />
- // <module name="C" folder="/work/app/module_c" />
- // </application>
-
- // Let's use two spaces for indentation instead of tab character
- doc.save_file("sample_saved_2.xml", " ");
-
- // Contents of file sample_saved_2.xml:
- // <?xml version="1.0"?>
- // <!--This is a test comment-->
- // <application>
- // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module>
- // <module name="B" folder="/work/app/module_b" />
- // <module name="C" folder="/work/app/module_c" />
- // </application>
-
- // Let's save a raw XML file
- doc.save_file("sample_saved_3.xml", "", format_raw);
-
- // Contents of file sample_saved_3.xml:
- // <?xml version="1.0"?><!--This is a test comment--><application><module name="A" folder="/work/app/module_a" status="85.4">Module A description</module><module name="B" folder="/work/app/module_b" /><module name="C" folder="/work/app/module_c" /></application>
-
- // Finally, you can print a subtree to any output stream (including cout)
- xml_writer_stream writer(cout);
- doc.child("application").child("module").print(writer);
-
- // Output:
- // <module name="A" folder="/work/app/module_a" status="85.4">Module A description</module>
-}
-
_Winnie C++ Colorizer
-
-
-Reference
-
-
-
-
The documentation for pugixml classes, functions and constants is available here.
- -pugixml is not a compliant XML parser. The main reason for that is that it does not reject -most malformed XML files. The more or less complete list of incompatibilities follows (I will be talking -of ones when using parse_w3c mode): - -
This table summarizes the comparison in terms of time and memory consumption between pugixml and -other parsers. For DOM parsers (all, except Expat, irrXML and SAX parser of XercesC), the process is -as follows:
- -For SAX parsers, the parse step is skipped (hence the N/A in relevant table cells), structure is -filled during 'walk' step.
- -For all parsers, 'total time' column means total time spent on the whole process, 'total allocs' - -total allocation count, 'total memory' - peak memory consumption for the whole process.
- -The tests were performed on a 1 Mb XML file with a small amount of text. They were compiled with -Microsoft Visual C++ 8.0 (2005) compiler in Release mode, with checked iterators/secure STL turned -off. The test system is AMD Sempron 2500+, 512 Mb RAM.
- -parser | -parse time | parse allocs | parse memory | -walk time | walk allocs | -total time | total allocs | total memory |
---|---|---|---|---|---|---|---|---|
irrXML | -N/A | N/A | N/A | -352 Mclocks | 697 245 | -356 Mclocks | 697 284 | 906 kb |
Expat | -N/A | N/A | N/A | -97 Mclocks | 19 | -97 Mclocks | 23 | 1028 kb |
TinyXML | -168 Mclocks | 50 163 | 5447 kb | -37 Mclocks | 0 | -242 Mclocks | 50 163 | 5447 kb |
PugXML | -100 Mclocks | 106 597 | 2747 kb | -38 Mclocks | 0 | -206 Mclocks | 131 677 | 2855 kb |
XercesC SAX | -N/A | N/A | N/A | -411 Mclocks | 70 380 | -411 Mclocks | 70 495 | 243 kb |
XercesC DOM | -300 Mclocks | 30 491 | 9251 kb | -65 Mclocks | 1 | -367 Mclocks | 30 492 | 9251 kb |
pugixml | -17 Mclocks | 40 | 2154 kb | -14 Mclocks | 0 | -32 Mclocks | 40 | 2154 kb |
pugixml (test of non-destructive parsing) | -12 Mclocks | 51 | 1632 kb | -21 Mclocks | 0 | -34 Mclocks | 51 | 1632 kb |
Note, that non-destructive parsing mode was just a test and is not yet in pugixml.
- -Q: I do not have/want STL support. How can I compile pugixml without STL?
-A: There is an undocumented define PUGIXML_NO_STL. If you uncomment the relevant line -in pugixml header file, it will compile without any STL classes. The reason it is undocumented -are that it will make some documented functions not available (specifically, xml_document::load, that -operates on std::istream, xml_node::path function, XPath-related functions and classes and as_utf16/as_utf8 -conversion functions). Otherwise, it will work fine.
- -Q: Do paths that are accepted by first_element_by_path have to end with delimiter?
-A: Either way will work, both /path/to/node/ and /path/to/node is fine.
- -I'm always open for questions; feel free to write them to arseny.kapoulkine@gmail.com. -
- -I'm always open for bug reports; feel free to write them to arseny.kapoulkine@gmail.com. -Please provide as much information as possible - version of pugixml, compiling and OS environment -(compiler and it's version, STL version, OS version, etc.), the description of the situation in which -the bug arises, the code and data files that show the bug, etc. - the more, the better. Though, please, -do not send executable files.
- -Note, that you can also submit bug reports/suggestions at -project page. - -
Here are some improvements that will be done in future versions (they are sorted by priority, the -upper ones will get there sooner).
- -The pugixml parser is distributed under the MIT license:
- --Copyright (c) 2006-2010 Arseny Kapoulkine - -Permission is hereby granted, free of charge, to any person -obtaining a copy of this software and associated documentation -files (the "Software"), to deal in the Software without -restriction, including without limitation the rights to use, -copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the -Software is furnished to do so, subject to the following -conditions: - -The above copyright notice and this permission notice shall be -included in all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES -OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT -HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, -WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR -OTHER DEALINGS IN THE SOFTWARE. -- -
Revised 25 May, 2010
-© Copyright Arseny Kapoulkine 2006-2010. All Rights Reserved.
- - -- cgit v1.2.3