From 186e491d1e7f7bddc04d5169084b224a648aa457 Mon Sep 17 00:00:00 2001 From: "arseny.kapoulkine" Date: Sun, 31 Oct 2010 07:45:27 +0000 Subject: docs: Regenerated HTML documentation git-svn-id: http://pugixml.googlecode.com/svn/trunk@790 99668b35-9821-0410-8761-19e4c4f06640 --- docs/manual/loading.html | 183 +++++++++++++++++++++++++---------------------- 1 file changed, 98 insertions(+), 85 deletions(-) (limited to 'docs/manual/loading.html') diff --git a/docs/manual/loading.html b/docs/manual/loading.html index a3c1515..5b5576b 100644 --- a/docs/manual/loading.html +++ b/docs/manual/loading.html @@ -4,14 +4,15 @@ Loading document - - + + -
pugixml 0.9 manual | + +pugixml 1.0 manual | Overview | Installation | Document: @@ -44,11 +45,11 @@ non-validating parser. This parser is not fully W3C conformant - it can load any valid XML document, but does not perform some well-formedness checks. While considerable effort is made to reject invalid XML documents, some validation - is not performed because of performance reasons. Also some XML transformations - (i.e. EOL handling or attribute value normalization) can impact parsing speed - and thus can be disabled. However for vast majority of XML documents there - is no performance difference between different parsing options. Parsing options - also control whether certain XML nodes are parsed; see Parsing options for + is not performed for performance reasons. Also some XML transformations (i.e. + EOL handling or attribute value normalization) can impact parsing speed and + thus can be disabled. However for vast majority of XML documents there is no + performance difference between different parsing options. Parsing options also + control whether certain XML nodes are parsed; see Parsing options for more information.

@@ -65,43 +66,36 @@

-

- The most common source of XML data is files; pugixml provides a separate - function for loading XML document from file: +

+ The most common source of XML data is files; pugixml provides dedicated functions + for loading an XML document from file:

xml_parse_result xml_document::load_file(const char* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+xml_parse_result xml_document::load_file(const wchar_t* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
 

- This function accepts file path as its first argument, and also two optional - arguments, which specify parsing options (see Parsing options) and - input data encoding (see Encodings). The path has the target + These functions accept the file path as its first argument, and also two + optional arguments, which specify parsing options (see Parsing options) + and input data encoding (see Encodings). The path has the target operating system format, so it can be a relative or absolute one, it should - have the delimiters of target system, it should have the exact case if target - file system is case-sensitive, etc. File path is passed to system file opening - function as is. + have the delimiters of the target system, it should have the exact case if + the target file system is case-sensitive, etc. +

+

+ File path is passed to the system file opening function as is in case of + the first function (which accepts const + char* path); the second function either uses + a special file opening function if it is provided by the runtime library + or converts the path to UTF-8 and uses the system file opening function.

load_file destroys the existing document tree and then tries to load the new tree from the specified file. - The result of the operation is returned in an xml_parse_result - object; this object contains the operation status, and the related information + The result of the operation is returned in an xml_parse_result + object; this object contains the operation status and the related information (i.e. last successfully parsed position in the input file, if parsing fails). See Handling parsing errors for error handling details.

-
- - - - - -
[Note]Note

- As of version 0.9, there is no function for loading XML document from wide - character path. Unfortunately, there is no portable way to do this; the - version 1.0 will provide such function only for platforms with the corresponding - functionality. You can use stream-loading functions as a workaround if - your STL implementation can open file streams via wchar_t - paths. -

This is an example of loading XML document from file (samples/load_file.cpp):

@@ -122,7 +116,7 @@ Loading document from memory

- Sometimes XML data should be loaded from some other source than file, i.e. + Sometimes XML data should be loaded from some other source than a file, i.e. HTTP URL; also you may want to load XML data from file using non-standard functions, i.e. to use your virtual file system facilities or to load XML from gzip-compressed files. All these scenarios require loading document @@ -177,12 +171,12 @@

It is equivalent to calling load_buffer - with size = - strlen(contents). - This function assumes native encoding for input data, so it does not do any - encoding conversion. In general, this function is fine for loading small - documents from string literals, but has more overhead and less functionality - than buffer loading functions. + with size being either strlen(contents) + or wcslen(contents) * sizeof(wchar_t), + depending on the character type. This function assumes native encoding for + input data, so it does not do any encoding conversion. In general, this function + is fine for loading small documents from string literals, but has more overhead + and less functionality than the buffer loading functions.

This is an example of loading XML document from memory using different functions @@ -246,7 +240,7 @@ Loading document from C++ IOstreams

- For additional interoperability pugixml provides functions for loading document + To enhance interoperability, pugixml provides functions for loading document from any object which implements C++ std::istream interface. This allows you to load documents from any standard C++ stream (i.e. file stream) or any third-party compliant implementation (i.e. Boost @@ -267,10 +261,10 @@

load with std::wstream argument treats the stream contents as a wide character stream (encoding - is always encoding_wchar). - Because of this, using load - with wide character streams requires careful (usually platform-specific) - stream setup (i.e. using the imbue + is always encoding_wchar). Because + of this, using load with + wide character streams requires careful (usually platform-specific) stream + setup (i.e. using the imbue function). Generally use of wide streams is discouraged, however it provides you the ability to load documents from non-Unicode encodings, i.e. you can load Shift-JIS encoded data if you set the correct locale. @@ -330,7 +324,7 @@

  • status_io_error is returned by load_file function and by load functions with std::istream/std::wstream arguments; it means that some - I/O error has occured during reading the file/stream. + I/O error has occurred during reading the file/stream.
  • status_out_of_memory means that @@ -407,11 +401,11 @@ member, which contains the offset of last successfully parsed character if parsing failed because of an error in source data; otherwise offset is 0. For parsing efficiency reasons, pugixml does not track the current line during parsing; this offset is in - units of pugi::char_t (bytes for character mode, wide - characters for wide character mode). Many text editors support 'Go To Position' - feature - you can use it to locate the exact error position. Alternatively, - if you're loading the document from memory, you can display the error chunk - along with the error description (see the example code below). + units of pugi::char_t (bytes for character + mode, wide characters for wide character mode). Many text editors support + 'Go To Position' feature - you can use it to locate the exact error position. + Alternatively, if you're loading the document from memory, you can display + the error chunk along with the error description (see the example code below).

    @@ -490,9 +484,15 @@
  • parse_declaration determines if XML document declaration (node with type node_declaration) - are to be put in DOM tree. If this flag is off, it is not put in the - tree, but is still parsed and checked for correctness. This flag is - off by default.

    + is to be put in DOM tree. If this flag is off, it is not put in the tree, + but is still parsed and checked for correctness. This flag is off by default.

    + +
  • +
  • + parse_doctype determines if XML document + type declaration (node with type node_doctype) + is to be put in DOM tree. If this flag is off, it is not put in the tree, + but is still parsed and checked for correctness. This flag is off by default.

  • @@ -525,13 +525,13 @@ the cost of allocating and storing such nodes (both memory and speed-wise) can be significant. For example, after parsing XML string <node> <a/> </node>, <node> element will have three children when parse_ws_pcdata - is set (child with type node_pcdata + is set (child with type node_pcdata and value " ", - child with type node_element - and name "a", and - another child with type node_pcdata - and value " "), - and only one child when parse_ws_pcdata + child with type node_element and + name "a", and another + child with type node_pcdata and value + " "), and only + one child when parse_ws_pcdata is not set. This flag is off by default.
  • @@ -551,7 +551,7 @@ that as pugixml does not handle DTD, the only allowed entities are predefined ones). If character/entity reference can not be expanded, it is left as is, so you can do additional processing later. Reference expansion - is performed in attribute values and PCDATA content. This flag is on by default.

    + is performed on attribute values and PCDATA content. This flag is on by default.

  • @@ -569,9 +569,9 @@ if attribute value normalization should be performed for all attributes. This means, that whitespace characters (new line, tab and space) are replaced with space (' '). - New line characters are always treated as if parse_eol + New line characters are always treated as if parse_eol is set, i.e. \r\n - is converted to single space. This flag is on + is converted to a single space. This flag is on by default.

  • @@ -579,10 +579,10 @@ parse_wnorm_attribute determines if extended attribute value normalization should be performed for all attributes. This means, that after attribute values are normalized as - if parse_wconv_attribute + if parse_wconv_attribute was set, leading and trailing space characters are removed, and all sequences of space characters are replaced by a single space character. The value - of parse_wconv_attribute + of parse_wconv_attribute has no effect if this flag is on. This flag is off by default. @@ -595,24 +595,25 @@

    parse_wconv_attribute option performs transformations that are required by W3C specification for attributes - that are declared as CDATA; parse_wnorm_attribute + that are declared as CDATA; parse_wnorm_attribute performs transformations required for NMTOKENS attributes. - In the absence of document type declaration all attributes behave as if - they are declared as CDATA, thus parse_wconv_attribute + In the absence of document type declaration all attributes should behave + as if they are declared as CDATA, thus parse_wconv_attribute is the default option.

    - Additionally there are two predefined option masks: + Additionally there are three predefined option masks:

    • parse_minimal has all options turned off. This option mask means that pugixml does not add declaration nodes, - PI nodes, CDATA sections and comments to the resulting tree and does - not perform any conversion for input data, so theoretically it is the - fastest mode. However, as discussed above, in practice parse_default is usually equally fast. -

      + document type declaration nodes, PI nodes, CDATA sections and comments + to the resulting tree and does not perform any conversion for input data, + so theoretically it is the fastest mode. However, as mentioned above, + in practice parse_default is usually + equally fast.

    • @@ -622,7 +623,18 @@ entity reference expansion, replacing whitespace characters with spaces in attribute values and performing EOL handling. Note, that PCDATA sections consisting only of whitespace characters are not parsed (by default) - for performance reasons. + for performance reasons.

      + +
    • +
    • + parse_full is the set of flags which adds + nodes of all types to the resulting tree and performs default conversions + for input data. It includes parsing CDATA sections, comments, PI nodes, + document declaration node and document type declaration node, performing + character and entity reference expansion, replacing whitespace characters + with spaces in attribute values and performing EOL handling. Note, that + PCDATA sections consisting only of whitespace characters are not parsed + in this mode.

    @@ -705,36 +717,36 @@

  • encoding_utf8 corresponds to UTF-8 encoding - as defined in Unicode standard; UTF-8 sequences with length equal to - 5 or 6 are not standard and are rejected. + as defined in the Unicode standard; UTF-8 sequences with length equal + to 5 or 6 are not standard and are rejected.
  • encoding_utf16_le corresponds to - little-endian UTF-16 encoding as defined in Unicode standard; surrogate + little-endian UTF-16 encoding as defined in the Unicode standard; surrogate pairs are supported.
  • encoding_utf16_be corresponds to - big-endian UTF-16 encoding as defined in Unicode standard; surrogate + big-endian UTF-16 encoding as defined in the Unicode standard; surrogate pairs are supported.
  • encoding_utf16 corresponds to UTF-16 - encoding as defined in Unicode standard; the endianness is assumed to - be that of target platform. + encoding as defined in the Unicode standard; the endianness is assumed + to be that of the target platform.
  • encoding_utf32_le corresponds to - little-endian UTF-32 encoding as defined in Unicode standard. + little-endian UTF-32 encoding as defined in the Unicode standard.
  • encoding_utf32_be corresponds to - big-endian UTF-32 encoding as defined in Unicode standard. + big-endian UTF-32 encoding as defined in the Unicode standard.
  • encoding_utf32 corresponds to UTF-32 - encoding as defined in Unicode standard; the endianness is assumed to - be that of target platform. + encoding as defined in the Unicode standard; the endianness is assumed + to be that of the target platform.
  • encoding_wchar corresponds to the encoding @@ -823,7 +835,8 @@

  • -
    pugixml 0.9 manual | + +pugixml 1.0 manual | Overview | Installation | Document: -- cgit v1.2.3