From ce4ac177801e31ffd309c91cb9e464d8cab205a3 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Thu, 13 Aug 2015 14:07:19 +0100 Subject: docs: Clarify UTF-8 vs wchar_t memory efficiency --- docs/manual.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/manual.adoc') diff --git a/docs/manual.adoc b/docs/manual.adoc index cd3d8f8..af48a10 100644 --- a/docs/manual.adoc +++ b/docs/manual.adoc @@ -420,7 +420,7 @@ bool xml_node::set_name(const wchar_t* value); [[char_t]][[string_t]] There is a special type, `pugi::char_t`, that is defined as the character type and depends on the library configuration; it will be also used in the documentation hereafter. There is also a type `pugi::string_t`, which is defined as the STL string of the character type; it corresponds to `std::string` in char mode and to `std::wstring` in wchar_t mode. -In addition to the interface, the internal implementation changes to store XML data as `pugi::char_t`; this means that these two modes have different memory usage characteristics. The conversion to `pugi::char_t` upon document loading and from `pugi::char_t` upon document saving happen automatically, which also carries minor performance penalty. The general advice however is to select the character mode based on usage scenario, i.e. if UTF-8 is inconvenient to process and most of your XML data is non-ASCII, wchar_t mode is probably a better choice. +In addition to the interface, the internal implementation changes to store XML data as `pugi::char_t`; this means that these two modes have different memory usage characteristics - generally UTF-8 mode is more memory and performance efficient, especially if `sizeof(wchar_t)` is 4. The conversion to `pugi::char_t` upon document loading and from `pugi::char_t` upon document saving happen automatically, which also carries minor performance penalty. The general advice however is to select the character mode based on usage scenario, i.e. if UTF-8 is inconvenient to process and most of your XML data is non-ASCII, wchar_t mode is probably a better choice. [[as_utf8]][[as_wide]] There are cases when you'll have to convert string data between UTF-8 and wchar_t encodings; the following helper functions are provided for such purposes: -- cgit v1.2.3