| :mod:`xml.parsers.expat` --- Fast XML parsing using Expat |
| ========================================================= |
| |
| .. module:: xml.parsers.expat |
| :synopsis: An interface to the Expat non-validating XML parser. |
| .. moduleauthor:: Paul Prescod <paul@prescod.net> |
| |
| |
n | .. % Markup notes: |
n | .. Markup notes: |
| .. % |
| |
| .. % Many of the attributes of the XMLParser objects are callbacks. |
| Many of the attributes of the XMLParser objects are callbacks. Since |
| .. % Since signature information must be presented, these are described |
| signature information must be presented, these are described using the method |
| .. % using the methoddesc environment. Since they are attributes which |
| directive. Since they are attributes which are set by client code, in-text |
| .. % are set by client code, in-text references to these attributes |
| references to these attributes should be marked using the :member: role. |
| .. % should be marked using the \member macro and should not include the |
| .. % parentheses used when marking functions and methods. |
| |
| .. versionadded:: 2.0 |
| |
| .. index:: single: Expat |
| |
n | The :mod:`xml.parsers.expat` module is a Python interface to the Expat non- |
n | The :mod:`xml.parsers.expat` module is a Python interface to the Expat |
| validating XML parser. The module provides a single extension type, |
| non-validating XML parser. The module provides a single extension type, |
| :class:`xmlparser`, that represents the current state of an XML parser. After |
| an :class:`xmlparser` object has been created, various attributes of the object |
| can be set to handler functions. When an XML document is then fed to the |
| parser, the handler functions are called for the character data and markup in |
| the XML document. |
| |
| .. index:: module: pyexpat |
| |
| parser. Direct use of the :mod:`pyexpat` module is deprecated. |
| |
| This module provides one exception and one type object: |
| |
| |
| .. exception:: ExpatError |
| |
| The exception raised when Expat reports an error. See section |
n | :ref:`expaterror-objects`, "ExpatError Exceptions," for more information on |
n | :ref:`expaterror-objects` for more information on interpreting Expat errors. |
| interpreting Expat errors. |
| |
| |
| .. exception:: error |
| |
| Alias for :exc:`ExpatError`. |
| |
| |
| .. data:: XMLParserType |
| |
| |
| .. function:: ParserCreate([encoding[, namespace_separator]]) |
| |
| Creates and returns a new :class:`xmlparser` object. *encoding*, if specified, |
| must be a string naming the encoding used by the XML data. Expat doesn't |
| support as many encodings as Python does, and its repertoire of encodings can't |
| be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If |
n | *encoding* is given it will override the implicit or explicit encoding of the |
n | *encoding* [1]_ is given it will override the implicit or explicit encoding of the |
| document. |
| |
| Expat can optionally do XML namespace processing for you, enabled by providing a |
| value for *namespace_separator*. The value must be a one-character string; a |
| :exc:`ValueError` will be raised if the string has an illegal length (``None`` |
| is considered the same as omission). When namespace processing is enabled, |
| element type names and attribute names that belong to a namespace will be |
| expanded. The element name passed to the element handlers |
| |
| Parses the contents of the string *data*, calling the appropriate handler |
| functions to process the parsed data. *isfinal* must be true on the final call |
| to this method. *data* can be the empty string at any time. |
| |
| |
| .. method:: xmlparser.ParseFile(file) |
| |
n | Parse XML data reading from the object *file*. *file* only needs to provide the |
n | Parse XML data reading from the object *file*. *file* only needs to provide |
| :meth:`read(nbytes)` method, returning the empty string when there's no more |
| the ``read(nbytes)`` method, returning the empty string when there's no more |
| data. |
| |
| |
| .. method:: xmlparser.SetBase(base) |
| |
| Sets the base to be used for resolving relative URIs in system identifiers in |
| declarations. Resolving relative identifiers is left to the application: this |
| value will be passed through as the *base* argument to the |
| |
| .. versionadded:: 2.3 |
| |
| :class:`xmlparser` objects have the following attributes: |
| |
| |
| .. attribute:: xmlparser.buffer_size |
| |
n | The size of the buffer used when :attr:`buffer_text` is true. This value cannot |
n | The size of the buffer used when :attr:`buffer_text` is true. |
| be changed at this time. |
| A new buffer size can be set by assigning a new integer value |
| to this attribute. |
| When the size is changed, the buffer will be flushed. |
| |
| .. versionadded:: 2.3 |
| |
n | .. versionchanged:: 2.6 |
| The buffer size can now be changed. |
| |
| .. attribute:: xmlparser.buffer_text |
| |
| Setting this to true causes the :class:`xmlparser` object to buffer textual |
| content returned by Expat to avoid multiple calls to the |
| :meth:`CharacterDataHandler` callback whenever possible. This can improve |
| performance substantially since Expat normally breaks character data into chunks |
| at every line ending. This attribute is false by default, and may be changed at |
| The requested operation was made on a parser which was finished parsing input, |
| but isn't allowed. This includes attempts to provide additional input or to |
| stop the parser. |
| |
| |
| .. data:: XML_ERROR_SUSPEND_PE |
| :noindex: |
| |
t | |
| .. rubric:: Footnotes |
| |
| .. [#] The encoding string included in XML output should conform to the |
| appropriate standards. For example, "UTF-8" is valid, but "UTF8" is |
| not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl |
| and http://www.iana.org/assignments/character-sets . |
| |