n | |
| :mod:`sgmllib` --- Simple SGML parser |
| ===================================== |
| |
| .. module:: sgmllib |
| :synopsis: Only as much of an SGML parser as needed to parse HTML. |
n | :deprecated: |
| |
n | .. deprecated:: 2.6 |
| The :mod:`sgmllib` module has been removed in Python 3.0. |
| |
| .. index:: single: SGML |
| |
| This module defines a class :class:`SGMLParser` which serves as the basis for |
| parsing text files formatted in SGML (Standard Generalized Mark-up Language). |
| In fact, it does not provide a full SGML parser --- it only parses SGML insofar |
| as it is used by HTML, and the module only exists as a base for the |
| :mod:`htmllib` module. Another HTML parser which supports XHTML and offers a |
| somewhat different interface is available in the :mod:`HTMLParser` module. |
| |
| |
| .. class:: SGMLParser() |
| |
| The :class:`SGMLParser` class is instantiated without arguments. The parser is |
| hardcoded to recognize the following constructs: |
| |
n | * Opening and closing tags of the form ``<tag attr="value" ...>`` and |
n | * Opening and closing tags of the form ``<tag attr="value" ...>`` and |
| ``</tag>``, respectively. |
| |
n | * Numeric character references of the form ``&#name;``. |
n | * Numeric character references of the form ``&#name;``. |
| |
n | * Entity references of the form ``&name;``. |
n | * Entity references of the form ``&name;``. |
| |
n | * SGML comments of the form ``<!--text-->``. Note that spaces, tabs, and |
n | * SGML comments of the form ``<!--text-->``. Note that spaces, tabs, and |
| newlines are allowed between the trailing ``>`` and the immediately preceding |
| ``--``. |
| |
| A single exception is defined as well: |
| |
| |
| .. exception:: SGMLParseError |
| |
| overridden by a derived class; the base class implementation does nothing. |
| |
| |
| .. method:: SGMLParser.handle_charref(ref) |
| |
| This method is called to process a character reference of the form ``&#ref;``. |
| The base implementation uses :meth:`convert_charref` to convert the reference to |
| a string. If that method returns a string, it is passed to :meth:`handle_data`, |
t | otherwise :meth:`unknown_charref(ref)` is called to handle the error. |
t | otherwise ``unknown_charref(ref)`` is called to handle the error. |
| |
| .. versionchanged:: 2.5 |
| Use :meth:`convert_charref` instead of hard-coding the conversion. |
| |
| |
| .. method:: SGMLParser.convert_charref(ref) |
| |
| Convert a character reference to a string, or ``None``. *ref* is the reference |