| |
| HTMLParser Objects |
| ------------------ |
| |
| In addition to tag methods, the :class:`HTMLParser` class provides some |
| additional methods and instance variables for use within tag methods. |
| |
| |
n | .. attribute:: XXX Class.formatter |
n | .. attribute:: HTMLParser.formatter |
| |
| This is the formatter instance associated with the parser. |
| |
| |
n | .. attribute:: XXX Class.nofill |
n | .. attribute:: HTMLParser.nofill |
| |
| Boolean flag which should be true when whitespace should not be collapsed, or |
| false when it should be. In general, this should only be true when character |
| data is to be treated as "preformatted" text, as within a ``<PRE>`` element. |
| The default value is false. This affects the operation of :meth:`handle_data` |
| and :meth:`save_end`. |
| |
| |
n | .. method:: XXX Class.anchor_bgn(href, name, type) |
n | .. method:: HTMLParser.anchor_bgn(href, name, type) |
| |
| This method is called at the start of an anchor region. The arguments |
| correspond to the attributes of the ``<A>`` tag with the same names. The |
| default implementation maintains a list of hyperlinks (defined by the ``HREF`` |
| attribute for ``<A>`` tags) within the document. The list of hyperlinks is |
| available as the data attribute :attr:`anchorlist`. |
| |
| |
n | .. method:: XXX Class.anchor_end() |
n | .. method:: HTMLParser.anchor_end() |
| |
| This method is called at the end of an anchor region. The default |
| implementation adds a textual footnote marker using an index into the list of |
| hyperlinks created by :meth:`anchor_bgn`. |
| |
| |
n | .. method:: XXX Class.handle_image(source, alt[, ismap[, align[, width[, height]]]]) |
n | .. method:: HTMLParser.handle_image(source, alt[, ismap[, align[, width[, height]]]]) |
| |
| This method is called to handle images. The default implementation simply |
| passes the *alt* value to the :meth:`handle_data` method. |
| |
| |
n | .. method:: XXX Class.save_bgn() |
n | .. method:: HTMLParser.save_bgn() |
| |
| Begins saving character data in a buffer instead of sending it to the formatter |
| object. Retrieve the stored data via :meth:`save_end`. Use of the |
| :meth:`save_bgn` / :meth:`save_end` pair may not be nested. |
| |
| |
n | .. method:: XXX Class.save_end() |
n | .. method:: HTMLParser.save_end() |
| |
| Ends buffering character data and returns all data saved since the preceding |
| call to :meth:`save_bgn`. If the :attr:`nofill` flag is false, whitespace is |
| collapsed to single spaces. A call to this method without a preceding call to |
| :meth:`save_bgn` will raise a :exc:`TypeError` exception. |
| |
| |
| :mod:`htmlentitydefs` --- Definitions of HTML general entities |
| ============================================================== |
| |
| .. module:: htmlentitydefs |
| :synopsis: Definitions of HTML general entities. |
| .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> |
t | |
| .. note:: |
| |
| The :mod:`htmlentitydefs` module has been renamed to :mod:`html.entities` in |
| Python 3.0. The :term:`2to3` tool will automatically adapt imports when |
| converting your sources to 3.0. |
| |
| |
| This module defines three dictionaries, ``name2codepoint``, ``codepoint2name``, |
| and ``entitydefs``. ``entitydefs`` is used by the :mod:`htmllib` module to |
| provide the :attr:`entitydefs` member of the :class:`HTMLParser` class. The |
| definition provided here contains all the entities defined by XHTML 1.0 that |
| can be handled using simple textual substitution in the Latin-1 character set |
| (ISO-8859-1). |