| Unlike the parser in :mod:`htmllib`, this parser is not based on the SGML parser |
| in :mod:`sgmllib`. |
| |
| |
| .. class:: HTMLParser() |
| |
| The :class:`HTMLParser` class is instantiated without arguments. |
| |
n | An HTMLParser instance is fed HTML data and calls handler functions when tags |
n | An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags |
| begin and end. The :class:`HTMLParser` class is meant to be overridden by the |
| user to provide a desired behavior. |
| |
| Unlike the parser in :mod:`htmllib`, this parser does not check that end tags |
| match start tags or call the end-tag handler for elements which are closed |
| implicitly by closing an outer element. |
| |
| An exception is defined as well: |
| attributes can be preserved, etc.). |
| |
| |
| .. method:: HTMLParser.handle_starttag(tag, attrs) |
| |
| This method is called to handle the start of a tag. It is intended to be |
| overridden by a derived class; the base class implementation does nothing. |
| |
n | The *tag* argument is the name of the tag converted to lower case. The *attrs* |
n | The *tag* argument is the name of the tag converted to lower case. The *attrs* |
| argument is a list of ``(name, value)`` pairs containing the attributes found |
t | inside the tag's ``<>`` brackets. The *name* will be translated to lower case |
t | inside the tag's ``<>`` brackets. The *name* will be translated to lower case, |
| and double quotes and backslashes in the *value* have been interpreted. For |
| and quotes in the *value* have been removed, and character and entity references |
| instance, for the tag ``<A HREF="http://www.cwi.nl/">``, this method would be |
| have been replaced. For instance, for the tag ``<A |
| HREF="http://www.cwi.nl/">``, this method would be called as |
| called as ``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``. |
| ``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``. |
| |
| .. versionchanged:: 2.6 |
| All entity references from :mod:`htmlentitydefs` are now replaced in the attribute |
| values. |
| |
| |
| .. method:: HTMLParser.handle_startendtag(tag, attrs) |
| |
| Similar to :meth:`handle_starttag`, but called when the parser encounters an |
| XHTML-style empty tag (``<a .../>``). This method may be overridden by |
| subclasses which require this particular lexical information; the default |
| implementation simple calls :meth:`handle_starttag` and :meth:`handle_endtag`. |