n | |
| :mod:`urlparse` --- Parse URLs into components |
| ============================================== |
| |
| .. module:: urlparse |
n | |
n | :synopsis: Parse URLs into or assemble them from components. |
| |
| |
| |
| .. index:: |
| single: WWW |
| single: World Wide Web |
| single: URL |
| pair: URL; parsing |
| pair: relative; URL |
| |
n | .. note:: |
| The :mod:`urlparse` module is renamed to :mod:`urllib.parse` in Python 3.0. |
| The :term:`2to3` tool will automatically adapt imports when converting |
| your sources to 3.0. |
| |
| |
| This module defines a standard interface to break Uniform Resource Locator (URL) |
| strings up in components (addressing scheme, network location, path etc.), to |
| combine the components back into a URL string, and to convert a "relative URL" |
| to an absolute URL given a "base URL." |
| |
| The module has been designed to match the Internet RFC on Relative Uniform |
| Resource Locators (and discovered a bug in an earlier draft!). It supports the |
| following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``, |
| .. function:: urlparse(urlstring[, default_scheme[, allow_fragments]]) |
| |
| Parse a URL into six components, returning a 6-tuple. This corresponds to the |
| general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``. |
| Each tuple item is a string, possibly empty. The components are not broken up in |
| smaller parts (for example, the network location is a single string), and % |
| escapes are not expanded. The delimiters as shown above are not part of the |
| result, except for a leading slash in the *path* component, which is retained if |
n | present. For example:: |
n | present. For example: |
| |
| >>> from urlparse import urlparse |
| >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
n | >>> o |
n | >>> o # doctest: +NORMALIZE_WHITESPACE |
| ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') |
| ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', |
| params='', query='', fragment='') |
| >>> o.scheme |
| 'http' |
| >>> o.port |
| 80 |
| >>> o.geturl() |
| 'http://www.cwi.nl:80/%7Eguido/Python.html' |
| |
| If the *default_scheme* argument is specified, it gives the default addressing |
| | :attr:`password` | | Password | :const:`None` | |
| +------------------+-------+--------------------------+----------------------+ |
| | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| +------------------+-------+--------------------------+----------------------+ |
| | :attr:`port` | | Port number as integer, | :const:`None` | |
| | | | if present | | |
| +------------------+-------+--------------------------+----------------------+ |
| |
n | See section :ref:`urlparse-result-object`, "Results of :func:`urlparse` and |
n | See section :ref:`urlparse-result-object` for more information on the result |
| :func:`urlsplit`," for more information on the result object. |
| object. |
| |
| .. versionchanged:: 2.5 |
| Added attributes to return value. |
| |
n | .. function:: parse_qs(qs[, keep_blank_values[, strict_parsing]]) |
| |
| Parse a query string given as a string argument (data of type |
| :mimetype:`application/x-www-form-urlencoded`). Data are returned as a |
| dictionary. The dictionary keys are the unique query variable names and the |
| values are lists of values for each name. |
| |
| The optional argument *keep_blank_values* is a flag indicating whether blank |
| values in URL encoded queries should be treated as blank strings. A true value |
| indicates that blanks should be retained as blank strings. The default false |
| value indicates that blank values are to be ignored and treated as if they were |
| not included. |
| |
| The optional argument *strict_parsing* is a flag indicating what to do with |
| parsing errors. If false (the default), errors are silently ignored. If true, |
| errors raise a :exc:`ValueError` exception. |
| |
| Use the :func:`urllib.urlencode` function to convert such dictionaries into |
| query strings. |
| |
| |
| .. function:: parse_qsl(qs[, keep_blank_values[, strict_parsing]]) |
| |
| Parse a query string given as a string argument (data of type |
| :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of |
| name, value pairs. |
| |
| The optional argument *keep_blank_values* is a flag indicating whether blank |
| values in URL encoded queries should be treated as blank strings. A true value |
| indicates that blanks should be retained as blank strings. The default false |
| value indicates that blank values are to be ignored and treated as if they were |
| not included. |
| |
| The optional argument *strict_parsing* is a flag indicating what to do with |
| parsing errors. If false (the default), errors are silently ignored. If true, |
| errors raise a :exc:`ValueError` exception. |
| |
| Use the :func:`urllib.urlencode` function to convert such lists of pairs into |
| query strings. |
| |
| .. function:: urlunparse(parts) |
| |
| Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument |
n | be any six-item iterable. This may result in a slightly different, but |
n | can be any six-item iterable. This may result in a slightly different, but |
| equivalent URL, if the URL that was parsed originally had unnecessary delimiters |
| (for example, a ? with an empty query; the RFC states that these are |
| equivalent). |
| |
| |
| .. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]]) |
| |
| This is similar to :func:`urlparse`, but does not split the params from the URL. |
| | :attr:`password` | | Password | :const:`None` | |
| +------------------+-------+-------------------------+----------------------+ |
| | :attr:`hostname` | | Host name (lower case) | :const:`None` | |
| +------------------+-------+-------------------------+----------------------+ |
| | :attr:`port` | | Port number as integer, | :const:`None` | |
| | | | if present | | |
| +------------------+-------+-------------------------+----------------------+ |
| |
n | See section :ref:`urlparse-result-object`, "Results of :func:`urlparse` and |
n | See section :ref:`urlparse-result-object` for more information on the result |
| :func:`urlsplit`," for more information on the result object. |
| object. |
| |
| .. versionadded:: 2.2 |
| |
| .. versionchanged:: 2.5 |
| Added attributes to return value. |
| |
| |
| .. function:: urlunsplit(parts) |
| |
| Combine the elements of a tuple as returned by :func:`urlsplit` into a complete |
n | URL as a string. The *parts* argument be any five-item iterable. This may result |
n | URL as a string. The *parts* argument can be any five-item iterable. This may |
| in a slightly different, but equivalent URL, if the URL that was parsed |
| result in a slightly different, but equivalent URL, if the URL that was parsed |
| originally had unnecessary delimiters (for example, a ? with an empty query; the |
| RFC states that these are equivalent). |
| |
| .. versionadded:: 2.2 |
| |
| |
| .. function:: urljoin(base, url[, allow_fragments]) |
| |
n | Construct a full ("absolute") URL by combining a "base URL" (*base*) with a |
n | Construct a full ("absolute") URL by combining a "base URL" (*base*) with |
| "relative URL" (*url*). Informally, this uses components of the base URL, in |
| another URL (*url*). Informally, this uses components of the base URL, in |
| particular the addressing scheme, the network location and (part of) the path, |
n | to provide missing components in the relative URL. For example:: |
n | to provide missing components in the relative URL. For example: |
| |
| >>> from urlparse import urljoin |
| >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
| 'http://www.cwi.nl/%7Eguido/FAQ.html' |
| |
| The *allow_fragments* argument has the same meaning and default as for |
| :func:`urlparse`. |
n | |
| .. note:: |
| |
| If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``), |
| the *url*'s host name and/or scheme will be present in the result. For example: |
| |
| .. doctest:: |
| |
| >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', |
| ... '//www.python.org/%7Eguido') |
| 'http://www.python.org/%7Eguido' |
| |
| If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and |
| :func:`urlunsplit`, removing possible *scheme* and *netloc* parts. |
| |
| |
| .. function:: urldefrag(url) |
| |
| If *url* contains a fragment identifier, returns a modified version of *url* |
| with no fragment identifier, and the fragment identifier as a separate string. |
| If there is no fragment identifier in *url*, returns *url* unmodified and an |
| empty string. |
| .. method:: ParseResult.geturl() |
| |
| Return the re-combined version of the original URL as a string. This may differ |
| from the original URL in that the scheme will always be normalized to lower case |
| and empty components may be dropped. Specifically, empty parameters, queries, |
| and fragment identifiers will be removed. |
| |
| The result of this method is a fixpoint if passed back through the original |
n | parsing function:: |
n | parsing function: |
| |
| >>> import urlparse |
| >>> url = 'HTTP://www.Python.org/doc/#' |
| |
| >>> r1 = urlparse.urlsplit(url) |
| >>> r1.geturl() |
| 'http://www.Python.org/doc/' |
| |
| >>> r2 = urlparse.urlsplit(r1.geturl()) |
| >>> r2.geturl() |
| 'http://www.Python.org/doc/' |
| |
| .. versionadded:: 2.5 |
| |
t | The following classes provide the implementations of the parse results:: |
t | The following classes provide the implementations of the parse results: |
| |
| |
| .. class:: BaseResult |
| |
| Base class for the concrete result classes. This provides most of the attribute |
| definitions. It does not provide a :meth:`geturl` method. It is derived from |
| :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__` |
| methods. |