Convert HTML to Markdown-formatted text.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Alireza Savand 3697acd581
Merge pull request #270 from jdufresne/posargs
4 months ago
docs Remove unused method unknown_decl 8 months ago
html2text Fix description of --no-wrap-links in help message 4 months ago
test Merge branch 'master' into black 8 months ago
.gitignore Add tox.ini to easily test all platforms locally 8 months ago
.travis.yml Introduce black to automate Python code formatting 8 months ago
AUTHORS.rst Add myself as a contributor 8 months ago
COPYING add COPYING (fix #31) 8 years ago
ChangeLog.rst Handle LEFT-TO-RIGHT MARK after a `<b>` tag 8 months ago
ISSUE_TEMPLATE Update ISSUE_TEMPLATE 3 years ago
MANIFEST.in Add tox.ini to easily test all platforms locally 8 months ago
README.md Add tox.ini to easily test all platforms locally 8 months ago
requirements-dev.txt Remove unused py from requirements-dev.txt 8 months ago
setup.cfg Introduce black to automate Python code formatting 8 months ago
setup.py Introduce black to automate Python code formatting 8 months ago
tox.ini Pass extra tox cli args to tox 4 months ago

README.md

html2text

Build Status Coverage Status Downloads Version Format License

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [filename [encoding]]

Option Description
--version Show program’s version number and exit
-h, --help Show this help message and exit
--ignore-links Don’t include any formatting for links
--escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]…[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.org/project/html2text/

$ pip install html2text

How to run unit tests

tox

To see the coverage results:

coverage html

then open the ./htmlcov/index.html file in your browser.

Documentation

Documentation lives here