Convert HTML to Markdown-formatted text.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Alireza Savand 3697acd581
Merge pull request #270 from jdufresne/posargs
6 months ago
docs Remove unused method unknown_decl 10 months ago
html2text Fix description of --no-wrap-links in help message 6 months ago
test Merge branch 'master' into black 10 months ago
.gitignore Add tox.ini to easily test all platforms locally 10 months ago
.travis.yml Introduce black to automate Python code formatting 10 months ago
AUTHORS.rst Add myself as a contributor 10 months ago
COPYING add COPYING (fix #31) 8 years ago
ChangeLog.rst Handle LEFT-TO-RIGHT MARK after a `<b>` tag 10 months ago
ISSUE_TEMPLATE Update ISSUE_TEMPLATE 3 years ago
MANIFEST.in Add tox.ini to easily test all platforms locally 10 months ago
README.md Add tox.ini to easily test all platforms locally 10 months ago
requirements-dev.txt Remove unused py from requirements-dev.txt 11 months ago
setup.cfg Introduce black to automate Python code formatting 10 months ago
setup.py Introduce black to automate Python code formatting 10 months ago
tox.ini Pass extra tox cli args to tox 6 months ago

README.md

html2text

Build Status Coverage Status Downloads Version Format License

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [filename [encoding]]

Option Description
--version Show program’s version number and exit
-h, --help Show this help message and exit
--ignore-links Don’t include any formatting for links
--escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]…[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.org/project/html2text/

$ pip install html2text

How to run unit tests

tox

To see the coverage results:

coverage html

then open the ./htmlcov/index.html file in your browser.

Documentation

Documentation lives here