Skip to content

Latest commit

 

History

History
54 lines (37 loc) · 1.71 KB

README.md

File metadata and controls

54 lines (37 loc) · 1.71 KB

html2md

html2md is a Python script that converts HTML (complete or fragments) into Markdown.

html2md was inspired by Aaron Swartz's html2text and is adding support for missing elements that are common in HTML pages without compromising the Markdown format.

Usage

html2md.py [-h] [-a] [-f] [--fenced_code {github,php}] [-e ENCODING]
                  [infile]

Transform HTML file to Markdown

positional arguments:
  infile

optional arguments:
  -h, --help            show this help message and exit
  -a, --attrs           Enable element attributes in the output (custom
                        Markdown extension)
  -f, --footnotes       Enabled footnote processing (custom Markdown
                        extension)
  --fenced_code {github,php}, --fencedcode {github,php}, --fenced {github,php}
                        Enabled fenced code output
  -e ENCODING, --encoding ENCODING
                        Provide an encoding for reading the input

Using it from your code:

import html2md
print html2md.html2md("<p>Getting rid of HTML with html2md. Yey!</p>")

You can pass in different options

  • footnotes: True|False (default False) convert footnotes
  • fenced_code: default|github|php (default: default) convert code snippets into fenced code
  • attrs: convert HTML attributes. This is a custom extension and should not be used.

License

Short version: OK for open source projects. OK for commercial projects with my signed agreement only.

Long version: see the License file in the project.