If you’re a content creator, there may be some situations where you need to convert HTML to Markdown. While there are a ton of online HTML to Markdown conversion tools like Turndown, it’s much faster to perform the conversion locally on your computer – especially if you have to process a lot of files. In this article, you’ll learn how to convert HTML to Markdown in Python.
To convert HTML to Markdown, I recommend using the Markdownify package by Matthew Dapena-Tretter. Use pip to install Markdownify.
pip install markdownify
After installing Markdownify, converting HTML to Markdown is super easy. Here’s a simple example with a <h1>"Hello, World!"</h1>
HTML string.
from markdownify import markdownify
html = markdownify("<h1>Hello, World!</h1>")
print(html)
## Hello World
## ===========
Markdownify supports a number of options including HTML tag stripping, HTML tag conversion, Markdown heading styles, and more. Here’s an example of an HTML to Markdown conversion using the ATX Markdown header style.
from markdownify import markdownify
html = markdownify("<h1>Hello, World!</h1>", heading_style="ATX")
print(html)
## ## Hello, World!
You can also use Markdownify to convert HTML in a file. This method is useful if you’re bulk converting a bunch of HTML files into Markdown – just iterate over a list of HTML files and save them to Markdown files.
from markdownify import markdownify
file = open("./hello-world.html", "r").read()
html = markdownify(file, heading_style="ATX")
print(html)
## ## Hello, World!
As you can see, converting HTML to Markdown in Python is very simple. With the excellent Markdownify package, the conversion process only requires a few lines of code. If you have any questions about this article, feel free to reach out to me on Twitter or send me an email.