I’m not a developer by trade, but I know just enough Python to automate repetitive tasks. Today at work, I saw an opportunity to speed up a task by writing a few lines of Python. Without going into too much detail, the task involved finding all the blog posts without a table of contents section.
We have a ton of content on our blog, so going through each post manually would’ve taken forever. To speed things up, I spent a few minutes writing the Python script below. To create the list of posts to check, I grabbed all the URLs from our sitemap, formatted them into a Python list, and assigned the list to the blog_urls
variable. Finally, I ran the script. A few minutes later, I had a complete list of all the blog posts that don’t have a table of contents section.
import requests
from bs4 import BeautifulSoup
from multiprocessing import Pool
blog_urls = [(LIST OF URLS)]
def check_toc_status(url):
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, features="html.parser")
if soup.find_all("aside", {"class": "toc"}):
print(f"OK: {url}")
else:
print(f"NO: {url}")
if __name__ == '__main__':
with Pool(8) as pool:
result = pool.map(check_toc_status, blog_urls)