How to Find Broken Links With Python

I was fixing some broken links on our blog at work when I decided it would be fun to make my own broken link checker. It didn’t end up being very complicated at all, and I’m glad that I no longer need to open a web browser and navigate to an ad-infested website to check if a page has broken links.

Here’s the code below if you want to use it.

import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor

def get_broken_links(url):

	# Set root domain.
	root_domain = domain.com
	
	# Internal function for validating HTTP status code.
	def _validate_url(url):
		r = requests.head(url)
		if r.status_code == 404:
			broken_links.append(url)
			
	# Make request to URL.		
	data = requests.get(url).text
	
	# Parse HTML from request.
	soup = BeautifulSoup(data, features="html.parser")
	
	# Create a list containing all links with the root domain.
	links = [link.get("href") for link in soup.find_all("a") if f"//{root_domain}" in link.get("href")]
	
	# Initialize list for broken links.
	broken_links = []
	
	# Loop through links checking for 404 responses, and append to list.
	with ThreadPoolExecutor(max_workers=8) as executor:
		executor.map(_validate_url, links)
		
	return broken_links

BrianLi.com

BrianLi.com

How to Find Broken Links With Python