One of my biggest pet peeves with Google Webmaster Tools is its lack of a complete API to programatically submit updated URLS for indexing. At the moment, Google’s Indexing API only allows submission of pages with specific kinds of structured data – JobPosting or BroadcastEvent embedded in a VideoObject. Hopefully Google will open up the Indexing API for all pages in the future, but I’m not holding my breath.

After some digging around, I disovered that Bing Webmaster Tools has a complete API that allows for up to 10,000 URL submissions per day. Instead of going to bed, I decided to stay up a little later to develop a script that checks my GitHub repo for content changes in the most recent commit, and submits any changed URLs to Bing. Hopefully I can use this script for Google Webmaster Tools in the future as well if Google ever opens up the Indexing API.

For this script, I used Python 3. Let’s go through the dependecies. The script uses base64, json, re, requests. I believe the only non-standard library is requests, so be sure you have that installed.

import base64
import json
import re
import requests

Next, I specified some information and credentials for GitHub. Since my Hugo repository is private, I needed to access it with a personal access token. After the credentials have been specified, gh_session creates a session and gh_session.auth authorizes the session with the provide username and token.

## GitHub Info
gh_username = "username"
gh_token = "token"
gh_repos_url = f"https://api.github.com/repos/username/repo/commits/master?access_token={gh_token}"
gh_session = requests.Session()
gh_session.auth = (gh_username, gh_token)

Next, commit gets the data for the latest commit. At this point, submission_url_list initializes a Python list with “https://brianli.com”. While this isn’t 100% necessary, I thought it wouldn’t hurt to re-submit my homepage in case the latest commit contained a new post.

## Get latest commit info.
commit = json.loads(gh_session.get(gh_repos_url).text)

## Initialize list of URLs to submit to Bing.
submission_url_list = ["https://brianli.com"]

The code block below loops through all the changed files in the latest commit, and grabs the filename. After that, I wrote a check to make sure the rest of the code only affects file in the content folder. To do this, I just wrote an if statement that checks if the first seven characters of the file path is “content” – if changed_file[0:7] == "content":. I also added a second check to filter out draft posts. Finally, I used regex to find the post slug to build the submission URL - re.findall(r"slug:\s?(\S+)". The last if statement checks whether a URL returns a 200 status code. If it does, it is appended to submission_url_list.

## Check latest commit, find changed files, check files for slug, build URL, append to URL list.
for x in range(len(commit["files"])):
	changed_file = commit["files"][x]["filename"]
	#print(changed_file)
	#print(changed_file[0:7])
	if changed_file[0:7] == "content": #Only match files in the content folder.
		r = requests.get(f"https://api.github.com/repos/username/repo/contents/{changed_file}?access_token={gh_token}")
		json_string = json.loads(r.text)
		file_content = base64.b64decode(json_string["content"]).decode("utf-8")
		if "draft:true" in file_content or "draft: true" in file_content: #Ignore drafts because they are not published.
			pass
		else:
			file_content_slug = re.findall(r"slug:\s?(\S+)", file_content)[0]
			submission_url = f"https://brianli.com/{file_content_slug}/"
			r = requests.get(submission_url)
			if r.status_code == 200: #Check URL is valid (200 response code) before appending to list.
				submission_url_list.append(submission_url)

In this last section, I specified some credentials for Bing Webmaster Tools' API. If you want to submit URLs programatically, you’ll need to create an API key in your dashboard. Finally, I built a POST request containing the list of URLS for submission. After executing, the script returns “Submission to Bing was successful” if the request returns a 200 status code.

## Bing Info
bing_submission_url = "https://ssl.bing.com/webmaster/api.svc/json/SubmitUrlbatch?apikey="
bing_api_key = "apikey"
bing_submission_urls = { "siteUrl":"https://brianli.com", "urlList":submission_url_list } #Create URL list to submit to Bing.
headers = { "Content-Type": "application/json; charset=utf-8" } #Bing response headers.

## Make request to Bing.
submission_request = requests.post(f"{bing_submission_url}{bing_api_key}", headers=headers, json=bing_submission_urls)

if submission_request.status_code == 200:
	print("Submission to Bing was successful.")
else:
	print("Submission was not successful. Please try again.")

Check the GIF below to see the script in action. After the script runs, you can see the submission URL counter in Bing’s dashboard increase from 23 to 26.

A Python script to submit URLs to Bing.

Now that the script is complete, I’m thinking about the best way to deploy it. There are a few options that come to mind.

  1. Run the script manually after each site build.
  2. Set up a cron to run the script every few hours.
  3. Deploy the script to Google Cloud Functions and instruct Netlify (my web host) to ping the trigger URL after each site build.

Yeah, I’ll probably go with the third option because it sounds the coolest.