Automatically Generate Your Meta Descriptions Using Python and BERT

adventure alpine background black and white
Photo by Pixabay on Pexels.com

If you want a quick and dirty way to programmatically meta descriptions at scale using Python, this is the tutorial for you. Jupyter notebook, as well as a step-by-step process included. 

Here is what this article will cover: 

  • why meta descriptions are important
  • what is BERT
  • how to identify pages with missing meta descriptions and scrape site content using Screaming Frog
  • how to generate meta descriptions in bulk automatically using BERT and Python
  • how to optimize meta descriptions for better CTR
  • other automation approaches for generating meta descriptions

Meta Descriptions: Crash Course and Important Considerations

Meta descriptions are used as part of the site’s metadata, as well as shown in SERPs to provide search engine users with a brief summary of a page. 

They can be of any length, though Google truncates snippets to ~155–160 characters. Shorter meta descriptions are also permitted, e.g. ~50–100 characters, as long as they are descriptive. 

Meta descriptions don’t directly impact search rankings. They are not a built-in component of the ranking algorithm and haven’t been for many years. 

Nonetheless, they impact search rankings indirectly. By being visible in the SERPs, meta descriptions can impact the click-through ratio (CTR).

Here are some best practices from our friends at Google, regarding meta descriptions: 

  • Use the “description” meta tag by placing it within the <head> element of your HTML document.
<html>
<head>
<title>Brandon's Baseball Cards - Buy Cards, Baseball News, Card Prices</title>
<meta name="description" content="Brandon's Baseball Cards provides a large selection of vintage and modern baseball cards for sale. We also offer daily baseball news and events.">
</head>
<body>
...
  • Accurately summarize the page content

Avoid keyword stuffing. 

Don’t use generic or non-descriptive meta descriptions, such as ‘This is a blog post”. 

Don’t use blanket meta descriptions for sections of your website. I commonly see this in blog sections, where all pages have the same meta descriptions. 

  • Use unique descriptions for each page

To emphasize my previous point: Duplication is lazy. Avoid at all costs.

Unique meta descriptions help users make more informed decisions and provide search engines with more context off the bat. 

Imagine how lazy you look to a potential advanced Google Search user, who is using the site: operator to find content on your site, related to a particular product or service you offer. Not a good look.

  • Programmatically generate meta descriptions for multiple pages

If you had any doubts about the legitimacy of the proposed programmatic approach, rest assured.

If your site has thousands or even millions of pages, hand-crafting description meta tags probably isn’t feasible. In this case, you could automatically generate description meta tags based on each page’s content.

Google
  • Always provide a meta description, even though Google might sometimes ignore it

It’s no news that Google dynamically generates meta descriptions to make the SERPs more attractive to its users. Google may choose to use a relevant section of your page’s visible text if it does a good job of matching up with a user’s query. 

It’s a common misconception amongst site owners that this translates to not needing to provide pages with a meta description. Nothing could be further from the truth. 

Having a unique, high-quality, descriptive, clear, and optimized meta description is part of good site housekeeping. 

Background to BERT

Bidirectional Encoder Representations from Transformers (BERT) is a machine learning model, which uses natural language processing techniques to transform text. BERT is pre-trained by Google. 

Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, where BERT takes into account the context for each occurrence of a given word. 

This enables better contextualization of the model’s output, which considers not only the importance of the word and how frequently it’s used but also the context of the use. 

In 2019, Google Search started applying BERT models for search queries in English. In three months’ time, BERT had already expanded to over 70 languages. A year after its introduction, almost every single English-based query was processed by BERT.

Why BERT? To quote the scientists that developed it

BERT is conceptually simple and empirically powerful.

Text summarization is the task of compressing long text into short one meanwhile keeping up the central idea. This can be achieved via:

  • extractive summarization — selects the top N sentences that best represent the key points of the article
  • abstraction — seeks to reproduce the key points of the article in new words

In this tutorial, we will use the BERT extractive summarizer model to programmatically generate meta descriptions.


How to Auto-Generate Meta Descriptions Using Python and BERT (step-by-step tutorial)

Requirements

Before we begin with the tutorial, a couple of notes on preparation and necessary requirements. 

  1. Identify all pages on your site with missing meta descriptions.

There are multiple ways of doing this:

To identify pages with meta description issues using Screaming Frog

  • Crawl the website
  • Open the ‘Meta Description‘ tab
  • Export the URLs of all missing and duplicate meta descriptions 
0* JUSHm kC6lWO
Screaming Frog Meta Descriptions Export, source: screaming frog

Here are links to tutorials on doing the same thing, using: 

2. Scrape the text from the subset of pages, with a missing meta description, using the exported URL list.

I’ve already written an in-depth guide about how to scrape text from URLs using Screaming frog, but here is a brief recap: 

  • Copy the selector of the element you want to extract text from.
  • Set up a custom extraction in Screaming Frog.
  • Export your data.
0*H70oUpnEIlVrSEUy
Custom Extraction with Screaming Frog, source: author

Here are some alternative ways that you can scrape content with, using a list of URLs: 

⚠️ Important note regarding file formatting:

One thing to note is the input file must be in the following format: 

  • FILE FORMAT: .CSV
  • FILE CONTENT: Column A — URL, Column B — Scraped content on the respective URL, all else — empty
  • FILENAME: inputdata.csv

Now, let’s get to business. 

3. Save the code in the same folder as the file and run it using Jupyter notebook. 

Running BERT using Python

Here is a link to the Jupyter notebook .ipynb file that you can copy and use easily on your machine. 

Let’s run through the Python code. 

First, we will import the necessary modules.

#import necessary modules
import pandas as pd
from summarizer import Summarizer

Then, we import the file using Pandas. 

#import your file
#format it with two columns - A with URLs, B with extracted page content
df = pd.read_csv("../meta-descriptions/inputdata.csv", header=0,
encoding="utf-8-sig", error_bad_lines=False)

In order to run BERT, we must first create a list to store the generated meta descriptions. 

Then, for each URL in the CSV we’ve provided, we will store the respective generated meta description. 

We then run BERT, specifying the model type (Summarizer), text for analysis, as well as the maximum text length of the summarized text. 

Finally, we store the values into the list, save them in a column in the data frame, and output the file with the meta descriptions attached as an additional column as a new csv.

# Create a list to store the MDs
metadesc = []
# For each URL in the input CSV run the analysis and store the results in the list 
for i in range(len(df)):
# Here is the bodytext TBA
body = str(df.iloc[i][1])
# BERT
model = Summarizer()
result = model(body, max_length=150)
full = ''.join(result)
print(full)
# Storing all values into the list 
metadesc.append(full)
#save stored values in a in a column
df['Meta Desc'] = metadesc
#save output
output = df.to_csv('output.csv')

…and Voila!

1*lMwASWthS5M7h5gJO1OiEQ
Output.csv will be automatically saved in the same folder as your input file and the Jupyter notebook, source: author
Generated meta descriptions are automatically filled in Column C, source: author

⚠️ Important note about the model’s capabilities to process text

One important thing to note is that BERT will only be able to generate a meta description for texts, longer than 400 words. This also corresponds with Google’s understanding of thin pages

Pages with less than 400 words will typically be flagged in the audits of tools such as Screaming Frog or Sitebulb. Ideally, you should strive on providing sufficient content on all of your indexable pages for both users and search engines to navigate your site successfully. 

We now have a file with automatically-generated meta descriptions, which we can upload to the site or start optimizing for better CTR. 


How to Optimize Your Meta Descriptions for Better CTR

1. Include the top query or the main keyword for the page in the meta description. 

You can export the top query, related to each page using a Google Sheets add-on called Search Analytics for Sheets

Provide the add-on with necessary permissions, then connect with your domain. Adjust the date and type of report. Select Group by: Page, Query.

1*0DBfL0AZkGRBeoqtBiT5PA
Search Analytics for Sheets report set-up, source: author

Then, using a VLOOKUP function, match the queries from the Search Console Export with the URLs, Content Export, and newly-generated Meta Descriptions from the previous step of this tutorial, using the URLs as a matching dimension. 

Provided that you name the Search Console report ‘GSC report’, and the file with the meta descriptions is named ‘output’, your formula will look like this: 

=VLOOKUP(A2, 'GSC report'!A:G, 2, false)

Then check if the keyword is contained within the meta description and if not (yet it is relevant to the content on the page), try to add it in a sentence. 

2. Inspire curiosity, especially for informational queries

Based on some advice from Search Engine Land, inspiring curiosity in a reader should be done especially for informational queries:

By the time a user finishes reading your description, they should be curious about what the page will say about the topic. You need to provide just enough information to explain what the page is about but not so much that it ruins the curiosity factor.

Curious which of your keywords or queries are informational? 

Check out this article about search intent keyword classification (which includes a free Data Studio Dashboard)


Other Methods You Can Use to Programmatically Generate Meta Descriptions

1. Utilizing other state-of-the-art (SOTA) research for text summarization

Hamlet Batista taught us to seek and utilize state-of-the-art (SOTA) research and code, so I can’t help but include his advice on finding other text summarization notebooks, which you might find more useful than this approach.

The benefit of this approach is that you can experiment with different scripts in order to find the best-performing one for this task. This can be quite challenging for beginner Pythonistas though.

2. Automatically generating meta descriptions using a Google Sheets add-on

Another method is to use is a Google Sheets add-on, called Meta Descriptions Writer.

This approach uses a similar methodology as the one used in the tutorial presented above. A crucial difference is instead of scraping content from the website, you just need the URLs for the summarizer to run as a function in your Sheet. Here is a YouTube tutorial detailing the steps.

Limitations

There is a caveat for this option — you will only get 25 free meta descriptions upon installing the add-on, all else will be paid

Another thing I noticed with this approach is that the output quality is less great than the suggested Python-based approach. Here are a couple of examples: 

For this page, BERT generated this meta description: 

Today, we’re one step closer to introducing a full ecosystem for publishing and exchanging presentations.

While the Meta Description Writer generated this one: 

We use cookies to analyze site performance and deliver personalized content. By clicking Agree you consent to the storing of cookies. You can revoke your consent at any time. Learn more about this in our privacy policy.

As you can see, as the latter dynamically visit the URLs, wherever there are pop-ups, it will fail to access the page.

Personally, I would recommend a more controlled approach for generating meta descriptions, using Python. 

3. Using a Streamlit app (only suitable for individual text snippet summarization).

Cheetan Ambi has shared the code for a really cool Streamlit app, which gives the user a choice between which model to use — BART or T5 transformer. 

You can find his full tutorial with Python code in this article.

This approach is amazing for when you want to summarize individual snippets of text, using the clean and tidy UI of streamlit.

It would not be suitable for bulk text summarizing unless the code is modified. In addition, it requires the app to be deployed.


Takeaway

Generating meta descriptions is a really boring task, yet one that every site owner or SEO must do at some point. Python and models such as BERT enable a quick-and-easy solution to this typically repetitive and time-consuming task.

There are different approaches you can take to programmatically generate the meta descriptions on your website, but the most important takeaway from this article should be that you must generate them. Period.

Feel free to engage in the comment sections about other tools you’ve come across that can achieve this task effortlessly.