Getting started with Machine Learning in SEO as a beginner
This is my beginner’s guide to machine learning for SEOs, which I presented to WTSFest 2022. I want to highlight that this guide is highly conceptual, as opposed to tactical, however, if you scroll to the bottom you will find all the resources you’ll need to get you started.
I also would like to acknowledge the influence of the Women in Tech SEO community, as well as Areej AbuAli in the creation of this guide as it was first presented as part of the WTSFest 2022, in London. The guide is a write-up of the talk’s main points, however, if you would like to check the slides as well, you can do so below.
This guide will not touch upon textbook definitions, for the most part, but will aim instead to put things into relevant context and give you only the information you truly need to get going. I will also not make a case that ‘Every SEO needs to learn Python’. That’s not necessarily the case, especially with the rise of no-code alternatives, as well as the plethora of tools that can enable you to start using pre-trained machine learning models with your data.
Let’s dive in.
Challenges of getting started with Machine Learning in SEO
A study done by the brilliant Shelley Walsh for Search Engine Journal shows that about a third of SEOs spend most of their time on keyword research, and another third has reported spending most of their time on on-page optimizations.
Despite all the tools that we have at our disposal, most processes are fairly manual. Even when a new process comes along, we struggle to find the time to fit it in. But I’m here today to argue that there are other factors at play here.
So let’s talk about the common challenges in getting started with machine learning and SEO.
Limiting Beliefs
Do you recognize any of these statements?
- “You have to be a Python / coding / math / data science expert to start using machine learning”
- “You have to know what each machine learning algorithm does in order to start.”
- “You need to have plenty of free time to learn from scratch in order to start.”
- “It is difficult / unattainable / scary / unnecessary to start with machine learning”
These types of thinking are very common not only in the SEO community but in the entire data science and machine learning community of why people fail to get started with machine learning. And they are not new by a long shot – they’ve been pestering data science enthusiasts for a long time!
Below, I’m listing some solutions for these three things, inspired by an article by Jason Brownlee, founder of Machine Learning Mastery:
- Waiting to Get Started? – If you are waiting to get started, search machine learning in ten minutes. Select a tutorial, follow along. Start small but do something today.
- Awaiting Perfect Conditions? – If you are waiting for the perfect conditions to start, like the perfect laptop or a clear schedule, maybe even the perfect data or project that aligns with your goals… I’m here to tell you – that timing might never be right, and the best time to start is today. Start small, build a habit, and track your progress.
- Struggling or tried and failed? – If you have tried and failed, that is hard – I get it. Cut the scope and change the direction – start small and remain consistent. But get back into it.
Noticing a pattern? Always start small.
This is the best way to overcome your fears of getting started with machine learning.
Lack of Context
The second thing that might be holding you back is a lack of context. Often you might hear things like:
“just think about things to automate.”
-everyone, that always knows how to automate things
But to automate things, you need to know what is possible via automation, especially in time-constraint circumstances. Because otherwise what happens is you’re going to get error, after error, after error trying to execute some things, wondering why your code is not working and you’re going to get frustrated, you’re going to try, you’re going to fail.
And it turns into this quote:
The right mindset to get started with machine learning in SEO
Often there is no need to reinvent the wheel, especially as a beginner.
You don’t need to know a ton of theory.
You don’t need end-to-end custom-built, autonomous, automated solutions that work with the click of a button.
You don’t need to spend a ton of time.
Aim to drive value and have fun with the process. These are the two most important things, because if you aim to drive value, you will know exactly the places in a particular deliverable, where your roadblocks are, where you are struggling. If you know that particular place, if you are able to dissect the task and make it more oriented towards advanced solutions, then you will save yourself a ton of time and you’re going to drive value because you’re going to be more efficient. And if you’re having fun while doing it, you’re going to do it more often and the opportunities will start popping up like crazy – ways that you can incorporate machine learning, ways that you can find to build automation into the way that you’re doing things.
So, knowing what model to use, how to find it, and to implement it quickly, and how to drive value via machine learning is the perfect way to start. Context becomes everything.
Let’s start with when to search for machine learning in SEO.
When to search for Machine Learning in SEO
Starting with the basics, in this section, we’re going to talk about task characteristics, solution characteristics, and data characteristics and how to use them when you’re encountering a task.
Task characteristics
In the most basic sense, machine learning tasks split into supervised and unsupervised, which on their own split into these four most common groups. Here is what they are and what they mean.
- Supervised machine learning – means you have labelled data to validate results
- Regression – making predictions
- Classification – split into groups, based on existing classes
- Unsupervised machine learning – means you don’t have a way to validate the model’s output
- Clustering – identify patterns and group data points based on similarity
- Dimensionality reduction – simplify or transform your data
Beyond the basics, you might find other things like this amazing flowchart, created by brilliant Karen Hao:
So if this is a flow chart that goes through the field of AI, in the following Q&A format:
- If it’s looking for patterns or in massive amounts of data, then yes, absolutely it is machine learning. If it’s not, then it’s not machine learning.
- If it is being told what to look for then it’s supervised machine learning and if it involves deep neural networks then it’s much deep learning, which can be both supervised or unsupervised.
- If it’s not being told what to look for, but instead it’s trying to reach an objective or through trial and error then it’s reinforcement learning. So reinforcement learning is another category of unsupervised machine learning where the model learns every time it makes an error so it learns from the output of the previous run and it incorporates it in the next one.
- If it does not involve learning through trial and error then it’s just unsupervised machine learning.
Data Characteristics
It’s also important to know what type of data you will be using for your machine learning.
Is your input data textual numeric or image-based? This is an important question to ask because it will very much define the type of model that you will search for.
For instance, you might want to use numeric data from organic clicks or sessions for a prediction. Or you might want to classify the pages in your blog into categories, in which case the input data will be textual – the text on the select pages. You might also want to generate alt text for your images or classify them into groups of similarity, in which case the input data will be image-based.
Solution Characteristics
Talking about the solution now. How do you know if a machine learning solution is good for your task? Let me take you through a simple Q&A.
- If your task is mission-critical, don’t rely on machine learning. – Often, the technology you will be using as a beginner is pre-trained, and often not built with the training data that we, SEOs use, meaning that it will sometimes fail.
- If your results should remain consistent every time, then don’t rely on machine learning. – Especially when using unsupervised machine learning, what you might find out is that the output differs each time you run the model, even if you’re running it with the same type of data set
- Must the results remain easy to understand and relate to the stakeholder than if they are? In that case don’t go into deep learning, because the workings behind it are going to be very difficult to deconstruct and understand for the average Joe like you and I.
- If it’s okay that the model outperforms on average existing methods, then that is when you can take a look at machine learning options.
So going through this flow chart, you can very easily identify when and how you can use machine learning as part of your solution.
ML characteristics and understanding your bottom line
The takeaway from this section is that there are different models and before embarking on implementing machine learning, you must go through an assessment of it using multiple factors:
- The insights that the model with providing
- The complexity of the implementation – not only the execution part but the interpretation of the results, too
- The scalability of the model – can you replicate this across a portfolio of clients? Can you replicate it with a smaller or bigger data set?
- The assets that you have, or otherwise the data characteristics, quantity, and quality – a model’s performance is as good as the data you give it.
- The accuracy of the model you’d be using – it’s very easy to find out what accuracy a specific model achieves. And you can find this data in databases like Google Scholar, or even in tutorials where people have applied the particular model to their own data sets and they will share the accuracy they’ve achieved. Depending on the task’s importance, you will know whether this is the correct model for you to use or not.
- What resources are available for this type of task already online? If you are limited on a time frame and you are a beginner, it might be very difficult to do something with machine learning if there aren’t that many resources available for this particular task.
So combining all of these assessments, you can come up with the bottom line, or otherwise is machine learning a good solution for this particular task or not? And going through this kind of checklist is something that you can do very quickly. Once you start doing it, you’ll find out that some tasks are suitable for machine learning and others are not. And that’s perfectly okay.
Let’s put things into context.
How to search for machine learning for SEO tasks
Keep queries specific to the data, task, and solution, and let’s look at a couple of examples from Jason Brownlee, again. He says a query can look like this:
“ Find a model or procedure that makes the best use of historic data comprised of inputs and outputs in order to skillfully predict outputs given new and unforeseen inputs in the future.”
This is an example of a forecasting task. But the way that this query is structured provides a guiding direction, or otherwise – a mission statement of what you’re trying to achieve with your project.
Another example is a “model or procedure that automatically creates the most likely approximation of the unknown underlying relationships between inputs and associated outputs and historic data.”
A typical on-page optimization project might include several mini-projects like writing meta descriptions, optimizing titles, or maybe even writing meta descriptions. Let’s go through the processes mentioned
- How to find ML solutions for Writing Meta Descriptions
- What is your input data? – Textual, it’s the page content
- Is the task supervised or unsupervised? – It’s unsupervised because we have no way to validate the results.
- What type of model are you looking for? – Transformational, i.e. page content to page summary using sentences from the page in less than 160 characters, but it can also be generative, i.e. writing them from scratch.
- Is it mission-critical? – No, several studies have shown meta descriptions are not critical.
- Are different results okay? – Yes.
- Is an explanation of the process needed? – Not really.
- Does it outperform average methods? – Yes, much faster.
So, your query might be: python script meta descriptions, but it might also be python script text summarisation unsupervised.
- How to find ML solutions for Title and H1 Optimisations
- What is your input data? – Textual, it’s the page content
- Is the task supervised or unsupervised? – It’s unsupervised, again.
- What type of model are you looking for? – Transformational, i.e. page content to page summary using sentences from the page in less than 160 characters, but it can also be generative, i.e. writing them from scratch.
- Is it mission-critical? ? This is why this question is crucial. It is very debatable. Some people have successfully done this like Hamlet Batista and his script for title tag optimization. However, for authoritative websites like HMRC, or any other YMYL sites we’ve seen how a title rewrite can actually suggest something that is not good or reflective on the page.
- Are different results okay? – Again, debatable.
- Is an explanation of the process needed? – Actually, it kind of is.
- Does it outperform average methods? – Yes, faster, but not necessarily always better.
So, the query here might be: title tag optimization machine learning unsupervised.
- How to find ML solutions for Image alt tag generation
- What is your input data? – Images with missing alt text
- Is the task supervised or unsupervised? – Unsupervised
- What type of model are you looking for? – Image recognition, generative
- Is it mission-critical? – No.
- Are different results okay? – Yes.
- Is an explanation of the process needed? – Not really.
- Does it outperform average methods? – Yes, much faster and scalable.
The query here might look like: image alt text generation unsupervised machine learning, but it can also be image recognition caption generation ML Python script .
An on-page optimization audit might also involve things like predicting traffic or revenue, based on the presence of a keyword in the title, or on historic performance. It might also involve internal link analysis, keyword research, or schema mark-up implementation.
And I know you might be thinking: “Surely a script cannot do all that?”
My answer is no. One script can not.
But a few certainly can. Remember what we said in the beginning – You don’t have to have an automated end-to-end solution in order to implement machine learning. The SEO community has already had creators and builders suggesting solutions of how machine learning can be implemented in all of these mentioned contexts.
Adding value doesn’t necessarily mean a fully automated autonomous solution. Incremental improvements can lead to a compounding effect.
How to start practicing machine learning in SEO
- Install Anaconda, pip, and set up Google Colab
- Install the main machine learning libraries, using pip via the Terminal or Anaconda’s interface
- Get your foot through the door with some daily ten-minute practice.
- Then get on with your role as you usually do.
- When you encounter a task that matches some of the characteristics that we discussed, an idea will pop up. ?
- Deconstruct each task that you encounter.
- Assess whether machine learning is the correct approach to solve the problem.
- Gather the data or adjust it as needed for the selected model you’ll be using
- Build or test existing scripts, libraries, and tools that you found
- Assess the results, scrutinize the output, and compare them with another output of a similar task.
- Document the journey in public, if possible, and build your deliverable.
Sometimes what you may find is that machine learning is not part of your deliverable, even though you have tested it as an option. But trust me, this is something that is going to make your argument stronger when you’re delivering an output, an audit, or any other type of solution.
Best practice tips for getting started with machine learning as an SEO
Find a Buddy
Finding someone that you can practice machine learning with, or joining an online community of like-minded tech-curious folk can be a game-changer for your mindset on learning. It will also help you with accountability for your projects and daily (or weekly) practice.
If you find it difficult to learn or find time to practice, try finding a coach, mentor, or buddy to practice with. Forming a one-to-one bond, based on mutual interests can be helpful for generating ideas, but also for troubleshooting and advice. There are some pretty cool people and online communities that are thriving, where SEO and ML are discussed daily.
Find the right tribe.
Jason Brownlee talked about finding your machine learning tribe, and not only that but finding the right tribe for you. I wholeheartedly agree with his sentiment.
As an SEO, you will likely fall into either the business tribe or data tribe, less you purposefully want to transition to a more dev-focused role.
This means that branching outside of your tribe will likely be discomforting and de-motivating at best, and entirely uninspiring at worst.
If you ever feel like that, just try finding communities that can better support your knowledge of the field, your experience, understand your needs, and speak your language.
There are many different types of machine learning tribes, but where we, as SEOs, fit in the most is either in the following two:
- Business tribes
- Business people with general interests
- Managers trying to deliver a particular project
- Data tribes
- data scientists interested in getting better answers to business questions
- data analysts interested in better explaining data.
Finding the correct tribe will actually enable you to connect with people that are very like-minded, face the same challenges, and are similar in the way that they go through tasks and in the way that they perceive their day-to-day activities. Doing that will also enable you to actually have a more relaxed relationship with failing in public, as you will not be scrutinized for lacking a particular knowledge or skill.
Set reasonable expectations
When embarking on your machine learning journey, set reasonable expectations not only for yourself but also for the output that you’re going to encounter.
Managing expectations about what ML can and can’t do will help you know when to apply it.
While it may be easy to get into the thinking that machine learning can help with all the projects you have, and can be easily scaled across your entire client portfolio, the reality is different. No algorithm is perfect, and most algorithms are not designed to solve problems, specific to SEO.
Hence, we must adapt not only our thinking in how we frame problems, but also how we interpret the solution. In my eyes, machine learning algorithms, tools, and libraries are always an ally, as opposed to a replacement.
You need to know what machine learning can do and cannot do and when it can help you and when to apply it. Of course, this comes with practice, but that is why encountering daily exercises is going to help you reach that goal a little bit faster.
Acknowledge and overcome your limiting beliefs
Don’t let limiting beliefs keep you from experimenting.
Collaborate. Build relationships.
Ask for help when needed. Developers as well as more experienced builders would be happy to give you a helping hand.
Test, test, test.
And finally, test scripts, tools, and software as you see it pop up. Go through your ‘backlog of cool things to try’.
Dissect the scripts you encounter and try to understand them. Find out what you like and dislike about them. Note best practices and after you do that, building your own script from tools is going to become much easier.
Fired up? Start with these ML tools and resources for SEOs for some quick wins.
You can check a selection of Colab notebooks from Britney Muller for SEOs, as well as the contributions of various SEO Pythonistas to the community.
Below I will link to a few beginner-friendly no-code web apps for some quick wins:
- Keyword Clustering Streamlit app by Lee Foot and Charly Wargnier
- LDA Topic Modelling using a Web App by Cornell University
- N-Gram Text Analyzer by Gred Bernhardt
- Entity Analyzer by Gred Bernhardt
- GPT-3 Content Generator app by Charly Wargnier
- GPT-3 App for FAQ Generation, Title optimisation & more by Andrea Volpini and Wordlift team
Check out the work of Ruth Everett, Michael Van Den Reym, Charly Wargnier , Daniel Heredia, Moshe Ma-Yafit,
Gerg Bernhardt, Hamlet Batista, and Andrea Volpini.