Sentient Search

Andreas Voniatis, 2013-01-23

Inside Machine Learning for SEO

Alchemy Viral - Sentient Search - Inside Machine Learning for SEO.lh copy

Every day, many hopeful search engine optimisation (SEO) experts attempt to conquer the seemingly impossible: reroute the Google algorithm backwards to deliver those ever-important higher search rankings. Machine learning is that extra process push in the form of a structured, analytical tool that allows for greater understanding of SEO insight and an awareness of underlying search engine algorithms. As is the case with many structures, however, there is always a weak point. Here, we look at the strengths and weaknesses of Machine Learning Systems (MLS) in SEO.

The Blueprint

Machine learning (ML) refers to how a software-based algorithm is made to automate the learning process behind something normally judged or decided by humans. ML enables learning and decision making processes to be scaled up, and the analysis of vast data sets with hundreds of underlying variables is possible. In order to make this possible, it is necessary to collect large data sets and analyse them statistically in one of two ways. These methods can be broadly separated into regression and classification.
Regression relates to the prediction, or forecasting of real outcomes; a hypothesis is generated given the output of a learning algorithm on a set of gathered training data. A typical use of this might be the prediction of how a search engine ranking of a site is likely to be affected given the increased or reduced inclusion of certain web site content.
Classification, on the other hand relates to an arbitrary assignment of elements within a data set to two or more defined types. For example the classification of the demographic of a web site user as a young, up and coming hipster or a tech-savvy grandmother.

Many SEO firms subscribe to the philosophy of “Why construct a team of engineers to track queries and check each algorithm when we can just write a program for it?” While the time gain is desirable, the technical process is vastly complicated.

At a basic level, a Machine Learning System ideally works as follows:

•The MLS will train itself, by comparing inputs variables to outputs on a finite training set comprising a large amount of data gathered from a web site. This “training” data may be labeled, or unlabeled (i.e. unstructured).
•The search engine begins to compile data, while simultaneously adjusting itself.
•Received feedback will then raise or lower the importance of certain parts of the system.
•Based on the hypothesis provided by its newly-trained learning algorithm the MLS will be able to make a future prediction, or classify gathered data elements into certain types.

Ghosts in the Machine

Much like any other machine, from the automobile to the iPhone, SEO machine learning systems are not perfect.

Mechanistic flaws with a web-based machine learning algorithm include:
• Unawareness of constantly changing algorithm factors.
• The addition of new variables over time such as social signals, visitor data, links, and bounce rate. However, this can be partially addressed by structuring the problem as “unsupervised”- i.e. with no pre-defined labels for the underlying factors.
• Search engine changes.
• Rapid formula changes.
• Multiple algorithms operating at different times in different parts of the world.
• Inability to monitor external influences (for example user fads, trends and global events, unrelated to the beahviour of search engine algorithms). These simply aren’t encapsulated in data gathered from a web page.


Simply put, machine learning is a good way to determine a more rigorously-educated guess of future site user behaviour, or to classify the users themselves into categories. However it will never reverse-engineer the Google algorithm nor provide a “be all, and end all” how-to guide on increased search rankings.

Depending on your point of view, this news is either a blessing or a curse. If you consider great search content to be the name of the game, then you win out.

Applying Machine Learning to SEO

For those with sites with rotating content, machine learning can be used effectively for the monitoring of search engine algorithms’ actions.

Documenting favouritism of certain behaviors over others is useful for enterprise sites such as newspapers or e-zines with discounts. Useful variables to monitor include:
•Aesthetic web design changes.
•Landing page content.
•Content links.
•Technology in use.

Machine learning for SEO is also effective during static search engine updates.

During these updates, your site (and those in the same query space) may have non spurious data that can be used to evaluate factors that have positive and negative impacts.



Google and Russian search engine Yandex are among the many currently using machine learning systems. It may not be perfect and the continuously changing algorithm factors adds new variables into the equation, but machine learning allows for a better idea of which content arrangement increases rankings.

1. Alchemy Viral – Link Research Tool Data: The Missing Analysis
2. How Search Engines Use Machine Learning for Pattern Detection
3. – The Theory of Deep Web Interferometry
4. Alchemy Viral – Machine Learning Websites

  • Adam Samuel

    Very insightful post. I think this is going to become evermore important and will put the people doing it at a greater advantage of understanding, delivering measurable bottom line results.

    • alchemyv

      Thanks Adam, certainly it will help tease out statistical signals to aid good web design and avoid bad web design. No more speculation but testing of SEO ideas or theories being discussed!

  • Chris Marsh

    Nice article.

    Machine Learning can obviously work if the right tools are used, and in the right hands. But really, how hard is this approach? I’d imagine very hard/technical. Are there many solutions out there? Id be interested to read some reviews of them.

    Ultimately, wouldn’t the output show us what we already know – i.e. the SEs currently like high quality, unique content that gets shared, citations etc etc. Maybe it could show you what has performed well and extrapolate out a strategy to follow onwards, but couldn’t this be done with a little effort studying a website’s metrics anyway?

    • alchemyv

      Thanks Chris – The only solutions I know of are Bloomreach and CoVario. SEO Moz also pursue machine learning techniques which are correlated to rankings but the product is not made available to the public, only the insights on a periodic basis.

      Machine learning insights can do a lot more than confirm the adage of creating compelling and shareable content. Machine learning can uncover web design practices that are covary and correlate to search traffic. This could be further extended to conversions and certain structural elements of content from the use of rare words, co-occurence, bullet points, use of images, length of copy – the list is very long.

      Compelling content fits squarely in the domain of Online PR and has nothing to do with SEO. Our job is simply to make the content more searchable and machine learning assists heavily in that process – when we can get the data and interpret it correctly!

Previous post:

Next post: