reach us
34 Paradise Road,
Richmond Upon Thames,
Surrey, UK TW9 1SE

T. +44 (0)844 264 2960
E. .(JavaScript must be enabled to view this email address)
email us
* Required field

Sentient Search

Andreas Voniatis | January 23, 2013

Inside Machine Learning for SEO

Alchemy Viral - Sentient Search - Inside Machine Learning for SEO.lh copy

Every day, many hopeful search engine optimisation (SEO) experts attempt to conquer the seemingly impossible: reroute the Google algorithm backwards to deliver those ever-important higher search rankings. Machine learning is that extra process push in the form of a structured, analytical tool that allows for greater understanding of SEO insight and an awareness of underlying search engine algorithms. As is the case with many structures, however, there is always a weak point. Here, we look at the strengths and weaknesses of Machine Learning Systems (MLS) in SEO.

The Blueprint

Machine learning (ML) refers to how a software-based algorithm is made to automate the learning process behind something normally judged or decided by humans. ML enables learning and decision making processes to be scaled up, and the analysis of vast data sets with hundreds of underlying variables is possible. In order to make this possible, it is necessary to collect large data sets and analyse them statistically in one of two ways. These methods can be broadly separated into regression and classification.
Regression relates to the prediction, or forecasting of real outcomes; a hypothesis is generated given the output of a learning algorithm on a set of gathered training data. A typical use of this might be the prediction of how a search engine ranking of a site is likely to be affected given the increased or reduced inclusion of certain web site content.
Classification, on the other hand relates to an arbitrary assignment of elements within a data set to two or more defined types. For example the classification of the demographic of a web site user as a young, up and coming hipster or a tech-savvy grandmother.

Many SEO firms subscribe to the philosophy of “Why construct a team of engineers to track queries and check each algorithm when we can just write a program for it?” While the time gain is desirable, the technical process is vastly complicated.

At a basic level, a Machine Learning System ideally works as follows:

•The MLS will train itself, by comparing inputs variables to outputs on a finite training set comprising a large amount of data gathered from a web site. This “training” data may be labeled, or unlabeled (i.e. unstructured).
•The search engine begins to compile data, while simultaneously adjusting itself.
•Received feedback will then raise or lower the importance of certain parts of the system.
•Based on the hypothesis provided by its newly-trained learning algorithm the MLS will be able to make a future prediction, or classify gathered data elements into certain types.

Ghosts in the Machine

Much like any other machine, from the automobile to the iPhone, SEO machine learning systems are not perfect.

Mechanistic flaws with a web-based machine learning algorithm include:
• Unawareness of constantly changing algorithm factors.
• The addition of new variables over time such as social signals, visitor data, links, and bounce rate. However, this can be partially addressed by structuring the problem as “unsupervised”- i.e. with no pre-defined labels for the underlying factors.
• Search engine changes.
• Rapid formula changes.
• Multiple algorithms operating at different times in different parts of the world.
• Inability to monitor external influences (for example user fads, trends and global events, unrelated to the beahviour of search engine algorithms). These simply aren’t encapsulated in data gathered from a web page.

Simply put, machine learning is a good way to determine a more rigorously-educated guess of future site user behaviour, or to classify the users themselves into categories. However it will never reverse-engineer the Google algorithm nor provide a “be all, and end all” how-to guide on increased search rankings.

Depending on your point of view, this news is either a blessing or a curse. If you consider great search content to be the name of the game, then you win out.

Applying Machine Learning to SEO

For those with sites with rotating content, machine learning can be used effectively for the monitoring of search engine algorithms’ actions.

Documenting favouritism of certain behaviors over others is useful for enterprise sites such as newspapers or e-zines with discounts. Useful variables to monitor include:
•Aesthetic web design changes.
•Landing page content.
•Content links.
•Technology in use.

Machine learning for SEO is also effective during static search engine updates.

During these updates, your site (and those in the same query space) may have non spurious data that can be used to evaluate factors that have positive and negative impacts.


Google and Russian search engine Yandex are among the many currently using machine learning systems. It may not be perfect and the continuously changing algorithm factors adds new variables into the equation, but machine learning allows for a better idea of which content arrangement increases rankings.

1. Alchemy Viral - Link Research Tool Data: The Missing Analysis
2. How Search Engines Use Machine Learning for Pattern Detection
3. - The Theory of Deep Web Interferometry
4. Alchemy Viral - Machine Learning Websites

Andreas qualified as a management accountant (ACMA) after graduating in Economics with honours from Leeds University. In 2003, pursued a career in Search Engine Optimisation (SEO) and has since held various Head of Search roles for award winning agencies including Infectious Media and prestigious startups. In 2010, Andreas became an independent consultant to international agencies and brands worldwide providing SEO consultancy services and online PR, including Exxon Mobil, Tesco, HSBC, Zurich, Quorn as well as startups including Discount Vouchers. His work has been featured in the Telegraph and Search Engine Watch particularly for reverse engineering the Google Penguin algorithm to a 98% statistical confidence level in 2013.