Google uses synonyms for query expansion in most if not all queries. As an SEO, understanding when and how synonym-based query expansion works in Google is a fundamental yet poorly understood skill for any decent SEO. In this first instalment of my query forensics series, we look at a number of related queries and draw some beginning theories on the role synonyms play in formulating search results.
When Google indexes web pages, it stores both the URLs of pages that contain a particular word and keyword position information. To better understand how exact-match terms and synonyms affect SERPs, we’ll map their positions in the target document similar to how Google might index the document using the following notation: T+ = in title; T- = stem in title; Ts- = synonym in title; B+ = prominent in the body; B- = present in the body; Bs = absent from body, but a synonym is present; url = in URL; A = in backlink anchor text (when not otherwise present on page). PA and DA stand for Page Authority and Domain Authority via SEOmoz.
Google could certainly store more information, for instance separating the keyword in the domain from the keyword URL path or mapping the keyword in hx tags and strong tags, but I don’t want this to get any more complex than it already is. Also, Google has only stated that they store information about where the term appears on a page in its posting lists, query expansion through synonyms could occur separately from the posting list that stores exact-match keyword data.
The first query we look at is Cycling Trails Netherlands (guess what I’m planning on doing for my holidays).
|Page||Search Terms||External Data|
The terms Cycling and Netherlands are obviously very important while synonyms seem to carry as much weight as Trails.
The lone page that is hyper-optimised for this query (http://en.wikiloc.com/trails/cycling/netherlands) doesn’t do very well. I wouldn’t be surprised if this site were hit by Panda and I suggest you look at it as a guide for what not to do, even though it is not as bad as many similarly over-optimized resources.
Substituting Routes for Trails (in a different browser to avoid personalised results from previous clicks) gives Cycling Routes Netherlands and it returns the same top five resources from the initial query, albeit in a different order, and six out of ten resources from the initial query appear again.
|Page||Search Terms||External Data|
“Cycling routes Netherlands” (1,300 global monthly searches via AdWords Keyword Tool) is a more popular query term than “cycling trails Netherlands” (not enough volume to display data). As a result, documents containing the term “routes” are a lot more prominent in the search results than “trails” in the first query.
What Google may be doing, is it is requesting the posting lists for each of “cycling,” “trails,” and “Netherlands;” the query processor sees that trails returns too few documents and then looks for synonyms to include more documents. If this is the case, documents included via synonyms are clearly not penalized for not containing the overly-restrictive keyword.
This doesn’t mean that synonyms are always treated equally, only when one keyword is overly restrictive.
The more common “cycling routes Netherlands” query only returns 129,000 documents. The less common “cycling trails Netherlands” query returns 61,700,000. What this shows is that query expansion only occurs when necessary and the query processor only adds synonyms when necessary (rather than adding them automatically, but not ranking them as well).
As an SEO, you should be checking query volume against the number of documents returned by Google when deciding whether to create a new page and comparing them against other synonymous key-phrases.
If a query has a low query volume but returns a lot of documents, then look for a synonym outperforming an exact-match term in the results and target that term instead.
Based on the higher number of documents returned by the lower volume query, query expansion via synonyms in Google likely works like in the diagram below.
The obvious next step is to try to figure out where the threshold for synonym-based query expansion lies and what rules govern it so that we can know when to recommend creating separate pages to target synonymous keywords, when to target multiple synonyms on the same page and what synonym to choose.