A fatal flaw? Google’s inability to recognise stolen content

Is Stolen Content a big SEO issue?

We at Pi Datametrics have conducted pioneering research on the effects of stolen content, with some pretty eye-opening results showing that content thieves can succeed. The answer, therefore is yes: stolen content is a big SEO issue, and there are some major flaws with how Google deals with it.

Part 1. Unexplained drops in SERPs

Recently we were flummoxed as to why one of our customers was losing their positions in the SERPs. Journeys by Design, a boutique African Travel Company, invests huge amounts of time and money into researching and creating unique content for their site. But as you can see in the figure below, Journeys by Design were subject to unexplained drops in the SERPs, despite their strong, quality content.

What was going on?Pi Datametrics - Google is unable to detect Stolen Content

Pi Datametrics Position Explorer Chart: Journeys by Design: Mountain Gorillas Nest: Google UK

Journeys by Design SERP drop

We investigated further and identified two sites which had replaced JBD’s original positions on the day following the drops.

Chart: Daily results for Friday 7th Feb

Journeys by Design replaced in SERPs

Journeys by Design were originally position 20 of the SERPs for this quite random, long-tail search term preceding their first drop, but were usurped by gorillaexpeditions.com (as seen above) who moved up 57 positions to replace them. Journeys by Design copied by Gorillaexpeditions.com

Gorilla expeditions had (perhaps inadvertently) created a near carbon copy of JBD’s content.

Chart: Daily Results for Thursday 19th Feb

Journeys by Design replaced in SERPs by selfdriverwanda.com

Journey by Design was replaced in the SERPs a second time (Feb 19th), by another site: selfdriverwanda.com. This domain moved up 82 places just by using the same first few sentences of JBD’s content. This just goes to show that even the slightest bit of duplicated content on Google can negatively impact the original.

Chart: Showing performance of 5 separate sites for a single search term

Journeys by Design affected by stolen content

In the end we found multiple sites using the same or very similar content to JBD.

While two sites managed to usurp JBD in the SERPs for a given time period, two others fluctuated beneath it.

 

Part 2. Testing the impact of stolen content in Google

This got us thinking:  If these sites are performing better, despite having less links, less rich content, and ultimately having a “worse” site, does that mean that anyone can use content (innocently or otherwise), and win even if their site is weaker? This is why we had to conduct our own tests.

Pi Datametrics stolen content

Stolen content Test 1: Econsultancy v Our blog

We duplicated content from Econsultancy; copying an entire blog post verbatim onto our blog. We chose Econsultancy because they are strong, they write about very similar topics, and we thought we could never beat them with all their links, comments and social shares. We also knew that Econsultancy’s content was regularly taken and plagiarised, so this was another important factor and something the publication wanted to see for themselves.

The Methodology of the Test:

  1. Ask permission to take content from Econsultancy
  2. Host verbatim content on our sister site Intelligent Positioning (or marked as “IP” in the charts)
  3. Track a collection of search terms (long tail and short tail) within the SEO tool, Pi Datametrics
  4. See the performance of the IP site, Econsultancy, Clickz and any others that show performance. We can only do this with Pi, as it has unlimited competitor tracking

We competed on three separate (short, medium, & long-tail) search terms: i) ‘PPC strategy’, ii) ‘How scalable is PPC?’, and iii) ‘Q&A Jared Field on PPC strategy’

 

Pi/IP replaced Econsultancy on the medium-tail search term ‘How scalable is PPC?’

7th image - test 1

 

Econsultancy flipped once with Pi/IP, but maintained its positions for the long-tail search term ‘Q&A Jared Field on PPC strategy’

8th image - test 2

Stolen content Test 1. Our findings:

  • Despite Econsultancy’s original article having 44 times more social shares than ours, and even though they’ve been established for 5 years longer, we found that pages with stolen content (such as ours), can not only prosper but can in fact blow the original content out of the water, and go on to achieve even stronger positions in the SERPs.
  • On the other hand we found that, when optimising our stolen content for long-tail search terms, our site failed to hijack any positions.

 

Stolen content Test 2: ClickZ

Next, using the same methodology, we took content from ClickZ, another strong site with reams of content and, again, pasted it onto our IP blog. The search term used in this instance was ‘Online web form optimisation’.

 

Pi/IP competed with the ClickZ author’s own blog page, while Click Z maintained its positions

9th image - test 3

 

Stolen content Test 2: Our findings

We found, to our surprise, that our stolen content was competing with the ClickZ author’s own blog page – Not ClickZ itself. Whilst we both battled it out, ClickZ’s  page – with the original content – maintained its position with minimal flux (position 3). Despite this, both Pi and the author of the blog still managed to outperform ClickZ on a selection of dates – making it to position 2. So two duplicate sites actually performed better than the site with the most links, and possibly the most traffic.

 

Conclusions from both the stolen content tests

  1. Even the most established sites, with a high Page Rank and strong social presence, are not safe from the effects of stolen content.
  2. Original content optimised with long-tail, specific search terms may be less affected by stolen content.
  3. Google may prioritise the most recent piece of (stolen) content in the SERPs – age will not save you.
  4. Or… Google may prioritise it arbitrarily, based on a number of varying, unseen factors.

 

What we do know for a fact is that Google is pretty hopeless at dealing with it.

 

Part 3. How do you deal with Stolen content

What does Google do about stolen content?

You can report any stolen content to Google via a scraper report, but to complete this you first of all need to be aware of the issue, which frankly many of us aren’t.

Google’s Panda algorithm works to eliminate poor quality or duplicate copy, so you wouldn’t be a fool for thinking that it could penalise content thieves. But that’s not quite the case.

While Panda is brilliant for improving quality and overall site UX, it only tends to identify and penalise duplicate content or theming internally. As such, stolen content often goes undetected by Google, which is why curating content and measuring performance with the right tracking tools is essential.

There are many cases in which duplicating or stealing content is unavoidable (i.e. lyrics and some recipe sites), and this is perhaps a justifiable reason for a lack of stolen content penalties. It doesn’t seem that Google is quite smart enough (yet) to differentiate between malicious stolen content and unavoidable dupe content. Were penalties to be implemented today, the aforementioned sites would undoubtedly suffer a distinct lack of visibility in the SERPs.

 

How can you protect your site from stolen content?

Prevention and curation is key to SERP stability. Using Pi Datametrics enterprise SEO platform, you can successfully prevent content thieves from diluting the potency of your copy, through:

Daily Tracking

With weekly tracking any number of SERP fluxes and shifts could go unnoticed throughout the 7 days. That’s why daily tracking is imperative for getting an accurate overview of your SERP positioning.

Unlimited URL tracking

If you could only track domains, it would be very complicated to identify your content enemies.

Top 100 tracking

As we saw in the Journeys by Design example, there can potentially be a lot of content poachers in the lower pages of Google. Similarly, if you drop significantly in the SERPs you want to be able to see exactly where you’re at, to identify the issue.

Unlimited competitor tracking

You need to be able to detect any usurpers, and monitor them in future to prevent a reoccurrence of stolen content. Therefore, you have to be able to see anyone that enters the top 100 on any given day.

Access to historical data

With this you can recognise any potential threats to your content, and analyse prior conflict patterns.

 

Watch our CTO, Jon Earnshaw’s talk on Stolen Content at Brighton SEO 2015:

 

If you would like to find out more about our tests on stolen content, how google deals with this and how you can view it on the best SEO platform around, please give us a call