Stolen content

We’ve conducted pioneering research on the effects of stolen content, with some pretty eye-opening results which show that content thieves can succeed.

Stolen content is a big SEO issue, and there are some major flaws with how Google deals with it.

Part 1. Unexplained drops in SERPs

Recently we were flummoxed as to why one of our customers was losing their positions in the SERPs. Journeys by Design, a boutique African Travel Company, invests huge amounts of time and money into researching and creating unique content for their site. But as you can see in the figure below, JBD were subject to unexplained drops in the SERPs, despite their strong, quality content.

Position Explorer Chart: Journeys by Design: Google UK

Journeys by Design SERP drop

We investigated further and identified two sites which had replaced JBD’s original positions on the day following the drops.

Daily top 100 results for Friday 7th Feb

Journeys by Design replaced in SERPs

Journeys by Design were originally position 20 of the SERPs for this quite random, long-tail search term preceding their first drop, but were usurped by gorillaexpeditions.com (as seen above) who moved up 57 positions to replace them. Journeys by Design copied by Gorillaexpeditions.com

Gorilla expeditions had (perhaps inadvertently) created a near carbon copy of JBD’s content.

Daily top 100 results for Thursday 19th Feb

Journeys by Design replaced in SERPs by selfdriverwanda.com

Journey by Design was replaced in the SERPs a second time (Feb 19th), by another site: selfdriverwanda.com. This domain moved up 82 places, just through copying one paragraph of JBD’s content. This just goes to show that even the slightest bit of duplicated content can negatively affect the performance of original content.

5 separate sites perform for a single search term

Journeys by Design affected by stolen content

In the end we found multiple sites using the same or very similar content to JBD.

While two sites managed to usurp JBD in the SERPs for a given time period, two others fluctuated beneath it.

Part 2. Testing the impact of stolen content in Google

This got us thinking:  If these sites were performing better, despite having less links, a lack of rich content, and despite ultimately displaying a ‘worse’ site, did that mean that anyone could dupe content (innocently or otherwise), and win rankings – even if their site was weaker?

We decided to take matters into our own hands.

Stolen content Test 1: Econsultancy v Pi/IP’s blog

We duplicated content from Econsultancy; copying an entire blog post verbatim onto our blog. We chose Econsultancy because they are well-known, they write about very similar topics, and we knew they’d be hard to beat, as they have a strong digital legacy, with thousands of backlinks, comments and social shares. We also knew that Econsultancy’s content was regularly copied or plagiarized, so this was another important factor, and was something the publication wanted to see for themselves.

Test methodology:

  1. Ask permission to take content from Econsultancy
  2. Host verbatim content on our sister site Intelligent Positioning (marked as “IP” in the charts)
  3. Track a collection of search terms (long tail and short tail) on Pi Datametrics enterprise SEO platform
  4. Visualize the performance of IP , Econsultancy, Clickz. We could only do this with Pi, as it has unlimited, daily competitor tracking

We competed on three separate (short, medium, and long-tail) search terms:

i) ‘PPC strategy’
ii) ‘How scalable is PPC?’
iii) ‘Q&A Jared Field on PPC strategy’

Pi/IP replaced Econsultancy for the ‘How scalable is PPC?’

7th image - test 1

Econsultancy maintained positions for ‘Q&A Jared Field on PPC strategy’

8th image - test 2

Stolen content Test 1 findings:

  • Despite Econsultancy’s original article achieving 44x more social shares than IP/Pi, and even though they’ve been established for 5 years longer, we found that pages with stolen content (such as ours), can not only prosper in the SERPs but can, in fact, blow the original content out of the water and go on to achieve even stronger positions.
  • On the other hand, we found that our site failed to hijack any positions when optimizing the stolen content for long-tail search terms.

 

Stolen content Test 2: ClickZ v Pi/IP’s blog v Click Z blog author

Next, using the same methodology, we took content from ClickZ (another strong site with reams of content) and pasted it onto our IP blog. The search term we used in this instance was ‘Online web form optimisation’.

Pi/IP competed with ClickZ author, while Click Z maintained positions

9th image - test 3

Stolen content Test 2 findings

To our surprise, we found that our stolen content was competing with the ClickZ author’s own blog page – not ClickZ itself. Whilst we both battled it out, ClickZ’s  page (the original) maintained its position with minimal flux (position 3). Despite this, both IP/Pi and the author of the blog still managed to outperform ClickZ on a selection of dates – making it to position 2. So it would seem that two duplicate sites actually outperformed the site with the most links, and most traffic.

Conclusions from both stolen content tests

  1. Even the most established sites, with a high Page Rank and strong social presence are not safe from the effects of stolen content.
  2. Original content optimized with long-tail, specific search terms may be less affected by stolen content.
  3. Google may prioritize the most recent piece of (stolen) content in the SERPs – age will not save you!
  4. Or… Google may prioritize arbitrarily, based on a number of varying, unseen factors.

What we do know for a fact is that Google is pretty hopeless at dealing with it.

 

Part 3. How do you deal with Stolen content

What does Google do about stolen content?

You can report any stolen content to Google via a scraper report, but to complete this you first of all need to be aware of the issue, which frankly many of us aren’t.

Google’s Panda algorithm works to eliminate poor quality or duplicate copy, so you wouldn’t be a fool for thinking that it could penalize content thieves. But that’s not quite the case.

While Panda is brilliant for improving quality and overall site UX, it only tends to identify and penalize duplicate content or theming internally. As such, stolen content often goes undetected by Google, which is why curating content and measuring performance with the right tools is essential.

There are many cases in which duplicating or stealing content is unavoidable (i.e. lyrics and recipe sites), and this is perhaps a justifiable reason for a lack of stolen content penalties. It doesn’t seem that Google is quite smart enough (yet) to differentiate between malicious stolen content and unavoidable dupe content. Were penalties to be implemented today, those sites would undoubtedly suffer a distinct lack of visibility in the SERPs.

 

How can you protect your site from stolen content?

Prevention and curation is key to SERP stability. Using Pi Datametrics enterprise SEO platform, you can successfully prevent content thieves from diluting the potency of your copy, through:

Daily Tracking

With weekly tracking any number of SERP fluxes and shifts could go unnoticed throughout the 7 days. That’s why daily tracking is imperative for getting an accurate overview of your SERP positioning.

Unlimited URL tracking

If you could only track domains, it would be very complicated to identify your content enemies, as it could be buried away in other parts of the site.

Top 100 tracking

As we saw in the Journeys by Design example, there can potentially be a lot of content poachers in the lower pages of Google. Similarly, if you drop significantly in the SERPs you want to be able to see exactly where you’re positioning to identify the issue.

Unlimited competitor tracking

You need to be able to detect any usurper and monitor them to prevent a reoccurrence of stolen content. Therefore, you have to be able to see anyone that enters the top 100 on any given day.

Access to historical data

Using historical data, you can recognize any potential threats to your content, and analyze prior conflict patterns.

If you would like to find out more about identifying and remedying stolen content with Pi Datametrics, don’t hesitate to get in touch.

Download the full deck here

Get the full deck here
Close this form

Please fill out this form

By submitting you agree to our Privacy Policy

The data-driven enterprise SEO platform

Get the visibility your content deserves with our pioneering platform, original, market-leading data and industry expert support. Sign up for a demo