We’ve conducted pioneering research on the effects of stolen content, with some pretty eye-opening results which show that content thieves can succeed.
Stolen content is a big SEO issue, and there are some major flaws with how Google deals with it.
Recently we were flummoxed as to why one of our customers was losing their positions in the SERPs. Journeys by Design, a boutique African Travel Company, invests huge amounts of time and money into researching and creating unique content for their site. But as you can see in the figure below, JBD were subject to unexplained drops in the SERPs, despite their strong, quality content.
We investigated further and identified two sites which had replaced JBD’s original positions on the day following the drops.
Journeys by Design were originally position 20 of the SERPs for this quite random, long-tail search term preceding their first drop, but were usurped by gorillaexpeditions.com (as seen above) who moved up 57 positions to replace them.
Gorilla expeditions had (perhaps inadvertently) created a near carbon copy of JBD’s content.
Journey by Design was replaced in the SERPs a second time (Feb 19th), by another site: selfdriverwanda.com. This domain moved up 82 places, just through copying one paragraph of JBD’s content. This just goes to show that even the slightest bit of duplicated content can negatively affect the performance of original content.
In the end we found multiple sites using the same or very similar content to JBD.
While two sites managed to usurp JBD in the SERPs for a given time period, two others fluctuated beneath it.
This got us thinking: If these sites were performing better, despite having less links, a lack of rich content, and despite ultimately displaying a ‘worse’ site, did that mean that anyone could dupe content (innocently or otherwise), and win rankings – even if their site was weaker?
We decided to take matters into our own hands.
We duplicated content from Econsultancy; copying an entire blog post verbatim onto our blog. We chose Econsultancy because they are well-known, they write about very similar topics, and we knew they’d be hard to beat, as they have a strong digital legacy, with thousands of backlinks, comments and social shares. We also knew that Econsultancy’s content was regularly copied or plagiarized, so this was another important factor, and was something the publication wanted to see for themselves.
We competed on three separate (short, medium, and long-tail) search terms:
i) ‘PPC strategy’
ii) ‘How scalable is PPC?’
iii) ‘Q&A Jared Field on PPC strategy’
Next, using the same methodology, we took content from ClickZ (another strong site with reams of content) and pasted it onto our IP blog. The search term we used in this instance was ‘Online web form optimisation’.
To our surprise, we found that our stolen content was competing with the ClickZ author’s own blog page – not ClickZ itself. Whilst we both battled it out, ClickZ’s page (the original) maintained its position with minimal flux (position 3). Despite this, both IP/Pi and the author of the blog still managed to outperform ClickZ on a selection of dates – making it to position 2. So it would seem that two duplicate sites actually outperformed the site with the most links, and most traffic.
What we do know for a fact is that Google is pretty hopeless at dealing with it.
You can report any stolen content to Google via a scraper report, but to complete this you first of all need to be aware of the issue, which frankly many of us aren’t.
Google’s Panda algorithm works to eliminate poor quality or duplicate copy, so you wouldn’t be a fool for thinking that it could penalize content thieves. But that’s not quite the case.
While Panda is brilliant for improving quality and overall site UX, it only tends to identify and penalize duplicate content or theming internally. As such, stolen content often goes undetected by Google, which is why curating content and measuring performance with the right tools is essential.
There are many cases in which duplicating or stealing content is unavoidable (i.e. lyrics and recipe sites), and this is perhaps a justifiable reason for a lack of stolen content penalties. It doesn’t seem that Google is quite smart enough (yet) to differentiate between malicious stolen content and unavoidable dupe content. Were penalties to be implemented today, those sites would undoubtedly suffer a distinct lack of visibility in the SERPs.
Prevention and curation is key to SERP stability. Using Pi Datametrics enterprise SEO platform, you can successfully prevent content thieves from diluting the potency of your copy, through:
With weekly tracking any number of SERP fluxes and shifts could go unnoticed throughout the 7 days. That’s why daily tracking is imperative for getting an accurate overview of your SERP positioning.
If you could only track domains, it would be very complicated to identify your content enemies, as it could be buried away in other parts of the site.
As we saw in the Journeys by Design example, there can potentially be a lot of content poachers in the lower pages of Google. Similarly, if you drop significantly in the SERPs you want to be able to see exactly where you’re positioning to identify the issue.
You need to be able to detect any usurper and monitor them to prevent a reoccurrence of stolen content. Therefore, you have to be able to see anyone that enters the top 100 on any given day.
Using historical data, you can recognize any potential threats to your content, and analyze prior conflict patterns.
If you would like to find out more about identifying and remedying stolen content with Pi Datametrics, don’t hesitate to get in touch.