Anyone who blogs great content, or even not-so-great content, knows that their posts are going to end up on numerous other blogs around the Internet within days, sometimes minutes, of when they’re published.
Anyone can easily install WordPress, skin it with a custom theme, and grab a couple plugins that will go out and “scrape” content from other blogs to publish on their own blog.
You CAN greatly reduce the content scraping of your blog!
There are a number of ways to approach content scraping of your blog posts, from doing nothing to blocking scrapers IP by IP via HTACCESS (not recommended).
Doing nothing about content scrapers — the easiest approach!
Yes, some, like Chris Coyer of CSS-Tricks, recommend doing nothing.
Chris says that instead of going to war with the scrapers, “you could spend that time doing something enjoyable, productive, and ultimately more valuable for the long-term success of your site.”
He goes on to list the reasons your content will always do better in searches that the hijacked content, i.e., because your blog:
- is on a domain with more trust;
- published that article first;
- is coded better for SEO than theirs;
- is better designed than theirs;
- isn’t at risk for serious penalization from search engines.
All of the above may be true, but I found that in some instances sites that had copied my articles were ranking higher than my original content! Although I think this may have been a temporary glitch due to Google’s March 2011 “Panda” update (a primary goal of which was to improve detection of scraper sites, but hit a lot of quality sites!), it still motivated me to do whatever I could to eliminate the copying and duplicate publishing of my blog posts.
Proactive Approaches to Dealing with Blog Content Scraping
If, like me, you want to try to greatly reduce the incidents of your posts being copied wholesale on other blogs, there are ways that will definitely help you get this done.
- Ping Google & Other Search Engines and RSS Feed Sites when you Publish
This notifies the search engines and RSS site-updating services of your post, ensuring your content gets indexed first.
WordPress has a built-in way to list the sites you want to ping when you publish your post. In your WordPress admin, navigate to:
Settings > Writing > Update Services (near the bottom)
Just enter the URLs for the services to ping.
The WordPress Codex offers a list of site-update services to ping. This site has a much more comprehensive list (If you use this list, be sure to remove the spaces before the URLs and have just single-line breaks between each URL.)
- Try to contact the offending blog owner
Sometimes the scraping is done by a human who honestly doesn’t think there’s anything wrong with copying your post. I’ve had two situations where I’ve contacted the owner and asked, civilly but forcefully, to remove my copied content, and they removed it!
Unfortunately, most of the content scraping is done by bots and there’s no one actually “manning” the blog. So this approach is not really the best, unless there’s a human repeat offender, and sometimes they’ll just ignore you.
- Include links to other posts on your blog
In a YouTube video, Matt Cutts of Google’s Search Quality group says that “If you make sure that the pages on your site link to you … then if someone scrapes you they might end up linking to you. To the extent that that’s a successful scraper or a successful spammer, those links will help you along.”
Of course, more sophisticated scrapers or scraping software will probably remove all your links. Looking on the bright side, this is probably the only benefit of having your content scraped.
- Publish only a “summary” feed of your post instead of the full post
Content scrapers very often grab your post content from your RSS feed. One way to curtail this approach is to publish only a summary of your feed. The user then clicks the hyperlinked article title to read the entire article on your blog.
If you use WordPress, go to: Settings > Reading…
And set “For each article in a feed, show” to “Summary”.
At least this way, your blog content won’t be delivered wholesale right to the scraper’s doorstep!
Opinions vary on RSS Feed Summary vs. Full Article
blogger Kristi Hines points out that she prefers to publish the full RSS feed because some readers don’t like having to leave their RSS reader.
Kimberly Castleberry points out that there are ways in RSS Readers for users to force the entire article rather than a summary.
However, I would prefer that users click over to our blog to read our posts, primarily because it provides the ability to comment, to share using the Like, Tweet, Google +1 and LinkedIn buttons, and possibly to view other content on our website or blog.
Because we tend to post articles that provide vauable information, I’m confident that our readers will be willing to click outside the comfort of their RSS reader.
- Disable Pingbacks and Trackbacks on your Blog.
Pingbacks and Trackbacks notify you that other blogs have linked to your article.
Although it’s good to be informed when this happens, it certainly isn’t necessary for your readers to see these backlinks, and with many blog themes they are displayed after your post. In the dashboard there is an incoming links area where you can see backlinks to your posts. And you can also view backlinks via Google Webmaster Tools.
Unfortunately, verification of Trackbacks isn’t reliable and they have been exploited by the Parasite Class to create backlinks to their websites with no value to yours.
Worse, Google might crawl your article, see a backlink to the spammy site, visit the spammy site and see no link to your article, and determine that they’re the original author!
Note that when you disable Pingbacks and Trackbacks on your blog, it only affects subsequent posts, NOT existing posts! Syed Balkhi has written a tutorial on disabling Pingbacks and Trackbacks on existing posts.
Disabling Pingbacks and Trackbacks doesn’t prevent the Parasite Class from stealing your content, but at least it removes the ability to also create a backlink to their content on your blog!
- Add an RSS Footer to your RSS Feeds.
Syed Balkhi of WPBeginner.com recommends that users add a line of text at the end of each post in their RSS — called an “RSS Footer”. You can do this by adding the Yoast SEO WordPress plugin, which lets you add a specific text at the end of all your posts when viewed in RSS Feeds, or you can code it yourself using Syed’s tutorial.
Syed recommends adding a link back to your original article and blog, which let’s Google know that you’re the original source of the article.
- File a Digital Millennium Copyright Act (DMCA) complaint
Google’s Matt Cutts also recommends filing a DMCA complaint.
You can also request that Google remove content from its index.
And, finally, there’s….
Google’s Great New Method to Assess Content Authorship — the rel=”author” Attribute!
Google, of course, wants to do whatever it can to minimize or eliminate content scraping, as it pollutes their search index with duplicate content. This was probably the primary driver behind the late-February 2011 Google’s “Panda” update.
In June 2011, Google announced a new way to ascertain original authorship — the HTML attribute rel=”author”. NOTE: You need to have a Google profile for this method to work.)
Preferred Method: Adding the rel=”author” attribute with HTML or by Modifying your WordPress CMS
Step 1: Add rel=”author” to your author-page link. Every blog post has an author credit and the author’s name is usually hyperlinked to an author bio.
If your blog has multiple authors each with their own author bio page, or you’re the only author and have an author bio page, you can modify the file “single.php” (assuming your theme has this file). Look for this bit of code:
For our blog, which uses the Fusion theme, our modified file looked like this:
<?php printf(__('by %s on','fusion'),'<a href="'. get_author_posts_url(get_the_author_ID()) .'" rel="author" title="'. sprintf(__("Posts by %s","fusion"), attribute_escape(get_the_author())).' ">'. get_the_author() .'</a>') ?>
The above adds the rel=”author” attribute to all links to author pages.
If this is too technical or you can’t modify the links to the author pages, then just have each author add a link to his/her author page at the bottom of each post, making sure to include the rel=”author” attribute:
<a href="http://www.myDomain.com/blog/author/MyName/" title="My Author Page" rel="author">My Author Bio</a>
Step 2: Add link to Google profile from author page. On each author page, add a link to the author’s Google profile (assuming they have one), and include the rel=”me” attribute.
The link to your Google profile will look something like this:
<a href="https://plus.google.com/xxxxxxxxxxxxxxxxx/posts" rel="me">My Google+ Profile</a>
https://profiles.google.com/[your Google ID]/about" rel="me">My Google Profile</a>.
Alternate Method: If you are unable to add the rel=”author” attribute and/or don’t have an author bio page
Matt Cutts and Othar Hansson provide a solution for those who can’t add the rel=”author” attribute or modify their WordPress CMS (video).
On each post, just put a link to your Google profile and add the parameter
?rel=author to the URL:
<a href="https://plus.google.com/xxxxxxxxxxxxxxxxx/posts?rel=author">My Google Profile+</a>
Once you have done one of the above, then you must complete the authorship loop by linking from your Google profile to your blog. IMPORTANT: Make sure you add the “+” to your anchor text, as above!
Step 3: Completing the Authorship Loop – Link to your Blog from your Google or Google+ profile. First, log in to your Google account:
After logging in you’ll see your name on the right side of the black bar. Click it and then click the “Profile” link. This takes you to your profile.
If you have a Google+ account, then on your profile page, you click the blue “Edit Profile” button:
The red bar will appear at the top, indicating you are in Edit mode. You will see this on the right side of your profile:
Click where the links are, next to the globe icon. If there are no links yet, you will see “Contributor to”.
Click the globe or “Contributor to” to edit.
Click “Add custom link.” For each link, there’s a field for the link name and one for the URL. When you’ve added your blog link, click “Save”.
If you don’t have a Google+ account
It’s pretty much the same drill. Edit “Contributor to” links to add a link to your blog.
This completes the authorship loop!
How to see who’s scraping your blog content
There are several ways to find out who’s scraping your blog content. Rajasekharan N. of MT Herald has an excellent article specifying four ways to do this:
- AdSense Allowed Sites
Google AdSense’s Allowed Sites feature allows you to specify the sites or URLs on which you wish to have your Google ads displayed. Google: “If a URL displaying your AdSense ad code is not on your Allowed Sites list, ads will still be displayed, but impressions and clicks won’t be recorded, advertisers won’t be charged, and you won’t receive any earnings for that URL.”
- FeedBurner Uncommon Uses
FeedBurner, Google’s free Web-feed management service, flags uncommon uses of your feed via the “Analyze” tab. After logging in to your FeedBurner account, you’ll see in the Analyze tab: Feed Stats > Uncommon Uses:
Read more about how to use the Uncommon Uses info on Google’s FeedBurner Help site.
- Google Webmaster Tools, Links To Your Site
Google Webmaster Tools reports the sites linking to your posts. You should check these every so often for suspicious linking patterns.
- Search Google for a Specific Phrase
Do a Google search for a specific and unique phrase from your blog article, surrounded by quotes so that Google returns only exact matches.
For example, from this article, might search this phrase:
“I’ve had two situations where I’ve contacted the owner and asked, civilly but forcefully, to remove my copied content, and they removed it!”
Any sites returned in this search will have most likely copied my post, as the likelihood of using that exact phrase is so low.
What Do YOU Think of Content Scraping?
As I said, you’ll probably not be able to halt content scraping altogether but you can at least minimize it AND try to get some backlinks from the Parasite Class.
Let me know in the Comments how you’ve dealt with this issue.
- Jeff Starr’s Perishable Press Article
- Excellent MT Herald article on Scraper Sites
- Syed Balkhi of WPBeginner.com on Turning Off Pingbacks and Trackbacks
- Matt Cutts and Othar Hansson show how to use rel=”author” to establish authorship.
- Google Help Article on Authorship
- Mark Horrell’s Article on Pros and Cons of RSS Feed Summaries vs. Full Article
- Google’s Panda Algorithm Update Annoucement