Scrapers are web sites that syndicate your content into their blogs (stealing) and in some cases end up outranking your site in the search engines for articles), it requires a 2 step process. First, who is it? and then 2nd how to block them in the future.
Readers should also know that this is usually done automatically using a plug in or script that “imports” your content into theirs upon postings. Google changes have helped prevent this on some levels but still its a big issue that is not fully combated. If you have a WordPress blog, there are already plugins that will do this function for the offending blog site.
How do I know if I’m being Scraped?
Usually you will discover this when doing searches for your own published articles and seeing them appear elsewhere, with same title, same content and even you same links (that usually link back to your own site as posted). You might even discover this thru Google Analytics. In short, eventually you’ll see “your complete articles” appearing in searches that are not coming from your web site
Why is Scraping an issue?
Offending Sites have been known to outrank the original sites with their own content
If the links are coming back to my site isn’t that a good thing?
Maybe for some traffic, but in short the scrapers will get caught and you don’t want that bad element linking back to you, even if no fault of your own. Essentially the harm here is duplicate content out on the web that can replace your own in the search listings. Canonicals don’t always work (FYI).
OK What to do?
FIRST: Determine the IP address of the offending site
I’ve provided 3 ways below of which you can do this, you just need the web site of the offending URL to get started
2– Tpe in domain name (Note type it in without backslash, like so : nameofwebsite.com)
3– Write down the IP Address
Do it yourself way:
1– In Your “All Programs” list on your computer
Go to Accessories–> Command Prompt (DOS window)
2– Type in :
(change nameofwebsite.com to the site you want to ping)
3– Hit Return
Use WHOIS way:
1- do a WHOIS search per http://www.networksolutions.com/whois/index.jsp
The results will come back with the sites IP address
2- If your website has a .htaccess file in its root (usually WordPress blogs do) you can block the offending site’s IP address from accessing you own SITE in the future. Just add this snippet of code to the top of your .htaccess file and SAVE / RE-UPLOAD. Make sure you change the 18.104.22.168 to the IP Address of the website you want to block. If you have more than one website IP Address, then add more lines of code
# using to block known scrapers
deny from 22.214.171.124
allow from all
THAT’S IT! – here is some more info that you should address
REPORT THE SCRAPERS!!!
I suggest you file a complaint first here
LINK TO CONTENT VIOLATION ON GOOGLE