Matt Cutts and protecting scraped content

I’m not pretending to be any closer to Matt Cutts than anyone else in the industry, but it was nice when he answered my plea, after our site got scraped (word for word) with only our brand name and contact details changed, by not one but three other sites. I was even more pleased when he asked for my ideal tools to deal with such a situation. Here are the thoughts I volunteered, and even though Matt certainly didn’t promise action of anything specific, he did appreciate the feedback and I think any other ideas would be welcome for Googleplex the melting pot. They are certainly in “listening mode” on the issue.

My main problem, with the scraped content, was not SO much with Google but moreover the resulting poor reflection on my brand. So I found myself with quite a bit of “mopping up” to do and that’s not Google’s fault. But Google might be able to save webmasters some time and worry with the following ideas:

My ideal tools for dealing with copyright theft of web content

1. Being able to report it as per without necessarily needing to post or fax. Although I can see the need to move to slow mail in these instances.

2. A “Case” list, so we can see if the problem is pending, has been reviewed or is fixed. A decision for the (legal) record would also be great, noting the date Google first indexed the original content vs the date it found the illegal content. This may help as evidence in the rare cases that get to court.

3. General advice on what to do when people use your content online.

4. The ability to see whether the domain/url has been purged from the index (as opposed to just not being listed) so I can see it is on a watch list.

5. If Google can supply a whois address, that would save some time, so we can send a “cease and desist” letter (with the obvious knowledge that the whois can be faked).

6. If Google has grabbed a postal or fax address from the site, that would be useful so we can send them a “cease and desist” letter (with the obvious knowledge that the address can again be faked).

7. A postal, fax or email address of the ISP hosting the site would be the most useful, so that we can also contact them with a more polite “cease and desist” letter. In the past we have found this to be the easiest route to eliminating duplicate content at source and it is less likely that the ISP has a fake address.

8. A tool like “similarsite:domain” command would be great, or something that tries to find content that may be IP theft.

The only perfect solution is to get the offending site content off the web, something Google can’t do of course, but Google can act as the IP owner’s friend and point them in the right direction. This helps to deflect the problem away from Google (once the URLs are flushed from the index) and helps the IP owner to stop the problem happening again at source.

I can see that giving out ISP addresses and other addresses needs to be done with some caution, but it is all publicly available and time is often of the essence, so I would suggest that if you can collate this information easily, then it is available straight away after a person has submitted a case, rather than when it has been reviewed. This allows the webmaster to take action quicker.

Matt didn’t promise anything of course! Just said the ideas were interesting. So those were my thoughts, which got pushed around the Plex. Are there any other good ideas to add to the mix?

Leave a Reply

Your email address will not be published. Required fields are marked *