Cloaking
From Eureka
Cloaking is a practice where a company gives search engine crawlers access to full-text articles - but when you try to read these articles, you get a "doorway page" demanding a subscription or payment. So, what you see is not what you get.
According to the Wikipedia:
- Cloaking is a black hat search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the users' browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content not present on the visible page. The purpose of cloaking is to deceive search engines so they display the page when it would not otherwise be displayed.
- The only legitimate uses for cloaking used to be for delivering content to users that search engines couldn't parse, like Adobe Flash. As of 2006, better methods of accessibility, including progressive enhancement, are available, so cloaking is not necessary. Cloaking is often used as a spamdexing technique, to try to trick search engines into giving the relevant site a higher ranking; it can also be used to trick search engine users into visiting a site based on the search engine description which site turns out to have substantially different, or even pornographic content. For this reason, major search engines consider cloaking for deception to be a violation of their guidelines, and therefore, they delist sites when deceptive cloaking is reported.
Nonetheless, cloaking is currently (August 2007) practiced by a number of large academic publishers with the complicity of Google Scholar. Culprits include the publishers Springer, Elsevier, Ingenta and the Institute of Electrical and Electronic Engineers.
Examples can be seen on Pierre Far's blog entry Academic Publishers as Spammers and Carl Willis' Hall of Shame.
How to tell if you can read a file returned by a Google search
Of course, you can just follow the link and see what happens. But when that proves frustrating through multiple cloaked pages, here's how you can tell at a glance while you're skimming the search results.
Compare the top result (as of 2009-04-05) of each of these two searches:
See the difference? A "View as HTML" link shows up on line 2 of the first search, which is not cloaked — but not on line 2 of the second, which is. Of course, you don't want to view the file as HTML if you've got a PostScript reader, since the HTML version is often corrupted (in formatting, images, mathematics, etc). Nevertheless, the phrase "View as HTML" is your clue that the content is not being restricted, hidden, and lied about (at least, not with the connivance of Google itself).
Sometimes normal WWW pages are also cloaked in this way; since these are already HTML files, there is no "View as HTML" option. But instead, the honest results should have a "Cached" link (but on the last line, rather than on line 2). See the "Cached" link in this example: "john baez" "how to teach stuff".
So, here's the rule:
- If Google offers you a "View as HTML" link or a "Cached" link, then you can probably read the page. (And if you can't, then you can at least follow the offered link to read Google's HTML cache, which is almost as good!)
- If there is no "View as HTML" link nor a "Cached" link, then the site must have told Google to keep the contents hidden. That doesn't prove that it will turn out to be cloaked, but something funny is certainly going on!
Toby Bartels has been using this method for a couple of years, and reports very few exceptions.[1]
References
- ↑ Toby Bartels, post to the n-Category Cafe (now lost!)
External Links
- Pierre Far, Academic Publishers as Spammers
- Pierre Far, Summary of Academic Publishers Cloaking Discussion
- Carl Willis, Web Spamming Hall of Shame

