Vitanuova for 2004 October 7 (entry 2)

< Anthropologists on free software
Google Print vs. Amazon Search Inside the Book >

To further protect your book content, printing and image copying functions are disabled on all Google Print content pages.

Similarly:

We've put a number of measures in place to prevent the downloading, copying, or printing of your content [...] Pages displaying your content have print, cut, copy, and save functionality disabled in order to protect your content.

I'm surprised at how much effort Google went to here. I would have expected my browser not to be vulnerable to having any of its "functionality disabled", yet, with a recent Firefox, I found that I couldn't

  1. print the page to a PostScript file,
  2. right-click on the page at all,
  3. save the page to disk (the image would somehow not be downloaded at all),
  4. view the precious image in Page Info/Media (although I could see which image it was),
  5. save the precious image in Page Info/Media,
  6. find the precious image in the DOM Inspector (which seemed like the really heavy artillery), although the DOM Inspector did let me see its URL as part of an uninterpreted style definition, and seem to reveal the trick: defining a style called ".theimg", with the definition
    { background-image:url("http://print.google.com/long url with cryptographic signature"); background-repeat:no-repeat; background-position:center left; background-color:white; }
    and then invoking that style inside a <div> tag:
    <div class=theimg><img src=images/cleardot.gif width=575 height=928 class=border></div>

So I tried turning off JavaScript, and I found that I was essentially no better off: right-clicking caused a copy of cleardot.gif, not the .theimg background, to be saved to disk. For some reason, Save Page As.../Web Page (complete) still declined to download the background image at all, even in the absence of JavaScript, as if perhaps the CSS parser in the display logic in Firefox is smarter than the CSS parser in the Save Page As... code.

The two ways I've found so far that work to capture images from Google Print are a screen capture (I used xwd, which of course worked perfectly) and looking in the on-disk cache (ls -lrt .mozilla/firefox/default.*/Cache/[0-9A-F]*). I'm still puzzled about why Page Info and the DOM Inspector won't actually reveal the image referenced in the .theimg style or allow it to be saved.

If you wanted to write a proxy that would make Google Print pages capable of being saved to disk, you would presumably want to match

background-image:url("http://print.google.com/\([^"]+\)")

(although you'd need to be careful to match only the one in the definition of ".theimg", because it looks like there may at least one other background-image:url) and then replace

<div class="theimg"

with

<div class="x"

and somewhere nearby (I'm not sure how many tags up you'd need to go) insert a plain old

<img src="http://print.google.com/$1">

I haven't tried this because it felt like too much work relative to the previous two methods.

Contrary to what I expected, Google Print does not seem to check referer, so it seems to be possible merely to extract the URL from the definition of .theimg, and then to load it directly. Perhaps that will change in the future.

Google must have hired some experts on html image protection or html obfuscation. To be sure, there are lots of other tricks in Google Print that I had never seen before. It is hard to think that the author of that HTML obfuscation was not the subject of Richard Stallman's accidental haiku. It is amusing to think that Mr. Bad's "other" DeCSS might at last be used for some kind of circumvention (although I doubt it, because presumably Google Print simply won't work at all with the CSS removed).

Google Print's version of the first page of Alice in Wonderland

Someone I know said:

The problem with "don't be evil" is that people get mad at Google when it acts like a business, instead of like the Messiah.

After an exciting slashdotting that took vitanuova offline for a while, I learned that Gervase Markham, a Mozilla developer, did a similar analysis and even found some other useful approaches. He's kept on researching this and continues to post information about Google Print on his site.

(This article has been updated.)


[Main]
Support Bloggers' Rights!
Support Bloggers' Rights!


Contact: Seth David Schoen