<D <M <Y
Y> M> D>

Google recently changed the language in its consumer FAQ from "To further protect your book content" to "To protect the publisher's book content", which makes me wonder if the consumer FAQ language was originally cribbed from an earlier FAQ for publishers. (Obviously, the materials for publishers were written earlier, because publishers have been submitting materials to Google Print for some time.)

Another FAQ answer says:

In order to enforce content viewing limits, we must keep track of page views by users. We do not associate any of your searches, or the specific pages you view, with personally identifiable information about you, such as your name or address. We're only concerned with the number of pages you've viewed in the particular book you're looking at. As always, we strongly encourage you to read our Privacy Policy (and everyone else's) to be fully informed about how your confidentiality is protected.

Other people were wondering whether this is done with a cookie, with IP addresses, or perhaps with some other tracking mechanism Google's discovered (cf. Martin Pool's meantime). I have yet to do any experiments about this; I'll probably try some browsing with Tor when I get a moment. (That reminds me of the oppositional geolocation issue; I know from Nitke work that commercial geolocation providers claim to be trying to identify IP addresses of proxies in order to allow them to be blacklisted. I've made a note to post later on about different threat models for proxy systems, but I won't go into that any further here.)

Another question is whether Google can avoid collecting PII if it uses particular enforcement methods. We know that PII collection is often in some sense inadvertent. I've heard lots of people -- not just Google -- talk about how they were not going to collect PII for various applications, but they often ended up collecting things from which PII could be deduced (in case of a court order, for instance). I expect to be doing a lot more research about this general problem, not in connection with Google Print, but for an EFF project on data retention. A lot of people have too-clever methods for "not collecting PII" that actually aren't, at least if the threat model is a court order demanding production to a technically knowledgeable person.

The Google Privacy Policy (not yet specifically updated for Google Print) says:

Google collects limited non-personally identifying information your browser makes available whenever you visit a website. This log information includes your Internet Protocol address, browser type, browser language, the date and time of your query and one or more cookies that may uniquely identify your browser.

I realize it's conventional in much of the Internet industry to say that an IP address is non-PII, as Google does; still, "Internet Protocol address [...] date and time" have often been sufficient to identify an individual when combined after the fact with ISP records, as in the RIAA file-sharing cases (first by means of 512(h) prelitigation subpoenas and later by means of subpoenas in John Doe lawsuits). (For various reasons, this identification process sometimes produced inaccurate results; that's one reason I say "have often been sufficient" rather than "have always been sufficient". I'm assuming here that it's sufficient often enough to raise a privacy concern.) That means that this information isn't PII to Google, in the sense that it doesn't allow Google alone to identify someone personally, but it is PII in a more absolute sense (in that there is a foreseeable way that it might be used to identify someone personally). This can be contrasted with pure demographic information, which, under certain assumptions, couldn't be used by anybody to personally identify an individual. An IP address is far from being purely demographic!

Amazon.com's privacy policy is much worse; it doesn't even claim to try to avoid collecting PII, but rather plunges right into the question of with whom the collected PII may be shared. Interestingly, Amazon suggests using anonymizing technologies (although its list is way out of date -- it includes ZKS, which has totally abandoned the consumer privacy market). Using Amazon's Search Inside the Book anonymously is an interesting question; I understand you have to log in to do it, although I don't think Amazon verifies any of the information you provide when you create an account.

I wasn't able to use Google Print through Anonymizer.com, although it seems to me that this was Anonymizer.com's fault (for mangling Google's HTML badly) rather than Google's (since it appears Google did not refuse to serve Google Print pages to Anonymizer.com).

Of course, Julie Cohen wrote an entire law review article, a modern classic, on whether copyright would lead to the erosion of anonymity for readers. "A Right to Read Anonymously: A Closer Look at 'Copyright Management' in Cyberspace", 28 Conn. L. Rev. 981 (1996). This has been an on-going theme in her work; see also her working paper "Normal Discipline in the Age of Crisis". It's a little funny to look at the footnotes in "A Right to Read Anonymously", because it was written before the DMCA and before any large-scale deployment of DRM, so a lot of the things that have become concrete were then speculative (and not all of them developed quite as Prof. Cohen imagined. But it would also have been speculative to say in 1996 that "we must keep track of page views by users" for copyright reasons; now Google has said it.


[Main]
Support Bloggers' Rights!
Support Bloggers' Rights!


Contact: Seth David Schoen