David Weekly on pseudonymity and privacy
In an earlier post, I talked about Zooko's views on pseudonymity and privacy, including the general idea that pseudonymity is quite a lot harder to maintain that cypherpunk privacy enthusiasts originally hoped.
David Weekly sent me a very thoughtful comment on this, with a similar skepticism. David recalls that a friend happened to comment that she kept a journal on-line.
"LiveJournal?" I guessed.
"Yeah," she said, "that's right. But I don't use my name, so people can't find it."
"I can find it in 15 minutes."
[... S]ure enough, I found her pseudonym on LiveJournal in about fifteen minutes. Basically, it's a compression issue. Unless you can assume a very large shared secret codebook (certain "replacement names" for people, a la "C" = Chris the ex, "E" = Elizabeth the roommate, etc.), a certain amount of the keying and introduction has to actually be on the site. For instance, if you went to visit a specific place or had specific new interests, you probably would have to write out those things fully for the blog to be at all useful to your friends.
I would therefore posit that any public blog whose point it is even in part to reveal the life of the underlying person and their experiences is findable in a trivial amount of time, given even only a small amount of knowledge about the person. Public pseudonyms used in anything but the most academic of discussions are quickly discoverable by mentions of facts alone.
Beyond that, of course, there is the issue of "fist". As I'm sure you know, there were code listeners in the UK who would listen to transmissions from German field operators. Without even being able to decode the texts, the listeners were able to uniquely identify specific operators by the patterns of their transmissions. If this applies to banging out dits and dashes, how much more would this apply to style used in writing? Indeed, this is how the Unabomber was found out...the style was Ted Kaczyinski's and Ted's alone. So as long as one writes consistently, or even making use of a consistent set of aphorisms and analogies, one can be uniquely identified.
It's possible that automated tools will be able to scan the Net, matching well-defined personal sites and emails with public pseudonyms. The only real way around this is to either never make one's public persona public or never make the pseudonym public. The former is arguably difficult, save living as a hermit (with an Internet drop) and the latter defeats much of the point of having a pseudonym.
The "fist" idea reminds me of some of David Molnar's research on RFID privacy, where RFIDs that supposedly are privacy-protective may actually divulge persistent tracking information as a result of lower-level protocols (collision-avoidance schemes) that had not been specially designed for privacy protection.
There really is a layer-crossover problem. People rarely go to great lengths to make themselves statistically indistinguishable from other people. A pseudonym that writes only about a single topic (without making reference to life events), as Unlimited Freedom does, is better off, especially if that pseudonym writes only infrequently and at seemingly random times. But that doesn't coincide with the communications habits or preferences of very many people who might want (or think they might want) anonymity or pseudonymity.
There certainly are possibilities for mechanically rewriting texts. A machine can perform certain transformations to ensure consistency (or consistent randomness!) in certain stylistic distinctions, for example "it's" vs. "it is", "don't" vs. "do not", certain cases of passive voice vs. active voice, and so on. Pseudonymous writers should definitely use a spell check if they're not confident about their spelling or typographic abilities. (I think a persistent typo was one of the stylometric tricks that linked up pseudonymous posts in the stylometry paper that the Tor bibliography includes.) But David's observation functions mostly at higher levels, which can't be mechanically rewritten. And I think his observations are dead on with regard to people blogging about their own lives, unless they already belong to a simply vast anonymity set or make very cautious military-censor-like decisions about what they're going to include. Loose lips sink pseudonyms, but most bloggers who discuss their personal lives have nothing if not loose lips...