I was astonished the other day by the following experience:
I booted an LNX-BBC test image and
started a test download of something using
BitTorrent.
BitTorrent was downloading into a RAM disk (actually a tmpfs, which is
kind of like a dynamically-sized RAM disk) with a maximum size of
about 128 MB.
I expected the download to run out of space at some point, since the file
I was trying to download was about 1 GB, much larger than the RAM disk.
After a little while, I went to check on it by looking at how much
disk space had been used.
The system said that only about 50 MB had been downloaded. Then I
remembered that BitTorrent preallocates space for the files it
downloads. I couldn't understand how the download had even been
able to start. The
BitTorrent
FAQ says that
BitTorrent pre-allocates the entire file when your download
begins, then writes in pieces in random order as it gets them.
As a result the file jumps to its full size immediately.
BitTorrent will tell you when the download is complete.
If this was the case, how could a BitTorrent download of a 1 GB
file even start on a 128 MB RAM disk? Furthermore,
how could the system claim that only 50 MB had been used?
My confusion was compounded when I went to look at the size of
the preallocated file, and ls reported it as occupying 1 GB.
Nick, who was visiting, explained that Unix supports sparse files
and a file's size in the filesystem may be substantially larger
than the amount of space it's actually taking up. When BitTorrent
allocates a complete file's size on a Unix filesystem, it will
only use a trivial amount of actual storage, and the amount of
storage used will increase as the download progresses.
I found this totally astonishing. I'm familiar with sparse files,
but I always thought they were a VMS thing and never realized
that they've been a standard part of Unix for a long time. I
don't know how I missed that.
The basic consequence of this is that the file size reported by
ls -l can be totally different from the file size reported by du.
Blocks not yet written will just not be allocated on disk, and
reading them will return zeroes.
Here are four problems:
- you have a collection of strings G which were generated by
some unknown formal grammar; find a formal grammar which generates them
- you have a collection of strings G which were generated by some
unknown formal grammar and a collection F which could not be generated
by that grammar; find a formal grammar which generates them
- you have a formal grammar G and a string S; determine whether S
could have been generated by G
- you have a computer program P and a string S; determine whether,
for some input, S could have been generated by P
(Another twist is to replace "grammar" by "Unix regular expression" and
"generated by" with "matches".)
I would find these problems more straightforward to talk about if I had
taken a compilers course, or an automata course, but I never got that far
in CS. I did study formal languages only briefly, and I once read
Minsky's Computation: Finite and Infinite Machines, which
talks about the different classes of automata.
(1) and (2) are easy because you can just say that the grammar is
( string1 | string2 | string3 | string4 | ... ). This is a trivial
solution and not very useful -- it has no economy and no
predictive power. It's tantamount to saying that the laws
of physics are "All observed phenomena occur and all non-observed
phenomena are impossible". (What's amusing is that this isn't even
true, partly because some observed phenomena do not actually occur.)
It would be nice to have a more interesting solution.
(4) is as hard as the Halting Problem, but it is solvable for
some programs. You can see quickly that it isn't solvable
in general (even if you didn't know that the Halting Problem is
unsolvable and if you just had an intuition that there is no magical
way of solving all math problems with the same technique). You could
just write a program which verifies whether something is an
exception to some conjecture. If you can tell whether "yes" is ever
an output of that program, you can tell immediately whether the
conjecture is true.
(3) is pretty interesting, and it can actually be solved. The
regular expression version is a standard part of many programming
languages. The efficiency of a solution is also an interesting
question.
I regret that I didn't get to go on
this
hike, because it was beautiful.
The recordings on Out There Live are very good. Some of them
are better than the studio recordings on some other Dar Williams CDs.
I have to recommend this CD very highly.
I went to Stacey's with Zack, at the end of a fairly unsuccessful
series of errands.
I got the 2nd edition of Friedl's regular expression book from
O'Reilly, which might shed some light on my problem (3) above, in
that it discusses implementation issues related to RE matching.
Friedl's book is one of the most useful books ever published by
O'Reilly, but it's not particularly well-known. But it is a triumph.
I also got a Chomsky book on the Vietnam War because of the publisher's
successful attempt to suggest that it had a new relevance or resonance
today.
I also got a copy of Practical Cryptography, the latest in
the series of "Schneier books that criticize earlier Schneier books".
(So Bruce Schneier is like Ludwig Wittgenstein and Cliff Stoll --
interesting company.)
"What you say is very fine, Adso, and I thank you.
The order that our mind imagines is like a net, or
like a ladder, built to attain something. But afterward
you must throw the ladder away, because you discover
that, even if it was useful, it was meaningless. Er
muoz gelîchesame die leiter abewerfen,
sô er an ir ufgestigen . . . . Is that how you
say it?"
"That is how it is said in my language. Who told you
that?"
"A mystic from your land. He wrote it somewhere, I
forget where. And it is not necessary for somebody
one day to find that manuscript again. The only
truths that are useful are instruments to be thrown
away."
(Umberto Eco, The Name of the Rose)
(Er muss sozusagen die Leiter wegwerfen, nachdem
er auf ihr hinaufgestiegen ist.)
Er muss diese Sätze überwinden, dann sieht
er die Welt richtig.
(Ludwig Wittgenstein, Tractatus Logico-Philosophicus
6.54)
I repeat: it suffices that a book be possible for it to exist.
Only the impossible is excluded. For example: no book can be a ladder,
although no doubt there are books which discuss and negate and
demonstrate this possibility and others whose structure corresponds to
that of a ladder.
(Jorge Luis Borges, "The Library of Babel")