Nested emulators for posterity's sake

An oldie but a goodie: The Library of Congress has a monkey on its back. Every couple years, it has to open and re-save every doc in its 35TB collection so that the docs can be read by modern computing and modern machinery.

This is problematic and not just because it's expensive. When you convert a document, it's hard to know whether you've preserved all the parts of that doc that will be of interest to posterity — it's impossible. For example, someone may want to dig through billions of Word docs to look at the embedded spyware GUIDs to see which modern writers were reviewing each others' works. Opening a Word 5 doc and saving it in WordXP may well eliminate that information.

The solution is emulation. Write, say, a 486 simulator that will run under a Pentium III running XP. Run Win 3.1 on the virtual machine and run Word 3 under the virtual Win 3.1. When PIIIs are in danger of obsolescence, write a PIII emulator to run on a G5 processor under OS X. Run XP on the virtual PIII, run the 486 emulator on the virtual XP, run Win 3.1 on the virtual 486 and so on — nested Turing machines, one inside the other.

Theoretically, this eliminates the explosion of complexity; at any time, you need only know how to emulate the last generation of technology on the current gen. While there is a possiblity that the nested emulators will introduce difficult debugging problems, an emulator that runs on a gate-for-gate simulated processor should, in theory, run perfectly (what do you do about I/O? I dunno).

It's a powerful idea. Human posterity is terribly endangered by proprietary data-formats (and doubly so by DRM technology), but by funding emulator research, the LoC can preserve posterity — just so long as Moore's Law keeps on generating CPUs that are sufficiently advanced over their predecessors that they can handily simulate them.

Of course, it's at direct odds with DRM. If I simulate your "trusted computer" in a virtual machine, I can bend the laws of time and space as far as the simulated computer goes — like a brain in a jar with a wire running off its stem, it doesn't have any way of distinguishing those responses that are explicitly generated from those that are "real."

The MPAA's Broadcast Protection Discussion Group is establishing the principal that digital media technologies should be made tamper-resistant (read: no emulators, no open source) so that you can't intepret the "protection" as damage and route around it.

I predict a major collision between the Copyright Office and the copyright industry in the coming months — let's hope posterity wins.

Link

Discuss