OLIVE: a system for emulating old OSes on old processors that saves old data from extinction

Olive ("Open Library of Images for Virtualized Execution") is an experimental service from Carnegie Mellon University that stores images of old processors, as well as the old operating systems that ran on top of them, along with software packages for those old OSes; this allows users to access old data from obsolete systems inside simulations of the computers that originally ran that data, using the original operating systems and applications.

This is a very powerful model for maintaining access to old data formats; while modern apps are often capable of parsing old data formats, they have well-understood shortcomings. For example, buggy versions of old apps may have been able to understand the corrupt files they created, but newer programs may only parse the old data if it was written to "spec." Attempts to overcome this with "bug-compatibility" and "quirks modes" are imperfect substitutes for actually running the old code, bugs and all.

it's also a powerful rebuttal to the lazy idea that digitized data is inherently less stable than, say, print records. We often hear about how obsolete file-formats, media and computers are causing "digital decay" of our old data, but the story is much more complicated than that.

Old storage media is definitely unstable. Magnetic and optical media literally rots, delaminating and decaying. I/O devices like tape drives and disk drives go out of production, break down, get scrapped, and can be next to impossible to find, creating races against the clock to find a device to read out old media before it decays beyond use.

But once that data is on a modern hard-drive, the whole story changes. Mass storage gets vastly cheaper with every year (the rate at which mass storage is improving puts the rate of progress in computer performance and network bandwidth in the shade). Data stored on your PC or in a data-center is relatively easy to preserve: the next system you buy will have much more storage than the system it replaces — we've really reached the end of the era of "offline storage" of data that can't be accessible at all times (the exception being some very large-scale scientific experiments that generate petabytes or even exabytes on every run).

Live storage is very robust. Not only do modern drives self-monitor, automatically moving data from unreliable sectors to reliable ones, but redundant, self-healing arrays have gotten faster and more reliable — and with storage being so cheap, backups have gotten more robust and commonplace than ever.

It's true that printed records don't usually require special equipment to read back, and high-quality paper is stable for hundreds or even thousands of years. But paper burns, it can't be (readily) encrypted, it's hard to back up (especially hard is maintaining concurrent, offsite paper backups that are geographically isolated from wars, natural disasters, etc). Live data can be cheaply instantaneously mirrored on servers all over the world, in an encrypted state that allows you to maintain the privacy and integrity of the data, even when the entity hosting a backup copy can't be trusted. Cheap paper and cheap ink rots just as fast (or faster) as cheap optical and magnetic media — but paper doesn't automatically sense when it is starting to fade or crumble and rewrite its contents onto pages that are in better shape.

The thing paper has that data has historically lacked is an execution environment. With visible light and a flat surface, you can read documents from the age of the Enlightenment. But until recently, reading files generated for the Apollo missions or even the Apple ][+ was a tricky business.

That's why emulation is so important: emulation does for file formats what mass online storage did for storage media, hitching it to the screaming price/performance curve of computing, doing away with the delicate and imperfect business of figuring out how to parse formats designed by dead people for computers that don't exist any more.

As exciting as Olive is, it's not perfect. The operating systems and applications needed to parse old data are tangled in copyright thickets. Though the code involved has no commercial value (most code exhausts its commercial life in years, or at most, decades) software attracts the same copyright that literary works enjoy: 90 years for works "created" by corporations, life plus 70 years for works created by "natural humans." These terms mean that Olive risks enormous copyright damages if it is widely offered, so it is only available to small group of insiders.

There's no easy way to break through this thicket. There is no reasonable economic rationale for software copyright terms in the 100-year range. As a Microsoft VP for Software once candidly admitted, Microsoft would pay its programmers to make exactly the same amount of code if software's copyright term was 10 years as if it was 100 years.

In the meantime, there are plenty of living, breathing copyright proprietors (and scientists, academics, and everyday users) whose data (and games, and art) is stuck inside proprietary file formats that can only be accessed if the copyright thicket can be cleared — if they can convince the absentee inheritors of the assets of long-defunct corporations (or the managers of thriving businesses who have more urgent issues than the licensing of 30-year-old OSes) to help them.

What else can Olive do? Maybe you're wondering what tools businesses were using shortly after Intel introduced the Pentium processor. Olive can help with that, too. Just fire up Microsoft Office 4.3 from 1994 (which thankfully predates the annoying automated office assistant "Clippy").

Perhaps you just want to spend a nostalgic evening playing Doom for DOS—or trying to understand what made such first-person shooter games so popular in the early 1990s. Or maybe you need to redo your 1997 taxes and can't find the disk for that year's version of TurboTax in your attic. Have no fear: Olive has you covered.

On the more serious side, Olive includes Chaste 3.1. The name of this software is short for Cancer, Heart and Soft Tissue Environment. It's a simulation package developed at the University of Oxford for computationally demanding problems in biology and physiology. Version 3.1 of Chaste was tied to a research paper published in March 2013. Within two years of publication, though, the source code for Chaste 3.1 no longer compiled on new Linux releases. That's emblematic of the challenge to scientific reproducibility Olive was designed to address.

Carnegie Mellon is Saving Old Software from Oblivion [Mahadev Satyanarayanan/IEEE Spectrum]

(via /.)