David Patterson of RISC and RAID fame, writes in Scientific American about his latest project, Recovery-Oriented Computing. The ROC approach accepts "that computer failure and human operator error are facts of life. Rather than trying to eliminate computer crashes–probably an impossible task–our team concentrates on designing systems that recover rapidly when mishaps do occur."
The researchers proopse four principles for the construction of "ROC-solid" systems:
"The first is speedy recovery: problems are going to happen, so engineers should design systems that recover quickly. Second, suppliers should give operators better tools with which to pinpoint the sources of faults in multicomponent systems. Third, programmers ought to build systems that support an "undo" function (similar to those in word-processing programs), so operators can correct their mistakes. Last, computer scientists should develop the ability to inject test errors; these would permit the evaluation of system behavior and assist in operator training."