For more than five years the LOCKSS program, part of Stanford
Libraries, has been developing, testing and deploying a peer-to-peer
system libraries can use to collect, preserve and disseminate
material published on the Web. The system is now in production
use at nearly 100 libraries around the world, and over 80
publishers who together account for over 2000 titles have
given permission for their content to be preserved. The system
is interesting from both the engineering and computer science
perspectives; the talk will cover both.
Digital preservation is a topic that has inspired a lot
of talk but very few production systems. That's because
the general digital preservation problem is important, urgent,
but inordinately hard. Even the small sub-problem that
the LOCKSS system addresses, preserving web-published academic
journals, is fraught with the kinds of difficulties that can
only be overcome with creative engineering. Libraries
have no money to pay for preservation, so the system
must be extremely cheap to acquire and operate. The
legal framework into which the system must fit, the DMCA,
is a severe constraint. The hardware and software making
up the system will become obsolete at least an order of magnitude
faster than the system's design lifetime.
To be reliable enough, the system must be highly replicated,
yet it cannot have a central locus of control vulnerable to
technical or legal attack. It must thus be a true peer-to-peer
system, and designing a fault-tolerant, attack-resistant
protocol by which the peers in such a system can communicate
is an interesting computer science problem. The LOCKSS
team has succeeded in designing a protocol that needs no long-term
secrets yet resists attacks by powerful adversaries aimed
at modifying content without detection, and at degrading the
system through denial-of-service.