 |
Despite decades of research in fault tolerance,
commodity operating systems, such as Windows and Linux, continue
to crash. In this talk, I will describe a new reliability subsystem
for operating systems that prevents the most common cause of crashes,
device driver failures, without requiring changes to drivers themselves.
To date, the subsystem has been used in Linux to prevent system
crashes in the presence of driver failures, recover failed drivers
transparently to the OS and applications, and update drivers "on
the fly" without requiring a system reboot after installation.
Measurements show that the system is extremely effective at protecting
the OS from driver failures, while imposing little runtime overhead.
|
 |