News
Events
|
Protecting Against Rare Event Failures in Archival SystemsPublished as Storage Systems Research Center Technical Report UCSC-SSRC-09-03. Preliminary version of a paper that appeared in MASCOTS 2009. AbstractDigital archives are growing rapidly, necessitating stronger reliability measures than RAID to avoid data loss from device failure. Mirroring, a popular solution, is too expensive over time. We present a compromise solution that uses multi-level redundancy coding to reduce the probability of data loss from multiple simultaneous device failures. This approach handles small-scale failures of one or two devices efficiently while still allowing the system to survive rare-event, larger-scale failures of four or more devices. In our approach, each disk is split into a set of fixed size disklets
which are used to construct reliability stripes. To protect against rare
event failures, reliability stripes are grouped into larger
"uber-groups," Our calculations of failure probabilities found that the addition of
uber-groups allowed the system to absorb many more disk failures without
data loss. Through discrete event simulation, we found that adding
uber-groups only negatively impacts performance when these groups need to
be used for a rebuild. Since rebuilds using uber-parity occur very rarely,
they minimally impact system performance over time. Finally, we showed
that robustness against rare events can be achieved for under 5% of total
system cost.
Available for download:
Bibtex entry@techreport{
author = {Avani Wildani and Thomas Schwarz and Ethan L. Miller and Darrell D. E. Long},
title = {Protecting Against Rare Event Failures in Archival Systems},
institution = {University of California, Santa Cruz},
number = {UCSC-SSRC-09-03},
month = apr,
year = {2009},
}
Last modified 31 May 2009 |
||||||||
|
© 2009 SSRC & UCSC |
Home | Research | People | Publications | Seminars | Sponsors | ||||||||
| Site powered by Django |