Seminar: Proposal practice talk (Kevin Greenan)

Data reliability is paramount in modern storage systems. Such reliability is generally provided using erasure codes across storage devices. Until recently, most systems employed mirroring and single parity to tolerate device failures. Recent studies suggest that these techniques are not sufficient going forward. Alternatives include advanced coding techniques, which exhibit a relatively complicated structure that may be exploited to increase system reliability, performance and power consumption.

In the first part of this proposal, we describe our study on the reliability of erasure codes. We have developed a generalized framework for evaluating the reliability of an arbitrary erasure code when instantiated in a system. In the process of studying the reliability of erasure codes, we found that many traditional modeling techniques do not extend well to multi-disk fault tolerant systems, irregular codes and latent sector faults. Our framework overcomes these obstacles and allows apples-to-apples comparison between any class of erasure code. We extended our framework to study the reliability of erasure-coded fragment placement in a system with heterogenous devices, designed a metric that orders placements by reliability and used the metric to find near-optimal placements.

In the process of studying storage reliability, we found that the structure of erasure codes may be exploited to save power and reduce the impact of errors in flash memory. The second part of this proposal addresses our initial work in these areas. First, we explore reliability and power-savings in an erasure-coded, archival storage system. In addition to exploring tradeoffs among reliability and power consumption, we have designed algorithms that avoid disk activation by exploiting the structure of the underlying erasure code. Finally, solid state flash memory is assumed to have very low whole-device failure rates, but relatively high raw bit-error rates. We exploit this disparity by developing mechanisms and an analytical evaluation for ensuring reliability in flash devices.

When:
Wednesday, June 4, 2008 at 12:00 PM

Where:
E2-599

CRSS Contact:
Greenan, Kevin

Last modified 24 May 2019