Reliable Storage

The emergence of low-power archival, petabyte-scale storage systems and novel flash-based systems motivates tradeoffs between erasure encoding schemes. While new erasure codes are being developed within both the coding theory and storage systems community, no clear winner has emerged from the performance, reliability and space efficiency tradeoff space. Obviously, both the underlying storage system and application have a huge effect on how to encode data for fault tolerance. Thus, each scheme must be thoroughly analyzed for use in a particular system. We divide erasure codes into three classes: linear MDS codes, XOR-based array codes and XOR-based flat codes. Linear MDS codes, such as Reed-Solomon codes, exhibit optimal space-efficiency and flexible fault tolerance, but turn out to be computationally expensive in practice. Most array codes are space-optimal and less computationally expensive than Reed-Solomon, but are only able to sustain a fixed-maximum number of faults. Finally, XOR-based flat codes, such as Low-Density Parity Check (LDPC) codes, are not generally space-optimal, but tend to be computationally inexpensive, facilitate irregular fault tolerance and interesting localization properties.

First and foremost, we aim to provide the proper application of erasure codes to storage systems. For instance, in the Pergamum project, we found that the use of a Product code can effectively eliminate latent sector faults. Going forward we plan to explore how the structure and layout of certain erasure codes can be exploited to save power (through fragment recovery) in an archival system and how these choices affect reliability. Much of our research is also focused on the use of non-volatile memories, such as Flash, for reliable and power-efficient mass storage. Flash memory exhibits bit error rates that are much higher than disk, while device failure rates are much lower. Our aim is to exploit this disparity and provide robust reliability mechanisms in Flash without compromising performance. We are currently exploring reliable Flash-based storage systems with Network Appliance.

Second, given the inherent structural differences between the classes of codes, we developed a robust and general framework for analyzing the reliability of erasure codes codes in an apples-to-apples fashion. We found that specific assumptions made with respect to current modeling techniques (e.g. Markov models) lack flexibility, become very complicated as fault tolerance increases and may produce inaccurate reliability estimates beyond single disk fault tolerance. To this end, we have developed a generalized simulation framework. Given a storage system configuration and device failure characteristics (whole device and block-level), we are able to compare the reliability of array codes, flat codes and MDS codes for any given system configuration. In addition, previous analytic models assumed homogeneity among the devices; our framework can handle heterogeneous devices. We have recently performed a study that shows how code fragment layout affects reliability when a system contains devices with heterogeneous failure rates.

Status

In addition to our primary research projects, we have also developed large-stripe erasure codes called Disaster Recovery Codes and techniques for efficient Galois field multiplication. We are also currently working on ramp-like schemes for secure information dispersal. We have developed software packages for Galois field arithmetic, Reed-Solomon erasure codes and solving Markov models. We are currently in the process of releasing all three packages along with Python modules used to efficiently examine the recoverability of fragments generated by XOR-based codes in a power-managed system.

We have also recently begun a project in fault isolation. On a mixed-user, mixed-application system, long rebuild times after a data loss event are easier for some projects to absorb than others. With group-based data layout, local faults are constrained to a smaller number of working sets, potentially increasing effective system availability.

On the flash front, we have worked on designing a reliable flash aware file system that uses Reed-Solomon coding for error correction and an algebraic signature for error detection. Unlike other flash file systems that use a spare area to store a error correcting code, this file system uses data pages to store redundancies for better correction. Only the signature of a page is stored in the spare area so that it can quickly check the validity of a page.

We are currently interested in a design of system that uses byte-addressable non-volatile memory (NVRAM) as a replacement of DRAM. In this system, guaranteeing the reliability of the data in NVRAM is especially important, because corrupted data in memory will not disappear after a power cycling. We are thinking about designing a transparent layer for NVRAM that is in charge of error correction and data persistency management.

Publications

Last modified 26 Oct 2011