RSS: Events
|
News
|
Papers
News
Events
››› Complete list of events
|
Reliable Storage
Faculty
Students
Associates
Alumni
Sponsors
Description
The emergence of low-power archival, petabyte-scale storage systems
and novel flash-based systems motivates tradeoffs between erasure
encoding schemes. While new erasure codes are being developed within
both the coding theory and storage systems community, no clear winner
has emerged from the performance, reliability and space efficiency
tradeoff space. Obviously, both the underlying storage system and
application have a huge effect on how to encode data for fault
tolerance. Thus, each scheme must be thoroughly analyzed for use in a
particular system. We divide erasure codes into three classes:
linear MDS codes, XOR-based array codes and XOR-based flat codes.
Linear MDS codes, such as Reed-Solomon codes, exhibit optimal
space-efficiency and flexible fault tolerance, but turn out to be
computationally expensive in practice. Most array codes are
space-optimal and less computationally expensive than Reed-Solomon,
but are only able to sustain a fixed-maximum number of faults.
Finally, XOR-based flat codes, such as Low-Density Parity Check (LDPC)
codes, are not generally space-optimal, but tend to be computationally
inexpensive, facilitate irregular fault tolerance and interesting
localization properties.
First and foremost, we aim to provide the proper application of
erasure codes to storage systems. For instance, in the Pergamum
project, we found that the use of a Product code can effectively
eliminate latent sector faults. Going forward we plan to explore how
the structure and layout of certain erasure codes can be exploited to
save power (through fragment recovery) in an archival system and how
these choices affect reliability. Much of our research is also
focused on the use of non-volatile memories, such as Flash, for
reliable and power-efficient mass storage. Flash memory exhibits bit
error rates that are much higher than disk, while device failure rates
are much lower. Our aim is to exploit this disparity and provide
robust reliability mechanisms in Flash without compromising
performance. We are currently exploring reliable Flash-based storage
systems with Network Appliance.
Second, given the inherent structural differences between the classes
of codes, we developed a robust and general framework for analyzing
the reliability of erasure codes codes in an apples-to-apples fashion.
We found that specific assumptions made with respect to current
modeling techniques (e.g. Markov models) lack flexibility, become very
complicated as fault tolerance increases and may produce inaccurate
reliability estimates beyond single disk fault tolerance. To this
end, we have developed a generalized simulation framework. Given a
storage system configuration and device failure characteristics (whole
device and block-level), we are able to compare the reliability of
array codes, flat codes and MDS codes for any given system
configuration. In addition, previous analytic models assumed
homogeneity among the devices; our framework can handle heterogeneous
devices. We have recently performed a study that shows how code
fragment layout affects reliability when a system contains devices
with heterogeneous failure rates.
Status
Past research focused on reliability mechanisms and their analysis in
very large storage systems. Most of this research was carried out
within the object-based storage project and influenced some of the
reliability mechanisms in the Ceph distributed file system.
In addition to our primary research projects, we have also developed
large-stripe erasure codes called Disaster Recovery Codes and
techniques for efficient Galois field multiplication. We are also
currently working on ramp-like schemes for secure information
dispersal. We have developed software packages for Galois field
arithmetic, Reed-Solomon erasure codes and solving Markov models. We
are currently in the process of releasing all three packages along
with Python modules used to efficiently examine the recoverability of
fragments generated by XOR-based codes in a power-managed system.
Publications
2009
-
Jehan-François Pâris,
Ahmed Amer,
Using Shared Parity Disks to Improve the Reliability of RAID Arrays,
Proceedings of the IEEE International Performance, Computing and Communications Conference (IPCCC),
December 2009.
-
Jehan-François Pâris,
Ahmed Amer,
Darrell D. E. Long,
Thomas Schwarz,
Evaluating the Impact of Irrecoverable Read Errors on Disk Array Reliability,
Proceedings of the IEEE 15th Pacific Rim International Symposium on Dependable Computing (PRDC09),
November 2009.
-
Yangwook Kang,
Ethan L. Miller,
Adding Aggressive Error Correction to a High-Performance Flash File System,
Proceedings of the 9th ACM/IEEE Conference on Embedded Software (EMSOFT '09),
October 2009.
-
Avani Wildani,
Thomas Schwarz,
Ethan L. Miller,
Darrell D. E. Long,
Protecting Against Rare Event Failures in Archival Systems,
Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2009),
September 2009.
-
Jehan-François Pâris,
Ahmed Amer,
Darrell D. E. Long,
Using Storage Class Memories to Increase the Reliability of Two-Dimensional RAID Arrays,
Proceedings of the 17th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2009),
September 2009.
-
Kevin Greenan,
Darrell D. E. Long,
Ethan L. Miller,
Thomas Schwarz,
Avani Wildani,
Building Flexible, Fault-Tolerant Flash-based Storage Systems,
Proceedings of the Fifth Workshop on Hot Topics in System Dependability (HotDep 2009),
June 2009.
-
Kanchi Gopinath,
Jon Elerath,
Darrell D. E. Long,
Reliability Modelling of Disk Subsystems with Probabilistic Model Checking,
Technical Report UCSC-SSRC-09-05,
May 2009.
-
Jehan-François Pâris,
Ahmed Amer,
Darrell D. E. Long,
Using storage class memories to increase the reliability of two-dimensional RAID arrays,
Technical Report UCSC-SSRC-09-04,
April 2009.
-
Avani Wildani,
Thomas Schwarz,
Ethan L. Miller,
Darrell D. E. Long,
Protecting Against Rare Event Failures in Archival Systems,
Technical Report UCSC-SSRC-09-03,
April 2009.
Preliminary version of a paper that appeared in MASCOTS 2009.
-
Rosie Wacha,
Data Reliability Techniques for Specialized Storage Environments,
Technical Report UCSC-SSRC-09-02,
March 2009.
2008
-
Ahmed Amer,
Jehan-François Pâris,
Darrell D. E. Long,
Thomas Schwarz,
Progressive Parity-Based Hardening of Data Stores,
Proceedings of the 27th International Performance of Computers and Communication Conference (IPCCC '08),
December 2008, pages 34-42.
-
Kevin Greenan,
Darrell D. E. Long,
Ethan L. Miller,
Thomas Schwarz,
Jay Wylie,
A Spin-Up Saved is Energy Earned: Achieving Power-Efficient, Erasure-Coded Storage,
Proceedings of the Fourth Workshop on Hot Topics in System Dependability (HotDep '08),
December 2008.
-
Ahmed Amer,
Darrell D. E. Long,
Jehan-François Pâris,
Thomas Schwarz,
Increased Reliability with SSPiRAL Data Layouts,
Proceedings of the 16th International Symposium on Modeling, Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS 2008),
September 2008, pages 189-198.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Optimizing Galois Field Arithmetic for Diverse Processor Architectures,
Proceedings of the 16th Annual IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008),
September 2008.
-
Kevin Greenan,
Ethan L. Miller,
Jay Wylie,
Reliability of XOR-based erasure codes on heterogeneous devices,
Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2008),
June 2008, pages 147-156.
-
Casey Marshall,
Efficient and safe data backup with Arrow,
Technical Report UCSC-SSRC-08-02,
June 2008.
Masters project report.
-
Neerja Bhatnagar,
Kevin Greenan,
Rosie Wacha,
Ethan L. Miller,
Darrell D. E. Long,
Energy-Reliability Trade-offs in Sensor Networks,
Proceedings of the Fifth Workshop on Embedded Networked Sensors (HotEmNets 2008),
June 2008.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage,
Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08),
February 2008, pages 1-16.
2007
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Darrell D. E. Long,
Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes,
Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007), held in conjunction with the 14th ACM Conference on Computer and Communications Security (CCS 2007),
October 2007.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Analysis and Construction of Galois Fields for Efficient Storage Reliability,
Technical Report UCSC-SSRC-07-09,
August 2007.
Revised version published in MASCOTS 2008.
-
Kevin Greenan,
Ethan L. Miller,
PRIMS : Making NVRAM Suitable for Extremely Reliable Storage,
short paper in Proceedings of the 3rd Workshop on Hot Topics in System Dependability (HotDep '07),
June 2007.
-
Jehan-François Pâris,
Thomas Schwarz,
Darrell D. E. Long,
Self-Adaptive Two-Dimensional RAID Arrays,
Proceedings of the International Performance Conference on Computers and Communication (IPCCC '07),
April 2007.
2006
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Long-Term Threats to Secure Archives,
Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006),
October 2006.
-
Jehan-François Pâris,
Darrell D. E. Long,
Using Device Diversity to Protect Data against Batch-Correlated Disk Failures,
Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006),
October 2006.
-
Kevin Greenan,
Ethan L. Miller,
Reliability Mechanisms for File Systems Using Non-Volatile Memory as a Metadata Store,
Proceedings of the 6th ACM & IEEE Conference on Embedded Software (EMSOFT '06),
October 2006, pages 178-187.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS: Secure Long-Term Archival Storage Without Encryption,
Technical Report UCSC-SSRC-06-03, Storage Systems Research Center, University of California, Santa Cruz,
September 2006.
Later version published in USENIX 2007.
-
Thomas Schwarz,
Ethan L. Miller,
Store, forget, and check: Using algebraic signatures to check remotely administered storage,
Proceedings of the IEEE Int'l Conference on Distributed Computing Systems (ICDCS '06),
July 2006.
2004
-
Thomas Schwarz,
Qin Xin,
Ethan L. Miller,
Darrell D. E. Long,
Andy Hospodor,
Spencer Ng,
Disk Scrubbing in Large Archival Storage Systems,
Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '04),
October 2004, pages 409-418.
Won Best Paper award.
-
Qin Xin,
Ethan L. Miller,
Thomas Schwarz,
Evaluation of Distributed Recovery in Large-Scale Storage Systems,
Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC 2004),
June 2004, pages 172-181.
2003
Last modified 27 Oct 2009
|