Impact of Failure on Interconnection Networks in Large Storage Systems

Appeared in Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies.

Abstract

Recent advances in large-capacity, low-cost storage devices have led to active research in design of large-scale storage systems built from commodity devices for supercomputing applications. Such storage systems, composed of thousands of storage devices, are required to provide high system bandwidth and petabyte-scale data storage. A robust network interconnection is essential to achieve high bandwidth, low latency, and reliable delivery during data transfers. However, failures, such as temporary link outages and node crashes, are inevitable. We discuss the impact of potential failures on network interconnections in very large-scale storage systems and analyze the trade-offs among several storage network topologies by simulations. Our results suggest that a good interconnect topology be essential to fault-tolerance of a petabyte-scale storage system.

Publication date:
April 2005

Authors:
Qin Xin
Ethan L. Miller
Thomas Schwarz
Darrell D. E. Long

Projects:
Ultra-Large Scale Storage

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{xin-msst05,
  author       = {Qin Xin and Ethan L. Miller and Thomas Schwarz and Darrell D. E. Long},
  title        = {Impact of Failure on Interconnection Networks in Large Storage Systems},
  booktitle    = {Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies},
  month        = apr,
  year         = {2005},
}
Last modified 5 Aug 2020