ASCAR: Automating Contention Management for High-Performance Storage Systems

Appeared in MSST'15. Publication: June 4th 2015

Abstract

High-performance parallel storage systems, such as those used by supercomputers and data centers, can suffer from performance degradation when a large number of clients are contending for limited resources, like bandwidth. These contentions lower the efficiency of the system and cause unwanted speed variances. We present the Automatic Storage Contention Alleviation and Reduction system (Ascar), a storage traffic management system for improving the bandwidth utilization and fairness of resource allocation. Ascar regulates I/O traffic from the clients using a rule based algorithm that controls the congestion window and request rates; it requires no runtime coordination between clients or with a central coordinator. Distributed rule-based system is fast-responding and scalable, but optimal rules are hard to design. We designed a SHAred-nothing Rule Producer (SHARP) that produces rules in an unsupervised manner by systematically exploring the solution space of possible rule designs and evaluating the target workload under the candidate rule sets. Evaluation shows that our Ascar prototype can improve the throughput of all tested workloads – some by as much as 35%. Ascar improves the throughput of a NASA NPB BTIO checkpoint workload by 33.5% and reduces its speed variance by 55.4% at the same time. By abandoning time-consuming communication between control clients, which is needed by most existing traffic control solutions, Ascar achieves high responsiveness and scalability; it can efficiently handle highly dynamic workloads, such as burst I/O. The optimization time and controller overhead are unrelated to the scale of the system; thus, it has the potential to support millions of clients. As a pure client-side solution, Ascar needs no change to either the hardware or server software.

Publication date:
May 2015

Authors:
Yan Li
Xiaoyuan Lu
Ethan L. Miller
Darrell D. E. Long

Projects:
Scalable High-Performance QoS

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{msst-f15,
  author       = {Yan Li and Xiaoyuan Lu and Ethan L. Miller and Darrell D. E. Long},
  title        = {{ASCAR}: Automating Contention Management for High-Performance Storage Systems},
  booktitle    = {MSST'15},
  month        = may,
  year         = {2015},
}
Last modified 28 May 2019