ASCAR: Automating Contention Management for High-Performance Storage Systems

Appeared in 31st International Conference on Massive Storage Systems and Technologies (MSST2015).

Abstract

High-performance parallel storage systems, such as those used by supercomputers and data centers, can suffer from performance degradation when a large number of clients are contending for limited resources, like bandwidth. These contentions lower the efficiency of the system and cause unwanted speed variances. We present the Automatic Storage Contention Alleviation and Reduction system (ASCAR), a storage traffic management system for improving the bandwidth utilization and fairness of resource allocation. ASCAR regulates I/O traffic from the clients using a rule based algorithm that controls the congestion window and rate limit. The rule-based client controllers are fast responding to burst I/O because no runtime coordination between clients or with a central coordinator is needed; they are also autonomous so the system has no scale-out bottleneck. Finding optimal rules can be a challenging task that requires expertise and numerous experiments. ASCAR includes a SHAred-nothing Rule Producer (SHARP) that produces rules in an unsupervised manner by systematically exploring the solution space of possible rule designs and evaluating the target workload under the candidate rule sets. Evaluation shows that our ASCAR prototype can improve the throughput of all tested workloads – some by as much as 35%. ASCAR improves the throughput of a NASA NPB BTIO checkpoint workload by 33.5% and reduces its speed variance by 55.4% at the same time. The optimization time and controller overhead are unrelated to the scale of the system; thus, it has the potential to support future large-scale systems that can have millions of clients and thousands of servers. As a pure client-side solution, ASCAR needs no change to either the hardware or server software.

Publication date:
June 2015

Authors:
Yan Li
Xiaoyuan Lu
Ethan L. Miller
Darrell D. E. Long

Projects:
Scalable High-Performance QoS
Ultra-Large Scale Storage
Storage QoS

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{li-msst15,
  author       = {Yan Li and Xiaoyuan Lu and Ethan L. Miller and Darrell D. E. Long},
  title        = {{ASCAR}: Automating Contention Management for High-Performance Storage Systems},
  booktitle    = {31st International Conference on Massive Storage Systems and Technologies (MSST2015)},
  month        = jun,
  year         = {2015},
}
Last modified 28 May 2019