Introduction

High-performance parallel storage systems, such as those used by supercomputers and data centers, can suffer from performance degradation when a large number of clients are contending for limited resources, like bandwidth. These contentions lower the efficiency of the system and cause unwanted speed variances. We present the Automatic Storage Contention Alleviation and Reduction system (ASCAR), a storage traffic management system for improving the bandwidth utilization and fairness of resource allocation. ASCAR regulates I/O traffic from the clients using a rule based algorithm that controls the congestion window and request rates; it requires no runtime coordination between clients or with a central coordinator. Distributed rule-based system is fast-responding and scalable, but optimal rules are hard to design. We designed a SHAred-nothing Rule Producer (SHARP) that produces rules in an unsupervised manner by systematically exploring the solution space of possible rule designs and evaluating the target workload under the candidate rule sets. Evaluation shows that our ASCAR prototype can improve the throughput of all tested workloads – some by as much as 35%. ASCAR improves the throughput of a NASA NPB BTIO checkpoint workload by 33.5% and reduces its speed variance by 55.4% at the same time. By abandoning time-consuming communication between control clients, which is needed by most existing traffic control solutions, ASCAR achieves high responsiveness and scalability; it can efficiently handle highly dynamic workloads, such as burst I/O. The optimization time and controller overhead are unrelated to the scale of the system; thus, it has the potential to support millions of clients. As a pure client-side solution, ASCAR needs no change to either the hardware or server software.

[image goes here]

A random write workload running with and without ASCAR. ASCAR increases average throughput and reduces speed variation.

People

Faculty

  • Darrell D. E. Long
  • Ethan L. Miller

Students

  • Yan Li
  • Xiaoyuan Lu

Associates

  • Ahmed Amer
  • Thomas M. Kroeger

Publications

  • Yan Li, Xiaoyuan Lu, Ethan L. Miller, Darrell D. E. Long, "ASCAR: Automating Contention Management for High-Performance Storage Systems," 31st International Conference on Massive Storage Systems and Technologies (MSST2015), June 2015.

Talks

  • Yan Li, ASCAR: Increasing Performance Through Automated Contention Management (slides) at Lustre Developer Day and User Group 2016, April 2016, Portland, OR, USA

Releases

ASCAR is an ongoing project that experiments with many ideas and algorithms, focusing on using machine learning methods to improve storage performance. We are working on several prototypes that handle different storage systems. We release the source code of our prototypes here for evaluation and research purposes, in the spirit of facilitating scientific research and collaboration. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OF THE UNIVERSITY OF CALIFORNIA BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

ASCAR for Lustre 2.4.0

This ASCAR prototype works with Lustre version 2.4.0. We have tested it on Red Hat Enterprise Linux / CentOS x86-64 Release 6.4 (and it should work on later 6.* releases too). It consists of the following components:

  • The client ASCAR controller works within the Lustre file system client. The source code contains the kernel module and related test cases. The full source code tree can be found here: https://github.com/mlogic/ascar-lustre-2.4-client. It can be directly compiled as a working Lustre client kernel module. You can also read the diff against the original Lustre 2.4.0 if you only want to read the code that we have added or changed for ASCAR. Because Lustre 2.4.0 was released under the GPLv2 license, this part of ASCAR (and only this part) is also licensed under GPLv2.
  • The source code of SHARP, the rule generator, can be found here: https://github.com/mlogic/ascar-lustre-sharp. This part of ASCAR is licensed under the 3-clause BSD license.

Please be noted that we have only tested it with Lustre 2.4.0. Theoretically, it should work with later 2.4.* versions, probably with minor tweaks if needed. If you have a need for or are working on porting it to other systems, please feel free to contact us for collaboration.

ASCAR for Lustre 2.7.0

Under development.

Last modified 24 May 2019