Zoroaster Seminar talk
Evaluating design alternatives for storage systems, tuning parameter values of existing systems, and assessing capacity and performance requirements when setting up systems for production use, all require the ability to capture the essence of how a system is typically used. Collecting and disseminating real-time enterprise workloads is diﬃcult from a logistical, security and privacy standpoint. Most companies run small workloads to test their systems which focus on a subset of system-speciﬁc characteristics. This provides them with an incomplete picture of the system’s performance on diﬀerent or larger workloads. Even when a trace is successfully collected and publicly shared, translating a trace from one architecture to another is typically done in an ad-hoc manner that leaves room for misinterpretation, leading to costly over-provisioning and system projections. Obtaining synthetic workloads that adequately represent real-time workloads for tuning tasks is diﬃcult. In the past, frameworks that generated synthetic workloads have been unsuccessful in eﬃciently capturing temporal distributions like in real-time enterprise traces, directly aﬀecting the placement decisions in systems. Additionally, existing tools do not provide a well-rounded performance evaluation of the system. To address these challenges, we propose Zoroaster, a self-improving, synthetic systems workload generator that can produce workloads of customizable scale and hybrid types, given a set system characteristics. Zoroaster will use Generative Adversarial Networks (GANs) to create complex synthetic workloads that dynamically and accurately map to the characteristics of the workload class that each model is built to emulate. GANs are an adversarial network framework consisting of a pair of deep neural networks: the generative model and the discriminative model. Our system will use real-time enterprise workloads to train the discriminative model. A randomly initialized generative model is then pitted against the pre-trained discriminative model, which learns to determine whether a sample is from the model distribution or the data distribution. The generative model will then create synthetic workloads with characteristics similar to those of the real-time workloads as outputs. Zoroaster will be considered successful if the generated models are statistically indistinguishable from real trace data at scale.
Monday, February 25, 2019 at 12:15 PM
Last modified 29 May 2019