Computational Storage Device Simulation for Real-World Workloads

Abstract: Data movement between storage and compute represents a bottleneck in data-driven
applications. By executing compute kernels on the storage device instead of moving the data
through the memory hierarchy to the CPU cache, throughput can be increased and energy
consumption can be reduced. This new type of application architecture allows for a reduction in
total cost of ownership when performing similar workloads such as genomics and data analytics.

In this work, we present a computational storage device simulator which allows the user
application to offload many concurrent compute tasks to the device. This simulator provides a
platform which can be used to further develop the interface between user applications, device
drivers, and computational storage devices. Given the current lack of readily available hardware
designs, this simulator platform allows research in these areas to continue to progress in
parallel. This allows us to explore different models for using this type of hardware, including
different possible constraints in the NVMe specifications as well as multiple approaches to
offloading compute tasks to the device.

We present a simulator which is built using the QEMU Linux device emulator system. Using the
Intel SPDK userspace NVMe device driver atop the emulated QEMU device allows for high
throughput access to the PCIe bus, and the NVMe I/O queueing system allows thousands of
requests to be in flight simultaneously. This approach allows the user to take advantage of high
levels of parallelism inherent in many data-driven workloads. Our simulator design allows us to
further develop an application framework for compute kernel offload. This provides us an
opportunity to explore different interfaces and synchronization mechanisms available to connect
the user application to the device. By exploring these different approaches in device interface
design, we will be able to provide the means for application developers to easily develop the
program modifications necessary in order to port existing data-driven applications to this
computational storage interface, reducing the engineering effort required to realize these
performance and efficiency benefits. This system enables us to evaluate the scalability of kernel
offloading techniques and the computational cost of synchronization between the host and the
storage accelerator.

**Access to the recording is reserved for CRSS members and Deep Dive guests.**

Please contact Cynthia McCarley ( if you are a CRSS member or Deep Dive guest that did not receive an email with the password to the recording.

Wednesday, February 24, 2021 at 3:00 PM

Zoom (Link available by invitation)

Material from the event

SSRC Contact:
McCarley, Cynthia

Last modified 2 Jul 2021