File System Workload Analysis For Large Scientific Computing Applications

Appeared in NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2004).


Parallel scientific applications require high-performance I/O support from underlying file systems. A comprehensive understanding of the expected workload is therefore essential for the design of high-performance parallel file systems. We re-examine the workload characteristics in parallel computing environments in the light of recent technology advances and new applications.

We analyze application traces from a cluster with hundreds of nodes. On average, each application has only one or two typical request sizes. Large requests from several hundred kilobytes to several megabytes are very common. Although in some applications, small requests account for more than 90% of all requests, almost all of the I/O data are transferred by large requests. All of these applica- tions show bursty access patterns. More than 65% of write requests have inter-arrival times within one millisecond in most applications. By running the same benchmark on different file models, we also find that the write throughput of using an individual output file for each node exceeds that of using a shared file for all nodes by a factor of 5. This indicates that current file systems are not well optimized for file sharing.

Publication date:
April 2004

Feng Wang
Qin Xin
Bo Hong
Scott A. Brandt
Ethan L. Miller
Darrell D. E. Long
Tyce T. Mclarty

Tracing and Benchmarking
Ultra-Large Scale Storage

Available media

Full paper text: PDF

Bibtex entry

  author       = {Feng Wang and Qin Xin and Bo Hong and Scott A. Brandt and Ethan L. Miller and Darrell D. E. Long and Tyce T. Mclarty},
  title        = {File System Workload Analysis For Large Scientific Computing Applications},
  booktitle    = {NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST 2004)},
  pages        = {139–152},
  month        = apr,
  year         = {2004},
Last modified 5 Aug 2020