Usage Behavior of a Large-Scale Scientific Archive

Appeared in Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC12).

Abstract

Archival storage systems for scientific data have been growing in both size and relevance over the past two decades, yet researchers and system designers alike must rely on limited and obsolete knowledge to guide archival management and design. To address this issue, we analyzed three years of file- level activities from the NCAR mass storage system, providing valuable insight into a large-scale scientific archive with over 1600 users, tens of millions of files, and petabytes of data. Our examination of system usage showed that, while a subset of users were responsible for most of the activity, this activity was widely distributed at the file level. We also show that the physical grouping of files and directories on media can improve archival storage system performance. Based on our observations, we provide suggestions and guidance for both future scientific archival system designs as well as improved tracing of archival activity.

Publication date:
November 2012

Authors:
Ian Adams
Brian Madden
Joel Frank
Mark W. Storer
Ethan L. Miller
Gene Harano

Projects:
Archival Storage
Tracing and Benchmarking

Available for download:

Full text:
Download as PDF

Bibtex entry

@inproceedings{adams12-sc,
  author       = {Ian Adams and Brian Madden and Joel Frank and Mark W. Storer and
Ethan L. Miller and Gene Harano},
  title        = {Usage Behavior of a Large-Scale Scientific Archive},
  booktitle = {Proceedings of the 2012 International Conference for High
Performance Computing, Networking, Storage and Analysis (SC12)},
  month        = nov,
  year         = {2012},
}
Last modified 19 Sep 2013