RSS feed RSS: Events | News | Papers


››› Complete list of news items


››› Complete list of events

Archival Storage


We have several active and past projects in archival storage, all of which are contributing to the ability to build more efficient, reliable, and secure long-term storage systems. In addition, we maintain a wiki page with links to resources on archival storage systems.

  • Archival Workload Studies: We have produced several detailed studies of archival storage user behavior and system evolution. Our studies provide relevant, up-to-date observations on archival system usage patterns to guide and validate future archival storage designs. Some of the key results we've found include weakening oft-quoted "Write-Once, Read-Maybe" assumption, and identifying that the vast majority of archival traffic comes from purely automated sources.
  • Improving Trace Analysis: Our experiences with analyzing long-term traces have highlighted shortcoming in current tracing and analysis techniques. We are using our experience to design new techniques and "best practices" to improve future traces and analyses, such as using traces and metadata snapshots to improve understanding of system state over time, and techniques for discerning between logger failures and full system crashes when activity rates appear unusually low.
  • Economic Modeling of Long-Term Storage: One of the most pressing current issues in archival storage is understanding what will influence the long-term total cost of operation (TCO) for storing data for decades or longer. Factors that influence the TCO include electricity, labor, shifting media costs, as well as disasters. An insufficiently funded archive may safely store data only to run out of funds at a critical juncture, such as a media cost spike like that incurred by the 2011 Thailand floods, and slowing growth of HD densities. Using a series of models and simulations we aim to explore factors that influence the long-term costs and survival of archives.
  • Secure and Searchable Long-Term Storage: As humanity generates ever-increasing amounts of data that must be stored for decades, we must both protect the data from disclosure and allow users to find information. Since long-term storage can potentially suffer from compromised by a single site or person, we distribute data across multiple archive sites, using techniques derived from POTSHARDS. We are investigating techniques that can then allow this data to be searched without revealing search terms or even significant correlation between documents to archive managers, providing a level of privacy necessary for long-term storage of medical records, sensitive corporate and government data, and personal information such as video and photos.


  • Archival Workload Studies: We have recently completed and published several studies of both private and public historical and scientific archives, and are looking towards analysis of a newer dataset obtained from the US Library of Congress.
  • Improving Trace Analysis:In this project we are in the midst of initial proof of concept simulations and analysis, creating artificial snapshots and workloads to better understand the strengths and limitations of our proposed techniques
  • Economic Modeling of Long-Term Storage: We have completed a working discrete event simulator, and are exploring a variety of questions. For example, what is the impact of increased device lifetime in scenarios with low overall device density growth?
  • Secure and Searchable Long-Term Storage: We are working towards publishing our initial work on Percival: a framework that leverages pre-indexing, keyed hashing and Bloom filters to enable blinded searching, blinding the archive from knowing what terms are being queried.
  • Past Projects: The following are projects we have worked on in the past
    Logan: A management system to scalably grow, maintain, and evolve a heterogeneous archival storage system
    Computation-Storage Trade-off: Using provenance to reduce storage overhead by storing intermediate and initial inputs and recomputing a dataset on demand
    Pergamum: long-term evolvable storage built from intelligent network-attached bricks with both disk and NVRAM such as flash.
    Deep Store: building more efficient archival storage using deduplication to take advantage of intra-file and inter-file redundancy.
    POTSHARDS: long-term secure storage, which allows the secure preservation of data for decades without relying upon traditional encryption to prevent information leakage.
  • Publications













    Last modified 7 Oct 2013
    Home | Research | People | Publications | Seminars | Sponsors
    Site powered by Django