Provenance Pruning for Local Storage Systems Based on the PASS Model

Provenance is a kind of metadata that records the ancestry or lineage of a data item. It can be useful in many real world applications, e.g., Experimental Documentation, Debugging, Security, Search, etc. With the increase in storage capacity, storage system has contained more and more files, correspondingly, the size of provenance increases bigger and bigger. According to the evidence from physics and astronomy data, the provenance information can become many times larger than the raw data, even over 10 times. We propose to prune reduplicate or unnecessary provenance to reduce storage overhead. Our provenance pruning method is based on PASS model. We will utilize the storage relationship between files to factor out large common subtree from the provenance chains of different files and leverage the experience from social network to compress the large provenance graph.

When:
Monday, March 7, 2011 at 1:00 PM

Where:
E2 599

CRSS Contact:
Xie, Yulai

Last modified 24 May 2019