The Effectiveness of Deduplication on Virtual Machine Disk Images

Appeared in Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference.


Virtualization is becoming widely deployed in servers to efficiently provide many logically separate execution environments while reducing the need for physical servers. While this approach saves physical CPU resources, it still consumes large amounts of storage because each virtual machine (VM) instance requires its own multi-gigabyte disk image. Moreover, existing systems do not support ad hoc block sharing between disk images, instead relying on techniques such as overlays to build multiple VMs from a single “base” image. Instead, we propose the use of deduplication to both reduce the total storage required for VM disk images and increase the ability of VMs to share disk blocks. To test the effectiveness of deduplication, we conducted extensive evaluations on different sets of virtual machine disk images with different chunking strategies. Our experiments found that the amount of stored data grows very slowly after the first few virtual disk images if only the locale or software configuration is changed, with the rate of compression suffering when different versions of an operating system or different operating systems are included. We also show that fixed-length chunks work well, achieving nearly the same compression rate as variable-length chunks. Finally, we show that simply identifying zero-filled blocks, even in ready-to-use virtual machine disk images available online, can provide significant savings in storage.

Publication date:
May 2009

Keren Jin
Ethan L. Miller


Available media

Full paper text: PDF

Bibtex entry

  author       = {Keren Jin and Ethan L. Miller},
  title        = {The Effectiveness of Deduplication on Virtual Machine Disk Images},
  booktitle    = {Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference},
  month        = may,
  year         = {2009},
Last modified 28 May 2019