High-volume digital storage hardware
Internet Archive PetaBox

PetaBox is a storage unit from Capricorn Technologies.[1] It was designed by the staff of the Internet Archive and C. R. Saikley to store and process one petabyte (a million gigabytes) of information.[2]


  • Density: 1.4 PetaBytes/rack
  • Power consumption: 3 kW/PetaByte
  • No air conditioning, instead uses excess heat to help heat the building.

Design history

The PetaBox, custom-designed by Internet Archive staff, was originally created to safely store and process one petabyte (a million gigabytes) of information. The goals and design points were:[3]

  • Low power: 6 kW per rack, 60 kW for the entire storage cluster
  • High density: 100+ TB/rack
  • Local computing to process the data (800 low-end PCs)
  • Multi-OS possible, Linux standard
  • Colocation friendly
  • Shipping container friendly: able to be run in a 20' by 8' by 8' shipping container
  • Easy maintenance: One system administrator per petabyte
  • Software to automate full mirroring
  • Easy to scale
  • Inexpensive design
  • Inexpensive storage


The first 100 terabyte rack became operational at the European Archive in June 2004. The second 80 terabyte rack became operational in San Francisco that same year. The Internet Archive then spun off its PetaBox production to the newly formed company Capricorn Technologies.[4]

Between 2004 and 2007, Capricorn replicated the Internet Archive's deployment of the PetaBox for major academic institutions, digital preservationists, government agencies, high-performance computing (HPC) and major research sites, medical imaging providers, digital image repositories, storage outsourcing sites, and other enterprises. Their largest product uses 750 gigabyte disks. In 2007 the Internet Archive data center housed approximately three petabytes of PetaBox storage technology.

As of 2010, the fourth version of the PetaBox was in operation. Its general specifications are:[5]

  • 24 disks per 4U high rack units
  • 10 units per rack
  • running Linux[6]
  • 240 disks of 2 TB/each per rack

As of December 2021,[7] the Petabox contains the following:

  • 4 data centers
  • 745 nodes
  • 28,000 spinning disks

The Wayback Machine contains 57 petabytes of information; book, music and video collections contain an extra 42 petabytes of information, and Unique Data contains an extra 99 petabytes of information, with everything adding up to a total of 212 petabytes' worth of storage.


  1. ^ "Big storage on the cheap". CNET.
  2. ^ "Fourth generation Petabox storage system"
  3. ^ "Overview"
  4. ^ "Internet Archive: Petabox".
  5. ^ Jeff Kaplan (27 July 2010). "The Fourth Generation Petabox". Internet Archive.
  6. ^ "eWEEK Labs Walk-Through: the Internet Archive". PCMag UK. Retrieved 2021-11-09.
  7. ^ "Internet Archive: Petabox". Retrieved 2022-08-16.

External link

Wikimedia Commons has media related to PetaBox.
  • Official website Edit this at Wikidata
  • v
  • t
  • e
Internet Archive
Internet Archive logo and wordmark.svg
Partners and