eDiscovery Daily Blog

eDiscovery Best Practices: Perspective on the Amount of Data Contained in 1 Gigabyte


Often, the picture used to introduce the blog post is a whimsical (but public domain!) representation of the topic at hand.  However, today’s picture is intended to be a bit instructional.

As we work with more data daily and we keep buying larger hard drives to store that data, one gigabyte (GB) of data seems smaller and smaller.  Today, you can buy a portable 1 terabyte (TB) drive for less than $100 in some places.  Is the GB smaller than it used to be?  Last I checked, it’s still about a billion bytes (1024 x 1024 x 1024 or 1,073,741,824 bytes, to be exact).

From a page standpoint, most estimates that I’ve heard have estimated 1 GB to be 50,000 to 75,000 pages.  Of course, that can vary widely, depending on the file types comprising that GB.  A GB of 1 megabyte (MB) one-page, high-resolution image files will only take about 1,000 pages to equal a GB, whereas a collection of 5 kilobyte (KB) text file and small emails (with minimal attachments) could take as much as 200,000 pages to equal a GB.  So, 50,000 to 75,000 is probably a good average.

A ream of copy paper is 500 pages and a case holds 10 reams (5,000 pages).  So, a GB is the equivalent of 100 to 150 reams of paper (10 to 15 cases), which is enough paper to fill a small truck.  Hence, today’s picture shows a truck full of paper.

There was a Gartner report that re-published Anne Kershaw’s analysis on the cost to manually review 1 TB of data.  Quoting from the report, as follows:

“Considering that one terabyte is generally estimated to contain 75 million pages, a one-terabyte case could amount to 18,750,000 documents, assuming an average of four pages per document. Further assuming that a lawyer or paralegal can review 50 documents per hour (a very fast review rate), it would take 375,000 hours to complete the review. In other words, it would take more than 185 reviewers working 2,000 hours each per year to complete the review within a year. Assuming each reviewer is paid $50 per hour (a bargain), the cost could be more than $18,750,000.”

If it costs $18.75 million to review 1 TB, one could extrapolate that to approximately $18,750 to review each GB.  Dividing by 1,000 (ignoring the 24), that extrapolates to: 75,000 pages / 4 = 18,750 documents / 50 documents reviewed per hour = 375 review hours x $50 per hour = $18,750.  I’ve mentioned that figure to clients and prospects and they almost always seem surprised that the figure is so high.  Then, I ask them how many hours does it take them to review a truckload of paper to determine relevancy to the case?  😉

Bottom line: each GB effectively culled out through technology (such as early case assessment, first pass review tools like FirstPass™, powered by Venio) can save approximately $18,750 in review costs.  That’s why technology based assisted review approaches have become so popular and why it’s important to remember how expensive each additional GB can be.

So, what do you think?  Did you realize that each GB was so large or so expensive?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.