eDiscovery Daily Blog

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think


Here’s a sample scenario: You identify custodians relevant to the case and collect files from each.  Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians.  You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel.  After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!!  What happened?!?

Did the vendor accidentally “double-bill” you?  That would be great – but no.  There’s a much more logical explanation and, unfortunately, you may wind up paying a lot more to process these files that you expected.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files.  For example, as noted above, Outlook emails are typically saved for each custodian in a personal storage (.PST) file format, which is an expanding container file. For most custodians, all of their email (and the corresponding attachments, if present) resides in a few PST files.  The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space?  The PST file is like one of those vacuum bags – it typically stores the emails and attachments in a compressed format to save space.  When the emails and attachments are processed into a review tool, they are expanded into their normal size.  This expanded size can be 1.5 to 2 times larger than the scanned size (or more).  And, that’s what many vendors will bill on – the expanded size.

There are other types of archive container files that compress the contents – .zip and .rar files are two examples of compressed container files.  These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, usually in – you guessed it – email.  With email comprising a majority of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.  It’s important to be prepared for that and know your options when processing that data, so you can effectively anticipate those processing costs.

So, what do you think?  Have you ever been surprised by processing costs of your ESI?   Please share any comments you might have or if you’d like to know more about a particular topic.