Quality Control, Making Sure the Numbers Add Up: eDiscovery Throwback Thursdays

August 15, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on September 18, 2012 – when eDiscovery Daily was not quite two years old. While it doesn’t exactly link to our just concluded two part series, it does tie in nicely and has been referenced in numerous webcasts since. Our seven-custodian example might be a bit light for the amount of email we get today, but it’s still a relevant exercise. Enjoy!

Let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We know there are typical categories of excluded files after processing: filtered files, NIST (National Institute of Standards and Technology list of files that are known to have no evidentiary value) and system files, exception files and duplicate files. These can be a significant portion of the collection. Obviously, what’s not excluded is available for searching and review. Even if your approach includes a technology assisted review (TAR) methodology such as predictive coding, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged. Privileged documents are also typically classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log. Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail-based example. We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case. The relevant time period for the case is January 1, 2010 through December 31, 2011. Other than date range, we plan to do no other filtering of files during processing. Duplicates will not be reviewed or produced. We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged. Here’s what this collection might look like:

Collected Files: 101,852 – After expansion, 7 PST files expand to 101,852 eMails and attachments.
Filtered Files: 23,564 – Filtering eMails outside of the relevant date range eliminates 23,564 files.
Remaining Files after Filtering: 78,288 – After filtering, there are 78,288 files to be processed.
NIST/System Files: 0 – eMail collections typically don’t have NIST or system files, so we’ll assume zero files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
Exception Files: 912 – Let’s assume that a little over 1% of the collection (912) is exception files like password protected, corrupted or empty files.
Duplicate Files: 24,215 – It’s fairly common for at least 30% of the collection to include duplicates, so we’ll assume 24,215 files here.
Remaining Files after Processing: 53,161 – We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
Files Culled During Searching: 35,618 – If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618 files.
Remaining Files for Review: 17,543 – After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
Files Tagged as Non-Responsive: 7,017 – If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such. Results can vary widely on the document collection, culling accuracy and review process, of course.
Remaining Files Tagged as Responsive: 10,526 – After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
Responsive Files Tagged as Privileged: 842 – If roughly 8% of the responsive documents wind up being privileged, that would be 842 privileged documents. Again, the number of responsive files can vary widely, depending on the file collection and privilege considerations.
Produced Files: 9,684 – After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them. The key is to note the numbers in red above. Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals. What happens if you add the counts in blue* together? You should get 101,852 – the number of collected files after expanding the PST files. As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery. That’s the way it should be. If not, investigation is required to determine where files were missed.

So, what do you think? Do you have a plan for accounting for all collected files during discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

*Why blue? Because we did red last time! :o)

P.S. — Happy Anniversary, honey! I’m the luckiest man around! If you don’t believe me, check out this post!

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily Blog