eDiscovery Best Practices: Quality Control, Making Sure the Numbers Add Up

September 18, 2012

Yesterday, we wrote about tracking file counts from collection to production, the concept of expanded file counts, and the categorization of files during processing. Today, let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We discussed the typical categories of excluded files after processing – obviously, what’s not excluded is available for searching and review. Even if your approach includes a technology assisted review (TAR) methodology such as predictive coding, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged. Privileged documents are also typically classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log. Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail based example. We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case. The relevant time period for the case is January 1, 2010 through December 31, 2011. Other than date range, we plan to do no other filtering of files during processing. Duplicates will not be reviewed or produced. We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged. Here’s what this collection might look like:

Collected Files: 101,852 – After expansion, 7 PST files expand to 101,852 eMails and attachments.
Filtered Files: 23,564 – Filtering eMails outside of the relevant date range eliminates 23,564 files.
Remaining Files after Filtering: 78,288 – After filtering, there are 78,288 files to be processed.
NIST/System Files: 0 – eMail collections typically don’t have NIST or system files, so we’ll assume zero files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
Exception Files: 912 – Let’s assume that a little over 1% of the collection (912) is exception files like password protected, corrupted or empty files.
Duplicate Files: 24,215 – It’s fairly common for approximately 30% of the collection to include duplicates, so we’ll assume 24,215 files here.
Remaining Files after Processing: 53,161 – We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
Files Culled During Searching: 35,618 – If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618 files.
Remaining Files for Review: 17,543 – After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
Files Tagged as Non-Responsive: 7,017 – If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such.
Remaining Files Tagged as Responsive: 10,526 – After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
Responsive Files Tagged as Privileged: 842 – If roughly 8% of the responsive documents are privileged, that would be 842 privileged documents.
Produced Files: 9,684 – After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them. The key is to note the numbers in red above. Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals. What happens if you add the counts in red together? You should get 101,852 – the number of collected files after expanding the PST files. As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery. That’s the way it should be. If not, investigation is required to determine where files were missed.

So, what do you think? Do you have a plan for accounting for all collected files during discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily Blog