eDiscovery Daily Blog

Quality Control By The Numbers: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

A while back, we wrote about Quality Assurance (QA) and Quality Control (QC) in the eDiscovery process.  Both are important in improving the quality of work product and making the eDiscovery process more defensible overall.  With regard to QC, an overall QC mechanism is tracking of document counts through the discovery process, especially from collection to production, to identify how every collected file was handled and why each non-produced document was not produced.

Expanded File Counts

Scanned counts of files collected are not the same as expanded file counts.  There are certain container file types, like Outlook PST files and ZIP archives that exist essentially to store a collection of other files.  So, the count that is important to track is the “expanded” file count after processing, which includes all of the files contained within the container files.  So, in a simple scenario where you collect Outlook PST files from seven custodians, the actual number of documents (emails and attachments) within those PST files could be in the tens of thousands.  That’s the starting count that matters if your goal is to account for every document or file in the discovery process.

Categorization of Files During Processing

Of course, not every document gets reviewed or even included in the search process.  During processing, files are usually categorized, with some categories of files usually being set aside and excluded from review.  Here are some typical categories of excluded files in most collections:

  • Filtered Files: Some files may be collected, and then filtered during processing. A common filter for the file collection is the relevant date range of the case.  If you’re collecting custodians’ source PST files, those may include messages outside the relevant date range; if so, those messages may need to be filtered out of the review set.  Files may also be filtered based on type of file or other reasons for exclusion.
  • NIST and System Files: Many file collections also contain system files, like executable files (EXEs) or Dynamic Link Library (DLLs) that are part of the software on a computer which do not contain client data, so those are typically excluded from the review set. NIST files are included on the National Institute of Standards and Technology list of files that are known to have no evidentiary value, so any files in the collection matching those on the list are “De-NISTed”.
  • Exception Files: These are files that cannot be processed or indexed, for whatever reason. For example, they may be password-protected or corrupted.  Just because these files cannot be processed doesn’t mean they can be ignored, depending on your agreement with opposing counsel, you may need to at least provide a list of them on an exception log to prove they were addressed, if not attempt to repair them or make them accessible (BTW, it’s good to establish that agreement for disposition of exception files up front).
  • Duplicate Files: During processing, files that are exact duplicates may be put aside to avoid redundant review (and potential inconsistencies). Some exact duplicates are typically identified based on the HASH value, which is a digital fingerprint generated based on the content and format of the file – if two files have the same HASH value, they have the same exact content and format.  Emails (and their attachments) may be identified as duplicates based on key metadata fields, so an attachment cannot be “de-duped” out of the collection by a standalone copy of the same file.

All of these categories of excluded files can reduce the set of files to actually be searched and reviewed.  On Monday, we’ll illustrate an example of a file set from collection to production to illustrate how each file is accounted for during the discovery process.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.