eDiscovery Daily Blog

The Pitfalls of Self-Culling and Image Files – eDiscovery Best Practices

This topic came up with a recent client, so I thought I would revisit it here on this blog.

There’s a common mistake that organizations make when collecting their own files to turn over for discovery purposes.  Many attorneys turn over the collection of potentially responsive files to the individual custodians of those files, or to someone in the organization responsible for collecting those files (typically, an IT person) and the self-collection involves “self-culling” through the use of search terms.  When this happens, important files can be missed.

Self-culling by custodians, unless managed closely, can be a wildly inconsistent process (at best).  You’re expecting each custodian to apply the same search terms consistently and, even if IT performs the self-culling, the process may have to be repeated if additional search terms are identified later on.  Even worse, potentially responsive image-only files will be missed with self-culling.

It’s common to have a number of image-only files within any collection, especially if the custodians frequently scan executed documents or use fax software to receive documents from other parties.  In those cases, image-only PDF or TIFF files can often make up as much as 20% of the collection.  When custodians are asked to perform “self-culling” by performing their own searches of their data, these files, which could contain information responsive to the case, will certainly be missed.

With the possibility of inconsistent self-culling, the possibility of additional search terms identified later and the (almost certain) presence of image-only files, I usually advise against self-culling by custodians.  I also don’t recommend that IT perform culling on behalf of the custodians, unless they have the ability to process that data to identify image-only files and perform Optical Character Recognition (OCR) to capture text from them.  If your IT department has the capabilities and experience to do so (and the process and chain of custody is well documented), then that’s great.  However, most internal IT departments either don’t have the capabilities or expertise (or both), in which case it’s best to collect all potentially responsive files from the custodians and turn them over to a qualified eDiscovery provider to perform the culling (performing OCR as needed to include responsive image-only files in the resulting responsive document set).  Unless the case requires supplemental productions, there is also no need to go back to the custodians to collect additional data with the full data set available.

So, what do you think?  Do you self-collect data for discovery purposes?  If so, how do you account for image-only files?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

print