eDiscovery Daily Blog

Organize Your Collection by Message Thread to Save Costs During Review: eDiscovery Best Practices

This topic came up recently with a client, so I thought it was timely to revisit…

Not only is insanity doing the same thing over and over again and expecting a different result, but in eDiscovery review, it can be even worse when you do get a different result.

One of the biggest challenges when reviewing electronically stored information (ESI) is identifying duplicates so that your reviewers aren’t reviewing the same files again and again.  Not only does that drive up costs unnecessarily, but it could lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

There are a few ways to identify duplicates.  Exact duplicates (that contain the exact same content in the same file format) can be identified through hash values, which are a digital fingerprint of the content of the file.  MD5 and SHA-1 are the most popular hashing algorithms, which can identify exact duplicates of a file, so that they can be removed from the review population.  Since many of the same emails are emailed to multiple parties and the same files are stored on different drives, deduplication through hashing can save considerable review costs.

Sometimes, files are exact (or nearly exact) duplicates in content but not in format.  One example is a Word document published to an Adobe PDF file – the content is the same, but the file format is different, so the hash value will be different.  Near-deduplication can be used to identify files where most or all of the content matches so they can be verified as duplicates and eliminated from review.

Another way to identify duplicative content is through message thread analysis.  Many email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion.  To review each email in the discussion thread would result in much of the same information being reviewed over and over again.  Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion.  That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about lunch plans or did you see The Walking Dead last night).

CloudNine’s review platform (shameless plug warning!) is one example of an application that provides a mechanism for message thread analysis of Outlook emails that pulls the entire thread into one conversation for review in a popup window.  By doing so, you can focus your review on the last emails in each conversation to see what is said without having to review each email.

With message thread analysis, you can minimize review of duplicative information within emails, saving time and cost and ensuring consistency in the review.

So, what do you think?  Does your review tool support message thread analysis?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.