Skip the HASH When Deduping Outlook MSG Files – eDiscovery Best Practices

May 7, 2013

As we discussed recently in this blog, Microsoft® Outlook emails can take many forms. One of those forms is the MSG file extension, which is used to represent a self-contained unit for an individual message “family” (email and its attachments). MSG files can exist on your computer in the same folders as Word, Excel and other data files. But, when it comes to deduping those MSG files, the approach to do so is typically different.

A few years ago, I was assisting a client and collecting emails from their email archiving system for discovery, outputting the selected emails to individual MSG files (per their request). Because this was an enterprise-wide search of email archives, the searches that I performed found the same emails again and again in different custodian folders. There was literally hundreds of thousands of duplicate emails in this collection. Of course, this is typical – anytime you send an email to three co-workers, all four of you have a copy of the email (assuming none of you deleted it). If the email is responsive and your goal is to dedupe across custodians, you only want to review and produce one copy, not four.

However, had I performed a HASH value identification of duplicates on those output MSG files, I would find no duplicates. Why is that?

That’s because each MSG file contains a field which stores the Creation Date and Time. Because this value will be set at the date and time the MSG is saved, two emails with otherwise identical content will not be considered duplicates based on the HASH value. Remember how “drag and drop” sets the Creation Date and Time of the copy to the current date and time? The same thing happens when an MSG file is created.

Hmmm, what to do? Typically, the approach for MSG files is to use key metadata fields to identify duplicates. Many processing vendors use a typical combination of fields that consist of: From, To, CC, BCC, Subject, Attachment Name, Sent Date/Time and Body of the email. Some use those fields only on MSG files; others use it on all emails (to dedupe individual emails within MSG files against those same emails within an OST or a PST file).

So, if you’re hungry to eliminate duplicates from your collection of MSG files, skip the HASH and use the metadata fields. It’s much more (ful)filling.

So, what do you think? Have you encountered any challenges when it comes to deduping emails? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

WHAT CLIENTS ARE SAYING ABOUT CLOUDNINE

Great value product.

“Offers the major features we were looking for, at a fraction of pricing of other competitors.”

I used CloudNine as part of fraud investigation for email searches.

“…The tag function made it easy to flag the search results. I was impressed with the ease of use for a first-time user. The speed and ease of loading data and being able to review it immediately is a tremendous advantage over other Cloud-based platforms.”

Excellent tool with outstanding support

“CloudNine Review is excellent, it takes the best of the (market leader) review solution and leaves out all of the fiddly bits that make that product excruciating to use. Their upload and processing is automatic, and their pricing structure is the best I’ve seen.”

Great software that is easy to log on, user-friendly, has a great layout, and is easy to navigate.

“…CloudNine is great at searching documents, including tagging, and exporting. Software tailored to our business needs and streamlined the task at hand.”

Discovery Production

This software is easy to use and allows us to upload and download documents as they become ready, saving us both time and money.

Stephanie Plake, Assistant to Attorney at Law Office

eDiscovery Daily Blog