eDiscovery Daily Blog
Data Needs to Be Converted More Often than You Think – eDiscovery Best Practices
We’ve discussed previously that electronic files aren’t necessarily ready to review just because they’re electronic. They often need processing and good processing requires a sound process. Sometimes that process includes data conversion if the data isn’t in the most useful format.
Case in point: I recently worked with a client that received a multi-part production from the other side (via a another party involved in the litigation, per agreement between the parties) that included image files, OCR text files and metadata. The files that my client received were produced over several months to several other parties in the litigation. The production contained numerous emails, each of which (of course) included an email sent date. Can you guess which format the email sent date was provided in? Here are some choices (using today’s date and 1:00 PM as an example):
- 09/03/2013 13:00:00
- 9/03/2013 1:00 PM
- September 3, 2013 1:00 PM
- Sep-03-2013 1:00 PM
- 2013/09/03 13:00:00
The answer: all of them.
Because there were several productions to different parties with (apparently) different format agreements, my client didn’t have the option to request the data to be reproduced in a standard format. Not only that, the name of the produced metadata field wasn’t consistent between productions – in about 15 percent of the documents the producing party named the field email_date_sent, in the rest it was named date_sent.
Ever try to sort emails chronologically when they’re not only in different formats, but also in two different fields? It’s impossible. Fortunately, at CloudNine Discovery, there is no shortage of computer “geeks” to address problems like this (I’m admittedly one of them).
As a result, we had to standardize the format of the dates into one standard format in one field. We used a combination of SQL queries to get the data into one field and string commands and regular expressions to manipulate dates that didn’t fit a standard SQL date format by re-parsing them into a correct date format. For example, the date 2013/09/03 was reparsed into 09/03/2013.
Getting the dates into a standard format in a single field not only enabled us to sort the emails chronologically by date sent, it also enabled us to identify (in combination with other standard email metadata fields) duplicates in the collection based on metadata fields (since the data was in image and OCR formats, HASH algorithms weren’t a viable option for de-duplication).
Over the years, I’ve seen many examples where data (either from our side or the other side) needs to be converted. It happens more than you think. When that happens, it’s good to have a computer “geek” on your side to address the problem.
So, what do you think? Have you encountered data conversion issues in your cases? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.