Processing Archives

Are You Scared Yet? – eDiscovery Horrors!

October 31, 2013

Today is Halloween. Every year at this time, because (after all) we’re an eDiscovery blog, we try to “scare” you with tales of eDiscovery horrors. So, I have one question: Are you scared yet?

Did you know that there has been over 3.4 sextillion bytes created in the Digital Universe since the beginning of the year, and data in the world will grow nearly three times as much from 2012 to 2017? How do you handle your own growing universe of data?

What about this?

The proposed blended hourly rate was $402 for firm associates and $632 for firm partners. However, the firm asked for contract attorney hourly rates as high as $550 with a blended rate of $466.

How about this?

You’ve got an employee suing her ex-employer for discrimination, hostile work environment and being forced to resign. During discovery, it was determined that a key email was deleted due to the employer’s routine auto-delete policy, so the plaintiff filed a motion for sanctions. Sound familiar? Yep. Was her motion granted? Nope.

Or maybe this?

After identifying custodians relevant to the case and collecting files from each, you’ve collected roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose electronic files from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!!

Scary, huh? If the possibility of exponential data growth, vendors holding data hostage and billable review rates of $466 per hour keep you awake at night, then the folks at eDiscovery Daily will do our best to provide useful information and best practices to enable you to relax and sleep soundly, even on Halloween!

Then again, if the expense, difficulty and risk of processing and loading up to 100 GB of data into an eDiscovery review application that you’ve never used before terrifies you, maybe you should check this out.

Of course, if you seriously want to get into the spirit of Halloween, click here. This will really terrify you!

What do you think? Is there a particular eDiscovery issue that scares you? Please share your comments and let us know if you’d like more information on a particular topic.

Happy Halloween!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiffs’ Supreme Effort to Recuse Judge Peck in Da Silva Moore Denied – eDiscovery Case Law

October 30, 2013

As we discussed back in July, attorneys representing lead plaintiff Monique Da Silva Moore and five other employees filed a petition for a writ of certiorari with the US Supreme Court arguing that New York Magistrate Judge Andrew Peck, who approved an eDiscovery protocol agreed to by the parties that included predictive coding technology, should have recused himself given his previous public statements expressing strong support of predictive coding. Earlier this month, on October 7, that petition was denied by the Supreme Court.

Da Silva Moore and her co-plaintiffs had argued in the petition that the Second Circuit Court of Appeals was too deferential to Peck when denying the plaintiff’s petition to recuse him, asking the Supreme Court to order the Second Circuit to use the less deferential “de novo” standard.

The plaintiffs have now been denied in their recusal efforts in four courts. Here is the link to the Supreme Court docket item, referencing denial of the petition.

This battle over predictive coding and Judge Peck’s participation has continued for over 18 months. For those who may have not been following the case or may be new to the blog, here’s a recap.

Last year, back in February, Judge Peck issued an opinion making this case likely the first case to accept the use of computer-assisted review of electronically stored information (“ESI”) for this case. However, on March 13, District Court Judge Andrew L. Carter, Jr. granted the plaintiffs’ request to submit additional briefing on their February 22 objections to the ruling. In that briefing (filed on March 26), the plaintiffs claimed that the protocol approved for predictive coding “risks failing to capture a staggering 65% of the relevant documents in this case” and questioned Judge Peck’s relationship with defense counsel and with the selected vendor for the case, Recommind.

Then, on April 5, 2012, Judge Peck issued an order in response to Plaintiffs’ letter requesting his recusal, directing plaintiffs to indicate whether they would file a formal motion for recusal or ask the Court to consider the letter as the motion. On April 13, (Friday the 13th, that is), the plaintiffs did just that, by formally requesting the recusal of Judge Peck (the defendants issued a response in opposition on April 30). But, on April 25, Judge Carter issued an opinion and order in the case, upholding Judge Peck’s opinion approving computer-assisted review.

Not done, the plaintiffs filed an objection on May 9 to Judge Peck’s rejection of their request to stay discovery pending the resolution of outstanding motions and objections (including the recusal motion, which has yet to be ruled on. Then, on May 14, Judge Peck issued a stay, stopping defendant MSLGroup’s production of electronically stored information. On June 15, in a 56 page opinion and order, Judge Peck denied the plaintiffs’ motion for recusal. Judge Carter ruled on the plaintiff’s recusal request on November 7 of last year, denying the request and stating that “Judge Peck’s decision accepting computer-assisted review … was not influenced by bias, nor did it create any appearance of bias”.

The plaintiffs then filed a petition for a writ of mandamus with the Second Circuit of the US Court of Appeals, which was denied this past April, leading to their petition for a writ of certiorari with the US Supreme Court, which has now also been denied.

So, what do you think? Will we finally move on to the merits of the case? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For Successful Discovery, Think Backwards – eDiscovery Best Practices

October 8, 2013

The Electronic Discovery Reference Model (EDRM) has become the standard model for the workflow of the process for handling electronically stored information (ESI) in discovery. But, to succeed in discovery, regardless whether you’re the producing party or the receiving party, it might be helpful to think about the EDRM model backwards.

Why think backwards?

You can’t have a successful outcome without envisioning the successful outcome that you want to achieve. The end of the discovery process includes the production and presentation stages, so it’s important to determine what you want to get out of those stages. Let’s look at them.

Presentation

As a receiving party, it’s important to think about what types of evidence you need to support your case when presenting at depositions and at trial – this is the type of information that needs to be included in your production requests at the beginning of the case.

Production

The format of the ESI produced is important to both sides in the case. For the receiving party, it’s important to get as much useful information included in the production as possible. This includes metadata and searchable text for the produced documents, typically with an index or load file to facilitate loading into a review application. The most useful form of production is native format files with all metadata preserved as used in the normal course of business.

For the producing party, it’s important to save costs, so it’s important to agree to a production format that minimizes production costs. Converting files to an image based format (such as TIFF) adds costs, so producing in native format can be cost effective for the producing party as well. It’s also important to determine how to handle issues such as privilege logs and redaction of privileged or confidential information.

Addressing production format issues up front will maximize cost savings and enable each party to get what they want out of the production of ESI.

Processing-Review-Analysis

It also pays to determine early in the process about decisions that affect processing, review and analysis. How should exception files be handled? What do you do about files that are infected with malware? These are examples of issues that need to be decided up front to determine how processing will be handled.

As for review, the review tool being used may impact production specs in terms of how files are viewed and production of load files that are compatible with the review tool, among other considerations. As for analysis, surely you test search terms to determine their effectiveness before you agree on those terms with opposing counsel, right?

Preservation-Collection-Identification

Long before you have to conduct preservation and collection for a case, you need to establish procedures for implementing and monitoring litigation holds, as well as prepare a data map to identify where corporate information is stored for identification, preservation and collection purposes.

As you can see, at the beginning of a case (and even before), it’s important to think backwards within the EDRM model to ensure a successful discovery process. Decisions made at the beginning of the case affect the success of those latter stages, so don’t forget to think backwards!

So, what do you think? What do you do at the beginning of a case to ensure success at the end? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. — Notice anything different about the EDRM graphic?

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily is Three Years Old!

September 20, 2013

We’ve always been free, now we are three!

It’s hard to believe that it has been three years ago today since we launched the eDiscoveryDaily blog. We’re past the “terrible twos” and heading towards pre-school. Before you know it, we’ll be ready to take our driver’s test!

We have seen traffic on our site (from our first three months of existence to our most recent three months) grow an amazing 575%! Our subscriber base has grown over 50% in the last year alone! Back in June, we hit over 200,000 visits on the site and now we have over 236,000!

We continue to appreciate the interest you’ve shown in the topics and will do our best to continue to provide interesting and useful posts about eDiscovery trends, best practices and case law. That’s what this blog is all about. And, in each post, we like to ask for you to “please share any comments you might have or if you’d like to know more about a particular topic”, so we encourage you to do so to make this blog even more useful.

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan College, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, Litigation Support Technology & News, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, Learn About E-Discovery, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!). We really appreciate it!

As many of you know by now, we like to take a look back every six months at some of the important stories and topics during that time. So, here are some posts over the last six months you may have missed. Enjoy!

Rodney Dangerfield might put it this way – “I Tell Ya, Information Governance Gets No Respect”

Is it Time to Ditch the Per Hour Model for Document Review? Here’s some food for thought.

Is it Possible for a File to be Modified Before it is Created? Maybe, but here are some mechanisms for avoiding that scenario (here, here, here, here, here and here). Best of all, they’re free.

Did you know changes to the Federal eDiscovery Rules are coming? Here’s some more information.

Count Minnesota and Kansas among the states that are also making changes to support eDiscovery.

By the way, since the Electronic Discovery Reference Model (EDRM) annual meeting back in May, several EDRM projects (Metrics, Jobs, Data Set and the new Native Files project) have already announced new deliverables and/or requested feedback.

When it comes to electronically stored information (ESI), ensuring proper chain of custody tracking is an important part of handling that ESI through the eDiscovery process.

Do you self-collect? Don’t Forget to Check for Image Only Files!

The Files are Already Electronic, How Hard Can They Be to Load? A sound process makes it easier.

When you remove a virus from your collection, does it violate your discovery agreement?

Do you think that you’ve read everything there is to read on Technology Assisted Review? If you missed anything, it’s probably here.

Consider using a “SWOT” analysis or Decision Tree for better eDiscovery planning.

If you’re an eDiscovery professional, here is what you need to know about litigation.

BTW, eDiscovery Daily has had 242 posts related to eDiscovery Case Law since the blog began! Forty-four of them have been in the last six months.

Our battle cry for next September? “Four more years!” 🙂

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data Needs to Be Converted More Often than You Think – eDiscovery Best Practices

September 3, 2013

We’ve discussed previously that electronic files aren’t necessarily ready to review just because they’re electronic. They often need processing and good processing requires a sound process. Sometimes that process includes data conversion if the data isn’t in the most useful format.

Case in point: I recently worked with a client that received a multi-part production from the other side (via a another party involved in the litigation, per agreement between the parties) that included image files, OCR text files and metadata. The files that my client received were produced over several months to several other parties in the litigation. The production contained numerous emails, each of which (of course) included an email sent date. Can you guess which format the email sent date was provided in? Here are some choices (using today’s date and 1:00 PM as an example):

09/03/2013 13:00:00
9/03/2013 1:00 PM
September 3, 2013 1:00 PM
Sep-03-2013 1:00 PM
2013/09/03 13:00:00

The answer: all of them.

Because there were several productions to different parties with (apparently) different format agreements, my client didn’t have the option to request the data to be reproduced in a standard format. Not only that, the name of the produced metadata field wasn’t consistent between productions – in about 15 percent of the documents the producing party named the field email_date_sent, in the rest it was named date_sent.

Ever try to sort emails chronologically when they’re not only in different formats, but also in two different fields? It’s impossible. Fortunately, at CloudNine Discovery, there is no shortage of computer “geeks” to address problems like this (I’m admittedly one of them).

As a result, we had to standardize the format of the dates into one standard format in one field. We used a combination of SQL queries to get the data into one field and string commands and regular expressions to manipulate dates that didn’t fit a standard SQL date format by re-parsing them into a correct date format. For example, the date 2013/09/03 was reparsed into 09/03/2013.

Getting the dates into a standard format in a single field not only enabled us to sort the emails chronologically by date sent, it also enabled us to identify (in combination with other standard email metadata fields) duplicates in the collection based on metadata fields (since the data was in image and OCR formats, HASH algorithms weren’t a viable option for de-duplication).

Over the years, I’ve seen many examples where data (either from our side or the other side) needs to be converted. It happens more than you think. When that happens, it’s good to have a computer “geek” on your side to address the problem.

So, what do you think? Have you encountered data conversion issues in your cases? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

How Big is Your ESI Collection, Really? – eDiscovery Best Practices

August 26, 2013

When I was at ILTA last week, this topic came up in a discussion with a colleague during the show, so I thought it would be good to revisit here.

After identifying custodians relevant to the case and collecting files from each, you’ve collected roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose electronic files from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! Are they trying to overbill you?

Yes and no.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files. For example, while Outlook emails can be stored in different file formats, they are typically collected from each custodian and saved in a personal storage (.PST) file format, which is an expanding container file. The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space? The PST file is like one of those vacuum bags – it often stores the emails and attachments in a compressed format to save space. There are other types of archive container files that compress the contents – .ZIP and .RAR files are two examples of compressed container files. These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, often in email. With email comprising a major portion of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.

When PST, ZIP, RAR or other compressed file formats are processed for loading into a review tool, they are expanded into their normal size. This expanded size can be 1.5 to 2 times larger than the scanned size (or more). And, that’s what some vendors will bill processing on – the expanded size. In those cases, you won’t know what the processing costs will be until the data is expanded since it’s difficult to determine until processing is complete.

It’s important to be prepared for that and know your options when processing that data. Make sure your vendor selection criteria includes questions about how processing is billed, on the scanned or expanded size. Some vendors (like the company I work for, CloudNine Discovery), do bill based on the scanned size of the collection for processing, so shop around to make sure you’re getting the best deal from your vendor.

So, what do you think? Have you ever been surprised by processing costs of your ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

A Technical Explanation of Near-Dupes – eDiscovery Tutorial

August 9, 2013

Bill Dimm provides a comprehensive and interesting description of near-dupes and the algorithms used to identify them in his Clustify blog (What is a near-dupe, really?). If you want to understand the “three reasonable, but different, ways of defining the near-dupe similarity between two documents”, bring your brain and check it out.

As we discussed last month, just because information volume in most organizations doubles every 18-24 months doesn’t mean that it’s all original. When reviewers are reviewing the same data again and again, it’s unnecessarily expensive and prone to mistakes.

As Bill notes in his post, “Near-duplicates are documents that are nearly, but not exactly, the same. They could be different revisions of a memo where a few typos were fixed or a few sentences were added. They could be an original email and a reply that quotes the original and adds a few sentences. They could be a Microsoft Word document and a printout of the same document that was scanned and OCRed with a few words not matching due to OCR errors.” I also classify examples such as a Word document published to an Adobe PDF file (where the content is the same, but the file format is different, so the hash value will be different) as near-duplicates because they won’t be de-duped with an MD5 or SHA-1 hash algorithm at the file level. You need an algorithm that looks for similarity in the document content.

Identifying near-duplicates that contain almost the same information reduces redundant review and saves costs. A recent client of mine had over 800,000 emails belonging to near-duplicate groupings that would have been impossible to identify without an effective algorithm to group them together.

Bill’s blog post goes on to discuss different methods for measuring similarity using mechanisms like a Jaccard index and a MinHash algorithm which counts shingles (don’t worry, they’re neither painful nor scaly). Understanding how your near-dupe software works is important. As Bill notes, “If misunderstandings about how the algorithm works cause the similarity values generated by the software to be higher than you expected when you chose the similarity threshold, you risk tagging near-dupes of non-responsive documents incorrectly (grouped documents are not as similar as you expected). If the similarity values are lower than you expected when you chose the threshold, you risk failing to group some highly similar documents together, which leads to less efficient review (extra groups to review).” His post is an excellent primer to developing that understanding.

So, what do you think? Do you have a plan for handling near-duplicates in your collection? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

I Removed a Virus, Did I Just Violate My Discovery Agreement? – eDiscovery Best Practices

August 8, 2013

As we discussed last month, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse, they can represent a whole collection of issues to address in order to process them for loading, and processing them effectively requires a sound process. But, what if the evidentiary files you collect from your custodians contain viruses or other malware?

It’s common to refer to all types of malware as “viruses”, but a computer virus is only one type of malware. Malware includes computer viruses, worms, trojan horses, spyware, dishonest adware, scareware, crimeware, most rootkits, and other malicious and unwanted software or program. A report from 2008 stated that more malicious code and other unwanted programs was being created than legitimate software applications. If you’ve ever had to attempt to remove files from an infected computer, you’ve seen just how prolific different types of malware can be.

Having worked with a lot of clients who don’t understand why it can take time to get ESI processed and loaded into their review platform, I’ve had to spend some time educating those clients as to the various processes required (including those we discussed last month). Before any of those processes can happen, you must first scan the files for viruses and other malware that may be infecting those files. If malware is found in any files, one of two things must happen:

Attempt to remove the malware with virus protection software, or
Isolate and log the infected files as exceptions (which you will also have to do if the virus protection software fails to remove the malware).

So, let’s get started, right? Not so fast.

While it may seem logical that the malware should always be removed, doing so is technically altering the file. It’s important to address how malware should be handled as part of the Rule 26(f) “meet and confer” conference, so neither party can be accused of spoliating data when removing malware from potentially discoverable files. If both sides agree that malware removal is acceptable, there still needs to be a provision to handle files for which malware removal attempts fail (i.e., exception logs). Regardless, the malware needs to be addressed so that it doesn’t affect the entire collection.

By the way, malware can hit anybody, as I learned (the hard way) a couple of years ago.

So, what do you think? How do you handle malware in your negotiations with opposing counsel and in your ESI collections? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Good Processing Requires a Sound Process – eDiscovery Best Practices

July 26, 2013

As we discussed yesterday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading. To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production. As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing. Examples of information we collect from our clients:

Do you need de-duplication? If so, should it performed at the case or the custodian level?
Should Outlook emails be extracted in MSG or HTM format?
What time zone should we use for email extraction? Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT). If you don’t think that matters, consider this example.
Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text? If we don’t OCR those files, these could be responsive files that are missed during searching.
If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
Should the collection be culled based on a responsive date range?
Should the collection be culled based on key terms?

Those are some general examples for native processing. If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing. Some examples:

Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
Should color images be created when appropriate?
Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
Should we create images of Excel files? If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
Should we endorse the images? If so, how?

Those are just some examples. Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use. Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data. Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think? How do you handle processing of electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – eDiscovery Best Practices

July 25, 2013

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started. Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited. Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.). It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is. They don’t just represent one problem, they can represent a whole collection of problems. For example:

Image only electronic files such as TIFF or image-only PDF files may be electronic, but they still have no searchable text. They still require OCR to generate searchable text to enable them to be effectively searched. It’s important to account for image-only files when self-collecting as keyword searches will miss these files.
Outlook Emails are typically stored in a “container” file like an EDB (Exchange Database), OST (Outlook Offline Storage Table) or PST (Outlook Personal Storage Table). To work with the emails individually, they typically require processing to break them out into individual MSG (Outlook MSG Files). That processing is also necessary to break out the attachments from the emails so that they can be reviewed or categorized individually, if required. And, if the emails are stored in Lotus Notes, there is no equivalent single message format, so those emails generally require conversion to HTML format during processing.
Databases are large, structured collections of data, but they don’t relate easily to a document format, so they require some analysis to determine if, and in what form, they should be produced.
In almost every collection, there are some files that cannot be processed or searched. Corrupt files, password protected files and other types of exception files are frequent components of your ESI collection and it can become very expensive to make these files searchable or reviewable.

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward. Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom. They’re way more useful than paper documents. So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available. But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think? Have you encountered difficulties or challenges when processing electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Processing