Review Archives

What is “Reduping?” – eDiscovery Explained

October 28, 2013

We’ve talked about “reduping” before, but since this question came up with a client recently, I thought it was worth revisiting.

As emails are sent out to multiple custodians, deduplication (or “deduping”) has become a common practice to eliminate multiple copies of the same email or file from the review collection, saving considerable review costs and ensuring consistency by not having different reviewers apply different responsiveness or privilege determinations to the same file (e.g., one copy of a file designated as privileged while the other is not may cause a privileged file to slip into the production set). Deduping can be performed either across custodians in a case or within each custodian.

Everyone who works in electronic discovery knows what “deduping” is. But how many of you know what “reduping” is? Here’s the answer:

“Reduping” is the process of re-introducing duplicates back into the population for production after completing review. There are a couple of reasons why a producing party may want to “redupe” the collection after review:

Deduping Not Requested by Receiving Party: As opposing parties in many cases still don’t conduct a meet and confer or discuss specifications for production, they may not have discussed whether or not to include duplicates in the production set. In those cases, the producing party may choose to produce the duplicates, giving the receiving party more files to review and driving up their costs (yes, it still happens). The attitude of the producing party can be “hey, they didn’t specify, so we’ll give them more than they asked for.”
Receiving Party May Want to See Who Has Copies of Specific Files: Sometimes, the receiving party does request that “dupes” are identified, but only within custodians, not across them. In those cases, it’s because they want to see who had a copy of a specific email or file. However, the producing party still doesn’t want to review the duplicates (because of increasing costs and the possibility of inconsistent designations), so they review a deduped collection and then redupe after review is complete.

As a receiving party, you’ll want to specifically address how dupes should be handled during production to ensure that you don’t receive duplicate files that provide no value.

Many review applications support the capability for reduping. For example, CloudNine Discovery‘s review tool (shameless plug warning!) OnDemand®, enables duplicates to be suppressed from review, but then enables the same tags to be applied to the duplicates of any files tagged during review. When it’s time to export documents for production, the user can decide at that time whether or not to export the dupes as part of that production.

So, what do you think? Do any of your cases include “reduping” as part of production? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For Successful Discovery, Think Backwards – eDiscovery Best Practices

October 8, 2013

The Electronic Discovery Reference Model (EDRM) has become the standard model for the workflow of the process for handling electronically stored information (ESI) in discovery. But, to succeed in discovery, regardless whether you’re the producing party or the receiving party, it might be helpful to think about the EDRM model backwards.

Why think backwards?

You can’t have a successful outcome without envisioning the successful outcome that you want to achieve. The end of the discovery process includes the production and presentation stages, so it’s important to determine what you want to get out of those stages. Let’s look at them.

Presentation

As a receiving party, it’s important to think about what types of evidence you need to support your case when presenting at depositions and at trial – this is the type of information that needs to be included in your production requests at the beginning of the case.

Production

The format of the ESI produced is important to both sides in the case. For the receiving party, it’s important to get as much useful information included in the production as possible. This includes metadata and searchable text for the produced documents, typically with an index or load file to facilitate loading into a review application. The most useful form of production is native format files with all metadata preserved as used in the normal course of business.

For the producing party, it’s important to save costs, so it’s important to agree to a production format that minimizes production costs. Converting files to an image based format (such as TIFF) adds costs, so producing in native format can be cost effective for the producing party as well. It’s also important to determine how to handle issues such as privilege logs and redaction of privileged or confidential information.

Addressing production format issues up front will maximize cost savings and enable each party to get what they want out of the production of ESI.

Processing-Review-Analysis

It also pays to determine early in the process about decisions that affect processing, review and analysis. How should exception files be handled? What do you do about files that are infected with malware? These are examples of issues that need to be decided up front to determine how processing will be handled.

As for review, the review tool being used may impact production specs in terms of how files are viewed and production of load files that are compatible with the review tool, among other considerations. As for analysis, surely you test search terms to determine their effectiveness before you agree on those terms with opposing counsel, right?

Preservation-Collection-Identification

Long before you have to conduct preservation and collection for a case, you need to establish procedures for implementing and monitoring litigation holds, as well as prepare a data map to identify where corporate information is stored for identification, preservation and collection purposes.

As you can see, at the beginning of a case (and even before), it’s important to think backwards within the EDRM model to ensure a successful discovery process. Decisions made at the beginning of the case affect the success of those latter stages, so don’t forget to think backwards!

So, what do you think? What do you do at the beginning of a case to ensure success at the end? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. — Notice anything different about the EDRM graphic?

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Is a Blended Document Review Rate of $466 Per Hour Excessive? – eDiscovery Case Law

September 23, 2013

Remember when we raised the question as to whether it is time to ditch the per hour model for document review? One of the cases we highlighted for perceived overbilling was ruled upon last month.

In the case In re Citigroup Inc. Securities Litigation, No. 09 MD 2070 (SHS), 07 Civ. 9901 (SHS) (S.D.N.Y. Aug. 1, 2013), New York District Judge Sidney H. Stein rejected as unreasonable the plaintiffs’ lead counsel’s proffered blended rate of more than $400 for contract attorneys—more than the blended rate charged for associate attorneys—most of whom were tasked with routine document review work.

In this securities fraud matter, a class of plaintiffs claimed Citigroup understated the risks of assets backed by subprime mortgages. After the parties settled the matter for $590 million, Judge Stein had to evaluate whether the settlement was “fair, reasonable, and adequate and what a reasonable fee for plaintiffs’ attorneys should be.” The court issued a preliminary approval of the settlement and certified the class. In his opinion, Judge Stein considered the plaintiffs’ motion for final approval of the settlement and allocation and the plaintiffs’ lead counsel’s motion for attorneys’ fees and costs of $97.5 million. After approving the settlement and allocation, Judge Stein decided that the plaintiffs’ counsel was entitled to a fee award and reimbursement of expenses but in an amount less than the lead counsel proposed.

One shareholder objected to the lead counsel’s billing practices, claiming the contract attorneys’ rates were exorbitant.

Judge Stein carefully scrutinized the contract attorneys’ proposed hourly rates “not only because those rates are overstated, but also because the total proposed lodestar for contract attorneys dwarfs that of the firm associates, counsel, and partners: $28.6 million for contract attorneys compared to a combined $17 million for all other attorneys.” The proposed blended hourly rate was $402 for firm associates and $632 for firm partners. However, the firm asked for contract attorney hourly rates as high as $550 with a blended rate of $466. The plaintiff explained that these “contract attorneys performed the work of, and have the qualifications of, law firm associates and so should be billed at rates commensurate with the rates of associates of similar experience levels.” In response, the complaining shareholder suggested that a more appropriate rate for contract attorneys would be significantly lower: “no reasonable paying client would accept a rate above $100 per hour.” (emphasis added)

Judge Stein rejected the plaintiffs’ argument that the contract attorneys should be billed at rates comparable to firm attorneys, citing authority that “clients generally pay less for the work of contract attorneys than for that of firm associates”:

“There is little excuse in this day and age for delegating document review (particularly primary review or first pass review) to anyone other than extremely low-cost, low-overhead temporary employees (read, contract attorneys)—and there is absolutely no excuse for paying those temporary, low-overhead employees $40 or $50 an hour and then marking up their pay ten times for billing purposes.”

Furthermore, “[o]nly a very few of the scores of contract attorneys here participated in depositions or supervised others’ work, while the vast majority spent their time reviewing documents.” Accordingly, the court decided the appropriate rate would be $200, taking into account the attorneys’ qualifications, work performed, and market rates.

For this and other reasons, the court found the lead counsel’s proposed lodestar “significantly overstated” and made a number of reductions. The reductions included the following amounts:

$7.5 million for document review by contract attorneys that happened after the parties agreed to settle; 20 of the contract attorneys were hired on or about the day of the settlement.
$12 million for reducing the blended hourly rate of contract attorneys from $466 to $200 for 45,300 hours, particularly where the bills reflected that these attorneys performed document review—not higher-level work—all day.
10% off the “remaining balance to account for waste and inefficiency which, the Court concludes, a reasonable hypothetical client would not accept.”

As a result, the court awarded a reduced amount of $70.8 million in attorneys’ fees, or 12% of the $590 million common fund.

So, what do you think? Was the requested amount excessive? Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required). For eDiscovery news and best practices, check out the Applied Discovery Blog here.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily is Three Years Old!

September 20, 2013

We’ve always been free, now we are three!

It’s hard to believe that it has been three years ago today since we launched the eDiscoveryDaily blog. We’re past the “terrible twos” and heading towards pre-school. Before you know it, we’ll be ready to take our driver’s test!

We have seen traffic on our site (from our first three months of existence to our most recent three months) grow an amazing 575%! Our subscriber base has grown over 50% in the last year alone! Back in June, we hit over 200,000 visits on the site and now we have over 236,000!

We continue to appreciate the interest you’ve shown in the topics and will do our best to continue to provide interesting and useful posts about eDiscovery trends, best practices and case law. That’s what this blog is all about. And, in each post, we like to ask for you to “please share any comments you might have or if you’d like to know more about a particular topic”, so we encourage you to do so to make this blog even more useful.

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan College, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, Litigation Support Technology & News, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, Learn About E-Discovery, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!). We really appreciate it!

As many of you know by now, we like to take a look back every six months at some of the important stories and topics during that time. So, here are some posts over the last six months you may have missed. Enjoy!

Rodney Dangerfield might put it this way – “I Tell Ya, Information Governance Gets No Respect”

Is it Time to Ditch the Per Hour Model for Document Review? Here’s some food for thought.

Is it Possible for a File to be Modified Before it is Created? Maybe, but here are some mechanisms for avoiding that scenario (here, here, here, here, here and here). Best of all, they’re free.

Did you know changes to the Federal eDiscovery Rules are coming? Here’s some more information.

Count Minnesota and Kansas among the states that are also making changes to support eDiscovery.

By the way, since the Electronic Discovery Reference Model (EDRM) annual meeting back in May, several EDRM projects (Metrics, Jobs, Data Set and the new Native Files project) have already announced new deliverables and/or requested feedback.

When it comes to electronically stored information (ESI), ensuring proper chain of custody tracking is an important part of handling that ESI through the eDiscovery process.

Do you self-collect? Don’t Forget to Check for Image Only Files!

The Files are Already Electronic, How Hard Can They Be to Load? A sound process makes it easier.

When you remove a virus from your collection, does it violate your discovery agreement?

Do you think that you’ve read everything there is to read on Technology Assisted Review? If you missed anything, it’s probably here.

Consider using a “SWOT” analysis or Decision Tree for better eDiscovery planning.

If you’re an eDiscovery professional, here is what you need to know about litigation.

BTW, eDiscovery Daily has had 242 posts related to eDiscovery Case Law since the blog began! Forty-four of them have been in the last six months.

Our battle cry for next September? “Four more years!” 🙂

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Everything You Wanted to Know about Technology Assisted Review – eDiscovery Trends

August 29, 2013

Whether you were “afraid to ask” or not…

Rob Robinson has put together another terrific compilation, this time a compilation of articles about Technology Assisted Review and Predictive Coding over the past 1 1/2 years (from February 2012, last updated on August 12). If you simply can’t get enough of the topic, you’ll want to check it out.

His compilation can be found at his Complex Discovery web site here (the title of the page is Technology-Assisted Review: From Expert Explanations to Mainstream Mentions). According to my count, there are 632(!) articles regarding the topic. Happy reading!

Of course, eDiscovery Daily made its fair share of contributions to the list. Here are our posts regarding the topic on the site, in case you missed them and want to catch up:

Here are a few others that aren’t listed – just sayin’ Rob! 😉:

Thanks to Rob, once again, for providing a very useful compilation on a very important eDiscovery topic. And, Rob, if you want to add links for the additional posts above, we won’t complain. 🙂

So, what do you think? Do you keep up with articles about technology assisted review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

How Big is Your ESI Collection, Really? – eDiscovery Best Practices

August 26, 2013

When I was at ILTA last week, this topic came up in a discussion with a colleague during the show, so I thought it would be good to revisit here.

After identifying custodians relevant to the case and collecting files from each, you’ve collected roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose electronic files from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! Are they trying to overbill you?

Yes and no.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files. For example, while Outlook emails can be stored in different file formats, they are typically collected from each custodian and saved in a personal storage (.PST) file format, which is an expanding container file. The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space? The PST file is like one of those vacuum bags – it often stores the emails and attachments in a compressed format to save space. There are other types of archive container files that compress the contents – .ZIP and .RAR files are two examples of compressed container files. These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, often in email. With email comprising a major portion of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.

When PST, ZIP, RAR or other compressed file formats are processed for loading into a review tool, they are expanded into their normal size. This expanded size can be 1.5 to 2 times larger than the scanned size (or more). And, that’s what some vendors will bill processing on – the expanded size. In those cases, you won’t know what the processing costs will be until the data is expanded since it’s difficult to determine until processing is complete.

It’s important to be prepared for that and know your options when processing that data. Make sure your vendor selection criteria includes questions about how processing is billed, on the scanned or expanded size. Some vendors (like the company I work for, CloudNine Discovery), do bill based on the scanned size of the collection for processing, so shop around to make sure you’re getting the best deal from your vendor.

So, what do you think? Have you ever been surprised by processing costs of your ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

A Technical Explanation of Near-Dupes – eDiscovery Tutorial

August 9, 2013

Bill Dimm provides a comprehensive and interesting description of near-dupes and the algorithms used to identify them in his Clustify blog (What is a near-dupe, really?). If you want to understand the “three reasonable, but different, ways of defining the near-dupe similarity between two documents”, bring your brain and check it out.

As we discussed last month, just because information volume in most organizations doubles every 18-24 months doesn’t mean that it’s all original. When reviewers are reviewing the same data again and again, it’s unnecessarily expensive and prone to mistakes.

As Bill notes in his post, “Near-duplicates are documents that are nearly, but not exactly, the same. They could be different revisions of a memo where a few typos were fixed or a few sentences were added. They could be an original email and a reply that quotes the original and adds a few sentences. They could be a Microsoft Word document and a printout of the same document that was scanned and OCRed with a few words not matching due to OCR errors.” I also classify examples such as a Word document published to an Adobe PDF file (where the content is the same, but the file format is different, so the hash value will be different) as near-duplicates because they won’t be de-duped with an MD5 or SHA-1 hash algorithm at the file level. You need an algorithm that looks for similarity in the document content.

Identifying near-duplicates that contain almost the same information reduces redundant review and saves costs. A recent client of mine had over 800,000 emails belonging to near-duplicate groupings that would have been impossible to identify without an effective algorithm to group them together.

Bill’s blog post goes on to discuss different methods for measuring similarity using mechanisms like a Jaccard index and a MinHash algorithm which counts shingles (don’t worry, they’re neither painful nor scaly). Understanding how your near-dupe software works is important. As Bill notes, “If misunderstandings about how the algorithm works cause the similarity values generated by the software to be higher than you expected when you chose the similarity threshold, you risk tagging near-dupes of non-responsive documents incorrectly (grouped documents are not as similar as you expected). If the similarity values are lower than you expected when you chose the threshold, you risk failing to group some highly similar documents together, which leads to less efficient review (extra groups to review).” His post is an excellent primer to developing that understanding.

So, what do you think? Do you have a plan for handling near-duplicates in your collection? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data May Be Doubling Every Couple of Years, But How Much of it is Original? – eDiscovery Best Practices

July 31, 2013

According to the Compliance, Governance and Oversight Council (CGOC), information volume in most organizations doubles every 18-24 months. However, just because it doubles doesn’t mean that it’s all original. Like a bad cover band singing Free Bird, the rendition may be unique, but the content is the same. The key is limiting review to unique content.

When reviewers are reviewing the same files again and again, it not only drives up costs unnecessarily, but it could also lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, we all know the importance of identifying exact duplicates (that contain the exact same content in the same file format) which can be identified through MD5 and SHA-1 hash values, so that they can be removed from the review population and save considerable review costs.

Identifying near duplicates that contain the same (or almost the same) information (such as a Word document published to an Adobe PDF file where the content is the same, but the file format is different, so the hash value will be different) also reduces redundant review and saves costs.

Then, there is message thread analysis. Many email messages are part of a larger discussion, sometimes just between two parties, and, other times, between a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Pulling those messages together and enabling them to be reviewed as an entire discussion can eliminate that redundant review. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about the latest misstep by Anthony Weiner).

Clustering is a process which pulls similar documents together based on content so that the duplicative information can be identified more quickly and eliminated to reduce redundancy. With clustering, you can minimize review of duplicative information within documents and emails, saving time and cost and ensuring consistency in the review. As a result, even if the data in your organization doubles every couple of years, the cost of your review shouldn’t.

So, what do you think? Does your review tool support clustering technology to pull similar content together for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Good Processing Requires a Sound Process – eDiscovery Best Practices

July 26, 2013

As we discussed yesterday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading. To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production. As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing. Examples of information we collect from our clients:

Do you need de-duplication? If so, should it performed at the case or the custodian level?
Should Outlook emails be extracted in MSG or HTM format?
What time zone should we use for email extraction? Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT). If you don’t think that matters, consider this example.
Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text? If we don’t OCR those files, these could be responsive files that are missed during searching.
If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
Should the collection be culled based on a responsive date range?
Should the collection be culled based on key terms?

Those are some general examples for native processing. If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing. Some examples:

Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
Should color images be created when appropriate?
Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
Should we create images of Excel files? If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
Should we endorse the images? If so, how?

Those are just some examples. Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use. Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data. Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think? How do you handle processing of electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – eDiscovery Best Practices

July 25, 2013

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started. Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited. Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.). It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is. They don’t just represent one problem, they can represent a whole collection of problems. For example:

Image only electronic files such as TIFF or image-only PDF files may be electronic, but they still have no searchable text. They still require OCR to generate searchable text to enable them to be effectively searched. It’s important to account for image-only files when self-collecting as keyword searches will miss these files.
Outlook Emails are typically stored in a “container” file like an EDB (Exchange Database), OST (Outlook Offline Storage Table) or PST (Outlook Personal Storage Table). To work with the emails individually, they typically require processing to break them out into individual MSG (Outlook MSG Files). That processing is also necessary to break out the attachments from the emails so that they can be reviewed or categorized individually, if required. And, if the emails are stored in Lotus Notes, there is no equivalent single message format, so those emails generally require conversion to HTML format during processing.
Databases are large, structured collections of data, but they don’t relate easily to a document format, so they require some analysis to determine if, and in what form, they should be produced.
In almost every collection, there are some files that cannot be processed or searched. Corrupt files, password protected files and other types of exception files are frequent components of your ESI collection and it can become very expensive to make these files searchable or reviewable.

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward. Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom. They’re way more useful than paper documents. So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available. But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think? Have you encountered difficulties or challenges when processing electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review