Review Archives

Quality Control, Making Sure the Numbers Add Up: eDiscovery Best Practices

July 20, 2015

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

Friday, we wrote about tracking file counts from collection to production, the concept of expanded file counts, and the categorization of files during processing. Today, let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We discussed the typical categories of excluded files after processing – obviously, what’s not excluded is available for searching and review. Even if your approach includes technology assisted review (TAR) as part of your methodology, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged. Privileged documents are also often classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log. Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail based example. We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case. The relevant time period for the case is January 1, 2013 through December 31, 2014. Other than date range, we plan to do no other filtering of files during processing. Identified duplicates will not be reviewed or produced. We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged. Here’s what this collection might look like:

Collected Files: After expansion and processing, 7 PST files expand to 101,852 eMails and attachments.
Filtered Files: Filtering eMails outside of the relevant date range eliminates 23,564
Remaining Files after Filtering: After filtering, there are 78,288 files to be processed.
NIST/System Files: eMail collections typically don’t have NIST or system files, so we’ll assume zero (0) files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
Exception Files: Let’s assume that a little less than 1% of the collection (912) is exception files like password protected, corrupted or empty files.
Duplicate Files: It’s fairly common for approximately 30% or more of the collection to include duplicates, so we’ll assume 24,215 files here.
Remaining Files after Processing: We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
Files Culled During Searching: If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618.
Remaining Files for Review: After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
Files Tagged as Non-Responsive: If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such.
Remaining Files Tagged as Responsive: After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
Responsive Files Tagged as Privileged: If roughly 8% of the responsive documents are determined to be privileged during review, that would be 842 privileged documents.
Produced Files: After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them. The key is to note the numbers in red above. Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals. What happens if you add the counts in red together? You should get 101,852 – the number of collected files after expanding the PST files. As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery. That’s the way it should be. If not, investigation is required to determine where files were missed.

So, what do you think? Do you have a plan for accounting for all collected files during discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control By The Numbers: eDiscovery Best Practices

July 17, 2015

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

A while back, we wrote about Quality Assurance (QA) and Quality Control (QC) in the eDiscovery process. Both are important in improving the quality of work product and making the eDiscovery process more defensible overall. With regard to QC, an overall QC mechanism is tracking of document counts through the discovery process, especially from collection to production, to identify how every collected file was handled and why each non-produced document was not produced.

Expanded File Counts

Scanned counts of files collected are not the same as expanded file counts. There are certain container file types, like Outlook PST files and ZIP archives that exist essentially to store a collection of other files. So, the count that is important to track is the “expanded” file count after processing, which includes all of the files contained within the container files. So, in a simple scenario where you collect Outlook PST files from seven custodians, the actual number of documents (emails and attachments) within those PST files could be in the tens of thousands. That’s the starting count that matters if your goal is to account for every document or file in the discovery process.

Categorization of Files During Processing

Of course, not every document gets reviewed or even included in the search process. During processing, files are usually categorized, with some categories of files usually being set aside and excluded from review. Here are some typical categories of excluded files in most collections:

Filtered Files: Some files may be collected, and then filtered during processing. A common filter for the file collection is the relevant date range of the case. If you’re collecting custodians’ source PST files, those may include messages outside the relevant date range; if so, those messages may need to be filtered out of the review set. Files may also be filtered based on type of file or other reasons for exclusion.
NIST and System Files: Many file collections also contain system files, like executable files (EXEs) or Dynamic Link Library (DLLs) that are part of the software on a computer which do not contain client data, so those are typically excluded from the review set. NIST files are included on the National Institute of Standards and Technology list of files that are known to have no evidentiary value, so any files in the collection matching those on the list are “De-NISTed”.
Exception Files: These are files that cannot be processed or indexed, for whatever reason. For example, they may be password-protected or corrupted. Just because these files cannot be processed doesn’t mean they can be ignored, depending on your agreement with opposing counsel, you may need to at least provide a list of them on an exception log to prove they were addressed, if not attempt to repair them or make them accessible (BTW, it’s good to establish that agreement for disposition of exception files up front).
Duplicate Files: During processing, files that are exact duplicates may be put aside to avoid redundant review (and potential inconsistencies). Some exact duplicates are typically identified based on the HASH value, which is a digital fingerprint generated based on the content and format of the file – if two files have the same HASH value, they have the same exact content and format. Emails (and their attachments) may be identified as duplicates based on key metadata fields, so an attachment cannot be “de-duped” out of the collection by a standalone copy of the same file.

All of these categories of excluded files can reduce the set of files to actually be searched and reviewed. On Monday, we’ll illustrate an example of a file set from collection to production to illustrate how each file is accounted for during the discovery process.

So, what do you think? Do you have a plan for accounting for all collected files during discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

This Study Discusses the Benefits of Including Metadata in Machine Learning for TAR: eDiscovery Trends

July 16, 2015

A month ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month and we covered one of those papers a couple of weeks later. Today, let’s cover another paper from the study.

The Role of Metadata in Machine Learning for Technology Assisted Review (by Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler) attempts to study the role of metadata in machine learning for technology assisted review (TAR), particularly with respect to the algorithm development process.

Let’s face it, we all generally agree that metadata is a critical component of ESI for eDiscovery. But, opinions are mixed as to its value in the TAR process. For example, the Grossman-Cormack Glossary of Technology Assisted Review (which we covered here in 2012) includes metadata as one of the “typical” identified features of a document that are used as input to a machine learning algorithm. However, a couple of eDiscovery software vendors have both produced documentation stating that “machine learning systems typically rely upon extracted text only and that experts engaged in providing document assessments for training should, therefore, avoid considering metadata values in making responsiveness calls”.

So, the authors decided to conduct a study that established the potential benefit of incorporating metadata into TAR algorithm development processes, as well as evaluate the benefits of using extended metadata and also using the field origins of that metadata. Extended metadata fields included Primary Custodian, Record Type, Attachment Name, Bates Start, Company/Organization, Native File Size, Parent Date and Family Count, to name a few. They evaluated three distinct data sets (one drawn from Topic 301 of the TREC 2010 Interactive Task, two other proprietary business data sets) and generated a random sample of 4,500 individual documents for each (split into a 3,000 document Control Set and a 1,500 document Training Set).

The metric they used throughout to compare model performance is Area Under the Receiver Operating Characteristic Curve (AUROC). Say what? According to the report, the metric indicates the probability that a given model will assign a higher ranking to a randomly selected responsive document than a randomly selected non-responsive document.

As indicated by the graphic above, their findings were that incorporating metadata as an integral component of machine learning processes for TAR improved results (based on the AUROC metric). Particularly, models incorporating Extended metadata significantly outperformed models based on body text alone in each condition for every data set. While there’s still a lot to learn about the use of metadata in modeling for TAR, it’s an interesting study and start to the discussion.

A copy of the twelve page study (including Bibliography and Appendix) is available here. There is also a link to the PowerPoint presentation file from the workshop, which is a condensed way to look at the study, if desired.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s One Study That Shows Potential Savings from Technology Assisted Review: eDiscovery Trends

June 30, 2015

A couple of weeks ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month. Today, let’s cover one of those papers.

The Case for Technology Assisted Review and Statistical Sampling in Discovery (by Christopher H Paskach, F. Eli Nelson and Matthew Schwab) aims to show how Technology Assisted Review (TAR) and Statistical Sampling can significantly reduce risk and improve productivity in eDiscovery processes. The easy to read 6 page report concludes with the observation that, with measures like statistical sampling, “attorney stakeholders can make informed decisions about the reliability and accuracy of the review process, thus quantifying actual risk of error and using that measurement to maximize the value of expensive manual review. Law firms that adopt these techniques are demonstrably faster, more informed and productive than firms who rely solely on attorney reviewers who eschew TAR or statistical sampling.”

The report begins by giving an introduction which includes a history of eDiscovery, starting with printing documents, “Bates” stamping them, scanning and using Optical Character Recognition (OCR) programs to capture text for searching. As the report notes, “Today we would laugh at such processes, but in a profession based on ‘stare decisis,’ changing processes takes time.” Of course, as we know now, “studies have concluded that machine learning techniques can outperform manual document review by lawyers”. The report also references key cases such as DaSilva Moore, Kleen Products and Global Aerospace, demonstrating with the first few of many cases to approve the use of technology assisted review for eDiscovery.

Probably the most interesting portion of the report is the section titled Cost Impact of TAR, which illustrates a case scenario that compares the cost of TAR to the cost of manual review. On a strictly relevance based review of 90,000 documents (after keyword filtering, which implies a multimodal approach to TAR), the TAR approach was over $57,000 less expensive ($136,225 vs. $193,500 for manual review). The report illustrates the comparison with both a numbers spreadsheet and a pie chart comparison of costs, based on the assumptions provided. Sounds like the basis for a budgeting tool!

Anyway, the report goes on to discuss the benefits of statistical sampling to validate the results, demonstrating that the only way to attempt to do so in a manual review scenario is to review the documents multiple times, which is prone to human error and inconsistent assessments of responsiveness. The report then covers necessary process changes to realize the benefits of TAR and statistical sampling and concludes with the declaration that:

“Companies and law firms that take advantage of the rapid advances in TAR will be able to keep eDiscovery review costs down and reduce the investment in discovery by getting to the relevant facts faster. Those firms who stick with unassisted manual review processes will likely be left behind.”

The report is a quick, easy read and can be viewed here.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

DESI Got Your Input, and Here It Is: eDiscovery Trends

June 16, 2015

Back in January, we discussed the Discovery of Electronically Stored Information (DESI, not to be confused with Desi Arnaz, pictured above) workshop and its call for papers describing research or practice for the DESI VI workshop that was held last week at the University of San Diego as part of the 15th International Conference on Artificial Intelligence & Law (ICAIL 2015). Now, links to those papers are available on their web site.

The DESI VI workshop aims to bring together researchers and practitioners to explore innovation and the development of best practices for application of search, classification, language processing, data management, visualization, and related techniques to institutional and organizational records in eDiscovery, information governance, public records access, and other legal settings. Ideally, the aim of the DESI workshop series has been to foster a continuing dialogue leading to the adoption of further best practice guidelines or standards in using machine learning, most notably in the eDiscovery space. Organizing committee members include Jason R. Baron of Drinker Biddle & Reath LLP and Douglas W. Oard of the University of Maryland.

The workshop included keynote addresses by Bennett Borden and Jeremy Pickens, a session regarding Topics in Information Governance moderated by Jason R. Baron, presentations of some of the “refereed” papers and other moderated discussions. Sounds like a very informative day!

As for the papers themselves, here is a list from the site with links to each paper:

Refereed Papers

William C. Dimm, Information Retrieval Performance Measurement Using Extrapolated Precision
Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler, The Role of Metadata in Machine Learning for Technology Assisted Review
David J. Marcos, How a Bill Becomes a Bit: Engineering Compliance
T. Oehrle and E. A. Johnson, Statistical Context Analysis and Search Quality
Jeremy Pickens, An Exploratory Analysis of Control Sets for Measuring E-Discovery Progress
James A. Sherer, Jenny Le and Amie Taal, Big Data Discovery, Privacy, and the Application of Differential Privacy Mechanisms
David van Dijk, David Graus, Zhaochun Ren, Hans Henseler and Maarten de Rijke, Who is Involved? Semantic Search for E-Discovery

Position Papers

Thomas I. Barnett, Away with Words: The Myths and Misnomers of Conventional Search Strategies and the Search for Meaning in eDiscovery
Christopher H Paskach, F. Eli Nelson and Matthew Schwab, The Case for Technology Assisted Review and Statistical Sampling in Discovery
Sandra Serkes, The Larger Picture: Moving Beyond Predictive Coding for Document Productions to Predictive Analytics for Information Governance
Yasu Robert Wasem Yoshii, The State of IG in Japan and an Unexplored Approach to Opening Up the Conservative Corporations

If you’re interested in discovery of ESI, Information Governance and artificial intelligence, these papers are for you! Kudos to all of the authors who submitted them. Over the next few weeks, we plan to dive deeper into at least a few of them.

So, what do you think? Did you attend DESI VI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Want to Save Review Costs? Be the Master of Your Domain(s): eDiscovery Best Practices

June 4, 2015

Yesterday, we discussed how some BigLaw firms mark-up reviewer billing rates two to three times (or more) when billing their clients. But, even if that’s not the case, review is still by far the most expensive phase of eDiscovery. One way to minimize those costs is to identify documents that need little or no review and domain categorization can help in identifying those documents.

Even though the types of electronically stored information (ESI) continue to be more diverse, with social media and other sources of ESI becoming more prominent, email is still generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

Several review platforms, including (shameless plug warning!) our CloudNine™ platform (see example above using the ever so ubiquitous Enron data set), support domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection. Domain categorization provides several benefits when reviewing your collection by identifying groups of documents, such as:

Non-Responsive ESI: Let’s face it, even if we cull the collection based on search terms, certain non-responsive documents will get through. For example, if custodians have received fantasy football emails from ESPN.com or weekly business newsletters from Fortune.com and those slip through the search criteria, that can add costs to review clearly non-responsive ESI. Instead, with domain categorization, domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains (and their attachments) can be “group-tagged” as non-responsive.
Potentially Privileged ESI: If there are any emails associated with outside counsel’s domain, they could obviously represent attorney work product or attorney-client privileged communications (or both). Domain categorization is a quick way to “group-tag” them as potentially privileged, so that they can be reviewed for privilege and dealt with quickly and effectively.
Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly and get them prioritized for review.

In essence, domain categorization enables you to put groups of documents into “buckets” to either eliminate them from review entirely or to classify them for a specific workflow for review, saving time and cost during the review process. Time is money!

So, what do you think? Does your review platform provide a mechanism for domain categorization? If so, do you use it to help manage the review process and control costs? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

This Firm Marked Up Reviewer Billings Over 500 Percent and that’s Not the Worst Part: eDiscovery Trends

June 3, 2015

Remember when we asked the question whether a blended document review rate of $466 per hour is excessive? Many of you weighed in on that one and that post is still our most viewed of all time. Marking up the billing rate for reviewers over 500 percent may or may not also be unacceptable, depending on who you talk to. But, everyone agrees that billing more hours than you actually worked is a bad thing.

According to a new article by Gina Passarella in The Legal Intelligencer (Are Contract Attorney Markups Of Any Concern to Clients?), a former Drinker Biddle & Reath contract attorney received a two-year suspension last week for overbilling a client on document review. The attorney worked for the firm from 2011 through 2012, where he was paid $40 an hour and charged out to a client at $245 an hour.

If you’re whipping out your calculator, I’ll save you the trouble – that’s a 513 percent markup (rounded up).

But, that’s not why he was suspended. It turns out that the attorney logged more time into the firm’s time accounting system than he was logged into the firm’s eDiscovery system and had overbilled for the 12 months he was at the firm. Drinker Biddle terminated the attorney within days of discovering the discrepancy.

But, according to Passarella’s article, “the legal community’s reaction focused not so much on the behavior as on the lawyer’s billing rate… Some have described a 513 percent markup as ‘stratospheric’ while others have said a firm’s internal profitability is none of the client’s business as long as the client feels it is getting the perceived value from the business transaction.”

Drinker Biddle chairman Andrew C. Kassner defended the markup, citing overhead costs and said that the firm works hard to ensure value for the client and provided a lower-cost option to the client by using a contract lawyer rather than an associate.

Unlike Mark Antony (the Roman emperor, not the singer), I don’t come to bury Drinker Biddle in this article, many law firms mark review up considerably. And, as Passarella notes, “Drinker Biddle was certainly an early adopter of the value proposition espoused by the Association of Corporate Counsel and others, becoming one of the first law firms to create a chief value officer position in 2010 and forming an associate training program post-recession that didn’t charge clients for the first four months of a first-year’s time.”

However, Passarella’s article does quote three individuals who questioned the current billing model: 1) a former general counsel who, while he was in-house, “decoupled” the use of contract attorneys from outside counsel, 2) a former BigLaw attorney who became disenchanted with the large-firm business model and created his own firm which focuses on providing better value to clients, and 3) an Altman Weil consultant who questioned the $245 value for document review, noting that “if it were really important they wouldn’t be using a $40-an-hour lawyer”. Perhaps we should revisit the discussion as to whether it’s time to ditch the per hour model for document review?

As for the overbilling, Kassner said the firm paid back the client all that it was charged for the overbilled time as well as for any time the attorney charged on the matter.

So, what do you think? Is it time to ditch the per hour model for document review? Or, is marking up reviewer billing two to three times (or more) an acceptable practice? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For a Successful Outcome to Your Discovery Project, Work Backwards: eDiscovery Best Practices

May 22, 2015

Based on a recent experience with a client, it seemed appropriate to revisit this topic. Plus, it’s always fun to play with the EDRM model. Notice anything different? 🙂

While the Electronic Discovery Reference Model from EDRM has become the standard model for the workflow of the process for handling electronically stored information (ESI) in discovery, it might be helpful to think about the EDRM model and work backwards, whether you’re the producing party or the receiving party.

Why work backwards?

You can’t have a successful outcome without envisioning the successful outcome that you want to achieve. The end of the discovery process includes the production and presentation stages, so it’s important to determine what you want to get out of those stages. Let’s look at them.

Presentation

Whether you’re a receiving party or a producing party, it’s important to think about what types of evidence you need to support your case when presenting at depositions and at trial – this is the type of information that needs to be included in your production requests at the beginning of the case as well as the type of information that you’ll need to preserve as a producing party.

Production

The format of the ESI produced is important to both sides in the case. For the receiving party, it’s important to get as much useful information included in the production as possible. This includes metadata and searchable text for the produced documents, typically with an index or load file to facilitate loading into a review application. The most useful form of production is native format files with all metadata preserved as used in the normal course of business.

For the producing party, it’s important to be efficient and minimize costs, so it’s important to agree to a production format that minimizes production costs. Converting files to an image based format (such as TIFF) adds costs, so producing in native format can be cost effective for the producing party as well. It’s also important to determine how to handle issues such as privilege logs and redaction of privileged or confidential information.

Addressing production format issues up front will maximize cost savings and enable each party to get what they want out of the production of ESI. If you don’t, you could be arguing in court like our case participants from yesterday’s post.

Processing-Review-Analysis

It also pays to make decisions early in the process that affect processing, review and analysis. How should exception files be handled? What do you do about files that are infected with malware? These are examples of issues that need to be decided up front to determine how processing will be handled.

As for review, the review tool being used may impact how quick and easy it is to get started, to load data and to use the tool, among other considerations. If it’s Friday at 5 and you have to review data over the weekend, is it easy to get started? As for analysis, surely you test search terms to determine their effectiveness before you agree on those terms with opposing counsel, right?

Preservation-Collection-Identification

Long before you have to conduct preservation and collection for a case, you need to establish procedures for implementing and monitoring litigation holds, as well as prepare a data map to identify where corporate information is stored for identification, preservation and collection purposes.

And, before a case even begins, you need an effective Information Governance program to minimize the amount of data that you might have to consider for responsiveness in the first place.

As you can see, at the beginning of a case (and even before), it’s important to think backwards within the EDRM model to ensure a successful discovery process. Decisions made at the beginning of the case affect the success of those latter stages, so working backwards can help ensure a successful outcome!

So, what do you think? What do you do at the beginning of a case to ensure success at the end? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For Better Document Review, You Need to Approach a ZEN State: eDiscovery Best Practices

May 6, 2015

Among the many definitions of the word “zen”, the Urban Dictionary provides perhaps the most appropriate (non-religious) definition of the word, as follows: a total state of focus that incorporates a total togetherness of body and mind. However, when it comes to document review, a new web site by eDiscovery thought leader Ralph Losey may change your way of thinking about the word “ZEN”.

Ralph’s new site, ZEN Document Review, introduces ‘ZEN’ as an acronym: Zero Error Numerics. As stated on the site, “ZEN document review is designed to attain the highest possible level of efficiency and quality in computer assisted review. The goal is zero error. The methods to attain that goal include active machine learning, random sampling, objective measurements, and comparative analysis using simple, repeatable systems.”

The ZEN methods were developed by Ralph Losey’s e-Discovery Team (many of which are documented on his excellent e-Discovery Team® blog). They rely on focused attention and full clear communication between review team members.

In the intro video on his site, Ralph acknowledges that it’s impossible to have zero error in any large, complex project, but “with the help of the latest tools and using the right mindset, we can come pretty damn close”. One of the graphics on the site represents an “upside down champagne glass” that illustrates 99.9% probable relevant identified correctly during the review process at the top of the graph and 00.1% probable relevant identified incorrectly at the bottom of the graph.

The ZEN approach includes everything from “predictive coding analytics, a type of artificial intelligence, actively managed by skilled human analysts in a hybrid approach” to “quiet, uninterrupted, single-minded focus” where “dual tasking during review is prohibited” to “judgmental and random sampling and analysis such as i-Recall” and even high ethics, with the goal being to “find and disclose the truth in compliance with local laws, not win a particular case”. And thirteen other factors, as well. Hey, nobody said that attaining ZEN is easy!

Attaining zero error in document review is a lofty goal – I admire Ralph for setting the bar high. Using the right tools, methods and attitude, can we come “pretty damn close”? What do you think? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Managing Email Signature Logos During Review: eDiscovery Best Practices

April 8, 2015

Yesterday, we discussed how corporate logo graphic files in email signatures can add complexity when managing those emails in eDiscovery, as these logos, repeated over and over again, can add up to a significant percentage of your collection on a file count basis. Today, we are going to discuss a couple of ways that I have worked with clients to manage those files during the review process.

These corporate logos cause several eDiscovery complications such as slowing page refreshes in review tools and wasting reviewer time and making review even more tedious. I’ll focus on those particular issues below.

It should be noted that, as VP of Professional Services at CloudNine, my (recent) experience in assisting clients has primarily been using CloudNine’s review platform, so, with all due respect to those “technically astute vendor colleagues” that Craig Ball referred to in his excellent post last week, I’ll be discussing how I have handled the situation with logos in Outlook emails at CloudNine (shameless plug warning!).

Processing Embedded Graphics within Emails

I think it’s safe to say as a general rule, when it comes to processing of Outlook format emails (whether those originated from EDB, OST, PST or MSG files), most eDiscovery processing applications (including LAW and CloudNine’s processing application, Discovery Client) treat embedded graphic files as attachments to the email and those are loaded into most review platforms as attachments linked to the parent email. So, a “family” that consists of an email with two attached PDF files and a corporate logo graphic file would actually have four “family” members with the corporate logo graphic file (assuming that there is just one) as one of the four “family” members.

This basically adds an extra “document” to each email with a logo that is included in the review population (more than one per email if there are additional logo graphics for links to the organization’s social media sites). These files don’t require any thought during review, but they still have to be clicked through and marked as reviewed during a manual review process. This adds time and tedium to an already tedious process. If those files could be excluded from the review population, reviewers could focus on more substantive files in the collection.

In Discovery Client, an MD5 hash value is computed for each individual file, including each email attachment (including embedded graphics). So, if the same GIF file is used over and over again for a corporate logo, it would have the same MD5 hash value in each case. CloudNine provides a Quick Search function that enables you to retrieve all documents in the collection with the same value as the current document. So, if you’re currently viewing a corporate logo file, it’s easy to retrieve all documents with the same MD5 hash value, apply a tag to those documents and then use the tag to exclude them from review. I’ve worked with clients to do this before to enable them to shorten the review process while establishing more reliable metrics for the remaining documents being reviewed.

It should be noted that doing so doesn’t preclude you from assigning responsiveness settings from the rest of the “family” to the corporate logo later if you plan to produce those corporate logos as separate attachments to opposing counsel.

Viewing Emails with Embedded Logos

Embedded logos and other graphics files can slow down the retrieval of emails for viewing in some document viewers, depending on how they render those graphics. By default, Outlook emails are already formatted in HTML and CloudNine provides an HTML view option that enables the user to view the email without the embedded graphics. As a result, the email retrieves more quickly, so, in many cases, where the graphics don’t add value, the HTML view option will speed up the review process (users can still view the full native file with embedded graphics as needed). In working with clients, I’ve recommended the HTML view tab as the default view in CloudNine as a way of speeding retrieval of files for review, which helps speed up the overall review process.

So, what do you think? Do you find that corporate logo graphics files are adding complexity to your own eDiscovery processes? If so, how do you address the issue? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review