Analysis Archives

Keyword Searching Isn’t Dead, If It’s Done Correctly: eDiscovery Best Practices

August 5, 2015

In the latest post of the Advanced Discovery blog, Tom O’Connor (who is an industry thought leader and has been a thought leader interviewee on this blog several times) posed an interesting question: Is Keyword Searching Dead?

In his post, Tom recapped the discussion of a session with the same name at the recent Today’s General Counsel Institute in New York City where Tom was a co-moderator of the session along with Maura Grossman, a recognized Technology Assisted Review (TAR) expert, who was recently appointed as Special Master in the Rio Tinto case. Tom then went on to cover some of the arguments for and against keyword searching as discussed by the panelists and participants in the session, while also noting that numerous polls and client surveys show that the majority of people are NOT using TAR today. So, they must be using keyword searching, right?

Should they be? Is there still room for keyword searching in today’s eDiscovery landscape, given the advances that have been made in recent years in TAR technology?

There is, if it’s done correctly. Tom quotes Maura in the article as stating that “TAR is a process, not a product.” The same could be said for keyword searching. If the process is flawed within which the keyword searches are being performed, you could either retrieve way more documents to be reviewed than necessary and drive up eDiscovery costs or leave yourself open to challenges in the courtroom regarding your approach. Many lawyers at corporations and law firms identify search terms to be performed (and, in many cases, agree on those terms with opposing counsel) without any testing done to confirm the validity of those terms.

Way back in the first few months of this blog (over four years ago), I advocated an approach to searching that I called “STARR” – Search, Test, Analyze, Revise (if necessary) and Repeat (also, if necessary). With an effective platform (using advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and knowledge and experience of that platform and also knowledge of search best practices, you can start with a well-planned search that can be confirmed or adjusted using the “STARR” approach.

And, even when you’ve been searching databases for as long as I have (decades now), an effective process is key because you never know what you will find until you test the results. The favorite example that I have used over recent years (and walked through in this earlier post) is the example where I was doing work for a petroleum (oil) company looking for documents that related to “oil rights” and retrieved almost every published and copyrighted document in the oil company with a search of “oil AND rights”. Why? Because almost every published and copyrighted document in the oil company had the phrase “All Rights Reserved”. Testing and an iterative process eventually enabled me to find the search that offered the best balance of recall and precision.

Like TAR, keyword searching is a process, not a product. And, you can quote me on that. (-:

So, what do you think? Is keyword searching dead? And, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“Da Silva Moore Revisited” Will Be Visited by a Newly Appointed Special Master: eDiscovery Case Law

July 27, 2015

In Rio Tinto Plc v. Vale S.A., 14 Civ. 3042 (RMB)(AJP) (S.D.N.Y. Jul. 15, 2015), New York Magistrate Judge Andrew J. Peck, at the request of the defendant, entered an Order appointing Maura Grossman as a special master in this case to assist with issues concerning Technology-Assisted Review (TAR).

Back in March (as covered here on this blog), Judge Peck approved the proposed protocol for technology assisted review (TAR) presented by the parties, titling his opinion “Predictive Coding a.k.a. Computer Assisted Review a.k.a. Technology Assisted Review (TAR) — Da Silva Moore Revisited”. Alas, as some unresolved issues remained regarding the parties’ TAR-based productions, Judge Peck decided to prepare the order appointing Grossman as special master for the case. Grossman, of course, is a recognized TAR expert, who (along with Gordon Cormack) wrote Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review and also the Grossman-Cormack Glossary of Technology Assisted Review (covered on our blog here).

While noting that it has “no objection to Ms. Grossman’s qualifications”, the plaintiff issued several objections to the appointment, including:

The defendant should have agreed much earlier to appointment of a special master: Judge Peck’s response was that “The Court certainly agrees, but as the saying goes, better late than never. There still are issues regarding the parties’ TAR-based productions (including an unresolved issue raised at the most recent conference) about which Ms. Grossman’s expertise will be helpful to the parties and to the Court.”
The plaintiff stated a “fear that [Ms. Grossman’s] appointment today will only cause the parties to revisit, rehash, and reargue settled issues”: Judge Peck stated that “the Court will not allow that to happen. As I have stated before, the standard for TAR is not perfection (nor of using the best practices that Ms. Grossman might use in her own firm’s work), but rather what is reasonable and proportional under the circumstances. The same standard will be applied by the special master.”
One of the defendant’s lawyers had three conversations with Ms. Grossman about TAR issues: Judge Peck noted that one contact in connection with The Sedona Conference “should or does prevent Ms. Grossman from serving as special master”, and noted that, in the other two, the plaintiff “does not suggest that Ms. Grossman did anything improper in responding to counsel’s question, and Ms. Grossman has made clear that she sees no reason why she cannot serve as a neutral special master”, agreeing with that statement.

Judge Peck did agree with the plaintiff on allocation of the special master’s fees, stating that the defendant’s “propsal [sic] is inconsistent with this Court’s stated requirement in this case that whoever agreed to appointment of a special master would have to agree to pay, subject to the Court reallocating costs if warranted”.

So, what do you think? Was the appointment of a special master (albeit an eminently qualified one) appropriate at this stage of the case? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

This Study Discusses the Benefits of Including Metadata in Machine Learning for TAR: eDiscovery Trends

July 16, 2015

A month ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month and we covered one of those papers a couple of weeks later. Today, let’s cover another paper from the study.

The Role of Metadata in Machine Learning for Technology Assisted Review (by Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler) attempts to study the role of metadata in machine learning for technology assisted review (TAR), particularly with respect to the algorithm development process.

Let’s face it, we all generally agree that metadata is a critical component of ESI for eDiscovery. But, opinions are mixed as to its value in the TAR process. For example, the Grossman-Cormack Glossary of Technology Assisted Review (which we covered here in 2012) includes metadata as one of the “typical” identified features of a document that are used as input to a machine learning algorithm. However, a couple of eDiscovery software vendors have both produced documentation stating that “machine learning systems typically rely upon extracted text only and that experts engaged in providing document assessments for training should, therefore, avoid considering metadata values in making responsiveness calls”.

So, the authors decided to conduct a study that established the potential benefit of incorporating metadata into TAR algorithm development processes, as well as evaluate the benefits of using extended metadata and also using the field origins of that metadata. Extended metadata fields included Primary Custodian, Record Type, Attachment Name, Bates Start, Company/Organization, Native File Size, Parent Date and Family Count, to name a few. They evaluated three distinct data sets (one drawn from Topic 301 of the TREC 2010 Interactive Task, two other proprietary business data sets) and generated a random sample of 4,500 individual documents for each (split into a 3,000 document Control Set and a 1,500 document Training Set).

The metric they used throughout to compare model performance is Area Under the Receiver Operating Characteristic Curve (AUROC). Say what? According to the report, the metric indicates the probability that a given model will assign a higher ranking to a randomly selected responsive document than a randomly selected non-responsive document.

As indicated by the graphic above, their findings were that incorporating metadata as an integral component of machine learning processes for TAR improved results (based on the AUROC metric). Particularly, models incorporating Extended metadata significantly outperformed models based on body text alone in each condition for every data set. While there’s still a lot to learn about the use of metadata in modeling for TAR, it’s an interesting study and start to the discussion.

A copy of the twelve page study (including Bibliography and Appendix) is available here. There is also a link to the PowerPoint presentation file from the workshop, which is a condensed way to look at the study, if desired.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Craig Ball Explains HASH Deduplication As Only He Can: eDiscovery Best Practices

July 10, 2015

Ever wonder why some documents are identified as duplicates and others are not, even though they appear to be identical? Leave it to Craig Ball to explain it in plain terms.

In the latest post (Deduplication: Why Computers See Differences in Files that Look Alike) in his excellent Ball in your Court blog, Craig states that “Most people regard a Word document file, a PDF or TIFF image made from the document file, a printout of the file and a scan of the printout as being essentially “the same thing.” Understandably, they focus on content and pay little heed to form. But when it comes to electronically stored information, the form of the data—the structure, encoding and medium employed to store and deliver content–matters a great deal.” The end result is that two documents may look the same, but may not be considered duplicates because of their format.

Craig also references a post from “exactly” three years ago (it’s four days off Craig, just sayin’) that provides a “quick primer on deduplication” that shows the three approaches where deduplication can occur, including the most common approach of using HASH values (MD5 or SHA-1).

My favorite example of how two seemingly duplicate documents can be different is the publication of documents to Adobe Portable Document Format (PDF). As I noted in our post from (nowhere near exactly) three years ago, I “publish” marketing slicks created in Microsoft® Publisher, “publish” finalized client proposals created in Microsoft Word and “publish” presentations created in Microsoft PowerPoint to PDF format regularly (still do). With a free PDF print driver, you can conceivably create a PDF file for just about anything that you can print. Of course, scans of printed documents that were originally electronic are another way where two seemingly duplicate documents can be different.

The best part of Craig’s post is the exercise that he describes at the end of it – creating a Word document of the text of the Gettysburg Address (saved as both .DOC and .DOCX), generating a PDF file using the Save As and Print As PDF file methods and scanning the printed document to both TIFF and PDF at different resolutions. He shows the MD5HASH value and the file size of each file. Because the format of the file is different each time, the MD5HASH value is different each time. When that happens for the same content, you have what some of us call “near dupes”, which have to be analyzed based on the text content of the file.

The file size is different in almost every case too. We performed a similar test (still not exactly) three years ago (but much closer). In our test, we took one of our one page blog posts about the memorable Apple v. Samsung litigation and saved it to several different formats, including TXT, HTML, XLSX, DOCX, PDF and MSG – the sizes ranged from 10 KB all the way up to 221 KB. So, as you can see, the same content can vary widely in both HASH value and file size, depending on the file format and how it was created.

As usual, I’ve tried not to steal all of Craig’s thunder from his post, so please check out it out here.

So, what do you think? What has been your most unique deduplication challenge? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s One Study That Shows Potential Savings from Technology Assisted Review: eDiscovery Trends

June 30, 2015

A couple of weeks ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month. Today, let’s cover one of those papers.

The Case for Technology Assisted Review and Statistical Sampling in Discovery (by Christopher H Paskach, F. Eli Nelson and Matthew Schwab) aims to show how Technology Assisted Review (TAR) and Statistical Sampling can significantly reduce risk and improve productivity in eDiscovery processes. The easy to read 6 page report concludes with the observation that, with measures like statistical sampling, “attorney stakeholders can make informed decisions about the reliability and accuracy of the review process, thus quantifying actual risk of error and using that measurement to maximize the value of expensive manual review. Law firms that adopt these techniques are demonstrably faster, more informed and productive than firms who rely solely on attorney reviewers who eschew TAR or statistical sampling.”

The report begins by giving an introduction which includes a history of eDiscovery, starting with printing documents, “Bates” stamping them, scanning and using Optical Character Recognition (OCR) programs to capture text for searching. As the report notes, “Today we would laugh at such processes, but in a profession based on ‘stare decisis,’ changing processes takes time.” Of course, as we know now, “studies have concluded that machine learning techniques can outperform manual document review by lawyers”. The report also references key cases such as DaSilva Moore, Kleen Products and Global Aerospace, demonstrating with the first few of many cases to approve the use of technology assisted review for eDiscovery.

Probably the most interesting portion of the report is the section titled Cost Impact of TAR, which illustrates a case scenario that compares the cost of TAR to the cost of manual review. On a strictly relevance based review of 90,000 documents (after keyword filtering, which implies a multimodal approach to TAR), the TAR approach was over $57,000 less expensive ($136,225 vs. $193,500 for manual review). The report illustrates the comparison with both a numbers spreadsheet and a pie chart comparison of costs, based on the assumptions provided. Sounds like the basis for a budgeting tool!

Anyway, the report goes on to discuss the benefits of statistical sampling to validate the results, demonstrating that the only way to attempt to do so in a manual review scenario is to review the documents multiple times, which is prone to human error and inconsistent assessments of responsiveness. The report then covers necessary process changes to realize the benefits of TAR and statistical sampling and concludes with the declaration that:

“Companies and law firms that take advantage of the rapid advances in TAR will be able to keep eDiscovery review costs down and reduce the investment in discovery by getting to the relevant facts faster. Those firms who stick with unassisted manual review processes will likely be left behind.”

The report is a quick, easy read and can be viewed here.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

DESI Got Your Input, and Here It Is: eDiscovery Trends

June 16, 2015

Back in January, we discussed the Discovery of Electronically Stored Information (DESI, not to be confused with Desi Arnaz, pictured above) workshop and its call for papers describing research or practice for the DESI VI workshop that was held last week at the University of San Diego as part of the 15th International Conference on Artificial Intelligence & Law (ICAIL 2015). Now, links to those papers are available on their web site.

The DESI VI workshop aims to bring together researchers and practitioners to explore innovation and the development of best practices for application of search, classification, language processing, data management, visualization, and related techniques to institutional and organizational records in eDiscovery, information governance, public records access, and other legal settings. Ideally, the aim of the DESI workshop series has been to foster a continuing dialogue leading to the adoption of further best practice guidelines or standards in using machine learning, most notably in the eDiscovery space. Organizing committee members include Jason R. Baron of Drinker Biddle & Reath LLP and Douglas W. Oard of the University of Maryland.

The workshop included keynote addresses by Bennett Borden and Jeremy Pickens, a session regarding Topics in Information Governance moderated by Jason R. Baron, presentations of some of the “refereed” papers and other moderated discussions. Sounds like a very informative day!

As for the papers themselves, here is a list from the site with links to each paper:

Refereed Papers

William C. Dimm, Information Retrieval Performance Measurement Using Extrapolated Precision
Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler, The Role of Metadata in Machine Learning for Technology Assisted Review
David J. Marcos, How a Bill Becomes a Bit: Engineering Compliance
T. Oehrle and E. A. Johnson, Statistical Context Analysis and Search Quality
Jeremy Pickens, An Exploratory Analysis of Control Sets for Measuring E-Discovery Progress
James A. Sherer, Jenny Le and Amie Taal, Big Data Discovery, Privacy, and the Application of Differential Privacy Mechanisms
David van Dijk, David Graus, Zhaochun Ren, Hans Henseler and Maarten de Rijke, Who is Involved? Semantic Search for E-Discovery

Position Papers

Thomas I. Barnett, Away with Words: The Myths and Misnomers of Conventional Search Strategies and the Search for Meaning in eDiscovery
Christopher H Paskach, F. Eli Nelson and Matthew Schwab, The Case for Technology Assisted Review and Statistical Sampling in Discovery
Sandra Serkes, The Larger Picture: Moving Beyond Predictive Coding for Document Productions to Predictive Analytics for Information Governance
Yasu Robert Wasem Yoshii, The State of IG in Japan and an Unexplored Approach to Opening Up the Conservative Corporations

If you’re interested in discovery of ESI, Information Governance and artificial intelligence, these papers are for you! Kudos to all of the authors who submitted them. Over the next few weeks, we plan to dive deeper into at least a few of them.

So, what do you think? Did you attend DESI VI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Deposition of Expert to Evaluate Issues Resulting from Plaintiff’s Deletion of ESI: eDiscovery Case Law

June 2, 2015

In Procaps S.A. v. Patheon Inc., 12-24356-CIV-GOODMAN, 2014 U.S. Dist. (S.D. Fla. Apr. 24, 2015), Florida District Judge Jonathan Goodman ordered the deposition of a third-party computer forensic expert, who had previously examined the plaintiff’s computers, to be conducted in part by a Special Master that had been appointed to examine the eDiscovery and forensic issues in the case. The purpose of the ordered deposition was to help the Court decide the issues related to files deleted by the plaintiff and assist the defendant to decide whether or not to file a sanctions motion.

Case Background

Although the plaintiff filed suit in this antitrust case in December 2012, it did not implement a formal litigation hold until after February 27, 2014, when this Court ordered one to be implemented in response to the defendant’s motion. Beyond not implementing a formal hold, the plaintiff’s counsel acknowledged that its document and electronically stored information (“ESI”) search efforts were inadequate. Its US lawyers never traveled to Colombia (where the plaintiff is based) to meet with its information technology team (or other executives) to discuss how relevant or responsive ESI would be located, and it did not retain an ESI retrieval consultant to help implement a litigation hold or to search for relevant ESI and documents. In addition, some critical executives and employees conducted their own searches for ESI and documents without ever seeing the defendant’s document request or without receiving a list of search terms from its counsel.

The plaintiff ultimately agreed to a forensic analysis by an outside vendor specializing in ESI retrieval and the Court appointed a neutral computer forensic expert to analyze the plaintiff’s ESI and later appointed a Special Master to assist the Court with ESI issues. Completed in May 2014,the report, which was “thousands of pages long” from the forensic expert, showed that “nearly 200,000 emails, PDFs, and Microsoft Word, Excel, and PowerPoint files were apparently deleted” and “[i]t appears that approximately 5,700 of these files contain an ESI search term in their title, which indicates that they could have been subject to production in the forensic analysis if they had not been deleted.”

The defendant filed a motion to conduct the deposition of the neutral third-party expert to explain the report and the plaintiff filed an opposing response.

Judge’s Ruling

You’ve got to love an opinion that begins by quoting both eighteenth century English writer Samuel Johnson and the recently departed B.B. King. Judge Goodman began his analysis by referencing Federal Rule of Evidence 706, noting that it “governs court-appointed expert witnesses” and that “Subsection 706(b)(2) provides that such witnesses ‘may be deposed by any party.’” With regard to the plaintiff’s objection that such depositions are not very common, he stated that “regardless of whether depositions of court-appointed neutral experts on computer forensic issues are very common, used occasionally or are actually rare and atypical, they are certainly permissible. As noted, Federal Rule Evidence 706(b)(2) expressly provides for them. Moreover, there are published opinions discussing these types of depositions without critical comment. Perhaps more importantly, district courts have ‘broad discretion over the management of pre-trial activities, including discovery and scheduling.’”

Judge Goodman also rejected the plaintiff’s objection about the purported tardiness of the motion, noting that the forensic analysis took more than a year and was not completed until the first week of April 2015. He stated that “the deposition would undoubtedly be of great help to the Court. If I were to deny the motion, as Procaps urges, then I would be undermining my own ability to grapple with the myriad, thorny issues which will surely arise in the next several weeks or months.

Therefore, the Undersigned hopes to be able to ‘get by with a little help from my [ESI neutral expert] friends’ and is ‘gonna try [to comprehensively and correctly assess the to-be-submitted ESI issues] with a little help from my friends.’ Granting Patheon’s motion will enable the Undersigned to accomplish that goal; denying it would render that specific goal unattainable (and make the ESI spoliation/sanctions/trial evidence/bad faith/significance of missing evidence/prejudice evaluation more difficult).”

As a result, Judge Goodman ordered the deposition of the third-party computer forensic expert to be conducted in part by the Special Master and laid out the procedures for the deposition in his order.

So, what do you think? Was the judge right in ordering the deposition? Please share any comments you might have or if you’d like to know more about a particular topic.

This isn’t the first time we’ve covered this case, click here for a previous ruling we covered back in May 2014.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s a New Job Title that May Catch On – Chief Data Scientist: eDiscovery Trends

May 29, 2015

With big data becoming bigger than ever, the ability for organizations to apply effective data analytics within information governance and electronic discovery disciplines has become more important than ever. With that in mind, one law firm has created a new role that might catch on with other firms and corporations – the role of Chief Data Scientist.

The article from Legaltech News (Drinker Biddle Names Borden Chief Data Scientist, by Chris DiMarco) notes that Drinker Biddle & Reath has named Bennett Borden the firm’s first chief data scientist (CDS). As the author notes, in this role, Borden will oversee the implementation of technologies and services that apply use of data analytics and other cutting edge tools to the practice of law and will be tasked with developing the firm’s data analytics strategy. The move positions Drinker Biddle as one of the first firms – possibly in the world – to carve out a leadership position overseeing data analytics, with the impetus for the new role coming from the firm’s longstanding views on the importance of governing information.

Borden, who is also co-founder of the Information Governance Initiative (IGI), was quoted in the article, stating, “Our perspective is that information governance is a coordinating discipline around all the different facets of the creation use and disposition of information. And so data analytics is one more part of a large IG framework.”

Borden’s selection as the firm’s chief data scientist comes on the heels of him receiving a Master of Science degree in business analytics from New York University.

“Because of where analytics is going, especially in the business arena, I was interested in getting additional training,” Borden said. “My entire career has focused on using advanced analytics on large volumes of information to find something of value. Much of my work has focused on using advanced data analytics across many of our practices, not only for discovery, but also for compliance and investigations.”

According to Borden, he is among the first to hold the title of CDS at a major firm. Will this start a trend? Maybe so. Congrats, Bennett!

So, what do you think? Do you think other firms and organizations will create a Chief Data Scientist position? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For a Successful Outcome to Your Discovery Project, Work Backwards: eDiscovery Best Practices

May 22, 2015

Based on a recent experience with a client, it seemed appropriate to revisit this topic. Plus, it’s always fun to play with the EDRM model. Notice anything different? 🙂

While the Electronic Discovery Reference Model from EDRM has become the standard model for the workflow of the process for handling electronically stored information (ESI) in discovery, it might be helpful to think about the EDRM model and work backwards, whether you’re the producing party or the receiving party.

Why work backwards?

You can’t have a successful outcome without envisioning the successful outcome that you want to achieve. The end of the discovery process includes the production and presentation stages, so it’s important to determine what you want to get out of those stages. Let’s look at them.

Presentation

Whether you’re a receiving party or a producing party, it’s important to think about what types of evidence you need to support your case when presenting at depositions and at trial – this is the type of information that needs to be included in your production requests at the beginning of the case as well as the type of information that you’ll need to preserve as a producing party.

Production

The format of the ESI produced is important to both sides in the case. For the receiving party, it’s important to get as much useful information included in the production as possible. This includes metadata and searchable text for the produced documents, typically with an index or load file to facilitate loading into a review application. The most useful form of production is native format files with all metadata preserved as used in the normal course of business.

For the producing party, it’s important to be efficient and minimize costs, so it’s important to agree to a production format that minimizes production costs. Converting files to an image based format (such as TIFF) adds costs, so producing in native format can be cost effective for the producing party as well. It’s also important to determine how to handle issues such as privilege logs and redaction of privileged or confidential information.

Addressing production format issues up front will maximize cost savings and enable each party to get what they want out of the production of ESI. If you don’t, you could be arguing in court like our case participants from yesterday’s post.

Processing-Review-Analysis

It also pays to make decisions early in the process that affect processing, review and analysis. How should exception files be handled? What do you do about files that are infected with malware? These are examples of issues that need to be decided up front to determine how processing will be handled.

As for review, the review tool being used may impact how quick and easy it is to get started, to load data and to use the tool, among other considerations. If it’s Friday at 5 and you have to review data over the weekend, is it easy to get started? As for analysis, surely you test search terms to determine their effectiveness before you agree on those terms with opposing counsel, right?

Preservation-Collection-Identification

Long before you have to conduct preservation and collection for a case, you need to establish procedures for implementing and monitoring litigation holds, as well as prepare a data map to identify where corporate information is stored for identification, preservation and collection purposes.

And, before a case even begins, you need an effective Information Governance program to minimize the amount of data that you might have to consider for responsiveness in the first place.

As you can see, at the beginning of a case (and even before), it’s important to think backwards within the EDRM model to ensure a successful discovery process. Decisions made at the beginning of the case affect the success of those latter stages, so working backwards can help ensure a successful outcome!

So, what do you think? What do you do at the beginning of a case to ensure success at the end? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For Better Document Review, You Need to Approach a ZEN State: eDiscovery Best Practices

May 6, 2015

Among the many definitions of the word “zen”, the Urban Dictionary provides perhaps the most appropriate (non-religious) definition of the word, as follows: a total state of focus that incorporates a total togetherness of body and mind. However, when it comes to document review, a new web site by eDiscovery thought leader Ralph Losey may change your way of thinking about the word “ZEN”.

Ralph’s new site, ZEN Document Review, introduces ‘ZEN’ as an acronym: Zero Error Numerics. As stated on the site, “ZEN document review is designed to attain the highest possible level of efficiency and quality in computer assisted review. The goal is zero error. The methods to attain that goal include active machine learning, random sampling, objective measurements, and comparative analysis using simple, repeatable systems.”

The ZEN methods were developed by Ralph Losey’s e-Discovery Team (many of which are documented on his excellent e-Discovery Team® blog). They rely on focused attention and full clear communication between review team members.

In the intro video on his site, Ralph acknowledges that it’s impossible to have zero error in any large, complex project, but “with the help of the latest tools and using the right mindset, we can come pretty damn close”. One of the graphics on the site represents an “upside down champagne glass” that illustrates 99.9% probable relevant identified correctly during the review process at the top of the graph and 00.1% probable relevant identified incorrectly at the bottom of the graph.

The ZEN approach includes everything from “predictive coding analytics, a type of artificial intelligence, actively managed by skilled human analysts in a hybrid approach” to “quiet, uninterrupted, single-minded focus” where “dual tasking during review is prohibited” to “judgmental and random sampling and analysis such as i-Recall” and even high ethics, with the goal being to “find and disclose the truth in compliance with local laws, not win a particular case”. And thirteen other factors, as well. Hey, nobody said that attaining ZEN is easy!

Attaining zero error in document review is a lofty goal – I admire Ralph for setting the bar high. Using the right tools, methods and attitude, can we come “pretty damn close”? What do you think? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Analysis