Review Archives

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

February 18, 2020

In the case In re Mercedes-Benz Emissions Litig., No. 2:16-cv-881 (KM) (ESK) (D.N.J. Jan. 9, 2020), Special Master Dennis Cavanaugh (U.S.D.J., Ret.) issued an order and opinion stating that he would not compel defendants to use technology assisted review (TAR), and instead adopted the search term protocol negotiated by the parties, with three areas of dispute resolved by his ruling.

Case Background

In this emissions test class action involving an automobile manufacturer, the plaintiffs proposed that the defendants use predictive coding/TAR, asserting that TAR yields significantly better results than either traditional human “eyes on” review of the full data set or the use of search terms. The plaintiffs also argued that if the Court were to decline to compel the defendants to adopt TAR, the Court should enter its proposed Search Term Protocol.

The defendants argued that there is no authority for imposing TAR on an objecting party and that this case presented a number of unique issues that would make developing an appropriate and effective seed set challenging, such as language and translation issues, unique acronyms and identifiers, redacted documents, and technical documents. As a result, they contended that they should be permitted to utilize their preferred custodian-and-search term approach.

Judge’s Ruling

Citing Rio Tinto Plc v. Vale S.A., Special Master Cavanaugh quoted from that case in stating: “While ‘the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it’…, no court has ordered a party to engage in TAR over the objection of that party. The few courts that have considered this issue have all declined to compel predictive coding.” Citing Hyles v. New York City (another case ruling by now retired New York Magistrate Judge Andrew J. Peck), Special Master Cavanaugh stated: “Despite the fact that it is widely recognized that ‘TAR is cheaper, more efficient and superior to keyword searching’…, courts also recognize that responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for producing their own electronically stored information.”

As a result, Special Master Cavanaugh ruled: “While the Special Master believes TAR would likely be a more cost effective and efficient methodology for identifying responsive documents, Defendants may evaluate and decide for themselves the appropriate technology for producing their ESI. Therefore, the Special Master will not order Defendants to utilize TAR at this time. However, Defendants are cautioned that the Special Master will not look favorably on any future arguments related to burden of discovery requests, specifically cost and proportionality, when Defendants have chosen to utilize the custodian-and-search term approach despite wide acceptance that TAR is cheaper, more efficient and superior to keyword searching. Additionally, the denial of Plaintiffs’ request to compel Defendants to utilize TAR is without prejudice to revisiting this issue if Plaintiffs contend that Defendants’ actual production is deficient.”

Special Master Cavanaugh also ruled on areas of dispute regarding the proposed Search Term Protocol, as follows:

Validation: Special Master Cavanaugh noted that “the parties have been able to reach agreement on the terms of Defendants’ validation process, [but] the parties are at an impasse regarding the level of validation of Plaintiffs’ search term results”, observing that “Plaintiffs’ proposal does not articulate how it will perform appropriate sampling and quality control measures to achieve the appropriate level of validation.” As a result, Special Master Cavanaugh, while encouraging the parties to work together to develop a reasonable procedure for the validation of Plaintiffs’ search terms, ruled: “As no articulable alternative process has been proposed by Plaintiffs, the Special Master will adopt Defendants’ protocol to the extent that it will require the parties, at Defendants’ request, to meet and confer concerning the application of validation procedures described in paragraph 12(a) to Plaintiffs, if the parties are unable to agree to a procedure.”
Known Responsive Documents & Discrete Collections: The defendants objected to the plaintiffs’ protocol to require the production of all documents and ESI “known” to be responsive as “vague, exceedingly burdensome, and provides no clear standard for the court to administer or the parties to apply”. The defendants also objected to the plaintiffs’ request for “folders or collections of information that are known to contain documents likely to be responsive to a discovery request” as “overly broad and flouts the requirement that discovery be proportional to the needs of the case.” Noting that “Defendants already agreed to produce materials that are known to be responsive at the November status conference”, Special Master Cavanaugh decided to “modify the Search Term Protocol to require production of materials that are ‘reasonably known’ to be responsive.” He also decided to require the parties to collect folders or collections of information “to the extent it is reasonably known to the producing party”, also requiring “the parties to meet and confer if a party believes a discrete document folder or collection of information that is relevant to a claim or defense is too voluminous to make review of each document proportional to the needs of the case.”

So, what do you think? Should a decision not to use TAR negatively impact a party’s ability to make burden of discovery arguments? Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Related to this topic, Rob Robinson’s Complex Discovery site published its Predictive Coding Technologies and Protocols Spring 2020 Survey results last week, which (as always) provides results on most often used primary predictive coding platforms and technologies, as well as most-often used TAR protocols and areas where TAR is most used (among other results). You can check it out at the link directly above.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

September 11, 2019

So many topics, so little time! Rob Robinson published the latest Predictive Coding and Technologies and Protocols Survey on his excellent ComplexDiscovery site last week, but this is the first chance I’ve had to cover it. The results are in and here are some of the findings in the largest response group for this survey yet.

As Rob notes in the results post here, the third Predictive Coding Technologies and Protocols Survey was initiated on August 23 and concluded on September 5 with individuals invited to participate directly by ComplexDiscovery and indirectly by industry website, blog, and newsletter mentions – including a big assist from the Association of Certified E-Discovery Specialists (ACEDS). It’s a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies and protocols from data discovery and legal discovery professionals within the eDiscovery ecosystem. The survey was designed to provide a general understanding of predictive coding technologies and protocols and had two primary educational objectives:

To provide a consolidated listing of potential predictive coding technology and protocol definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
To ask eDiscovery ecosystem professionals about their usage and preferences of predictive coding platforms, technologies, and protocols.

There were 100 total respondents in the survey (a nice, round number!). Here are some of the more notable results:

39 percent of responders were from law firms, 37 percent of responders were from software or services provider organizations, and the remaining 24 percent of responders were either part of a consultancy (12 percent), a corporation (6 percent), the government (3 percent), or another type of entity (3 percent).
86 percent of responders shared that they did have a specific primary platform for predictive coding versus 14 percent who indicated they did not.
There were 31 different platforms noted as primary predictive platforms by responders, nine of which received more than one vote and they accounted for more than three-quarters of responses (76 percent).
Active Learning was the most used predictive coding technology, with 86 percent reporting that they use it in their predictive coding efforts.
Just over half (51 percent) of responders reported using only one predictive coding technology in their predictive coding efforts.
Continuous Active Learning (CAL) was (by far) the most used predictive coding protocol, with 82 percent reporting that they use it in their predictive coding efforts.
Maybe the most interesting stat: 91 percent of responders reported using technology-assisted review in more than one area of data and legal discovery. So, the uses of TAR are certainly expanding!

Rob has reported several other results and provided graphs for additional details. To check out all of the results, click here. Want to compare to the previous two surveys? They’re here and here. :o)

So, what do you think? Do any of the results surprise you? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Number of Pages (Documents) in Each Gigabyte Can Vary Widely: eDiscovery Throwback Thursdays

August 29, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on July 31, 2012 – when eDiscovery Daily wasn’t even two years old yet. It’s “so old (how old is it?)”, it references a blog post from the now defunct Applied Discovery blog. We’ve even done an updated look at this topic with more file types about four years later. Oh, and (as we are more focused on documents than pages for most of the EDRM life cycle as it’s the metric by which we evaluate processing to review), so it’s the documents per GB that tends to be more considered these days.

So, why is this important? Not only for estimation purposes for review, but also for considering processing throughput. If you have two 40 GB (or so) PST container files and one file has twice the number of documents as the other, the one with more documents will take considerably longer to process. It’s getting to a point where the document per hour throughput is becoming more important than the GB per hour, as that can vary widely depending on the number of documents per GB. Today, we’re seeing processing throughput speeds as high as 1 million documents per hour with solutions like (shameless plug warning!) our CloudNine Explore platform. This is why Early Data Assessment tools have become more important as they can provide that document count quickly that lead to more accurate estimates. Regardless, the exercise below illustrates just how widely the number of pages (or documents) can vary within a single GB. Enjoy!

A long time ago, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs. But, did you know just how widely the number of pages (or documents) per gigabyte can vary? The “how many pages” question came up a lot back then and I’ve seen a variety of answers. The aforementioned Applied Discovery blog post provided some perspective in 2012 based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file. Many emails include attachments, which can be in any of a number of different file formats. Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats. So, estimating page (or document) counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application. To illustrate this, I decided to conduct a little (admittedly unscientific) study using our one-page blog post (also from July 2012) about the Apple/Samsung litigation (the first of many as it turned out, as that litigation dragged on for years). I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same. Here are the results:

Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example back then was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too). Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications. Data size will enable you to provide a “ballpark” estimate for processing and review at best, but, to provide a more definitive estimate, you need a document count today to get there. Early data assessment has become key to better estimates of scope and time frame for delivery than ever before.

So, what do you think? Was this example useful or highly flawed? Or both? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The March Toward Technology Competence (and Possibly Predictive Coding Adoption) Continues: eDiscovery Best Practices

March 5, 2019

I know, because it’s “March”, right? :o) Anyway, it’s about time is all I can say. My home state of Texas has finally added its name to the list of states that have adopted the ethical duty of technology competence for lawyers, becoming the 36th state to do so. And, we have a new predictive coding survey to check out.

As discussed on Bob Ambrogi’s LawSites blog, just last week (February 26), the Supreme Court of Texas entered an order amending Paragraph 8 of Rule 1.01 of the Texas Disciplinary Rules of Professional Conduct. The amended comment now reads (emphasis added):

Maintaining Competence

Because of the vital role of lawyers in the legal process, each lawyer should strive to become and remain proficient and competent in the practice of law, including the benefits and risks associated with relevant technology. To maintain the requisite knowledge and skill of a competent practitioner, a lawyer should engage in continuing study and education. If a system of peer review has been established, the lawyer should consider making use of it in appropriate circumstances. Isolated instances of faulty conduct or decision should be identified for purposes of additional study or instruction.

The new phrase in italics above mirrors the one adopted in 2012 by the American Bar Association in amending the Model Rules of Professional Conduct to make clear that lawyers have a duty to be competent not only in the law and its practice, but also in technology. Hard to believe it’s been seven years already! Now, we’re up to 36 states that have formally adopted this duty of technology competence. Just 14 to go!

Also, this weekend, Rob Robinson published the results of the Predictive Coding Technologies and Protocols Spring 2019 Survey on his excellent Complex Discovery blog. Like the first version of the survey he conducted back in September last year, the “non-scientific” survey designed to help provide a general understanding of the use of predictive coding technologies, protocols, and workflows by data discovery and legal discovery professionals within the eDiscovery ecosystem. This survey had 40 respondents, up from 31 the last time.

I won’t steal Rob’s thunder, but here are a couple of notable stats:

Approximately 62% of responders (62.5%) use more than one predictive coding technology in their predictive coding efforts: That’s considerably higher than I would have guessed;
Continuous Active Learning (CAL) was the most used predictive coding protocol with 80% of responders reporting that they use it in their predictive coding efforts: I would have expected that CAL was the leader, but not as dominant as these stats show; and
95% of responders use technology-assisted review in more than one area of data and legal discovery: Which seems a good sign to me that practitioners aren’t just limiting it to identification of relevant documents in review anymore.

Rob’s findings, including several charts, can be found here.

So, what do you think? Which state will be next to adopt an ethical duty of technology competence for lawyers? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Releases the Final Version of its TAR Guidelines: eDiscovery Best Practices

February 8, 2019

During last year’s EDRM Spring Workshop, I discussed on this blog that EDRM had released the preliminary draft of its Technology Assisted Review (TAR) Guidelines for public comment. They gave a mid-July deadline for comments and I even challenged the people who didn’t understand TAR very well to review it and provide feedback – after all, those are the people who would hopefully stand to benefit the most from these guidelines. Now, over half a year later, EDRM has released the final version of its TAR Guidelines.

The TAR Guidelines (available here) have certainly gone through a lot of review. In addition to the public comment period last year, it was discussed in the last two EDRM Spring meetings (2017 and 2018), presented at the Duke Distinguished Lawyers’ conference on Technology Assisted Review in 2017 for feedback, and worked on extensively during that time.

As indicated in the press release, more than 50 volunteer judges, practitioners, and eDiscovery experts contributed to the drafting process over a two-year period. Three drafting teams worked on various iterations of the document, led by Matt Poplawski of Winston & Strawn, Mike Quartararo of eDPM Advisory Services, and Adam Strayer of Paul, Weiss, Rifkind, Wharton & Garrison. Tim Opsitnick of TCDI and U.S. Magistrate Judge James Francis IV (Southern District of New York, Ret.), assisted in editing the document and incorporating comments from the public comment period.

“We wanted to address the growing confusion about TAR, particularly marketing claims and counterclaims that undercut the benefits of various versions of TAR software,” said John Rabiej, deputy director of the Bolch Judicial Institute of Duke Law School, which oversees EDRM. “These guidelines provide guidance to all users of TAR and apply across the different variations of TAR. We avoided taking a position on which variation of TAR is more effective, because that very much depends on facts specific to each case. Instead, our goal was to create a definitive document that could explain what TAR is and how it is used, to help demystify it and to help encourage more widespread adoption.” EDRM/Duke Law also provide a TAR Q&A with Rabiej here.

The 50-page document contains four chapters: The first chapter defines technology assisted review and the TAR process. The second chapter lays out a standard workflow for the TAR process. The third chapter examines alternative tasks for applying TAR, including prioritization, categorization, privilege review, and quality and quantity control. Chapter four discusses factors to consider when deciding whether to use TAR, such as document set, cost, timing, and jurisdiction.

“Judges generally lack the technical expertise to feel comfortable adjudicating disputes involving sophisticated search methodologies. I know I did,” said Magistrate Judge Francis, who assisted in editing the document. “These guidelines are intended, in part, to provide judges with sufficient information to ask the right questions about TAR. When judges are equipped with at least this fundamental knowledge, counsel and their clients will be more willing to use newer, more efficient technologies, recognizing that they run less risk of being caught up in a discovery quagmire because a judge just doesn’t understand TAR. This, in turn, will further the goals of Rule 1 of the Federal Rules of Civil Procedure: to secure the just, speedy, and inexpensive determination of litigation.”

EDRM just announced the release of the final version of the TAR guidelines yesterday, so I haven’t had a chance to read it completely through yet, but a quick comparison to the public comment version from last May seems to indicate the same topics and sub-topics that were covered back then, so there certainly appears to be no major rewrite as a result of the public comment feedback. I look forward to reading it in detail and determining what specific changes were made.

So, what do you think? Will these guidelines help the average attorney or judge better understand TAR? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Defendants to Sample Disputed Documents to Help Settle Dispute: eDiscovery Case Law

November 6, 2018

In Updateme Inc. v. Axel Springer SE, No. 17-cv-05054-SI (LB) (N.D. Cal. Oct. 11, 2018), California Magistrate Judge Laurel Beeler ordered the defendants to review a random sample of unreviewed documents in dispute and produce any responsive documents reviewed (along with a privilege log, if applicable) and report on the number of documents and families reviewed and the rate of responsiveness within one week.

Case Background

In this case where the plaintiff, creator of a news-aggregator cell-phone app, claimed that the defendants “stole” their platform and released a copycat app, learned that the defendants used the code name “Ajax” to refer their product. The defendants determined that there were 5,126 unique documents (including associated family members) within the previously collected ESI that hit on the term “Ajax”, but they had not reviewed those documents for responsiveness. The plaintiff asked the court to order the defendants to review those documents and produce responsive documents within two weeks.

The defendants claimed that the term “Ajax” is a project name that they created to refer to the plaintiff’s threatened litigation, not the product itself and claimed that “a sampling of the `Ajax’ documents confirms that, in every responsive document, the term `Ajax’ was used to refer to the dispute itself.” But, the plaintiff cited 93 produced documents generally and two documents in particular (which the defendants were attempting to clawback as privileged) that referred to their product. However, the defendants also claimed that it would be unduly burdensome and expensive to review the “Ajax” documents at this stage of the litigation and also argued that the term “Ajax” was not included in the ESI Protocol that the parties agreed upon months ago and should not be added at this late stage.

Judge’s Ruling

Judge Beeler observed this: “Whether ‘Ajax’ refers to Updateme or only the defendants’ dispute with Updateme is in some sense a distinction without a difference. Either way, the search term ‘Ajax’ is likely to return documents that are responsive to Updateme’s request for “[a]ll communications . . . concerning Updateme or the updaemi® application[.]” Documents concerning the defendants’ dispute with Updateme are likely documents concerning Updateme.”

Judge Beeler also noted that “even if ‘Ajax’ refers to the dispute, that does not mean that documents that contain ‘Ajax’ are necessarily more likely to be privileged or protected from disclosure”, using a hypothetical scenario where two non-lawyers might discuss the impact of the “Ajax” dispute on profits. She concluded her analysis with this statement: “To the extent the defendants are suggesting that if ‘Ajax’ purportedly refers to their dispute with Updateme, ESI containing ‘Ajax’ should remain outside the scope of discovery, the court is not convinced.”

As a result, Judge Beeler ordered the defendants to “randomly select 10% of the unreviewed documents {in dispute}, review them (and their associated family members) for responsiveness, produce responsive documents (and a privilege log for any responsive documents that are withheld), and provide a chart listing the number of documents and families reviewed and the rate of responsiveness” within one week. Judge Beeler stated that the parties should then meet and confer if they continued to have disputes regarding these documents.

So, what do you think? Should random sampling be used more to settle proportionality disputes or should it be a last resort? Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Mike Q Says the Weakest Link in TAR is Humans: eDiscovery Best Practices

October 26, 2018

We started the week with a post from Tom O’Connor (his final post in his eDiscovery Project Management from Both Sides series). And, we’re ending the week covering an article from Mike Quartararo on Technology Assisted Review (TAR). You would think we were inadvertently promoting our webcast next week or something. :o)

Remember The Weakest Link? That was the early 2000’s game show with the sharp-tongued British hostess (Anne Robinson) telling contestants that were eliminated “You are the weakest link. Goodbye!” Anyway, in Above the Law (Are Humans The Weak Link In Technology-Assisted Review?), Mike takes a look at the debate as to which tool is the superior tool for conducting TAR and notes the lack of scientific studies that point to any particular TAR software or algorithm being dramatically better or, more importantly, significantly more accurate, than any other. So, if it’s not the tool that determines the success or failure of a TAR project, what is it? Mike says when TAR has problems, it’s because of the people.

Of course, Mike knows quite a bit about TAR. He’s managed his “share of” of projects, has used “various flavors of TAR” and notes that “none of them are perfect and not all of them exceed all expectations in all circumstances”. Mike has also been associated with the EDRM TAR project (which we covered earlier this year here) for two years as a team leader, working with others to draft proposed standards.

When it comes to observations about TAR that everyone should be able to agree on, Mike identifies three: 1) that TAR is not Artificial Intelligence, just “machine learning – nothing more, nothing less”, 2) that TAR technology works and “TAR applications effectively analyze, categorize, and rank text-based documents”, and 3) “using a TAR application — any TAR application — saves time and money and results in a reasonable and proportional outcome.” Seems logical to me.

So, when TAR doesn’t work, “the blame may fairly be placed at the feet (and in the minds) of humans.” We train the software by categorizing the training documents, we operate the software, we analyze the outcome. So, it’s our fault.

Last month, we covered this case where the plaintiffs successfully requested additional time for discovery when defendant United Airlines, using TAR to manage its review process, produced 3.5 million documents. However, sampling by the plaintiffs (and later confirmed by United) found that the production contained only 600,000 documents that were responsive to their requests (about 17% of the total production). That seems like a far less than ideal TAR result to me. Was that because of human failure? Perhaps, when it comes down to it, the success of TAR being dependent on humans points us back to the long-used phrase regarding humans and computers: Garbage In, Garbage Out.

So, what do you think? Should TAR be considered Artificial Intelligence? As always, please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s a Terrific Scorecard for Mobile Evidence Discovery: eDiscovery Best Practices

October 25, 2018

As we’ve noted before, eDiscovery isn’t just about discovery of emails and office documents anymore. There are so many sources of data these days that legal professionals have to account for and millions more being transmitted over the internet every minute, much of which is being transmitted and managed via mobile devices. Now, here’s a terrific new Mobile Evidence Burden and Relevance Scorecard, courtesy of Craig Ball!

Craig has had a lot to say in the past about mobile device preservation and collection, even going as far as to say that failure to advise clients to preserve relevant and unique mobile data when under a preservation duty is committing malpractice. To help lawyers avoid that fate, Craig has described a simple, scalable approach for custodian-directed preservation of iPhone data.

Craig’s latest post (Mobile to the Mainstream, PDF article here) “looks at simple, low-cost approaches to getting relevant and responsive mobile data into a standard e-discovery review workflow” as only Craig can. But, Craig also “offers a Mobile Evidence Scorecard designed to start a dialogue leading to a consensus about what forms of mobile content should be routinely collected and reviewed in e-discovery, without the need for digital forensic examination.”

It’s that scorecard – and Craig’s discussion of it – that is really cool. Craig breaks down various types of mobile data (e.g., Files, Photos, Messages, Phone Call History, Browser History, etc.) in terms of Ease of Collection and Ease of Review (Easy, Moderate or Difficult), Potential Relevance (Frequent, Case Specific or Rare) and whether or not you would Routinely Collect (Yes, No or Maybe). Believe it or not, Craig states that you would routinely collect almost half (7 out of 16 marked as “Yes”, 2 more marked as “Maybe”) of the file types. While the examples are specific to the iPhone (which I think is used most by legal professionals), the concepts apply to Android and other mobile devices as well.

I won’t steal Craig’s thunder here; instead, I’ll direct you to his post here so that you can check it out yourself. This scorecard can serve as a handy guide for what lawyers should expect for mobile device collection in their cases. Obviously, it depends on the lawyer and the type of case in which they’re involved, but it’s still a good general reference guide.

So, what do you think? Do you routinely collect data from mobile devices for your cases? And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiffs Granted Discovery Extension Due to Defendant’s TAR Review Glitch: eDiscovery Case Law

September 21, 2018

In the case In Re Domestic Airline Travel Antitrust Litigation, MDL Docket No. 2656, Misc. No. 15-1404 (CKK), (D.D.C. Sept. 13, 2018), District of Columbia District Judge Colleen Kollar-Kotelly granted the Plaintiffs’ Motion for an Extension of Fact Discovery Deadlines (over the defendants’ objections) for six months, finding that defendant “United’s production of core documents that varied greatly from the control set in terms of the applicable standards for recall and precision and included a much larger number of non-responsive documents that was anticipated” (United’s core production of 3.5 million documents contained only 600,000 documents that were responsive).

Case Background

In the case involves a multidistrict class action litigation brought by the plaintiffs (purchasers of air passenger transportation for domestic travel) alleging that the defendant airlines willingly conspired to engage in unlawful restraint of trade, the plaintiffs filed an instant Motion for Extension of Time to Complete Discovery, requesting an extension of six months, predicated on an “issue with United’s ‘core’ document production,” asserting that defendant United produced more than 3.5 million [core] documents to the Plaintiffs, but “due to United’s technology assisted review process (‘TAR’), only approximately 17%, or 600,000, of the documents produced are responsive to Plaintiffs’ requests,” and the plaintiffs (despite having staffed their discovery review with 70 attorneys) required additional time to sort through them.

Both defendants (Delta and United) opposed the plaintiffs’ request for an extension, questioning whether the plaintiffs had staffed the document review with 70 attorneys and suggesting the Court review the plaintiffs’ counsel’s monthly time sheets to verify that statement. Delta also questioned by it would take the plaintiffs so long to review the documents and tried to extrapolate how long it would take to review the entire set of documents based on a review of 3 documents per minute (an analysis that the plaintiffs called “preposterous”). United indicated that it engaged “over 180 temporary contract attorneys to accomplish its document production and privilege log process within the deadlines” set by the Court, so the plaintiffs should be expected to engage in the same expenditure of resources. But, the plaintiffs contended that they “could not have foreseen United’s voluminous document production made up [of] predominantly non-responsive documents resulting from its deficient TAR process when they jointly proposed an extension of the fact discovery deadline in February 2018.”

Judge’s Ruling

Judge Kollar-Kotelly noted that “Plaintiffs contend that a showing of diligence involves three factors — (1) whether the moving party diligently assisted the Court in developing a workable scheduling order; (2) that despite the diligence, the moving party cannot comply with the order due to unforeseen or unanticipated matters; and (3) that the party diligently sought an amendment of the schedule once it became apparent that it could not comply without some modification of the schedule.” She noted that “there is no dispute that the parties diligently assisted the Court in developing workable scheduling orders through their preparation of Joint Status Reports prior to the status conferences in which discovery issues and scheduling were discussed, and in their meetings with the Special Master, who is handling discovery matters in this case.”

Judge Kollar-Kotelly also observed that “United’s core production of 3.5 million documents — containing numerous nonresponsive documents — was unanticipated by Plaintiffs, considering the circumstances leading up to that production” and that “Plaintiffs devoted considerable resources to the review of the United documents prior to filing this motion seeking an extension”. Finding also that “Plaintiffs’ claim of prejudice in not having the deadlines extended far outweighs any inconvenience that Defendants will experience if the deadlines are extended”, Judge Kollar-Kotelly found “that Plaintiffs have demonstrated good cause to warrant an extension of deadlines in this case based upon Plaintiffs’ demonstration of diligence and a showing of nominal prejudice to the Defendants, if an extension is granted, while Plaintiffs will be greatly prejudiced if the extension is not granted.” As a result, she granted the motion to request the extension.

So, what do you think? Was the court right to have granted the extension? Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Also, if you’re going to be in Houston on Thursday, September 27, just a reminder that I will be speaking at the second annual Legal Technology Showcase & Conference, hosted by the Women in eDiscovery (WiE), Houston Chapter, South Texas College of Law and the Association of Certified E-Discovery Specialists (ACEDS). I’ll be part of the panel discussion AI and TAR for Legal: Use Cases for Discovery and Beyond at 3:00pm and CloudNine is also a Premier Platinum Sponsor for the event (as well as an Exhibitor, so you can come learn about us too). Click here to register!

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Survey Says! Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

September 20, 2018

Last week, I discussed the predictive coding survey that Rob Robinson was conducting on his Complex Discovery site (along with the overview of key predictive coding related terms. The results are in and here are some of the findings.

As Rob notes in the results post here, the Predictive Coding Technologies and Protocols Survey was initiated on August 31 and concluded on September 15. It’s a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies and protocols from data discovery and legal discovery professionals within the eDiscovery ecosystem. The survey was designed to provide a general understanding of predictive coding technologies and protocols and had two primary educational objectives:

To provide a consolidated listing of potential predictive coding technology and protocol definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
To ask eDiscovery ecosystem professionals about their usage and preferences of predictive coding platforms, technologies, and protocols.

There were 31 total respondents in the survey. Here are some of the more notable results:

More than 80% of responders (80.64%) shared that they did have a specific primary platform for predictive coding versus just under 20% (19.35%), who indicated they did not.
There were 12 different platforms noted as primary predictive platforms by responders, but only three platforms received more than one vote and they accounted for more than 50% of responses (61%).
Active Learning was the most used predictive coding technology, with more than 70% of responders (70.96%) reporting that they use it in their predictive coding efforts.
Just over two-thirds of responders (67.74%) use more than one predictive coding technology in their predictive coding efforts, while just under one-third (32.25%) use only one.
Continuous Active Learning (CAL) was (by far) the most used predictive coding protocol, with more than 87% of responders (87.09%) reporting that they use it in their predictive coding efforts.

Rob has reported several other results and provided graphs for additional details. To check out all of the results, click here.

So, what do you think? Do any of the results surprise you? Please share any comments you might have or if you’d like to know more about a particular topic.

Also, if you’re going to be in Houston on Thursday, September 27, just a reminder that I will be speaking at the second annual Legal Technology Showcase & Conference, hosted by the Women in eDiscovery (WiE), Houston Chapter, South Texas College of Law and the Association of Certified E-Discovery Specialists (ACEDS). I’ll be part of the panel discussion AI and TAR for Legal: Use Cases for Discovery and Beyond at 3:00pm and CloudNine is also a Premier Platinum Sponsor for the event (as well as an Exhibitor, so you can come learn about us too). Click here to register!

Image Copyright (C) FremantleMedia North America, Inc.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

The Number of Pages (Documents) in Each Gigabyte Can Vary Widely: eDiscovery Throwback Thursdays

The March Toward Technology Competence (and Possibly Predictive Coding Adoption) Continues: eDiscovery Best Practices

EDRM Releases the Final Version of its TAR Guidelines: eDiscovery Best Practices

Court Orders Defendants to Sample Disputed Documents to Help Settle Dispute: eDiscovery Case Law

Mike Q Says the Weakest Link in TAR is Humans: eDiscovery Best Practices

Here’s a Terrific Scorecard for Mobile Evidence Discovery: eDiscovery Best Practices

Plaintiffs Granted Discovery Extension Due to Defendant’s TAR Review Glitch: eDiscovery Case Law

Survey Says! Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

Status: Updated