Predictive Analytics: It’s Not Just for Review Anymore – eDiscovery Trends

One of the most frequently discussed trends in this year’s annual thought leader interviews that we conducted was the application of analytics (including predictive analytics) to Information Governance.  A recent report published in the Richmond Journal of Law & Technology addresses how analytics can be used to optimize Information Governance.

Written by Bennett B. Borden & Jason R. Baron (who was one of our thought leaders discussing that very topic), Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice, 20 RICH. J.L. & TECH. 7 (2014) is written for those who are not necessarily experts in the field.  It provides a synopsis of why and how predictive coding first emerged in eDiscovery and defines important terms related to the topic, then discusses aspects of an information governance program where application of predictive coding and related analytical techniques is most useful. Most notably, the authors provide a few “early” examples of the use of advanced analytics, like predictive coding, for non-litigation contexts to illustrate the possibilities for applying the technology.  Here is a high-level breakdown of the report:

Introduction (pages 1-3): Provides a high-level introduction of the topics to be discussed.

A. The Path to Da Silva Moore (pages 3-14): Provides important background to the evolution of managing electronically stored information (ESI) and predictive coding (fittingly, it begins with the words “In the beginning”).  Starting on page 9, the authors discuss “The Da Silva Moore Precedent”, providing a detailed account of the Da Silva Moore case (our post here summarizes our coverage of the case) and also references other cases, as well: In re Actos (Pioglitazone) Products Liability Litigation, Global Aerospace Inc., et al, v. Landow Aviation, L.P., Kleen Products v. Packaging Corp. of America, EORHB, Inc. v. HOA Holdings and In Re: Biomet M2a Magnum Hip Implant Products Liability Litigation.  Clearly, the past couple of years have provided several precedents for the use of predictive coding in litigation.

B. Information Governance and Analytics in the Era of Big Data (pages 15-20): This section provides definitions and important context for terms such as “big data”, “analytics” and “Information Governance”.  It’s important to have the background on these concepts before launching into how analytics can be applied to optimize Information Governance.

C. Applying the Lessons of E-Discovery In Using Analytics for Optimal Information Governance: Some Examples (pages 21-31): With the background of sections A and B under your belt, the heart of the report then gets into the actual application of analytics in different scenarios, using “True Life Examples” that are “’ripped from’ the pages of the author’s legal experience, without embellishment”.  These examples where analytics are used include:

  • A corporate client is being sued by a former employee in a whistleblower qui tam action;
  • A highly regulated manufacturing client decided to outsource the function of safety testing some of its products and a director of the department whose function was being outsourced, despite being offered a generous severance package, demanded four times the severance amount and threatened to go to the company’s regulator with a list of ten supposed major violations that he described in the email if he did not receive what he was asking for.
  • A major company received a whistleblower letter from a reputable third party alleging that several senior personnel were involved with an elaborate kickback scheme that also involved FCPA violations.
  • An acquisition agreement between parties contained a provision such that if the disclosures made by the target were found to be off by a certain margin within thirty days of the acquisition, the purchase price would be adjusted.

In each case, the use of analytics either resulted in a quick settlement, proved the alleged violations to be unfounded, or resulted in an appropriate adjustment in the purchase price of the acquired company.  These real world examples truly illustrate how analytics can be applied beyond the document review stage of eDiscovery.

Conclusion (pages 31-32): While noting that the authors’ intent was to “merely scratch the surface” of the topic, they offer some predictions for the end of the decade and note “expected demand on the part of corporate clients for lawyers to be familiar with state of the art practices in the information governance space”.  In other words, your clients are going to expect you to understand this.

The report is an easy read, even for novices to the technology, and is a must-read for anyone looking to understand more about applying analytics to Information Governance.  Bennett and Jason are both with Drinker Biddle & Reath LLP and are also co-chairs of the Information Governance Initiative (here is our recent blog post about IGI).

So, what do you think? Has your organization applied analytics to big data to reduce or eliminate litigation costs? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Review Attorneys, Are You Smarter than a High Schooler?


Review attorneys are taking a beating these days.  There’s so much attention being focused on technology assisted review, with the latest study noting the cost-effectiveness of technology assisted review (when compared to manual review) having just been released this month.  There is also the very detailed and well known white paper study written by Maura Grossman and Gordon Cormack (Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review) which notes not only the cost-effectiveness of technology assisted review but also that it was actually more accurate.

The latest study, from information scientist William Webber (and discussed in this Law Technology News article by Ralph Losey) seems to indicate that trained reviewers don’t provide any better review accuracy than a pair of high schoolers that he selected with “no legal training, and no prior e-discovery experience, aside from assessing a few dozen documents for a different TREC topic as part of a trial experiment”.  In fact, the two high schoolers did better!  He also notes that “[t]hey worked independently and without supervision or correction, though one would be correct to describe them as careful and motivated.”  His conclusion?

“The conclusion that can be reached, though, is that our assessors were able to achieve reliability (with or without detailed assessment guidelines) that is competitive with that of the professional reviewers — and also competitive with that of a commercial e-discovery vendor.”

Webber also cites two other studies with similar results and notes “All of this raises the question that is posed in the subject of this post: if (some) high school students are as reliable as (some) legally-trained, professional e-discovery reviewers, then is legal training a practical (as opposed to legal) requirement for reliable first-pass review for responsiveness? Or are care and general reading skills the more important factors?”

I have a couple of observations about the study.  Keep in mind, I’m not an attorney (and don’t play one on TV), but I have worked with review teams on several projects and have observed the review process and how it has been conducted in a real world setting, so I do have some real-world basis for my thoughts:

  • Two high schoolers is not a significant sample size: I’ve worked on several projects where some reviewers are really productive and others are highly unproductive to the point of being useless.  It’s difficult to determine a valid conclusion on the basis of two non-legal reviewers in his study and four non-legal reviewers in one of the studies that Webber cites.
  • Review is typically an iterative process: In my experience, most legal reviews that I’ve seen start with detailed instructions and training provided to the reviewers, followed up with regular (daily, if not more frequent) changes to instructions to reflect information gathered during the review process.  Instructions are refined as the review commences and more information is learned about the document collection.  Since Webber noted that “[t]hey worked independently and without supervision or correction”, it doesn’t appear that his review test was conducted in this manner.  This makes it less of a real world scenario, in my opinion.

I also think some reviews especially benefit from a first pass review with legal trained reviewers (for example, a reviewer who understands intellectual property laws is going to understand potential IP issues better than someone who hasn’t had the training in IP law).  Nonetheless, these studies are bound to “fan the flames” of debate regarding the effectiveness of manual attorney review (even more than they already are).

So, what do you think?  Do you think his study is valid?  Or do you have other concerns about the conclusions he has drawn?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective


As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays).  The overview of the 2011 TREC Legal Track can be found here.

The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.” 

However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”

Other notable tidbits from the study and article:

  • Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
  • Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
  • The document collection used was derived from the EDRM Enron Data Set;
  • The learning task had three distinct topics, each representing a distinct request for production.  A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
  • OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
  • One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”.  According to the LTN article, the barred participant was Recommind.

For more information, check out the links to the article and the study above.  TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.

So, what do you think?  Are you surprised by the results or are they expected?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Bennett Borden


This is the second of our Holiday Thought Leader Interview series.  I interviewed several thought leaders to get their perspectives on various eDiscovery topics.

Today's thought leader is Bennett B. Borden. Bennett is the co-chair of Williams Mullen’s eDiscovery and Information Governance Section. Based in Richmond, Va., his practice is focused on Electronic Discovery and Information Law. He has published several papers on the use of predictive coding in litigation. Bennett is not only an advocate for predictive coding in review, but has reorganized his own litigation team to more effectively use advanced computer technology to improve eDiscovery.

You have written extensively about the ways that the traditional, or linear review process is broken. Most of our readers understand the issue, but how well has the profession at large grappled with this? Are the problems well understood?

The problem with the expense of document review is well understood, but how to solve it is less well known. Fortunately, there is some great research being done by both academics and practitioners that is helping shed light on both the problem and the solution. In addition to the research we’ve written about in The Demise of Linear Review and Why Document Review is Broken, some very informative research has come out of the TREC Legal Track and subsequent papers by Maura R. Grossman and Gordon V. Cormack, as well as by Jason R. Baron, the eDiscovery Institute, Douglas W. Oard and Herbert L. Roitblat, among others.  Because of this important research, the eDiscovery bar is becoming increasingly aware of how document review and, more importantly, fact development can be more effective and less costly through the use of advanced technology and artful strategy. 

You are a proponent of computer-assisted review- is computer search technology truly mature? Is it a defensible strategy for review?

Absolutely. In fact, I would argue that computer-assisted review is actually more defensible than traditional linear review.  By computer-assisted review, I mean the utilization of advanced search technologies beyond mere search terms (e.g., topic modeling, clustering, meaning-based search, predictive coding, latent semantic analysis, probabilistic latent semantic analysis, Bayesian probability) to more intelligently address a data set. These technologies, to a greater or lesser extent, group documents based upon similarities, which allows a reviewer to address the same kinds of documents in the same way.

Computers are vastly superior to humans in quickly finding similarities (and dissimilarities) within data. And, the similarities that computers are able to find have advanced beyond mere content (through search terms) to include many other aspects of data, such as correspondents, domains, dates, times, location, communication patterns, etc. Because the technology can now recognize and address all of these aspects of data, the resulting groupings of documents is more granular and internally cohesive.  This means that the reviewer makes fewer and more consistent choices across similar documents, leading to a faster, cheaper, better and more defensible review.

How has the use of [computer-assisted review] predictive coding changed the way you tackle a case? Does it let you deploy your resources in new ways?

I have significantly changed how I address a case as both technology and the law have advanced. Although there is a vast amount of data that might be discoverable in a particular case, less than 1 percent of that data is ever used in the case or truly advances its resolution. The resources I deploy focus on identifying that 1 percent, and avoiding the burden and expense largely wasted on the 99 percent. Part of this is done through developing, negotiating and obtaining reasonable and iterative eDiscovery protocols that focus on the critical data first. EDiscovery law has developed at a rapid pace and provides the tools to develop and defend these kinds of protocols. An important part of these protocols is the effective use of computer-assisted review.

Lately there has been a lot of attention given to the idea that computer-assisted review will replace attorneys in litigation. How much truth is there to that idea? How will computer-assisted review affect the role of attorneys?

Technology improves productivity, reducing the time required to accomplish a task. This is no less true of computer-assisted review. The 2006 amendments to the Federal Rules of Civil Procedure caused a massive increase in the number of attorneys devoted to the review of documents. As search technology and the review tools that employ them continue to improve, the demand for attorneys devoted to review will obviously decline.

But this is not a bad thing. Traditional linear document review is horrifically tedious and boring, and it is not the best use of legal education and experience. Fundamentally, litigators develop facts and apply the law to those facts to determine a client’s position to advise them to act accordingly. Computer-assisted review allows us to get at the most relevant facts more quickly, reducing both the scope and duration of litigation. This is what lawyers should be focused on accomplishing, and computer-assisted review can help them do so.

With the rise of computer-assisted review, do lawyers need to learn new skills? Do lawyers need to be computer scientists or statisticians to play a role?

Lawyers do not need to be computer scientists or statisticians, but they certainly need to have a good understanding of how information is created, how it is stored, and how to get at it. In fact, lawyers who do not have this understanding, whether alone or in conjunction with advisory staff, are simply not serving their clients competently.

You’ve suggested that lawyers involved in computer-assisted review enjoy the work more than in the traditional manual review process. Why do you think that is?

I think it is because the lawyers are using their legal expertise to pursue lines of investigation and develop the facts surrounding them, as opposed to simply playing a monotonous game of memory match. Our strategy of review is to use very talented lawyers to address a data set using technological and strategic means to get to the facts that matter. While doing so our lawyers uncover meaning within a huge volume of information and weave it into a story that resolves the matter. This is exciting and meaningful work that has had significant impact on our clients’ litigation budgets.

How is computer assisted review changing the competitive landscape? Does it provide an opportunity for small firms to compete that maybe didn’t exist a few years ago?

We live in the information age, and lawyers, especially litigators, fundamentally deal in information. In this age it is easier than ever to get to the facts that matter, because more facts (and more granular facts) exist within electronic information. The lawyer who knows how to get at the facts that matter is simply a more effective lawyer. The information age has fundamentally changed the competitive landscape. Small companies are able to achieve immense success through the skillful application of technology. The same is true of law firms. Smaller firms that consciously develop and nimbly utilize the technological advantages available to them have every opportunity to excel, perhaps even more so than larger, highly-leveraged firms. It is no longer about size and head-count, it’s about knowing how to get at the facts that matter, and winning cases by doing so.

Thanks, Bennett, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Jason R. Baron


This is the first of the Holiday Thought Leader Interview series.  I interviewed several thought leaders to get their perspectives on various eDiscovery topics.

Today’s thought leader is Jason R. Baron. Jason has served as the National Archives' Director of Litigation since May 2000 and has been involved in high-profile cases for the federal government. His background in eDiscovery dates to the Reagan Administration, when he helped retain backup tapes containing Iran-Contra records from the National Security Council as the Justice Department’s lead counsel. Later, as director of litigation for the U.S. National Archives and Records Administration, Jason was assigned a request to review documents pertaining to tobacco litigation in U.S. v. Philip Morris.

He currently serves as The Sedona Conference Co-Chair of the Working Group on Electronic Document Retention and Production. Baron is also one of the founding coordinators of the TREC Legal Track, a search project organized through the National Institute of Standards and Technology to evaluate search protocols used in eDiscovery. This year, Jason was awarded the Emmett Leahy Award for Outstanding Contributions and Accomplishments in the Records and Information Management Profession.

You were recently awarded the prestigious Emmett Leahy Award for excellence in records management. Is it unusual that a lawyer wins such an award? Or is the job of the litigator and records manager becoming inextricably linked?

Yes, it was unusual: I am the first federal lawyer to win the Emmett Leahy award, and only the second lawyer to have done so in the 40-odd years that the award has been given out. But my career path in the federal government has been a bit unusual as well: I spent seven years working as lead counsel on the original White House PROFS email case (Armstrong v. EOP), followed by more than a decade worrying about records-related matters for the government as Director of Litigation at NARA. So with respect to records and information management, I long ago passed at least the Malcolm Gladwell test in "Outliers" where he says one needs to spend 10,000 hours working on anything to develop a level of "expertise."  As to the second part of your question, I absolutely believe that to be a good litigation attorney these days one needs to know something about information management and eDiscovery — since all evidence is "born digital" and lots of it needs to be searched for electronically. As you know, I also have been a longtime advocate of a greater linking between the fields of information retrieval and eDiscovery.

In your acceptance speech you spoke about the dangers of information overload and the possibility that it will make it difficult for people to find important information. How optimistic that we can avoid this dystopian future? How can the legal profession help the world avoid this fate? 

What I said was that in a world of greater and greater retention of electronically stored information, we need to leverage artificial intelligence and specifically better search algorithms to keep up in this particular information arms race. Although Ralph Losey teased me in a recent blog post that I was being unduly negative about future information dystopias, I actually am very optimistic about the future of search technology assisting in triaging the important from the ephemeral in vast collections of archives. We can achieve this through greater use of auto-categorization and search filtering methods, as well as a having a better ability in the future to conduct meaningful searches across the enterprise (whether in the cloud or not). Lawyers can certainly advise their clients how to practice good information governance to accomplish these aims.

You were one of the founders of the TREC Legal Track research project. What do you consider that project’s achievement at this point?

The initial idea for the TREC Legal Track was to get a better handle on evaluating various types of alternative search methods and technologies, to compare them against a "baseline" of how effective lawyers were in relying on more basic forms of keyword searching. The initial results were a wake-up call, in showing lawyers that sole reliance on simple keywords and Boolean strings sometimes results in a large quantity of relevant evidence going missing. But during the half-decade of research that now has gone into the track, something else of perhaps even greater importance has emerged from the results, namely: we have a much better understanding now of what a good search process looks like, which includes a human in the loop (known in the Legal Track as a topic authority) evaluating on an ongoing, iterative basis what automated search software kicks out by way of initial results. The biggest achievement however may simply be the continued existence of the TREC Legal Track itself, still going in its 6th year in 2011, and still producing important research results, on an open, non-proprietary platform, that are fully reproducible and that benefit both the legal profession as well as the information retrieval academic world. While I stepped away after 4 years from further active involvement in the Legal Track as a coordinator, I continue to be highly impressed with the work of the current track coordinators, led by Professor Doug Oard at the University of Maryland, who was remained at the helm since the very beginning.

To what extent has TREC’s research proven the reliability of computer-assisted review in litigation? Is there a danger that the profession assumes the reliability of computer-assisted review is a settled matter?

The TREC Legal Track results I am most familiar with through calendar year 2010 have shown computer-assisted review methods finding in some cases on the order of 85% of relevant documents (a .85 recall rate) per topic while only producing 10% false positives (a .90 precision rate). Not all search methods have had these results, and there has been in fact a wide variance in success achieved, but these returns are very promising when compared with historically lower rates of recall and precision across many information retrieval studies. So the success demonstrated to date is highly encouraging. Coupled with these results has been additional research reported by Maura Grossman & Gordon Cormack, in their much-cited paper Technology-Assisted Review in EDiscovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, which makes the case for the greater accuracy and efficiency of computer-assisted review methods.

Other research conducted outside of TREC, most notably by Herbert Roitblat, Patrick Oot and Anne Kershaw, also point in a similar direction (as reported in their article Mandating Reasonableness in a Reasonable Inquiry). All of these research efforts buttress the defensibility of technology-assisted review methods in actual litigation, in the event of future challenges. Having said this, I do agree that we are still in the early days of using many of the newer predictive types of automated search methods, and I would be concerned about courts simply taking on faith the results of past research as being applicable in all legal settings. There is no question however that the use of predictive analytics, clustering algorithms, and seed sets as part of technology-assisted review methods is saving law firms money and time in performing early case assessment and for multiple other purposes, as reported in a range of eDiscovery conferences and venues — and I of course support all of these good efforts.

You have discussed the need for industry standards in eDiscovery. What benefit would standards provide?

Ever since I served as Co-Editor in Chief on The Sedona Conference Commentary on Achieving Quality in eDiscovery (2009), I have been thinking that the process for conducting good eDiscovery. That paper focused on project management, sampling, and imposing various forms of quality controls on collection, review, and production. The question is, is a good eDiscovery process capable of being fit into a maturity model of sorts, and might be useful to consider whether vendors and law firms would benefit from having their in-house eDiscovery processes audited and certified as meeting some common baseline of quality? To this end, the DESI IV workshop ("Discovery of ESI") held in Pittsburgh last June, as part of the Thirteenth International AI and Law Conference (ICAIL 2011), had as its theme exploring what types of model standards could be imposed on the eDiscovery discipline, so that we all would be able to work from some common set of benchmarks, Some 75 people attended and 20-odd papers were presented. I believe the consensus in the room was that we should be pursuing further discussions as to what an ISO 9001-type quality standard would look like as applied to the specific eDiscovery sector, much as other industry verticals have their own ISO standards for quality. Since June, I have been in touch with some eDiscovery vendors have actually undergone an audit process to achieve ISO 9001 certification. This is an area where no consensus has yet emerged as to the path forward — but I will be pursuing further discussions with DESI workshop attendees in the coming months and promise to report back in this space as to what comes of these efforts.

What sort of standards would benefit the industry? Do we need standards for pieces of the eDiscovery process, like a defensible search standard, or are you talking about a broad quality assurance process?

DESI IV started by concentrating on what would constitute a defensible search standard; however, it became clear at the workshop and over the course of the past few months that we need to think bigger, in looking across the eDiscovery life cycle as to what constitutes best practices through automation and other means. We need to remember however that eDiscovery is a very young discipline, as we're only five years out from the 2006 Rules Amendments. I don't have all the answers, by any means, on what would constitute an acceptable set of standards, but I like to ask questions and believe in a process of continuous, lifelong learning. As I said, I promise I'll let you know about what success has been achieved in this space.

Thanks, Jason, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Why Predictive Coding is a Hot Topic


Yesterday, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

We all know the reasons why predictive coding is considered such a panacea, but it is easy to forget why it is needed and why the legal industry is still grappling with eDiscovery issues after so many years. Jason Baron, Director of Litigation at the U.S. National Archives and Records Administration, recently won the 2011 Emmett Leahy Award for excellence in records and information management. He took the opportunity to step back and consider why exactly the problem won’t go away. He believes that technology can help solve our problems, if applied intelligently. “We lawyers types remain stuck in a paradigm that too often relies on people and not automated technologies,” he said.

But he also warns that electronically stored data may soon overwhelm the profession. By now, readers of this blog are familiar with the dire and mind-boggling predictions about the volume of discoverable electronic data being created every day. Litigators are obviously concerned that new types of information and growing volumes of data will swamp the courts, but the problem could affect all aspects of modern life. “At the start of the second decade of the 21st century, we need to recognize that the time is now to prevent what I have termed the coming digital dark ages,” Baron said. “The ongoing and exponentially increasing explosion of information means that over the next several decades the world will be seeing records and information growth orders of magnitude greater than anything seen by humankind to date. We all need better ways to search through this information.”

As one of the leaders of the TREC Legal Track, a research experiment into searching large volumes of data more effectively, Baron has an intimate understanding of the challenges ahead, and he has serious concerns. “The paradox of our age is information overload followed by future inability to access anything of important. We cannot let that future happen” he said, talking to a roomful of records management experts and litigators. “We all need to be smarter in preventing this future dystopia.”

eDiscovery blogger Ralph Losey linked to both Judge Peck’s article and Jason’s speech, and expanded on those thoughts. Losey prefers to believe, as he wrote in a post called The Dawn of a Golden Age of Justice, that lawyers will not only survive, but thrive despite the explosion in information. “We must fight fire with fire by harnessing the new (Artificial Intelligence) capacities of computers,” he says. “If we boost our own intelligence and abilities with algorithmic agents we will be able to find the evidence we need in the trillions of documents implicated by even average disputes.”

So, what do you think? Will Artificial Intelligence in the hands of truth-seeking lawyers save us from information overload, or has the glut of electronic information already swamped the world? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: A Green Light for Predictive Coding?


There are a handful of judges whose pronouncements on anything eDiscovery-related are bound to get legal technologists talking. Judge Andrew Peck, United States magistrate judge for the Southern District of New York is one of them. His recent article, Search, Forward, published in Law Technology News, is one of the few judicial pronouncements on the use of predictive coding and has sparked a lively debate.

To date there is no reported case tackling the use of advanced computer-assisted search technology (“predictive coding” in the current vernacular) despite growing hype. Many litigators are hoping that judges will soon weigh in and give the profession some real guidance on the use of predictive coding in litigation. Peck says it will likely be a long time before a definitive statement come from the bench, but in the meantime his article provides perhaps the best insight into at least one judge’s thinking.

Judge Peck is probably best known in eDiscovery circles for the March 19, 2009 decision, William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009) (Peck, M.J.). In it, he called for "careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or 'keywords' to be used to produce emails or other electronically stored information".

Peck notes that lawyers are not eager to take the results of computer review before a judge and face possible rejection. However, he says those fears are misplaced, that admissibility is defined by content of a document, not how it was found. Peck also relies heavily on research we have discussed on this blog, including the TREC Legal Track, to argue that advanced search technology can provide defensible search methods.

While he stops short of green lighting the use of such technology, he does encourage lawyers in this direction. “Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval,” he writes. “In my opinion, computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ (Fed. R. Civ. P. 1) determination of cases in our e-discovery world.”

Silicon Valley consultant Mark Michels agrees with Peck’s article writing in Law Technology News that, “the key to (predictive coding’s) defensibility is upfront preparation to ensure that the applied tools and techniques are subject to thoughtful quality control during the review process.”

But other commenters are quick to point out the limitations of predictive coding. Ralph Losey expands on Peck’s argument, describing specific and defensible deployment of predictive coding (or Artificial Intelligence in Losey’s piece). He says predictive coding can speed up the process, but that the failure rate is still too high. Losey points out “the state of technology and law today still requires eyeballs on all ESI before it goes out the door and into the hands of the enemy,” he writes. “The negative consequences of disclosure of secrets, especially attorney-client privilege and work product privilege secrets, is simply too high.”

Judge Peck’s article is just one sign that thoughtful, technology-assisted review be deployed in litigation. Tomorrow, we will review some darker musings on the likelihood that predictive coding will save eDiscovery from the exploding universe of discoverable data.

So, what do you think? Is predictive coding ready for prime time?  Can lawyers confidently take results from new search technology before a judge without fear of rejection? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: Lawyers Versus Machines – Who’s “Winning”?


As discussed on this blog, mainstream publications including The New York Times and Forbes have noticed the rise of search technology in discovery, particularly predictive coding. The New York Times article, Armies of Expensive Lawyers, Replaced by Cheaper Software, inspired a lot of consternation in the legal community by proposing that technology was replacing human lawyers. Among the first to reply, Ralph Losey wrote a blog post New York Times Discovers eDiscovery, But Gets the Jobs Report  Wrong, arguing that “the supposed job-chilling impact of these new technologies on the legal profession was off the mark. In fact, the contrary is true.”

However, the Times article does point to a real trend – clients demanding that their outside counsel and litigation support teams use technology to work more efficiently. “Just because the “paper of record” says something doesn’t make it so, of course. But it does mean that every GC and Litigation DGC/AGC in America (and likely Canada) now has this trend on their radar,” litigation project management guru Steven Levy wrote on the blog Lexican.

The obvious problem with the New York Times article is that search and review is an iterative process and demands human intervention to make the machines involved function properly.  However, the missing piece of the discussion today is exactly what the relation between human reviewers and computers should be. There is a nascent movement to investigate this topic, finding the line where machine-led review ends and where human intervention is necessary.

Recent research by some of the leaders of the TREC Legal Track research project has begun to explore the interaction between human and machine review. Maura Grossman, a litigator with Wachtell, Lipton, Rosen & Katz and one of the TREC coordinators, and Gordon Cormack, a computer scientist and fellow TREC-er, wrote the research paper Technology Assisted Review in eDiscovery Can be More Effective and Efficient Than Manual Review. As the title indicates, human review cannot match the accuracy of technology-assisted review. However, the paper points out the need for a roadmap detailing the ideal interaction between human lawyers and machine review in litigation. “A technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege.”

What may be endangered is the existing review process, as it has traditionally been practiced, not human attorneys. Bennett Borden, an attorney with Williams Mullin, argues the linear review processes cannot produce the same results as the skillful use of technology. He has some interesting asides about the ways lawyers can do things computer searches cannot. For example, human reviewers are able to intuitively “come upon a scent” of relevant documents that machines missed. He says that reviewers not only are able to effectively pursue information by following leads initiated by a computer, but they actually enjoyed the process more than straight-ahead manual review.

Clearly, more research is needed in this area, but if lawyers are going to defend their role in litigation, defining the role of lawyers in discovery is an important question. What do you think?  Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: Same Old Story, Lawyers Struggling to “Get” eDiscovery


A couple of days ago, Law Technology News (LTN) published an article entitled Lawyers Struggle to Get a Grasp on E-Discovery, by Gina Passarella, via The Legal Intelligencer.  Noting that “[a]ttorneys have said e-discovery can eat up between 50 to 80 percent of a litigation budget”, the article had several good observations and quotes from various eDiscovery thought leaders, including:

  • Cozen O'Connor member David J. Walton, co-chairman of the firm's eDiscovery task force, who observed that “I'm afraid not to know [eDiscovery] because it dominates every part of a case”;
  • LDiscovery General Counsel Leonard Deutchman, who noted that the younger generation comfortable with the technology will soon be the judges and attorneys handling these matters, asked the question “what happens to those people that never change?”.  His answer: “They die.”
  • K&L Gates eDiscovery analysis and technology group Co-Chairman Thomas J. Smith noted that “A lot of the costs in e-discovery are driven by paranoia because counsel or the party themselves don't really know the rules and don't know what the case law says”.
  • Morgan Lewis & Bockius partner Stephanie A. "Tess" Blair heads up the firm's e-data practice and hopes that in five years eDiscovery will become more routine, noting “I think we're at the end of the beginning”.
  • Dechert's e-discovery practice guru Ben Barnett said, “Technology created the problem, so technology needs to solve it.”  But, David Cohen, the head of Reed Smith's eDiscovery practice, said that the increasing amount of data sources are keeping ahead of that process, saying “You have to make improvements in how you handle it just to tread water in terms of cost”.

There are several other good quotes and observations in the article, linked above.

On the heels of Jason Krause’s two part series on this blog regarding the various eDiscovery standards organizations, and the controversy regarding eDiscovery certification programs (referenced by this post regarding the certification program at The Organization of Legal Professionals), where do attorneys turn for information?  How do attorneys meet the competency requirements that the American Bar Association (ABA) Model Rules set forth, when an understanding of eDiscovery has become an increasing part of those requirements?

One common denominator of the firms quoted above is that they all have one or more individuals focused on managing the eDiscovery aspect of the cases in which they’re involved.  Having an eDiscovery specialist (or a team) can be a key component of effectively managing the discovery process.  If you’re a smaller firm and cannot devote a resource to managing eDiscovery, then find a competent provider that can assist when needed.

In addition to identifying an “expert” within or outside the firm, there are so many resources available for self-education that any attorney can investigate to boost their own eDiscovery “savvy”.  Join one of the standards organizations referenced in the two part series above.  Or, participate in a certification program.

One method for self-education that attorneys already know is case law research – while there is always variety in how some of the issues are handled by different courts, case decisions related to eDiscovery can certainly identify risks and issues that may need to be addressed or mitigated.  Subscribing to one or more resources that publish eDiscovery case law is a great way to keep abreast of developments.  And, I would be remiss if I didn’t note that eDiscovery Daily is one of those resources – in the nearly 11 month history of this blog, we have published 43 case law posts to date.  More to come, I’m sure… 😉

So, what do you think? Do you have a game plan for “getting” eDiscovery?  Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Standards: How Does an Industry Get Them?


As discussed yesterday, there is a nascent, but growing, movement pushing for industry standards in eDiscovery. That’s something many litigators may chafe at, thinking that standards and industry benchmarks impose checklists or management processes that tell them how to do their job. But industry standards, when implemented well, provide not only a common standard of care, but can help provide a point of comparison to help drive buying decisions.

It’s probably understandable that many of the calls for standards today focus on the search process. Judge Shira Scheindlin wrote in Pension Committee of the University of Montreal Pension Plan v. Banc of America Securities, LLC that a party’s “failure to assess the accuracy and validity of selected search terms” was tantamount to negligence.  As mentioned yesterday, the Text Retrieval Conference TREC Legal Track has been benchmarking different search strategies, even finding ways to optimize the search process. The ultimate goal is to provide baseline standards and guidelines to allow parties to determine if they are being successful in searching electronically stored information in litigation.

Within these technical discussions a new emerging thread is a call for ethical standards and codes of conduct. Jason Baron, National Archives' Director of Litigation and one of the coordinators of the TREC Legal Track, organized the SIRE workshop that concluded last week, focused on information retrieval issues in large data sets. However, even he, who has been working on optimizing search technology, recognizes the need for standards of care and ethics in eDiscovery to manage the human element. In a paper released earlier this year, he noted, “While there are no reported cases discussing the matter of ‘keyword search ethics,’ it is only a matter of time before courts are faced with deciding difficult issues regarding the duty of responding parties and their counsel to make adequate disclosures.”

The leading provider of industry standards is the Electronic Discovery Resource Model (EDRM), which has a number of projects and efforts underway to create common frameworks and standards for managing eDiscovery. Many of the EDRM’s ongoing projects are aimed at creating a framework, and not standards. In addition to the EDRM Framework familiar to many eDiscovery professionals, the group has produced an EDRM Model Code of Conduct Project to issue aspiring eDiscovery ethics guidelines and is working on a model Search Project.

But biggest piece of the discussion is how to create benchmarks and standards for repeatable, defensible, and consistent business processes through the entire eDiscovery process. There are no current quality standards for eDiscovery, but there are several models that could be adopted. For example, the ISO 9000 quality management system defines industry-specific quality standards and could be tailored to eDiscovery. The Capability Maturity Model Integration (CMMI) in software engineering follows a similar model, but unlike ISO, does not require annual updates for certification.

This is still a nascent movement, characterized more by workshops and panel discussions than by actual standards efforts. Recent events include EDRM 2011-2012 Kickoff Meeting, St Paul, MN, May 11-12, ICAIL 2011 DESI IV Workshop, Pittsburgh, PA, June 6, TREC Legal Track, Gaithersburg, MD, November, and the SIRE workshop at the Special Interest Group on Information Retrieval (SIGIR) SIGIR 2011 on July 28.

There seems to be a growing consensus that industry standards are not just useful, but likely necessary in eDiscovery. The Sedona Commentary on Achieving Quality in eDiscovery Principle 3 says, “Implementing a well thought out e-discovery process should seek to enhance the overall quality of the production in the form of: (a) reducing the time from request to response; (b) reducing cost; and (c) improving the accuracy and completeness of responses to requests.”

The question now seems to be, what type of standards need to be in place and who is going to craft them. So, what do you think?  Please share any comments you might have or if you'd like to know more about a particular topic.

Editor's Note: Welcome Jason Krause as a guest author to eDiscovery Daily blog!  Jason is a freelance writer in Madison, Wisconsin. He has written about technology and the law for more than a dozen years, and has been writing about EDD issues since the first Zubulake decisions. Jason began his career in Silicon Valley, writing about technology for The Industry Standard, and later served as the technology reporter for the ABA Journal. He can be reached at

  • 1
  • 2