Review

eDiscovery Best Practices: For Successful Predictive Coding, Start Randomly

 

Predictive coding is the hot eDiscovery topic of 2012, with three significant cases (Da Silva Moore v. Publicis Groupe, Global Aerospace v. Landow Aviation and Kleen Products v. Packaging Corp. of America) either approving or considering the use of predictive coding for eDiscovery.  So, how should your organization begin when preparing a collection for predictive coding discovery?  For best results, start randomly.

If that statement seems odd, let me explain. 

Predictive coding is the use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection.  That subset of the collection is often referred to as the “seed” set of documents.  How the seed set of documents is derived is important to the success of the predictive coding effort.

Random Sampling, It’s Not Just for Searching

When we ran our series of posts (available here, here and here) that discussed the best practices for random sampling to test search results, it’s important to note that searching is not the only eDiscovery activity where sampling a set of documents is a good practice.  It’s also a vitally important step for deriving that seed set of documents upon which the predictive coding software learning decisions will be made.  As is the case with any random sampling methodology, you have to begin by determining the appropriate sample size to represent the collection, based on your desired confidence level and an acceptable margin of error (as noted here).  To ensure that the sample is a proper representative sample of the collection, you must ensure that the sample is performed from the entire collection to be predictively coded.

Given the debate in the above cases regarding the acceptability of the proposed predictive coding approaches (especially Da Silva Moore), it’s important to be prepared to defend your predictive coding approach and conducting a random sample to generate the seed documents is a key step to defensibility of that approach.

Then, once the sample is generated, the next key to success is the use of a subject matter expert (SME) to make responsiveness determinations.  And, it’s important to conduct a sample (there’s that word again!) of the result set after the predictive coding process to determine whether the process achieved a sufficient quality in automatically coding the remainder of the collection.

So, what do you think?  Do you start your predictive coding efforts “randomly”?  You should.  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Use of Internet-Based Tools, Predictive Coding, Up in 2012, Says ABA

According to a recently released report from the American Bar Association (ABA), use of Internet-based electronic discovery tools and predictive coding has risen in 2012.  The 2012 ABA Legal Technology Survey Report: Litigation and Courtroom Technology (Volume III) discusses the use of technology related to litigation, ranging from hardware used in the courtroom to technology related to eDiscovery and e-filing. It includes a trend report summarizing this year’s notable results and highlighting changes from previous years.

Statistical Highlights

Here are some of the notable stats from the ABA study:

Use of Internet-based eDiscovery and Litigation Support

  • 44% of attorneys whose firm had handled an eDiscovery case said they had used Internet-based eDiscovery tools (up from 31% in 2011 – a 42% rise in usage);
  • In sole practitioner firms, 33% of attorneys said they had used Internet-based eDiscovery tools whereas nearly 67% of attorneys in large firms (500 or more attorneys) indicated they had used those tools;
  • 35% of attorneys said they had used Internet-based litigation support software (up from 24% in 2011 – a 46% rise in usage).

Use of Desktop-based eDiscovery and Litigation Support

  • Use of Desktop-based eDiscovery rose from 46% to 48% (just a 4% rise in usage) and use of Desktop-based Litigation Support remained the same at 46%.

Use of Predictive Coding Technology

  • 23% of those attorneys said they had used predictive coding technology to process or review ESI (up from 15% in 2011 – a 53% rise in usage);
  • Of the firms that have handled an eDiscovery case, only 5% of sole practitioners and only 6% of firms with less than 10 attorneys indicated they had used predictive coding technology whereas nearly 44% of attorneys in large firms said they used predictive coding.

Outsourcing

  • 44% of attorneys surveyed indicated that they outsourced work to eDiscovery consultants and companies (slightly down from 45% in 2011 – a 2% drop);
  • Outsourcing to computer forensics specialists remained unchanged at 42%, according to the survey;
  • On the other hand, 25% of respondents indicated that they outsource to attorneys in other firms (up from 16% in 2011 – a 56% rise!).  Hmmm…

All percentages rounded.

The 2012 ABA Legal Technology Survey Report is comprised of six volumes, with eDiscovery results discussed in Volume III (link above), which can be purchased from the ABA for $350 (or $300 if you’re an ABA member).  If you’re just interested in the trend report, the cost for that is $55 ($45 for ABA members).

So, what do you think?  Any surprises?  Do those numbers reflect your own usage of the technologies and outsourcing patterns?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: The Growth of eDiscovery is Transparent

 

With data in the world doubling every two years or so and the variety of issues that organizations need to address to manage that data from an eDiscovery standpoint, it would probably surprise none of you that the eDiscovery market is growing.  But, do you know how quickly the market is growing?

According to a new market report published by Transparency Market Research (and reported by BetaNews), the global eDiscovery market is expected to rise 275% from 2010 to 2017.  Their report eDiscovery (Software and Service) Market – Global Scenario, Trends, Industry Analysis, Size, Share and Forecast, 2010 – 2017 indicates that the global eDiscovery market was worth $3.6 billion in 2010 and is expected to reach $9.9 billion by 2017, growing at a Compound Annual Growth Rate (CAGR) of 15.4% during that time.  Here are some other noteworthy stats that they report and forecast:

  • The U.S. portion of the eDiscovery market was valued at $3.0 billion in 2010, and is estimated to grow at a CAGR of 13.3% from 2010 to 2017 to reach $7.2 billion by 2017 (240% total growth);
  • The eDiscovery market in the rest of the world was valued at $600 million in 2010, and is estimated to grow at a CAGR of 23.2% from 2010 to 2017 to reach $2.7 billion by 2017 (450% total growth – wow!);
  • Not surprisingly, the U.S. is expected to continue to be the leader in terms of revenue with 73% of global eDiscovery market share in 2017;
  • The report also breaks the market into software based eDiscovery and services based eDiscovery, with the global software based eDiscovery market valued at $1.1 billion in 2010 and expected to grow at a CAGR of 11.5% to reach $2.5 billion by 2017 (227% total growth) and the global services based eDiscovery market valued at $2.5 billion in 2010 and expected to grow at a CAGR of 17.0% to reach $7.4 billion by 2017 (296% total growth).

According to the report, key factors driving the global eDiscovery market include “increasing adoption of predictive coding, growing risk mitigation activities in organizations, increase in criminal prosecutions and civil litigation and growth of record management across various industries”.  They predict that “[i]n the next five years, the e-discovery industry growth will get further support from increasing automatic enterprise information archiving applications, growth in multi-media search for sound and visual data, next generation technology growth for cloud computing i.e. virtualization and increasing involvement of organizations in the social media space.”

The report also discusses topics such as pricing trends, competitor analysis, growth drivers, opportunities and inhibitors and provides company profiles of several big players in the industry.  The 96 page report is available in a single user license for $4,395 up to a corporate license for $10,395.

So, what do you think?  Do those growth numbers surprise you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Interview with Laura Zubulake of Zubulake’s e-Discovery, Part 2

 

Last week, we discussed the new book by Laura A. Zubulake, the plaintiff in probably most famous eDiscovery case ever (Zubulake vs. UBS Warburg), entitled Zubulake's e-Discovery: The Untold Story of my Quest for Justice.  I also conducted an interview with Laura last week to get her perspective on the book, including her reasons for writing the book seven years after the case ended and what she expects readers to learn from her story.

The book is the story of the Zubulake case – which resulted in one of the largest jury awards in the US for a single plaintiff in an employment discrimination case – as told by the author, in her words.  As Zubulake notes in the Preface, the book “is written from the plaintiff’s perspective – my perspective. I am a businessperson, not an attorney. The version of events and opinions expressed are portrayed by me from facts and circumstances as I perceived them.”  It’s a “classic David versus Goliath story” describing her multi-year struggle against her former employer – a multi-national financial giant.  The book is available at Amazon and also at CreateSpace.

Our interview with Laura had so much good information in it, we couldn’t fit it all into a single post.  Yesterday was part 1.  Here is the second and final part!

What advice would have for plaintiffs who face a similar situation to the one you faced?

I don’t give advice, and I’ll tell you why.  It’s because every case is different.  And, it’s not just the facts of the case but it’s also the personal lives of the plaintiffs.  So, it’s very difficult for me to do that.  Unless you’re in someone else’s shoes, you really can’t appreciate what they’re going through, so I don’t give advice.

What do you think about the state of eDiscovery today and where do you think that more attention could be paid to the discovery process?

While I don’t work in the industry day-to-day, I read a lot and keep up with the trends and it’s pretty incredible to me how it has changed over the past eight to nine years.  The first opinions in my case were in 2003 and 2004.  Back then, we had so little available with regard to technology and legal guidance.  When I attend a conference like LegalTech, I’m always amazed at the number of vendors and all the technology that’s now offered.  From that standpoint, how it has matured as an industry is a good thing.  However, I do believe that there are still important issues with regard to eDiscovery to be addressed.  When you read surveys and you see how many corporations still have yet to adopt certain aspects of the eDiscovery process, the fact that’s the case raises concern.  Some firms have not implemented litigation holds or document retention policies or an information governance structure to manage their information and you would think by now that a majority of corporations would have adopted something along those lines. 

I guess organizations still think discovery issues and sanctions won’t happen to them.  And, while I recognize the difficulty in a large organization with lots of employees to control everything and everybody, I’m surprised at the number of cases where sanctions occur.  I do read some of the case law and I do “scratch my head” from time to time.  So, I think there are still issues.

Obviously, the hot topic now is predictive coding.  My concern is that people perceive that as the “end all” and the ultimate answer to questions.  I think that processes like predictive coding will certainly help, but I think there’s still something to be said for the “human touch” when it comes to reviewing documents. I think that we’re making progress, but I think there is still more yet to go.

I read in an article that you were considering opening up an eDiscovery consulting practice.  Is that the case and, if so, what will be unique about your practice?

It’s something that I’m considering.  I’ve been working on the book, but I’d like to get back into more of a routine and perhaps focus on education for employees.  When people address eDiscovery issues, they look to implement technology and look to establish retention policies and procedures to implement holds, and that’s all good.  But, at the same time, I think there should be more efforts to educate the employees because they’re the ones who create the electronic documents.  Educate them as to the risks involved and procedures to follow to minimize those risks, such as litigation holds.  I think if you have an educated workforce and they understand that “less is more” when writing electronic documents, that they don’t always have to copy someone or forward something, that they can be more selective in their writing to reduce costs.

I think because of my background and my personal experiences and because I’m not an attorney, I can relate more to the typical worker.  I was on the trading desk and I know the day-to-day stresses of trying to manage email, trying to do the right thing, but also trying to be productive.  I think I can also relate to senior management and advise them that, although they may not recognize the risk, the risk is there.  And, that’s because I’ve been a worker, I’ve been on the trading desk, I’ve been through litigation, I’ve actually reviewed documents and I’ve gone to trial.  So, if you think that not implementing information governance or other eDiscovery policies is a good idea, that’s not the case.  Corporations should see this as an opportunity to manage information and use those management structures for the benefit of their company.

Thanks, Laura, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery History: Zubulake’s e-Discovery

 

In the 22 months since this blog began, we have published 133 posts related to eDiscovery case law.  When discussing the various case opinions that involve decisions regarding to eDiscovery, it’s easy to forget that there are real people impacted by these cases and that the story of each case goes beyond just whether they preserved, collected, reviewed and produced electronically stored information (ESI) correctly.  A new book, by the plaintiff in the most famous eDiscovery case ever, provides the “backstory” that goes beyond the precedent-setting opinions of the case, detailing her experiences through the events leading up to the case, as well as over three years of litigation.

Laura A. Zubulake, the plaintiff in the Zubulake vs. UBS Warburg case, has written a new book: Zubulake's e-Discovery: The Untold Story of my Quest for Justice.  It is the story of the Zubulake case – which resulted in one of the largest jury awards in the US for a single plaintiff in an employment discrimination case – as told by the author, in her words.  As Zubulake notes in the Preface, the book “is written from the plaintiff’s perspective – my perspective. I am a businessperson, not an attorney. The version of events and opinions expressed are portrayed by me from facts and circumstances as I perceived them.”  It’s a “classic David versus Goliath story” describing her multi-year struggle against her former employer – a multi-national financial giant.

Zubulake begins the story by developing an understanding of the Wall Street setting of her employer within which she worked for over twenty years and the growing importance of email in communications within that work environment.  It continues through a timeline of the allegations and the evidence that supported those allegations leading up to her filing of a discrimination claim with the Equal Employment Opportunity Commission (EEOC) and her subsequent dismissal from the firm.  This Allegations & Evidence chapter is particularly enlightening to those who may be familiar with the landmark opinions but not the underlying evidence and how that evidence to prove her case came together through the various productions (including the court-ordered productions from backup tapes).  The story continues through the filing of the case and the beginning of the discovery process and proceeds through the events leading up to each of the landmark opinions (with a separate chapter devoted each to Zubulake I, III, IV and V), then subsequently through trial, the jury verdict and the final resolution of the case.

Throughout the book, Zubulake relays her experiences, successes, mistakes, thought processes and feelings during the events and the difficulties and isolation of being an individual plaintiff in a three-year litigation process.  She also weighs in on the significance of each of the opinions, including one ruling by Judge Shira Scheindlin that may not have had as much impact on the outcome as you might think.  For those familiar with the opinions, the book provides the “backstory” that puts the opinions into perspective; for those not familiar with them, it’s a comprehensive account of an individual who fought for her rights against a large corporation and won.  Everybody loves a good “David versus Goliath story”, right?

The book is available at Amazon and also at CreateSpace.  Look for my interview with Laura regarding the book in this blog next week.

So, what do you think?  Are you familiar with the Zubulake opinions?  Have you read the book?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: The Number of Pages in Each Gigabyte Can Vary Widely

 

A while back, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs.  But, did you know just how widely the number of pages per gigabyte can vary?

The “how many pages” question comes up a lot and I’ve seen a variety of answers.  Michael Recker of Applied Discovery posted an article to their blog last week titled Just How Big Is a Gigabyte?, which provides some perspective based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file.  Many emails include attachments, which can be in any of a number of different file formats.  Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats.  So, estimating page counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application.  To illustrate this, I decided to conduct a little (admittedly unscientific) study using yesterday’s one page blog post about the Apple/Samsung litigation.  I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same.  Here are the results:

  • Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
  • HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
  • Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
  • Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
  • Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
  • Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too).  Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications.  As a result, to estimate the number of pages in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well.

So, what do you think?  Was this example useful or highly flawed?  Or both?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Review Attorneys, Are You Smarter than a High Schooler?

 

Review attorneys are taking a beating these days.  There’s so much attention being focused on technology assisted review, with the latest study noting the cost-effectiveness of technology assisted review (when compared to manual review) having just been released this month.  There is also the very detailed and well known white paper study written by Maura Grossman and Gordon Cormack (Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review) which notes not only the cost-effectiveness of technology assisted review but also that it was actually more accurate.

The latest study, from information scientist William Webber (and discussed in this Law Technology News article by Ralph Losey) seems to indicate that trained reviewers don’t provide any better review accuracy than a pair of high schoolers that he selected with “no legal training, and no prior e-discovery experience, aside from assessing a few dozen documents for a different TREC topic as part of a trial experiment”.  In fact, the two high schoolers did better!  He also notes that “[t]hey worked independently and without supervision or correction, though one would be correct to describe them as careful and motivated.”  His conclusion?

“The conclusion that can be reached, though, is that our assessors were able to achieve reliability (with or without detailed assessment guidelines) that is competitive with that of the professional reviewers — and also competitive with that of a commercial e-discovery vendor.”

Webber also cites two other studies with similar results and notes “All of this raises the question that is posed in the subject of this post: if (some) high school students are as reliable as (some) legally-trained, professional e-discovery reviewers, then is legal training a practical (as opposed to legal) requirement for reliable first-pass review for responsiveness? Or are care and general reading skills the more important factors?”

I have a couple of observations about the study.  Keep in mind, I’m not an attorney (and don’t play one on TV), but I have worked with review teams on several projects and have observed the review process and how it has been conducted in a real world setting, so I do have some real-world basis for my thoughts:

  • Two high schoolers is not a significant sample size: I’ve worked on several projects where some reviewers are really productive and others are highly unproductive to the point of being useless.  It’s difficult to determine a valid conclusion on the basis of two non-legal reviewers in his study and four non-legal reviewers in one of the studies that Webber cites.
  • Review is typically an iterative process: In my experience, most legal reviews that I’ve seen start with detailed instructions and training provided to the reviewers, followed up with regular (daily, if not more frequent) changes to instructions to reflect information gathered during the review process.  Instructions are refined as the review commences and more information is learned about the document collection.  Since Webber noted that “[t]hey worked independently and without supervision or correction”, it doesn’t appear that his review test was conducted in this manner.  This makes it less of a real world scenario, in my opinion.

I also think some reviews especially benefit from a first pass review with legal trained reviewers (for example, a reviewer who understands intellectual property laws is going to understand potential IP issues better than someone who hasn’t had the training in IP law).  Nonetheless, these studies are bound to “fan the flames” of debate regarding the effectiveness of manual attorney review (even more than they already are).

So, what do you think?  Do you think his study is valid?  Or do you have other concerns about the conclusions he has drawn?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Need to Catch Up on Trends Over the Last Six Weeks? Take a Time Capsule.

 

I try to set aside some time over the weekend to catch up on my reading and keep abreast of developments in the industry and although that’s sometimes that’s easier said than done, I stumbled across an interesting compilation of legal technology information from my friend Christy Burke and her team at Burke & Company.  On Friday, Burke & Company released The Legal Technology Observer (LTO) Time Capsule on Legal IT Professionals. LTO was a 6 week concentrated collection of essays, articles, surveys and blog posts providing expert practical knowledge about legal technology, eDiscovery, and social media for legal professionals.

The content has been formatted into a PDF version and is available for free download here.  As noted in their press release, Burke & Company's bloggers, including Christy, Melissa DiMercurio, Ada Spahija and Taylor Gould, as well as many distinguished guest contributors, set out to examine the trends, topics and perspectives that are driving today's legal technology world for 6 weeks from June 6 to July 12. They did so with help of many of the industry's most respected experts and LTO acquired more than 21,000 readers in just 6 weeks.  Nice job!

The LTO Time Capsule covers a wide range of topics related to legal technology.  There were several topics that have impact to eDiscovery, some of which included thought leaders previously interviewed on this blog (links to their our previous interviews with them below), including:

  • The EDRM Speaks My Language: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Learning to Speak EDRM: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Predictive Coding: Dozens of Names, No Definition, Lots of Controversy: Written by – Sharon D. Nelson, Esq. and John W. Simek.
  • Social Media 101 for Law Firms – Don’t Get Left Behind: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Kerry Scott Boll of JustEngage.
  • Results of Social Media 101 Snap-Poll: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC.
  • Getting up to Speed with eDiscovery: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Browning Marean, Senior Counsel at DLA Piper, San Diego.
  • LTO Interviews Craig Ball to Examine the Power of Computer Forensics: Written by – Melissa DiMercurio, Account Executive at Burke and Company LLC; Featuring – Expert Craig Ball, Trial Lawyer and Certified Computer Forensic Examiner.
  • LTO Asks Bob Ambrogi How a Lawyer Can Become a Legal Technology Expert: Written by – Melissa DiMercurio, Account Exectuive at Burke and Company LLC; Featuring – Bob Ambrogi, Practicing Lawyer, Writer and Media Consultant.
  • LTO Interviews Jeff Brandt about the Mysterious Cloud Computing Craze: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Jeff Brandt, Editor of PinHawk Law Technology Daily Digest.
  • Legal Technology Observer eDiscovery in America – A Legend in the Making: Written by – Christy Burke, President of Burke and Company LLC; Featuring – Barry Murphy, Analyst with the eDJ Group and Contributor to eDiscoveryJournal.com.
  • IT-Lex and the Sedona Conference® Provide Real Help to Learn eDiscovery and Technology Law: Written by – Christy Burke, President of Burke and Company LLC.

These are just some of the topics, particularly those that have an impact on eDiscovery.  To check out the entire list of articles, click here to download the report.

So, what do you think?  Do you need a quick resource to catch up on your reading?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Case Law: Judge Scheindlin Says “No” to Self-Collection, “Yes” to Predictive Coding

 

When most people think of the horrors of Friday the 13th, they think of Jason Voorhees.  When US Immigration and Customs thinks of Friday the 13th horrors, do they think of Judge Shira Scheindlin?

As noted in Law Technology News (Judge Scheindlin Issues Strong Opinion on Custodian Self-Collection, written by Ralph Losey, a previous thought leader interviewee on this blog), New York District Judge Scheindlin issued a decision last Friday (July 13) addressing the adequacy of searching and self-collection by government entity custodians in response to Freedom of Information Act (FOIA) requests.  As Losey notes, this is her fifth decision in National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al., including one that was later withdrawn.

Regarding the defendant’s question as to “why custodians could not be trusted to run effective searches of their own files, a skill that most office workers employ on a daily basis” (i.e., self-collect), Judge Scheindlin responded as follows:

“There are two answers to defendants' question. First, custodians cannot 'be trusted to run effective searches,' without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that 'contain reasonable specificity of detail rather than merely conclusory statements.' Defendants' counsel recognize that, for over twenty years, courts have required that these affidavits 'set [ ] forth the search terms and the type of search performed.' But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants' question has emerged from scholarship and caselaw only in recent years: most custodians cannot be 'trusted' to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: 'Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.' There is increasingly strong evidence that '[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.' As Judge Andrew Peck — one of this Court's experts in e-discovery — recently put it: 'In too many cases, however, the way lawyers choose keywords is the equivalent of the child's game of 'Go Fish' … keyword searches usually are not very effective.'”

Regarding search best practices and predictive coding, Judge Scheindlin noted:

“There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere. There is a 'need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or keywords to be used to produce emails or other electronically stored information.' And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.”

“Through iterative learning, these methods (known as 'computer-assisted' or 'predictive' coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies' unsupported assertions that their lay custodians have designed and conducted a reasonable search.”

Losey notes that “A classic analogy is that self-collection is equivalent to the fox guarding the hen house. With her latest opinion, Schiendlin [sic] includes the FBI and other agencies as foxes not to be trusted when it comes to searching their own email.”

So, what do you think?  Will this become another landmark decision by Judge Scheindlin?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective

 

As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays).  The overview of the 2011 TREC Legal Track can be found here.

The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.” 

However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”

Other notable tidbits from the study and article:

  • Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
  • Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
  • The document collection used was derived from the EDRM Enron Data Set;
  • The learning task had three distinct topics, each representing a distinct request for production.  A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
  • OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
  • One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”.  According to the LTN article, the barred participant was Recommind.

For more information, check out the links to the article and the study above.  TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.

So, what do you think?  Are you surprised by the results or are they expected?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.