Analysis

eDiscovery Case Law: No Kleen Sweep for Technology Assisted Review

 

For much of the year, proponents of predictive coding and other technology assisted review (TAR) concepts have been pointing to three significant cases where the technology based approaches have either been approved or are seriously being considered.  Da Silva Moore v. Publicis Groupe and Global Aerospace v. Landow Aviation are two of the cases, the third one is Kleen Products v. Packaging Corp. of America.  However, in the Kleen case, the parties have now reached an agreement to drop the TAR-based approach, at least for the first request for production.

Background and Debate Regarding Search Approach

On February 21, the plaintiffs asked Magistrate Judge Nan Nolan to require the producing parties to employ a technology assisted review approach (referred to as "content-based advanced analytics," or CBAA) in their production of documents for discovery purposes.

In their filing, the plaintiffs claimed that “[t]he large disparity between the effectiveness of [the computer-assisted coding] methodology and Boolean keyword search methodology demonstrates that Defendants cannot establish that their proposed [keyword] search methodology is reasonable and adequate as they are required.”  Citing studies conducted between 1994 and 2011 claimed to demonstrate the superiority of computer-assisted review over keyword approaches, the plaintiffs claimed that computer-assisted coding retrieved for production “70 percent (worst case) of responsive documents rather than no more than 24 percent (best case) for Defendants’ Boolean, keyword search.”

In their response, the defendants contended that the plaintiffs "provided no legitimate reason that this Court should deviate here from reliable, recognized, and established discovery practices" in favor of their "unproven" CBAA methods. The defendants also emphasized that they have "tested, independently validated, and implemented a search term methodology that is wholly consistent with the case law around the nation and that more than satisfies the ESI production guidelines endorsed by the Seventh Circuit and the Sedona Conference." Having (according to their briefing) already produced more than one million pages of documents using their search methods, the defendants conveyed outrage that the plaintiffs would ask the court to "establish a new and radically different ESI standard for cases in this District."

Stipulation and Order

After “a substantial number of written submissions and oral presentations to the Court” regarding the search technology issue, “in order to narrow the issues, the parties have reached an agreement that will obviate the need for additional evidentiary hearings on the issue of the technology to be used to search for documents responsive to the First Requests.”  That agreement was memorialized this week in the Stipulation and Order Relating to ESI Search (link to stipulation courtesy of Law.com).  As part of that agreement, the plaintiffs have withdrawn their demand that the defendants apply CBAA to the first production request (referred to in the stipulation as the “First Request Corpus”). 

As for productions beyond the First Request Corpus, the plaintiffs also agreed not to “argue or contend” that the defendants should be required to CBAA or “predictive coding” with respect to any requests for production served on any defendant prior to October 1, 2013.  As for requests for production served after October 1, 2013, it was agreed that the parties would “meet and confer regarding the appropriate search methodology to be used for such newly collected documents”, with the ability for either party to file a motion if they can’t agree.  So, there will be no TAR-based approach in the Kleen case, at least until next October.

So, what do you think?  Does this signal a difficulty in obtaining approval for TAR-based approaches?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: For Successful Predictive Coding, Start Randomly

 

Predictive coding is the hot eDiscovery topic of 2012, with three significant cases (Da Silva Moore v. Publicis Groupe, Global Aerospace v. Landow Aviation and Kleen Products v. Packaging Corp. of America) either approving or considering the use of predictive coding for eDiscovery.  So, how should your organization begin when preparing a collection for predictive coding discovery?  For best results, start randomly.

If that statement seems odd, let me explain. 

Predictive coding is the use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection.  That subset of the collection is often referred to as the “seed” set of documents.  How the seed set of documents is derived is important to the success of the predictive coding effort.

Random Sampling, It’s Not Just for Searching

When we ran our series of posts (available here, here and here) that discussed the best practices for random sampling to test search results, it’s important to note that searching is not the only eDiscovery activity where sampling a set of documents is a good practice.  It’s also a vitally important step for deriving that seed set of documents upon which the predictive coding software learning decisions will be made.  As is the case with any random sampling methodology, you have to begin by determining the appropriate sample size to represent the collection, based on your desired confidence level and an acceptable margin of error (as noted here).  To ensure that the sample is a proper representative sample of the collection, you must ensure that the sample is performed from the entire collection to be predictively coded.

Given the debate in the above cases regarding the acceptability of the proposed predictive coding approaches (especially Da Silva Moore), it’s important to be prepared to defend your predictive coding approach and conducting a random sample to generate the seed documents is a key step to defensibility of that approach.

Then, once the sample is generated, the next key to success is the use of a subject matter expert (SME) to make responsiveness determinations.  And, it’s important to conduct a sample (there’s that word again!) of the result set after the predictive coding process to determine whether the process achieved a sufficient quality in automatically coding the remainder of the collection.

So, what do you think?  Do you start your predictive coding efforts “randomly”?  You should.  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Use of Internet-Based Tools, Predictive Coding, Up in 2012, Says ABA

According to a recently released report from the American Bar Association (ABA), use of Internet-based electronic discovery tools and predictive coding has risen in 2012.  The 2012 ABA Legal Technology Survey Report: Litigation and Courtroom Technology (Volume III) discusses the use of technology related to litigation, ranging from hardware used in the courtroom to technology related to eDiscovery and e-filing. It includes a trend report summarizing this year’s notable results and highlighting changes from previous years.

Statistical Highlights

Here are some of the notable stats from the ABA study:

Use of Internet-based eDiscovery and Litigation Support

  • 44% of attorneys whose firm had handled an eDiscovery case said they had used Internet-based eDiscovery tools (up from 31% in 2011 – a 42% rise in usage);
  • In sole practitioner firms, 33% of attorneys said they had used Internet-based eDiscovery tools whereas nearly 67% of attorneys in large firms (500 or more attorneys) indicated they had used those tools;
  • 35% of attorneys said they had used Internet-based litigation support software (up from 24% in 2011 – a 46% rise in usage).

Use of Desktop-based eDiscovery and Litigation Support

  • Use of Desktop-based eDiscovery rose from 46% to 48% (just a 4% rise in usage) and use of Desktop-based Litigation Support remained the same at 46%.

Use of Predictive Coding Technology

  • 23% of those attorneys said they had used predictive coding technology to process or review ESI (up from 15% in 2011 – a 53% rise in usage);
  • Of the firms that have handled an eDiscovery case, only 5% of sole practitioners and only 6% of firms with less than 10 attorneys indicated they had used predictive coding technology whereas nearly 44% of attorneys in large firms said they used predictive coding.

Outsourcing

  • 44% of attorneys surveyed indicated that they outsourced work to eDiscovery consultants and companies (slightly down from 45% in 2011 – a 2% drop);
  • Outsourcing to computer forensics specialists remained unchanged at 42%, according to the survey;
  • On the other hand, 25% of respondents indicated that they outsource to attorneys in other firms (up from 16% in 2011 – a 56% rise!).  Hmmm…

All percentages rounded.

The 2012 ABA Legal Technology Survey Report is comprised of six volumes, with eDiscovery results discussed in Volume III (link above), which can be purchased from the ABA for $350 (or $300 if you’re an ABA member).  If you’re just interested in the trend report, the cost for that is $55 ($45 for ABA members).

So, what do you think?  Any surprises?  Do those numbers reflect your own usage of the technologies and outsourcing patterns?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: Assessing Your Data Before Meet and Confer Shouldn’t Be Expensive

 

So, you’re facing litigation and you need help from an outside provider to “get your ducks in a row” to understand how much data you have, how many documents have hits on key terms and estimate the costs to process, review and produce the data so that you’re in the best position to negotiate appropriate terms at the Rule 26(f) conference (aka, meet and confer).  But, how much does it cost to do all that?  It shouldn’t be expensive.  In fact, it could even be free.

Metadata Inventory

Once you’ve collected data from your custodians, it’s important to understand how much data you have for each custodian and how much data is stored on each media collected.  You should also be able to break the collection down by file type and by date range.  A provider should be able to process the data and provide a metadata inventory of the collected electronically stored information (ESI) that enables the inventory to be queried by:

  • Data source (hard drive, folder, or custodian)
  • Folder names and sizes
  • File names and sizes
  • Volume by file type
  • Date created and last date modified

When this done prior to the Rule 26(f) conference, it enables your legal team to intelligently negotiate at the conference by understanding the potential volume (and therefore potential cost) of including or excluding certain custodians, document types, or date ranges in the discovery order. 

Word Index of the Collection

Want to get a sense of how many documents mention each of the key players in the case?  Or, how many mention the key issues?  After a simple index of the data, a provider should be able to at least provide a consolidated report of all the words (not including stop words, of course), from all sources that includes number of occurrences for each word in the collected ESI (at least for files that contain embedded text).  This initial index won’t catch everything – image-only files and exception (e.g., corrupted or password protected) files won’t be included – but it will enable your legal team to intelligently negotiate at the meet and confer by understanding the potential volume (and therefore potential cost) of including or excluding certain key words in the discovery order.

eDiscovery Budget Worksheet

Loading the metadata inventory into an eDiscovery budget worksheet that includes standard performance data (such as document review production statistics) and projected billing rates and costs can provide a working eDiscovery project budget projection for the case.  This projection can enable your legal team to advise their client of projected costs of the case, negotiate cost sharing or cost burden arguments in the meet and confer, and create a better discovery production strategy.

It shouldn’t be expensive to prepare these items to develop an initial assessment of the case to prepare for the Rule 26(f) conference.  In fact, the company that I work for, CloudNine Discovery, provides these services for free.  But, regardless who you use, it’s important to assess your data before the meet and confer to enable your legal team to understand the potential costs and risks associated with the case and negotiate the best possible approach for your client.

So, what do you think?  What analysis and data assessment do you perform prior to the meet and confer?  Please share any comments you might have or if you’d like to know more about a particular topic.

P.S.: No ducks were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: The Growth of eDiscovery is Transparent

 

With data in the world doubling every two years or so and the variety of issues that organizations need to address to manage that data from an eDiscovery standpoint, it would probably surprise none of you that the eDiscovery market is growing.  But, do you know how quickly the market is growing?

According to a new market report published by Transparency Market Research (and reported by BetaNews), the global eDiscovery market is expected to rise 275% from 2010 to 2017.  Their report eDiscovery (Software and Service) Market – Global Scenario, Trends, Industry Analysis, Size, Share and Forecast, 2010 – 2017 indicates that the global eDiscovery market was worth $3.6 billion in 2010 and is expected to reach $9.9 billion by 2017, growing at a Compound Annual Growth Rate (CAGR) of 15.4% during that time.  Here are some other noteworthy stats that they report and forecast:

  • The U.S. portion of the eDiscovery market was valued at $3.0 billion in 2010, and is estimated to grow at a CAGR of 13.3% from 2010 to 2017 to reach $7.2 billion by 2017 (240% total growth);
  • The eDiscovery market in the rest of the world was valued at $600 million in 2010, and is estimated to grow at a CAGR of 23.2% from 2010 to 2017 to reach $2.7 billion by 2017 (450% total growth – wow!);
  • Not surprisingly, the U.S. is expected to continue to be the leader in terms of revenue with 73% of global eDiscovery market share in 2017;
  • The report also breaks the market into software based eDiscovery and services based eDiscovery, with the global software based eDiscovery market valued at $1.1 billion in 2010 and expected to grow at a CAGR of 11.5% to reach $2.5 billion by 2017 (227% total growth) and the global services based eDiscovery market valued at $2.5 billion in 2010 and expected to grow at a CAGR of 17.0% to reach $7.4 billion by 2017 (296% total growth).

According to the report, key factors driving the global eDiscovery market include “increasing adoption of predictive coding, growing risk mitigation activities in organizations, increase in criminal prosecutions and civil litigation and growth of record management across various industries”.  They predict that “[i]n the next five years, the e-discovery industry growth will get further support from increasing automatic enterprise information archiving applications, growth in multi-media search for sound and visual data, next generation technology growth for cloud computing i.e. virtualization and increasing involvement of organizations in the social media space.”

The report also discusses topics such as pricing trends, competitor analysis, growth drivers, opportunities and inhibitors and provides company profiles of several big players in the industry.  The 96 page report is available in a single user license for $4,395 up to a corporate license for $10,395.

So, what do you think?  Do those growth numbers surprise you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Interview with Laura Zubulake of Zubulake’s e-Discovery, Part 2

 

Last week, we discussed the new book by Laura A. Zubulake, the plaintiff in probably most famous eDiscovery case ever (Zubulake vs. UBS Warburg), entitled Zubulake's e-Discovery: The Untold Story of my Quest for Justice.  I also conducted an interview with Laura last week to get her perspective on the book, including her reasons for writing the book seven years after the case ended and what she expects readers to learn from her story.

The book is the story of the Zubulake case – which resulted in one of the largest jury awards in the US for a single plaintiff in an employment discrimination case – as told by the author, in her words.  As Zubulake notes in the Preface, the book “is written from the plaintiff’s perspective – my perspective. I am a businessperson, not an attorney. The version of events and opinions expressed are portrayed by me from facts and circumstances as I perceived them.”  It’s a “classic David versus Goliath story” describing her multi-year struggle against her former employer – a multi-national financial giant.  The book is available at Amazon and also at CreateSpace.

Our interview with Laura had so much good information in it, we couldn’t fit it all into a single post.  Yesterday was part 1.  Here is the second and final part!

What advice would have for plaintiffs who face a similar situation to the one you faced?

I don’t give advice, and I’ll tell you why.  It’s because every case is different.  And, it’s not just the facts of the case but it’s also the personal lives of the plaintiffs.  So, it’s very difficult for me to do that.  Unless you’re in someone else’s shoes, you really can’t appreciate what they’re going through, so I don’t give advice.

What do you think about the state of eDiscovery today and where do you think that more attention could be paid to the discovery process?

While I don’t work in the industry day-to-day, I read a lot and keep up with the trends and it’s pretty incredible to me how it has changed over the past eight to nine years.  The first opinions in my case were in 2003 and 2004.  Back then, we had so little available with regard to technology and legal guidance.  When I attend a conference like LegalTech, I’m always amazed at the number of vendors and all the technology that’s now offered.  From that standpoint, how it has matured as an industry is a good thing.  However, I do believe that there are still important issues with regard to eDiscovery to be addressed.  When you read surveys and you see how many corporations still have yet to adopt certain aspects of the eDiscovery process, the fact that’s the case raises concern.  Some firms have not implemented litigation holds or document retention policies or an information governance structure to manage their information and you would think by now that a majority of corporations would have adopted something along those lines. 

I guess organizations still think discovery issues and sanctions won’t happen to them.  And, while I recognize the difficulty in a large organization with lots of employees to control everything and everybody, I’m surprised at the number of cases where sanctions occur.  I do read some of the case law and I do “scratch my head” from time to time.  So, I think there are still issues.

Obviously, the hot topic now is predictive coding.  My concern is that people perceive that as the “end all” and the ultimate answer to questions.  I think that processes like predictive coding will certainly help, but I think there’s still something to be said for the “human touch” when it comes to reviewing documents. I think that we’re making progress, but I think there is still more yet to go.

I read in an article that you were considering opening up an eDiscovery consulting practice.  Is that the case and, if so, what will be unique about your practice?

It’s something that I’m considering.  I’ve been working on the book, but I’d like to get back into more of a routine and perhaps focus on education for employees.  When people address eDiscovery issues, they look to implement technology and look to establish retention policies and procedures to implement holds, and that’s all good.  But, at the same time, I think there should be more efforts to educate the employees because they’re the ones who create the electronic documents.  Educate them as to the risks involved and procedures to follow to minimize those risks, such as litigation holds.  I think if you have an educated workforce and they understand that “less is more” when writing electronic documents, that they don’t always have to copy someone or forward something, that they can be more selective in their writing to reduce costs.

I think because of my background and my personal experiences and because I’m not an attorney, I can relate more to the typical worker.  I was on the trading desk and I know the day-to-day stresses of trying to manage email, trying to do the right thing, but also trying to be productive.  I think I can also relate to senior management and advise them that, although they may not recognize the risk, the risk is there.  And, that’s because I’ve been a worker, I’ve been on the trading desk, I’ve been through litigation, I’ve actually reviewed documents and I’ve gone to trial.  So, if you think that not implementing information governance or other eDiscovery policies is a good idea, that’s not the case.  Corporations should see this as an opportunity to manage information and use those management structures for the benefit of their company.

Thanks, Laura, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: The Number of Pages in Each Gigabyte Can Vary Widely

 

A while back, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs.  But, did you know just how widely the number of pages per gigabyte can vary?

The “how many pages” question comes up a lot and I’ve seen a variety of answers.  Michael Recker of Applied Discovery posted an article to their blog last week titled Just How Big Is a Gigabyte?, which provides some perspective based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file.  Many emails include attachments, which can be in any of a number of different file formats.  Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats.  So, estimating page counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application.  To illustrate this, I decided to conduct a little (admittedly unscientific) study using yesterday’s one page blog post about the Apple/Samsung litigation.  I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same.  Here are the results:

  • Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
  • HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
  • Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
  • Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
  • Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
  • Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too).  Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications.  As a result, to estimate the number of pages in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well.

So, what do you think?  Was this example useful or highly flawed?  Or both?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Need to Catch Up on Trends Over the Last Six Weeks? Take a Time Capsule.

 

I try to set aside some time over the weekend to catch up on my reading and keep abreast of developments in the industry and although that’s sometimes that’s easier said than done, I stumbled across an interesting compilation of legal technology information from my friend Christy Burke and her team at Burke & Company.  On Friday, Burke & Company released The Legal Technology Observer (LTO) Time Capsule on Legal IT Professionals. LTO was a 6 week concentrated collection of essays, articles, surveys and blog posts providing expert practical knowledge about legal technology, eDiscovery, and social media for legal professionals.

The content has been formatted into a PDF version and is available for free download here.  As noted in their press release, Burke & Company's bloggers, including Christy, Melissa DiMercurio, Ada Spahija and Taylor Gould, as well as many distinguished guest contributors, set out to examine the trends, topics and perspectives that are driving today's legal technology world for 6 weeks from June 6 to July 12. They did so with help of many of the industry's most respected experts and LTO acquired more than 21,000 readers in just 6 weeks.  Nice job!

The LTO Time Capsule covers a wide range of topics related to legal technology.  There were several topics that have impact to eDiscovery, some of which included thought leaders previously interviewed on this blog (links to their our previous interviews with them below), including:

  • The EDRM Speaks My Language: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Learning to Speak EDRM: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Predictive Coding: Dozens of Names, No Definition, Lots of Controversy: Written by – Sharon D. Nelson, Esq. and John W. Simek.
  • Social Media 101 for Law Firms – Don’t Get Left Behind: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Kerry Scott Boll of JustEngage.
  • Results of Social Media 101 Snap-Poll: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC.
  • Getting up to Speed with eDiscovery: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Browning Marean, Senior Counsel at DLA Piper, San Diego.
  • LTO Interviews Craig Ball to Examine the Power of Computer Forensics: Written by – Melissa DiMercurio, Account Executive at Burke and Company LLC; Featuring – Expert Craig Ball, Trial Lawyer and Certified Computer Forensic Examiner.
  • LTO Asks Bob Ambrogi How a Lawyer Can Become a Legal Technology Expert: Written by – Melissa DiMercurio, Account Exectuive at Burke and Company LLC; Featuring – Bob Ambrogi, Practicing Lawyer, Writer and Media Consultant.
  • LTO Interviews Jeff Brandt about the Mysterious Cloud Computing Craze: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Jeff Brandt, Editor of PinHawk Law Technology Daily Digest.
  • Legal Technology Observer eDiscovery in America – A Legend in the Making: Written by – Christy Burke, President of Burke and Company LLC; Featuring – Barry Murphy, Analyst with the eDJ Group and Contributor to eDiscoveryJournal.com.
  • IT-Lex and the Sedona Conference® Provide Real Help to Learn eDiscovery and Technology Law: Written by – Christy Burke, President of Burke and Company LLC.

These are just some of the topics, particularly those that have an impact on eDiscovery.  To check out the entire list of articles, click here to download the report.

So, what do you think?  Do you need a quick resource to catch up on your reading?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Case Law: Judge Scheindlin Says “No” to Self-Collection, “Yes” to Predictive Coding

 

When most people think of the horrors of Friday the 13th, they think of Jason Voorhees.  When US Immigration and Customs thinks of Friday the 13th horrors, do they think of Judge Shira Scheindlin?

As noted in Law Technology News (Judge Scheindlin Issues Strong Opinion on Custodian Self-Collection, written by Ralph Losey, a previous thought leader interviewee on this blog), New York District Judge Scheindlin issued a decision last Friday (July 13) addressing the adequacy of searching and self-collection by government entity custodians in response to Freedom of Information Act (FOIA) requests.  As Losey notes, this is her fifth decision in National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al., including one that was later withdrawn.

Regarding the defendant’s question as to “why custodians could not be trusted to run effective searches of their own files, a skill that most office workers employ on a daily basis” (i.e., self-collect), Judge Scheindlin responded as follows:

“There are two answers to defendants' question. First, custodians cannot 'be trusted to run effective searches,' without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that 'contain reasonable specificity of detail rather than merely conclusory statements.' Defendants' counsel recognize that, for over twenty years, courts have required that these affidavits 'set [ ] forth the search terms and the type of search performed.' But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants' question has emerged from scholarship and caselaw only in recent years: most custodians cannot be 'trusted' to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: 'Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.' There is increasingly strong evidence that '[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.' As Judge Andrew Peck — one of this Court's experts in e-discovery — recently put it: 'In too many cases, however, the way lawyers choose keywords is the equivalent of the child's game of 'Go Fish' … keyword searches usually are not very effective.'”

Regarding search best practices and predictive coding, Judge Scheindlin noted:

“There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere. There is a 'need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or keywords to be used to produce emails or other electronically stored information.' And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.”

“Through iterative learning, these methods (known as 'computer-assisted' or 'predictive' coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies' unsupported assertions that their lay custodians have designed and conducted a reasonable search.”

Losey notes that “A classic analogy is that self-collection is equivalent to the fox guarding the hen house. With her latest opinion, Schiendlin [sic] includes the FBI and other agencies as foxes not to be trusted when it comes to searching their own email.”

So, what do you think?  Will this become another landmark decision by Judge Scheindlin?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective

 

As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays).  The overview of the 2011 TREC Legal Track can be found here.

The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.” 

However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”

Other notable tidbits from the study and article:

  • Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
  • Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
  • The document collection used was derived from the EDRM Enron Data Set;
  • The learning task had three distinct topics, each representing a distinct request for production.  A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
  • OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
  • One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”.  According to the LTN article, the barred participant was Recommind.

For more information, check out the links to the article and the study above.  TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.

So, what do you think?  Are you surprised by the results or are they expected?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.