eDiscovery Best Practices: Judges’ Guide to Cost-Effective eDiscovery


Last week at LegalTech, I met Joe Howie at the blogger’s breakfast on Tuesday morning.  Joe is the founder of Howie Consulting and is the Director of Metrics Development and Communications for the eDiscovery Institute, which is a 501(c)(3) nonprofit research organization for eDiscovery.

eDiscovery Institute has just released a new publication that is a vendor-neutral guide for approaches to considerably reduce discovery costs for ESI.  The Judges’ Guide to Cost-Effective E-Discovery, co-written by Anne Kershaw (co-Founder and President of the eDiscovery Institute) and Joe Howie, also contains a foreword by the Hon. James C. Francis IV, Magistrate Judge for the Southern District of New York.  Joe gave me a copy of the guide, which I read during my flight back to Houston and found to be a terrific publication that details various mechanisms that can reduce the volume of ESI to review by up to 90 percent or more.  You can download the publication here (for personal review, not re-publication), and also read a summary article about it from Joe in InsideCounsel here.

Mechanisms for reducing costs covered in the Guide include:

  • DeNISTing: Excluding files known to be associated with commercial software, such as help files, templates, etc., as compiled by the National Institute of Standards and Technology, can eliminate a high number of files that will clearly not be responsive;
  • Duplicate Consolidation (aka “deduping”): Deduping across custodians as opposed to just within custodians reduces costs 38% for across-custodian as opposed to 21% for within custodian;
  • Email Threading: The ability to review the entire email thread at once reduces costs 36% over having to review each email in the thread;
  • Domain Name Analysis (aka Domain Categorization): As noted previously in eDiscoveryDaily, the ability to classify items based on the domain of the sender of the email can significantly reduce the collection to be reviewed by identifying emails from parties that are clearly not responsive to the case.  It can also be a great way to quickly identify some of the privileged emails;
  • Predictive Coding: As noted previously in eDiscoveryDaily, predictive coding is the use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. According to this report, “A recent survey showed that, on average, predictive coding reduced review costs by 45 percent, with several respondents reporting much higher savings in individual cases”.

The publication also addresses concepts such as focused sampling, foreign language translation costs and searching audio records and tape backups.  It even addresses some of the most inefficient (and therefore, costly) practices of ESI processing and review, such as wholesale printing of ESI to paper for review (either in paper form or ultimately converted to TIFF or PDF), which is still more common than you might think.  Finally, it references some key rules of the ABA Model Rules of Professional Conduct to address the ethical duty of attorneys in effective management of ESI.  It’s a comprehensive publication that does a terrific job of explaining best practices for efficient discovery of ESI.

So, what do you think?  How many of these practices have been implemented by your organization?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: When is Attorney-Client Communication NOT Privileged?

One answer: When it’s from your work email account, and your employer has a written policy that company email is not private and subject to audit.  Oh, and you’re suing your employer.

In Holmes v. Petrovich Dev. Co., LLC, 2011 WL 117230 (Cal. Ct. App. Jan. 13, 2011), a California court of appeals upheld a trial court ruling that emails from a plaintiff to her attorney via her company’s computer “did not constitute ‘confidential communication between client and lawyer’ within the meaning of Evidence Code section 952” and thus were not privileged.

The plaintiff, Gina Holmes worked as an executive assistant at Petrovich Development of Sacramento, California.  When hired, she read and signed the company’s policies regarding use of computers, which informed employees that they had no right of privacy to any personal information created or maintained on company computers, and that such information was subject to monitoring.

Holmes claimed Petrovich Development became hostile when it found out she was pregnant shortly after being hired in 2004 and used her company’s computer to communicate with an attorney, eventually quitting her job and suing her employer.  During the case, emails between her and her attorney were introduced at trial “to show Holmes did not suffer severe emotional distress, was only frustrated and annoyed, and filed the action at the urging of her attorney”.  Despite plaintiff’s protests that the emails were privileged, they were not excluded from evidence at trial.  Rather, the trial court ruled that the emails “were not protected … because they were not private.”  Because the plaintiff did not prevail on any of her claims, she appealed, claiming the court erred in failing to exclude the emails.

In a 3-0 decision by the Sacramento Third Appellate District, they affirmed the findings of the trial court, stating that the plaintiff’s use of the company computer after being expressly advised that her messages were not private was “akin to consulting her attorney in one of defendants’ conference rooms, in a loud voice, with the door open, yet unreasonably expecting that the conversation overheard … would be privileged.”.  The court also noted that “communication under these circumstances is not a “‘confidential communication between client and lawyer’ “ within the meaning of section 952 because it is not transmitted “by a means which, so far as the client is aware, discloses the information to no third persons other than those who are present to further the interest of the client in the consultation….”.

So, what do you think?  Was justice served?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: EDRM Data Set for Great Test Data


In it’s almost six years of existence, the Electronic Discovery Reference Model (EDRM) Project has implemented a number of mechanisms to standardize the practice of eDiscovery.  Having worked on the EDRM Metrics project for the past four years, I have seen some of those mechanisms implemented firsthand.

One of the most significant recent accomplishments by EDRM is the EDRM Data Set.  Anyone who works with eDiscovery applications and processes understands the importance to be able to test those applications in as many ways as possible using realistic data that will illustrate expected results.  The use of test data is extremely useful in crafting a defensible discovery approach, by enabling you to determine the expected results within those applications and processes before using them with your organization’s live data.  It can also help you identify potential anomalies (those never occur, right?) up front so that you can be proactive to develop an approach to address those anomalies before encountering them in your own data.

Using public domain data from Enron Corporation (originating from the Federal Energy Regulatory Commission Enron Investigation), the EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) to test those eDiscovery applications and processes.  In 2009, the EDRM Data Set project released its first version of the Enron Data Set, comprised of Enron e-mail messages and attachments within Outlook PST files, organized in 32 zipped files.

This past November, the EDRM Data Set project launched Version 2 of the EDRM Enron Email Data Set.  Straight from the press release announcing the launch, here are some of the improvements in the newest version:

  • Larger Data Set: Contains 1,227,255 emails with 493,384 attachments (included in the emails) covering 151 custodians;
  • Rich Metadata: Includes threading information, tracking IDs, and general Internet headers;
  • Multiple Email Formats: Provision of both full and de-duplicated email in PST, MIME and EDRM XML, which allows organizations to test and compare results across formats.

The Text REtrieval Conference (TREC) Legal Track project provided input for this version of the data set, which, as noted previously on this blog, has used the EDRM data set for its research.  Kudos to John Wang, Project Lead for the EDRM Data Set Project and Product Manager at ZL Technologies, Inc., and the rest of the Data Set team for such an extensive test set collection!

So, what do you think?  Do you use the EDRM Data Set for testing your eDiscovery processes?  Please share any comments you might have or if you’d like to know more about a particular topic.