eDiscovery Daily Blog

EDRM Updates Statistical Sampling Applied to Electronic Discovery Guide – eDiscovery Trends

Over two years ago, we covered EDRM’s initial announcement of a new guide called Statistical Sampling Applied to Electronic Discovery.  Now, they have announced an updated version of the guide.

The release of EDRM’s Statistical Sampling Applied to Electronic Discovery, Release 2, announced last week and published on the EDRM website, is open for public comment until January 9, 2015, after which any input received will be reviewed and considered for incorporation before the updated materials are finalized.

As EDRM notes in their announcement, “The updated materials provide guidance regarding the use of statistical sampling in e-discovery. Much of the information is definitional and conceptual and intended for a broad audience. Other materials (including an accompanying spreadsheet) provide additional information, particularly technical information, for e-discovery practitioners who are responsible for developing further expertise in this area.”

The expanded Guide is comprised of ten sections (most of which have several sub-sections), as follows:

  1. Introduction
  2. Estimating Proportions within a Binary Population
  3. Acceptance Sampling
  4. Sampling in the Context of the Information Retrieval Grid – Recall, Precision and Elusion
  5. Seed Set Selection in Machine Learning
  6. Guidelines and Considerations
  7. Additional Guidance on Statistical Theory
  8. Calculating Confidence Levels, Confidence Intervals and Sample Sizes
  9. Acceptance Sampling
  10. Examples in the Accompanying Excel Spreadsheet

The guide ranges from the introductory and explanation of basic statistical terms (such as sample size, margin of error and confidence level) to more advanced concepts such as binomial distribution and hypergeometric distribution.  Bring your brain.

As section 10 indicates, there is also an accompanying Excel spreadsheet which can be downloaded from the page, EDRM Statistics Examples 20141023.xlsm, which implements relevant calculations supporting Sections 7, 8 and 9. The spreadsheet was developed using Microsoft Excel 2013 and is an .xlsm file, meaning that it contains VBA code (macros), so you may have to adjust your security settings in order to view and use them.  You’ll also want to read the guide first (especially sections 7 thru 10) as the Excel workbook is a bit cryptic.

Comments can be posted at the bottom of the EDRM Statistical Sampling page, or emailed to the group at mail@edrm.net or you can fill out their comment form here.

One thing that I noticed is that the old guide, from April of 2012, is still on the EDRM site.  It might be a good idea to archive that page to avoid confusion with the new guide.

So, what do you think?  Do you perform statistical sampling to verify results within your eDiscovery process?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

print