eDiscovery Daily Blog

eDiscovery Trends: EDRM and Statistical Sampling


I’ve been proud to be a member of The Electronic Discovery Reference Model (EDRM) for the past six years (all but the first year) and I’m always keen to report on activities and accomplishments of the various working groups within EDRM.  Since this blog was founded, we’ve reported on 1) the unveiling of the EDRM Data Set, which has become a standard for useful eDiscovery test and demo data, 2) the EDRM Metrics Privilege Survey (which I helped draft), to collect typical volumes and percentages of privileged documents throughout the industry, 3) Model Code of Conduct which focuses on the ethical duties of eDiscovery service providers, and 4) the collaboration between EDRM and ARMA and subsequent joint Information Governance white paper.  EDRM’s latest announcement yesterday is a new guide, Statistical Sampling Applied to Electronic Discovery, which is now available for review and comment. 

As EDRM notes in their announcement, “The purpose of the guide is to provide guidance regarding the use of statistical sampling in e-discovery contexts. Most of the material is definitional and conceptual, and is intended for a broad audience. The later material and the accompanying spreadsheet provide additional information, particularly technical information, to people in e-discovery roles who become responsible for developing further expertise in this area.”

The Guide is comprised of six sections, as follows:

  1. Introduction: Includes basic concepts and definitions, alludes to mathematical techniques to be discussed in more detail in subsequent sections, identifies potential eDiscovery situations where sampling techniques may be useful and identifies areas not covered in this initial guide.
  2. Estimating Proportions within a Binary Population: Provides some common sense observations as to why sampling is useful, along with a straightforward explanation of statistical terminology and the interdependence of sample size, margin of error/confidence range and confidence level.
  3. Guidelines and Considerations: Provides guidelines for effective statistical sampling, such as cull prior to sampling, account for family relationships, simple vs. stratified random sampling and use of sampling in machine learning, among others.
  4. Additional Guidance on Statistical Theory: Covers mathematical concepts such as binomial distribution, hypergeometric distribution, and normal distribution.  Bring your mental “slide-rule”!
  5. Examples Using the Accompanying Excel Spreadsheet: Describes an attached workbook (EDRM Statistics Examples 20120427.xlsm) that contains six sheets that include a notes section as well as basic, observed and population normal approximation models and basic and observed binomial methods to assist in learning these different sampling methods.
  6. Validation Study: References a Daegis article that provides an empirical study of sampling in the eDiscovery context.  In addition to that article, consider reading our previous posts on determining an appropriate sample size to test your search, how to generate a random selection and a practical example to test your search using sampling.

Comments can be posted at any of the EDRM Statistical Sampling pages, or emailed to the group at mail@edrm.net.  As a big proponent of statistical sampling as an effective and cost-effective method for verifying results, I’m very interested to see where this guide goes and how people will use it.  BTW, EDRM’s Annual Kickoff Meeting is next week (May 16 and 17) in St. Paul, MN – it’s not too late to become a member and help shape the future of eDiscovery with other industry leaders!

So, what do you think?  Do you perform statistical sampling to verify results within your eDiscovery process?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.