eDiscovery Daily Blog

Mo’ Data, Mo’ Data, Mo’ Data from EDRM: eDiscovery Trends

It didn’t take long for EDRM to deliver on its promise of an advanced data set.  Back in August, EDRM announced the release of the first of its “Micro Datasets”, designed for eDiscovery data testing and process validation.  The first one was small, this new data set is MUCH bigger.

The initial August offering was a 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files containing EDRM specific work product files and data from public websites to uncommon formats including .mbox email storage files and .gz archive files.  On Monday, EDRM announced the release of a new 5.7 GB Micro Dataset. As before, this new EDRM dataset was assembled to meet eDiscovery data testing and process validation needs of software and tool providers, litigation support organizations, law firms and educational organizations and is sourced from publicly available data and free from copyright restrictions.

Designed to support exception handling exercises and advanced testing, the files in the new dataset have various levels of corruption, and the dataset contains a duplicate set of files that are encrypted.  The file types in the set include:

  • A variety of.csv files
  • Websites and web pages
  • Adobe Acrobat files
  • Graphic files and photographs
  • Public census data
  • Microsoft Office files
  • Audio files
  • 4 email boxes with shared correspondence, threads and attachments
  • Multiple Encase .e01 files containing data from a phone and another data source

This new EDRM Micro Dataset is available exclusively to EDRM members. Current EDRM members have been notified by email with instructions for file downloading (I just downloaded my copy yesterday and look forward to delving into it this week).  So, if you’re interested in joining EDRM, there has never been a better time!  Organizations and individuals interested in EDRM membership will find information at https://www.edrm.net/join/.

“The EDRM Dataset team has done outstanding work in advancing the industry with the development of advanced datasets that better reflect the types of data anomalies and challenges faced by e-discovery professionals today,” said George Socha, co-founder of EDRM. “EDRM members will benefit greatly from their work, in addition to the education, guidelines and latest in industry best practices provided to members.”

Five years after the Enron data set was converted to Outlook by the EDRM Data Set team (in November of 2010) we’re beginning to have some new dataset options.  We may actually someday see an eDiscovery product demo without Enron data!

So, what do you think?  Are you looking forward to checking out the new data set?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.