EDRM

eDiscovery Daily is Three Years Old!

We’ve always been free, now we are three!

It’s hard to believe that it has been three years ago today since we launched the eDiscoveryDaily blog.  We’re past the “terrible twos” and heading towards pre-school.  Before you know it, we’ll be ready to take our driver’s test!

We have seen traffic on our site (from our first three months of existence to our most recent three months) grow an amazing 575%!  Our subscriber base has grown over 50% in the last year alone!  Back in June, we hit over 200,000 visits on the site and now we have over 236,000!

We continue to appreciate the interest you’ve shown in the topics and will do our best to continue to provide interesting and useful posts about eDiscovery trends, best practices and case law.  That’s what this blog is all about.  And, in each post, we like to ask for you to “please share any comments you might have or if you’d like to know more about a particular topic”, so we encourage you to do so to make this blog even more useful.

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan College, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, Litigation Support Technology & News, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, Learn About E-Discovery, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!).  We really appreciate it!

As many of you know by now, we like to take a look back every six months at some of the important stories and topics during that time.  So, here are some posts over the last six months you may have missed.  Enjoy!

Rodney Dangerfield might put it this way – “I Tell Ya, Information Governance Gets No Respect

Is it Time to Ditch the Per Hour Model for Document Review?  Here’s some food for thought.

Is it Possible for a File to be Modified Before it is Created?  Maybe, but here are some mechanisms for avoiding that scenario (here, here, here, here, here and here).  Best of all, they’re free.

Did you know changes to the Federal eDiscovery Rules are coming?  Here’s some more information.

Count Minnesota and Kansas among the states that are also making changes to support eDiscovery.

By the way, since the Electronic Discovery Reference Model (EDRM) annual meeting back in May, several EDRM projects (Metrics, Jobs, Data Set and the new Native Files project) have already announced new deliverables and/or requested feedback.

When it comes to electronically stored information (ESI), ensuring proper chain of custody tracking is an important part of handling that ESI through the eDiscovery process.

Do you self-collect?  Don’t Forget to Check for Image Only Files!

The Files are Already Electronic, How Hard Can They Be to Load?  A sound process makes it easier.

When you remove a virus from your collection, does it violate your discovery agreement?

Do you think that you’ve read everything there is to read on Technology Assisted Review?  If you missed anything, it’s probably here.

Consider using a “SWOT” analysis or Decision Tree for better eDiscovery planning.

If you’re an eDiscovery professional, here is what you need to know about litigation.

BTW, eDiscovery Daily has had 242 posts related to eDiscovery Case Law since the blog began!  Forty-four of them have been in the last six months.

Our battle cry for next September?  “Four more years!”  🙂

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Wants You! – eDiscovery Trends

A lot is happening in the Electronic Discovery Reference Model (EDRM) group lately and this blog has reported several accomplishments in just the last few months.  With so much going on, you would think they don’t need any help to get things done, but, in fact, EDRM wants your help.

In their latest press release, EDRM has announced its fall campaign for new members. As the press release states, EDRM is offering memberships to individuals and organizations that wish to contribute to the overall improvement of the electronic discovery process by participating in the development and delivery of guidelines, standards, and new resources to the electronic discovery industry.

Since its inception in 2005, EDRM has comprised more than 260 member organizations representing every aspect of eDiscovery and information governance. Attorneys, IT professionals, litigation, and eDiscovery directors and others from corporations, law firms, government, consulting firms, software companies, and service providers are welcome to join EDRM. Members select projects in which to participate based on their individual areas of interest.

The objective of the EDRM Membership Drive is to expand the array of talent and expertise to continue development of practical resources from EDRM by broadening membership from all areas of the electronic discovery industry: providers of software and services, corporations, law firms, educational institutions, and individuals.

Having been a member for most of the 8+ years since EDRM was founded, I can personally say that participating in EDRM is rewarding, not only from a standpoint of helping to shape the direction of the industry, but also in terms of the ability to network with other industry professionals.  It appears that despite the fact that more than half the attendees at this year’s annual meeting were first time attendees, EDRM is still looking for more new members.

Information about EDRM memberships is available here. EDRM will also be hosting a series of webinars in the coming weeks to provide information about the organization and current opportunities for participation to individuals and organizations interested in learning more or considering a new membership.

Since the annual meeting back in May, several EDRM projects (Metrics, Jobs, Data Set and the new Native Files project) have already announced new deliverables and/or requested feedback.  With so much going on and the Mid-Year meeting coming in October (9th through 11th), now is a great time to get involved.

So, what do you think?  Are you a member of EDRM or another organization focused on eDiscovery best practices?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data May Be Doubling Every Couple of Years, But How Much of it is Original? – eDiscovery Best Practices

According to the Compliance, Governance and Oversight Council (CGOC), information volume in most organizations doubles every 18-24 months. However, just because it doubles doesn’t mean that it’s all original. Like a bad cover band singing Free Bird, the rendition may be unique, but the content is the same. The key is limiting review to unique content.

When reviewers are reviewing the same files again and again, it not only drives up costs unnecessarily, but it could also lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, we all know the importance of identifying exact duplicates (that contain the exact same content in the same file format) which can be identified through MD5 and SHA-1 hash values, so that they can be removed from the review population and save considerable review costs.

Identifying near duplicates that contain the same (or almost the same) information (such as a Word document published to an Adobe PDF file where the content is the same, but the file format is different, so the hash value will be different) also reduces redundant review and saves costs.

Then, there is message thread analysis. Many email messages are part of a larger discussion, sometimes just between two parties, and, other times, between a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Pulling those messages together and enabling them to be reviewed as an entire discussion can eliminate that redundant review. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about the latest misstep by Anthony Weiner).

Clustering is a process which pulls similar documents together based on content so that the duplicative information can be identified more quickly and eliminated to reduce redundancy. With clustering, you can minimize review of duplicative information within documents and emails, saving time and cost and ensuring consistency in the review. As a result, even if the data in your organization doubles every couple of years, the cost of your review shouldn’t.

So, what do you think? Does your review tool support clustering technology to pull similar content together for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Wish There Were Better Standards for Production of Native Files? Enough is ENF! – eDiscovery Trends

At the Electronic Discovery Reference Model (EDRM) annual meeting back in May, I provided updates for several of the EDRM projects, two of which (Metrics and Jobs) have already made significant announcements since the meeting.  Another project, the new Native Files project, has recently released two white papers authored by EDRM member Wade Peterson (of Bowman and Brooke LLP) proposing the creation and adoption of a new ENF (encapsulated native file) standard for the production of native files.

In Can Native File Productions be ENF (Enough)?, Peterson presents the conceptual framework for defining the new standard.  This white paper includes several sections, such as:

  • Background: Describes the historical background regarding traditional document productions as either paper, TIFF or PDF;
  • Executive Overview: Describes the problem (outdated standards defined almost two decades ago) and the purpose of the paper (to present a conceptual framework for defining a new, up-to-date standard that reflects “3-dimensional” native documents);
  • Challenges: A list of several challenges facing litigation support professionals today when producing documents, including these: “Courts and opposing counsel are increasingly demanding ‘native file productions’”, “Native files can be altered (either intentionally or not)” and “Native files cannot be redacted”;
  • Solution: The stated goal to develop a new standard for document productions, which addresses today’s concerns, has an open architecture to meet future requirements and is eventually adopted by courts as the legal standard;
  • Architecture: A detailed description of the architecture “framework for ‘encapsulating’ native files in sort of an envelope metaphor”, with a diagram to illustrate the framework;
  • Enhancements to the Standard: A discussion of possible enhancements that could be incorporated into the open-architecture standard;
  • Overcoming Obstacles: A discussion of potential obstacles as well as processes and tools needed to support this standard;
  • Conclusion: A summary call to construct a new document production standard to replace the standards “defined well over 20 years ago to produce documents which didn’t even exist 20 years ago”;
  • Author: A bio of the author, Wade Peterson.

In This is Just About ENF, Wade illustrates a sample ENF, describes some of its elements, and describes the operation of a basic utility to view ENF files.  It shows a sample XML representation of a sample ENF, describes Attributes, potential Vendor enhancements to ENF files, includes a detailed description of the Native Files element of the ENF, discusses Areas of Concern when dealing with native files and illustrates a very basic viewing tool, which he refers to as “viewENF”.

The two white papers reflect quite a bit of thought and effort to begin the process to create and adopt a new standard for addressing a growing problem – the production of a diverse collection of native files.  It will be interesting how the effort progresses to gain support for this proposed new standard.

So, what do you think?  Does this proposed standard appear to be a promising solution to the native file production issue?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Free Your Mind, the Matrix Has You – eDiscovery Trends

OK, maybe it’s not The Matrix with Neo and Morpheus, but if you perform a role in eDiscovery, the Electronic Discovery Reference Model (EDRM) Talent Task Matrix probably describes the responsibilities associated with your role in the process.

Back in February, we introduced the Talent Task Matrix as a tool collaboratively developed by EDRM’s Jobs Project Team to help hiring managers better understand the responsibilities associated with common eDiscovery roles. The Matrix maps responsibilities to the EDRM framework, so eDiscovery duties associated can be assigned to the appropriate parties.

The EDRM Talent Task Matrix Spreadsheet is available in XLSX or PDF format.  It shows the EDRM Stage and Stage Area, the Responsibility within each stage, followed by the various positions that have responsibilities within the eDiscovery life cycle.  It shows a “Yes” for each responsibility that each position participates in the responsibility.  There are 130 responsibilities listed in the Matrix, covering the entire EDRM life cycle.

Since the release of the Matrix in January 2013, it has been downloaded more than 1,000 times!  Chances are, at least some of you reading this have downloaded it.

Now, as indicated in this press release, the EDRM Jobs Team is interested in learning how the Matrix is used by people responsible for hiring and professional development in their organizations. They specifically want to know how the Matrix was used and what results were achieved.  They plan to use success stories regarding use of the Matrix to develop case studies to be posted on EDRM.net.

If you have downloaded the Matrix or know of someone who has downloaded the Matrix, EDRM would like to hear from you!  Contact Tom Gelbmann or George Socha (at mail@edrm.net) to share your experiences and results (all responses will be held in confidence).

If your organization has not yet used the Matrix, but intends to do so, you can still contact them and provide a brief summary of your plans to use the Matrix and any comments or recommendations you may have to improve on the Matrix to meet your needs.  It may not have those gun racks that appear out of nowhere, but it’s still pretty cool.

So, what do you think?  Have you used the Talent Task Matrix?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Publishes New Metrics Model – eDiscovery Trends

When I attended the Annual Meeting for the Electronic Discovery Reference Model (EDRM) last month, one of the projects that was close to a major deliverable was the Metrics project – a project that I worked on during my first two years as a participant in EDRM.  Now, EDRM has announced and published that deliverable: a brand new Metrics model.

As their press release notes, the “EDRM Metrics Model provides a framework for planning, preparation, execution and follow-up of e-discovery matters and projects by showing the relationship between the e-discovery process and how information, activities and outcomes may be measured.”  It consists of two inter-dependent elements: (a) The Center, which includes the key metrics variables of Volume, Time and Cost, and (b) The outside nodes, which identify work components that affect the outcome associated with the elements at the Center.  There is no indicated starting node on the Metrics Wheel, because any of the seven nodes could be a starting point or factor in an eDiscovery project.

Information at the Center

The model depicts Volume, Time, and Cost at its center, and all of the outside nodes impact each of these three major variables. Time, Cost, & Volume are inter-related variables that fluctuate for each project.

Outside Nodes

Here is a brief description of each of the seven nodes:

Activities: Things that are happening or being done by either people or technology; examples can include: collecting documents, designing a search, interviewing a custodian, etc.

Custodians: Person having administrative control of a document or electronic file or system; for example, the custodian of an email is the owner of the mailbox which contains the message.

Systems: The places, technologies, tools and locations in which electronic information is created, stored or managed; examples of systems include shared drives, email, computer applications, databases, cloud sources and archival sources such as back-up tapes.

Media: The storage devices for electronic information; examples include: CDs, DVDs, floppy disks, hard drives, tapes and paper.

Status: A unique point in time in a project or process that relates to the performance or completion of the project or process; measured qualitatively in reference to a desired outcome.

Formats: The way information is arranged or set out; for example, the format of a file which affects which applications are required to view, process, and store it.

Quality Assurance (“QA”): Ongoing methods to ensure that reasonable results are being achieved; an example of QA would be to ensure that no privileged documents are released in a production by performing a operation, such as checking for privilege tags within the production set.

A complete explanation of the model, including graphics, descriptions, glossary and downloadable content is available here.  Kudos to the team, led by Kevin Clark and Dera Nevin (TD Bank Group)!

So, what do you think?  Do you think the model will be useful to help your team better understand the activities and how they impact volume, time and cost for the project?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

200,000 Visits on eDiscovery Daily! – eDiscovery Milestones

While we may be “just a bit behind” Google in popularity (900 million visits per month), we’re proud to announce that yesterday eDiscoveryDaily reached the 200,000 visit milestone!  It took us a little over 21 months to reach 100,000 visits and just over 11 months to get to 200,000 (don’t tell my boss, he’ll expect 300,000 in 5 1/2 months).  When we reach key milestones, we like to take a look back at some of the recent stories we’ve covered, so here are some recent eDiscovery items of interest.

EDRM Data Set “Controversy”: Including last Friday, we have covered the discussion related to the presence of personally-identifiable information (PII) data (including social security numbers, credit card numbers, dates of birth, home addresses and phone numbers) within the Electronic Discovery Reference Model (EDRM) Enron Data Set and the “controversy” regarding the effort to clean it up (additional posts here and here).

Minnesota Implements Changes to eDiscovery Rules: States continue to be busy with changes to eDiscovery rules. One such state is Minnesota, which has amending its rules to emphasize proportionality, collaboration, and informality in the discovery process.

Changes to Federal eDiscovery Rules Could Be Coming Within a Year: Another major set of amendments to the discovery provisions of the Federal Rules of Civil Procedure is getting closer and could be adopted within the year.  The United States Courts’ Advisory Committee on Civil Rules voted in April to send a slate of proposed amendments up the rulemaking chain, to its Standing Committee on Rules of Practice and Procedure, with a recommendation that the proposals be approved for publication and public comment later this year.

I Tell Ya, Information Governance Gets No Respect: A new report from 451 Research has indicated that “although lawyers are bullish about the prospects of information governance to reduce litigation risks, executives, and staff of small and midsize businesses, are bearish and ‘may not be placing a high priority’ on the legal and regulatory needs for litigation or government investigation.”

Is it Time to Ditch the Per Hour Model for Document Review?: Some of the recent stories involving alleged overbilling by law firms for legal work – much of it for document review – begs the question whether it’s time to ditch the per hour model for document review in place of a per document rate for review?

Fulbright’s Litigation Trends Survey Shows Increased Litigation, Mobile Device Collection: According to Fulbright’s 9th Annual Litigation Trends Survey released last month, companies in the United States and United Kingdom continue to deal with, and spend more on litigation.  From an eDiscovery standpoint, the survey showed an increase in requirements to preserve and collect data from employee mobile devices, a high reliance on self-preservation to fulfill preservation obligations and a decent percentage of organizations using technology assisted review.

We also covered Craig Ball’s Eight Tips to Quash the Cost of E-Discovery (here and here) and interviewed Adam Losey, the editor of IT-Lex.org (here and here).

Jane Gennarelli has continued her terrific series on Litigation 101 for eDiscovery Tech Professionals – 32 posts so far, here is the latest.

We’ve also had 15 posts about case law, just in the last 2 months (and 214 overall!).  Here is a link to our case law posts.

On behalf of everyone at CloudNine Discovery who has worked on the blog over the last 32+ months, thanks to all of you who read the blog every day!  In addition, thanks to the other publications that have picked up and either linked to or republished our posts!  We really appreciate the support!  Now, on to 300,000!

And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Some Additional Perspective on the EDRM Enron Data Set “Controversy” – eDiscovery Trends

Sharon Nelson wrote a terrific post about the “controversy” regarding the Electronic Discovery Reference Model (EDRM) Enron Data Set in her Ride the Lightning blog (Is the Enron E-Mail Data Set Worth All the Mudslinging?).  I wanted to repeat some of her key points here and offer some of my own perspective directly from sitting in on the Data Set team during the EDRM Annual Meeting earlier this month.

But, First a Recap

To recap, the EDRM Enron Data Set, sourced from the FERC Enron Investigation release made available by Lockheed Martin Corporation, has been a valuable resource for eDiscovery software demonstration and testing (we covered it here back in January 2011).  Initially, the data was made available for download on the EDRM site, then subsequently moved to Amazon Web Services (AWS).  However, after much recent discussion about personally-identifiable information (PII) data (including social security numbers, credit card numbers, dates of birth, home addresses and phone numbers) available within FERC (and consequently the EDRM Data Set), the EDRM Data Set was taken down from the AWS site.

Then, a couple of weeks ago, EDRM, along with Nuix, announced that they have republished version 1 of the EDRM Enron PST Data Set (which contains over 1.3 million items) after cleansing it of private, health and personal financial information. Nuix and EDRM have also published the methodology Nuix’s staff used to identify and remove more than 10,000 high-risk items, including credit card numbers (60 items), Social Security or other national identity numbers (572), individuals’ dates of birth (292) and other personal data.  All personal data gone, right?

Not so fast.

As noted in this Law Technology News article by Sean Doherty (Enron Sandbox Stirs Up Private Data, Again), “Index Engines (IE) obtained a copy of the Nuix-cleansed Enron data for review and claims to have found many ‘social security numbers, legal documents, and other information that should not be made public.’ IE evidenced its ‘find’ by republishing a redacted version of a document with PII” (actually, a handful of them).  IE and others were quite critical of the effort by Nuix/EDRM and the extent of the PII data still remaining.

As he does so well, Rob Robinson has compiled a list of articles, comments and posts related to the PII issue, here is the link.

Collaboration, not criticism

Sharon’s post had several observations regarding the data set “controversy”, some of which are repeated here:

  • “Is the legal status of the data pretty clear? Yes, when a court refused to block it from being made public apparently accepting the greater good of its release, the status is pretty clear.”
  • “Should Nuix be taken to task for failure to wholly cleanse the data? I don’t think so. I am not inclined to let perfect be the enemy of the good. A lot was cleansed and it may be fair to say that Nuix was surprised by how much PII remained.”
  • “The terms governing the download of the data set made clear that there was no guarantee that all the PII was removed.” (more on that below in my observations)
  • “While one can argue that EDRM should have done something about the PII earlier, at least it is doing something now. It may be actively helpful to Nuix to point out PII that was not cleansed so it can figure out why.”
  • “Our expectations here should be that we are in the midst of a cleansing process, not looking at the data set in a black or white manner of cleansed or uncleansed.”
  • “My suggestion? Collaboration, not criticism. I believe Nuix is anxious to provide the cleanest version of the data possible – to the extent that others can help, it would be a public service.”

My Perspective from the Data Set Meeting

I sat in on part of the Data Set meeting earlier this month and there was a couple of points discussed during the meeting that I thought were worth relaying:

1.     We understood that there was no guarantee that all of the PII data was removed.

As with any process, we understood that there was no effective way to ensure that all PII data was removed after the process was complete and discussed needing a mechanism for people to continue to report PII data that they find.  On the download page for the data set, there was a link to the legal disclaimer page, which states in section 1.8:

“While the Company endeavours to ensure that the information in the Data Set is correct and all PII is removed, the Company does not warrant the accuracy and/or completeness of the Data Set, nor that all PII has been removed from the Data Set. The Company may make changes to the Data Set at any time without notice.”

With regard to a mechanism for reporting persistent PII data, there is this statement on the Data Set page on the EDRM site:

PII: These files may contain personally identifiable information, in spite of efforts to remove that information. If you find PII that you think should be removed, please notify us at mail@edrm.net.”

2.     We agreed that any documents with PII data should be removed, not redacted.

Because the original data set, with all of the original PII data, is available via FERC, we agreed that any documents containing sensitive personal information should be removed from the data set – NOT redacted.  In essence, redacting those documents is putting a beacon on them to make it easier to find them in the FERC set or downloaded copies of the original EDRM set, so the published redacted examples of missed PII only serves to facilitate finding those documents in the original sets.

Conclusion

Regardless of how effective the “cleansing” of the data set was perceived to be by some, it did result in removing over 10,000 items with personal data.  Yet, some PII data evidently remains.  While some people think (and they may have a point) that the data set should not have been published until after an independent audit for remaining PII data, it seems impractical (to me, at least) to wait until it is “perfect” before publishing the set.  So, when is it good enough to publish?  That appears to be open to interpretation.

Like Sharon, my hope is that we can move forward to continue to improve the Data Set through collaboration and that those who continue to find PII data in the set will notify EDRM, so that they can remove those items and continue to make the set better.  I’d love to see the Data Set page on the EDRM site reflect a history of each data set update, with the revision date, the number of additional PII items found and removed and who identified them (to give credit to those finding the data).  As Canned Heat would say, “Let’s Work Together”.

And, we haven’t even gotten to version 2 of the Data Set yet – more fun ahead!  🙂

So, what do you think?  Have you used the EDRM Enron Data Set?  If so, do you plan to download the new version?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Version 1 of the EDRM Enron Data Set NOW AVAILABLE – eDiscovery Trends

Last week, we reported from the Annual Meeting for the Electronic Discovery Reference Model (EDRM) group and discussed some significant efforts and accomplishments by each of the project teams within EDRM.  That included an update from the EDRM Data Set project, where an effort was underway to identify and remove personally-identifiable information (“PII”) data from the EDRM Data Set.  Now, version 1 of the Data Set is completed and available for download.

To recap, the EDRM Enron Data Set, sourced from the FERC Enron Investigation release made available by Lockheed Martin Corporation, has been a valuable resource for eDiscovery software demonstration and testing (we covered it here back in January 2011).  Initially, the data was made available for download on the EDRM site, then subsequently moved to Amazon Web Services (AWS).  However, after much recent discussion about PII data (including social security numbers, credit card numbers, dates of birth, home addresses and phone numbers) available within FERC (and consequently the EDRM Data Set), the EDRM Data Set was taken down from the AWS site.

Yesterday, EDRM, along with Nuix, announced that they have republished version 1 of the EDRM Enron PST Data Set (which contains over 1.3 million items) after cleansing it of private, health and personal financial information. Nuix and EDRM have also published the methodology Nuix’s staff used to identify and remove more than 10,000 high-risk items.

As noted in the announcement, Nuix consultants Matthew Westwood-Hill and Ady Cassidy used a series of investigative workflows to identify the items, which included:

  • 60 items containing credit card numbers, including departmental contact lists that each contained hundreds of individual credit cards;
  • 572 items containing Social Security or other national identity numbers—thousands of individuals’ identity numbers in total;
  • 292 items containing individuals’ dates of birth;
  • 532 items containing information of a highly personal nature such as medical or legal matters.

While the personal data was (and still is) available via FERC long before the EDRM version was created, completion of this process will mean that many in the eDiscovery industry that rely on this highly useful data set for testing and software demonstration can now use a version which should be free from sensitive personal information!

For more information regarding the announcement, click here. The republished version 1 of the Data Set, as well as the white paper discussing the methodology is available at nuix.com/enron.  Nuix is currently applying the same methodology to the EDRM Enron Data Set v2 (which contains nearly 2.3 million items) and will publish to the same site when complete.

So, what do you think?  Have you used the EDRM Enron Data Set?  If so, do you plan to download the new version?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiff Granted Access to Defendant’s Database – eDiscovery Case Law

Last week in the EDRM Annual Meeting, one of our group discussion sessions was centered on production and presentation of native files – a topic which has led to the creation of a new EDRM project to address standards for working with native files in these areas.  This case provides an example of a unique form of native production.

In Advanced Tactical Ordnance Systems, LLC v. Real Action Paintball, Inc., No. 1:12-CV-296 (N.D. Ind. Feb. 25, 2013), Indiana Magistrate Judge Roger B. Cosbey took the unusual step of allowing the plaintiff direct access to a defendant company’s database under Federal Rule of Civil Procedure 34 because the plaintiff made a specific showing that the information in the database was highly relevant to the plaintiff’s claims, the benefit of producing it substantially outweighed the burden of producing it, and there was no prejudice to the defendant.

In this case involving numerous claims, including trademark infringement and fraud, Advanced Tactical Ordnance Systems LLC (“ATO”) sought expedited discovery after it obtained a temporary restraining order against the defendants. One of its document requests sought the production of defendant Real Action Paintball’s OS Commerce database to search for responsive evidence. Real Action objected, claiming that the request asked for confidential and sensitive information from its “most important asset” that would give the plaintiff a competitive advantage and that the request amounted to “‘an obvious fishing expedition.”

To decide the issue, Judge Cosbey looked to Federal Rule of Civil Procedure 34(a)(1)(A), which allows parties to ask to “inspect, copy, test, or sample . . . any designated documents or electronically stored information . . . stored in any medium from which information can be obtained either directly or, if necessary, after translation by the responding party into a reasonably usable form.” The advisory committee notes to this rule explain that the testing and sampling does not “create a routine right of direct access to a party’s electronic information system, although such access might be justified in some circumstances.” Judge Cosbey also considered whether the discovery request was proportionate under Federal Rule of Civil Procedure 26(b)(2)(C)(iii), comparing the “burden or expense” of the request against its “likely benefit, considering the needs of the case, the amount in controversy, the parties’ resources, the importance of the issues at stake in the action, and the importance of the discovery in resolving the issues.”

Based on its analysis, Judge Cosbey permitted ATO’s request. The benefits of allowing the plaintiff to access the defendant’s OS Commerce database outweighed the burden of producing data from it, especially because the parties had entered a protective order. The information was particularly important to the plaintiff’s argument that the defendant was using hidden metatags referencing ATO’s product to improve its results in search engines, thereby stealing the plaintiff’s customers.

Despite the defendant company’s claims that the information the database contained was proprietary and potentially harmful to the business’s competitive advantage, the court found the company failed to establish how the information in the database constituted a trade secret or how its disclosure could harm the company, especially where much of the information had already been produced or was readily available on the company’s website. Moreover, the company could limit the accessibility of the database to “‘Attorneys’ Eyes Only.’”

So, what do you think?  Was it appropriate to grant the plaintiff direct access to the defendant’s database?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required).  For eDiscovery news and best practices, check out the Applied Discovery Blog here.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.