Analysis

eDiscovery Trends: George Socha of Socha Consulting

This is the first of the 2012 LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:
  1. What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
  2. Which trend(s), if any, haven’t emerged to this point like you thought they would?
  3. What are your general observations about LTNY this year and how it fits into emerging trends?
  4. What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha.  A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey; last year he and Tom Gelbmann converted the Survey into Apersee, an online system for selecting eDiscovery providers and their offerings.  In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are nine active projects with over 300 members from 81 participating organizations.  George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?

I may have said this last year too, but it holds true even more this year – if there’s an emerging trend, it’s the trend of people talking about the emerging trend.  It started last year and this year every person in the industry seems to be delivering the emerging trend.  Not to be too crass about it, but often the message is, “Buy our stuff”, a message that is not especially helpful.

Regarding actual emerging trends, each year we all try to sum up legal tech in two or three words.  The two words for this year can be “predictive coding.”  Use whatever name you want, but that’s what everyone seems to be hawking and talking about at LegalTech this year.  This does not necessarily mean they really can deliver.  It doesn’t mean they know what “predictive coding” is.  And it doesn’t mean they’ve figured out what to do with “predictive coding.”  Having said that, expanding the use of machine assisted review capabilities as part of the e-discovery process is a important step forward.  It also has been a while coming.  The earliest I can remember working with a client, doing what’s now being called predictive coding, was in 2003.  A key difference is that at that time they had to create their own tools.  There wasn’t really anything they could buy to help them with the process.

Which trend(s), if any, haven’t emerged to this point like you thought they would?

One thing I don’t yet hear is discussion about using predictive coding capabilities as a tool to assist with determining what data to preserve in the first place.  Right now the focus is almost exclusively on what do you do once you’ve “teed up” data for review, and then how to use predictive coding to try to help with the review process.

Think about taking the predictive coding capabilities and using them early on to make defensible decisions about what to and what not to preserve and collect.  Then consider continuing to use those capabilities throughout the e-discovery process.  Finally, look into using those capabilities to more effectively analyze the data you’re seeing, not just to determine relevance or privilege, but also to help you figure out how to handle the matter and what to do on a substantive level.

What are your general observations about LTNY this year and how it fits into emerging trends?

Well, Legal Tech continues to have been taken over by electronic discovery.  As a result, we tend to overlook whole worlds of technologies that can be used to support and enhance the practice of law. It is unfortunate that in our hyper-focus on e-discovery, we risk losing track of those other capabilities.

What are you working on that you’d like our readers to know about?

With regard to EDRM, we recently announced that we have hit key milestones in five projects.  Our EDRM Enron Email Data Set has now officially become an Amazon public dataset, which I think will mean wider use of the materials.

We announced the publication of our Model Code of Conduct, which was five years in the making.  We have four signatories so far, and are looking forward to seeing more organizations sign on.

We announced the publication of version 2.0 of our EDRM XML schema.  It’s a tightened-up schema, reorganized so that it should be a bit easier to use and more efficient in the operation.

With the Metrics project, we are beginning to add information to a database that we’ve developed to gather metrics, the objective being to be able to make available metrics with an empirical basis, rather than the types of numbers bandied about today, where no one seems to know how they were arrived at. Also, last year the Uniform Task Billing Management System (UTBMS) code set for litigation was updated.  The codes to use for tracking e-discovery activities were expanded from a single code that covered not just e-discovery but other activities, to a number of codes based on the EDRM Metrics code set.

On the Information Governance Reference Model (IGRM) side, we recently published a joint white paper with ARMA.  The paper cross-maps the EDRMs Information Governance Reference Model (IGRM) with ARMA’s Generally Accepted Recordkeeping Principles (GARP).  We look forward to more collaborative materials coming out of the two organizations.

As for Apersee, we continue to allow consumers search the data on the site for free, but we also are longer charging providers a fee for their information to be available.  Instead, we now have two sponsors and some advertising on the site.  This means that any provider can put information in, and everyone can search that information.  The more data that goes in, the more useful the searching process comes because.  All this fits our goal of creating a better way to match consumers with the providers who have the services, software, skills and expertise that the consumers actually need.

And on a consulting and testifying side, I continue to work a broad array of law firms; corporate and governmental consumers of e-discovery services and software; and providers offering those capabilities.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: “Assisted” is the Key Word for Technology Assisted Review

As noted in our blog post entitled 2012 Predictions – By The Numbers, almost all of the sets of eDiscovery predictions we reviewed (9 out of 10) predicted a greater emphasis on Technology Assisted Review (TAR) in the coming year.  It was one of our predictions, as well.  And, during all three days at LegalTech New York (LTNY) a couple of weeks ago, sessions were conducted that addressed technology assisted review concepts and best practices.

While some equate technology assisted review with predictive coding, other technology approaches such as conceptual clustering are also increasing in popularity.  They qualify as TAR approaches, as well.  However, for purposes of this blog post, we will focus on predictive coding.

Over a year ago, I attended a Virtual LegalTech session entitled Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and wrote a blog post from that entitled What the Heck is “Predictive Coding”?  The speakers for the session were Jason R. Baron, Maura Grossman and Bennett Borden (Jason and Bennett are previous thought leader interviewees on this blog).  The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

It’s very cool technology and capable of efficient and accurate review of the document collection, saving costs without sacrificing quality of review (in some cases, it yields even better results than traditional manual review).  However, there is one key phrase in the definition above that can make or break the success of the predictive coding process: “based on human review of only a subset of the document collection”.

Key to the success of any review effort, whether linear or technology assisted, is knowledge of the subject matter.  For linear review, knowledge of the subject matter usually results in preparation of high quality review instructions that (assuming the reviewers competently follow those instructions) result in a high quality review.  In the case of predictive coding, use of subject matter experts (SMEs) to review a core subset of documents (typically known as a “seed set”) and make determinations regarding that subset is what enables the technology in predictive coding to “predict” the responsiveness and importance of the remaining documents in the collection.  The more knowledgeable the SMEs are in creating the “seed set”, the more accurate the “predictions” will be.

And, as is the case with other processes such as document searching, sampling the results (by determining the appropriate sample size of responsive and non-responsive items, randomly selecting those samples and reviewing both groups – responsive and non-responsive – to test the results) will enable you to determine how effective the process was in predictively coding the document set.  If sampling shows that the process yielded inadequate results, take what you’ve learned from the sample set review and apply it to create a more accurate “seed set” for re-categorizing the document collection.  Sampling will enable you to defend the accuracy of the predictive coding process, while saving considerable review costs.

So, what do you think?  Have you utilized predictive coding in any of your reviews?  How did it work for you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Needing “Technology Assisted Review” to Write a Blog Post

Late on a Thursday night, with a variety of tasks and projects on my plate at the moment, it seems more difficult this night to find a unique and suitable topic for today’s blog post.

One thing I often do when looking for ideas is to hit the web and turn to the many resources that I read regularly to stay abreast of developments in the industry.  Usually when I do that, I find one article or blog post that “speaks to me” as a topic to talk about on this blog.  However, when doing so last night, I found several topics worth discussing and had difficulty selecting just one.  So, here are some of the notable articles and posts that I’ve been reviewing:

There’s plenty more articles out there.  I’ve barely scratched the surface.  When we launched eDiscovery Daily about 16 months ago, some wondered whether there would be enough eDiscovery news and information to talk about on a daily basis.  The problem we have found instead is that there is SO much to talk about, it’s difficult to choose.  Today, I was unable to choose just one topic, so, as the picture notes, “I have nothing to say”.  Therefore, I’ve had to use “technology assisted review” to provide a post to you, thanks to the many excellent articles and blogs out there.  Enjoy!

So, what do you think?  Are there any specific topics that you find are being discussed a lot on the web?  Are there any topics that you’d like to see discussed more?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Sampling within eDiscovery Software

Those of you who have been following this blog since early last year may remember that we published a three part series regarding testing your eDiscovery searches using sampling (as part of the “STARR” approach discussed on this blog about a year ago).  We discussed how to determine the appropriate sample size to test your search, using a sample size calculator (freely available on the web).  We also discussed how to make sure the sample size is randomly selected (again referencing a site freely available on the web for generating the random set).  We even walked through an example of how you can test and refine a search using sampling, saving tens of thousands in review costs with defensible results.

Instead of having to go to all of these external sites to manually size and generate your random sample set, it’s even better when the eDiscovery ECA or review software you’re using handles that process for you.  The latest version of FirstPass®, powered by Venio FPR™, does exactly that.  Version 3.5.1.2 of FirstPass has introduced a sampling module that provides a wizard that walks you through the process of creating a sample set to review to test your searches.  What could be easier?

The wizard begins by providing a dialog to enable the user to select the sampling population.  You can choose from tagged documents from one or more tags, documents in saved search results, documents from one or more selected custodians or all documents in the database.  When choosing tags, you can choose ANY of the selected tags, ALL of the selected tags, or even choose documents NOT in the selected tags (for example, enabling you to test the documents not tagged as responsive to confirm that responsive documents weren’t missed in your search).

You can then specify your confidence level (e.g., 95% confidence level) and confidence interval (a.k.a., margin of error – e.g., 4%) using slider bars.  As you slide the bars to the desired level, the application shows you how that will affect the size of the sample to be retrieved.  You can then name the sample and describe its purpose, then identify whether you want to view the sample set immediately, tag it or place it into a folder.  Once you’ve identified the preferred option for handling your sample set, the wizard gives you a summary form for displaying your choices.  Once you click the Finish button, it creates the sample and gives you a form to show you what it did.  Then, if you chose to view the sample set immediately, it will display the sample set (if not, you can then retrieve the tag or folder containing your sample set).

By managing this process within the software, it saves considerable time outside the application having to identify the sample size and create a randomly selected set of IDs, then go back into the application to retrieve and tag those items as belonging to the sample set (which is how I used to do it).  The end result is simplified and streamlined.

So, what do you think?  Is sample set generation within the ECA or review tool a useful feature?  Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: I work for CloudNine Discovery, which provides SaaS-based eDiscovery review applications FirstPass® (for first pass review) and OnDemand® (for linear review and production).

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Our 2012 Predictions

Yesterday, we evaluated what others are saying and noted popular eDiscovery prediction trends for the coming year.  It’s interesting to identify common trends among the prognosticators and also the unique predictions as well.

But we promised our own predictions for today, so here they are.  One of the nice things about writing and editing a daily eDiscovery blog is that it forces you to stay abreast of what’s going on in the industry.  Based on the numerous stories we’ve read (many of which we’ve also written about), and in David Letterman “Top 10” fashion, here are our eDiscovery predictions for 2012:

  • Still More ESI in the Cloud: Frankly, this is like predicting “the Sun will be hot in 2012”.  Given the predictions in cloud growth by Forrester and Gartner, it seems inevitable that organizations will continue to migrate more data and applications to “the cloud”.  Even if some organizations continue to resist the cloud movement, those organizations still have to address the continued growth in usage of social media sites in business (which, last I checked, are based in the cloud).  It’s inevitable.
  • More eDiscovery Technology in the Cloud As Well: We will continue to see more cloud offerings for eDiscovery technology, ranging from information governance to preservation and collection to review and production.  With the need for corporations to share potentially responsive ESI with one or more outside counsel firms, experts and even opposing counsel, cloud based Software-as-a-Service (SaaS) applications are a logical choice for sharing that information effortlessly without having to buy software, hardware and provide infrastructure to do so.  Every year at LegalTech, there seems to be a few more eDiscovery cloud providers and this year should be no different.
  • Self-Service in the Cloud: So, organizations are seeing the benefits of the cloud not only for storing ESI, but also managing it during Discovery.  It’s the cost effective alternative.  But, organizations are demanding the control of a desktop application within their eDiscovery applications.  The ability to load your own data, add your own users and maintain their rights, create your own data fields are just a few of the capabilities that organizations expect to be able to do themselves.  And, more providers are responding to those needs.  That trend will continue this year.
  • Technology Assisted Review: This was the most popular prediction among the pundits we reviewed.  The amount of data in the world continues to explode, as there were 988 exabytes in the whole world as of 2010 and Cisco predicts that IP traffic over data networks will reach 4.8 zettabytes (each zettabyte is 1,000 exabytes) by 2015.  More than five times the data in five years.  Even in the smaller cases, there’s simply too much data to not use technology to get through it all.  Whether it’s predictive coding, conceptual clustering or some other technology, it’s required to enable attorneys manage the review more effectively and efficiently.
  • Greater Adoption of eDiscovery Technology for Smaller Cases: As each gigabyte of data is between 50,000 and 100,000 pages, a “small” case of 4 GB (or two max size PST files in Outlook® 2003) can still be 300,000 pages or more.  As “small” cases are no longer that small, attorneys are forced to embrace eDiscovery technology for the smaller cases as well.  And, eDiscovery providers are taking note.
  • Continued Focus on International eDiscovery:  So, cases are larger and there’s more data in the cloud, which leads to more cases where Discovery of ESI internationally becomes an issue.  The Sedona Conference® just issued in December the Public Comment Version of The Sedona Conference® International Principles on Discovery, Disclosure & Data Protection: Best Practices, Recommendations & Principles for Addressing the Preservation & Discovery of Protected Data in U.S. Litigation, illustrating how important an issue this is becoming for eDiscovery.
  • Prevailing Parties Awarded eDiscovery Costs: Shifting to the courtroom, we have started to see more cases where the prevailing party is awarded their eDiscovery costs as part of their award.  As organizations have pushed for more proportionality in the Discovery process, courts have taken it upon themselves to impose that proportionality through taxing the “losers” for reimbursement of costs, causing prevailing defendants to say: “Sue me and lose?  Pay my costs!”.
  • Continued Efforts and Progress on Rules Changes: Speaking of proportionality, there will be continued efforts and progress on changes to the Federal Rules of Civil Procedure as organizations push for clarity on preservation and other obligations to attempt to bring spiraling eDiscovery costs under control.  It will take time, but progress will be made toward that goal this year.
  • Greater Price/Cost Control Pressure on eDiscovery Services: In the meantime, while waiting for legislative relief, organizations will expect the cost for eDiscovery services to be more affordable and predictable.  In order to accommodate larger amounts of data, eDiscovery providers will need to offer simplified and attractive pricing alternatives.
  • Big Player Consolidation Continues, But Plenty of Smaller Players Available: In 2011, we saw HP acquire Autonomy and Symantec acquire Clearwell, continuing a trend of acquisitions of the “big players” in the industry.  This trend will continue, but there is still plenty of room for the “little guy” as smaller providers have been pooling resources to compete, creating an interesting dichotomy in the industry of few big and many small providers in eDiscovery.

So, what do you think?  Care to offer your own predictions?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: 2012 Predictions – By The Numbers

With a nod to Nick Bakay, “It’s all so simple when you break things down scientifically.”

The late December/early January time frame is always when various people in eDiscovery make their annual predictions as to what trends to expect in the coming year.  I know what you’re thinking – “oh no, not another set of eDiscovery predictions!”  However, at eDiscovery Daily, we do things a little bit differently.  We like to take a look at other predictions and see if we can spot some common trends among those before offering some of our own (consider it the ultimate “cheat sheet”).  So, as I did last year, I went “googling” for 2012 eDiscovery predictions, and organized the predictions into common themes.  I found eDiscovery predictions here, here, here, here, here, here and Applied Discovery.  Oh, and also here, here and here.  Ten sets of predictions in all!  Whew!

A couple of quick comments: 1) Not all of these are from the original sources, but the links above attribute the original sources when they are re-prints.  If I have failed to accurately attribute the original source for a set of predictions, please feel free to comment.  2) This is probably not an exhaustive list of predictions (I have other duties in my “day job”, so I couldn’t search forever), so I apologize if I’ve left anybody’s published predictions out.  Again, feel free to comment if you’re aware of other predictions.

Here are some of the common themes:

  • Technology Assisted Review: Nine out of ten “prognosticators” (up from 2 out of 7 last year) predicted a greater emphasis/adoption of technological approaches.  While some equate technology assisted review with predictive coding, other technology approaches such as conceptual clustering are also increasing in popularity.  Clearly, as the amount of data associated with the typical litigation rises dramatically, technology is playing a greater role to enable attorneys manage the review more effectively and efficiently.
  • eDiscovery Best Practices Combining People and Technology: Seven out of ten “augurs” also had predictions related to various themes associated with eDiscovery best practices, especially processes that combine people and technology.  Some have categorized it as a “maturation” of the eDiscovery process, with corporations becoming smarter about eDiscovery and integrating it into core business practices.  We’ve had numerous posts regarding to eDiscovery best practices in the past year, click here for a selection of them.
  • Social Media Discovery: Six “pundits” forecasted a continued growth in sources and issues related to social media discovery.  Bet you didn’t see that one coming!  For a look back at cases from 2011 dealing with social media issues, click here.
  • Information Governance: Five “soothsayers” presaged various themes related to the promotion of information governance practices and programs, ranging from a simple “no more data hoarding” to an “emergence of Information Management platforms”.  For our posts related to Information Governance and management issues, click here.
  • Cloud Computing: Five “mediums” (but are they happy mediums?) predict that ESI and eDiscovery will continue to move to the cloud.  Frankly, given the predictions in cloud growth by Forrester and Gartner, I’m surprised that there were only five predictions.  Perhaps predicting growth of the cloud has become “old hat”.
  • Focus on eDiscovery Rules / Court Guidance: Four “prophets” (yes, I still have my thesaurus!) expect courts to provide greater guidance on eDiscovery best practices in the coming year via a combination of case law and pilot programs/model orders to establish expectations up front.
  • Complex Data Collection: Four “psychics” also predicted that data collection will continue to become more complex as data sources abound, the custodian-based collection model comes under stress and self-collection gives way to more automated techniques.

The “others receiving votes” category (three predicting each of these) included cost shifting and increased awards of eDiscovery costs to the prevailing party in litigation, flexible eDiscovery pricing and predictable or reduced costs, continued focus on international discovery and continued debate on potential new eDiscovery rules.  Two each predicted continued consolidation of eDiscovery providers, de-emphasis on use of backup tapes, de-emphasis on use of eMail, multi-matter eDiscovery management (to leverage knowledge gained in previous cases), risk assessment /statistical analysis and more single platform solutions.  And, one predicted more action on eDiscovery certifications.

Some interesting predictions.  Tune in tomorrow for ours!

So, what do you think?  Care to offer your own “hunches” from your crystal ball?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Cloud Covered by Ball

What is the cloud, why is it becoming so popular and why is it important to eDiscovery? These are the questions being addressed—and very ably answered—in the recent article Cloud Cover (via Law Technology News) by computer forensics and eDiscovery expert Craig Ball, a previous thought leader interviewee on this blog.

Ball believes that the fears about cloud data security are easily dismissed when considering that “neither local storage nor on-premises data centers have proved immune to failure and breach”. And as far as the cloud’s importance to the law and to eDiscovery, he says, “the cloud is re-inventing electronic data discovery in marvelous new ways while most lawyers are still grappling with the old.”

What kinds of marvelous new ways, and what do they mean for the future of eDiscovery?

What is the Cloud?

First we have to understand just what the cloud is.  The cloud is more than just the Internet, although it’s that, too. In fact, what we call “the cloud” is made up of three on-demand services:

  • Software as a Service (SaaS) covers web-based software that performs tasks you once carried out on your computer’s own hard drive, without requiring you to perform your own backups or updates. If you check your email virtually on Hotmail or Gmail or run a Google calendar, you’re using SaaS.
  • Platform as a Service (PaaS) happens when companies or individuals rent virtual machines (VMs) to test software applications or to run processes that take up too much hard drive space to run on real machines.
  • Infrastructure as a Service (IaaS) encompasses the use and configuration of virtual machines or hard drive space in whatever manner you need to store, sort, or operate your electronic information.

These three models combine to make up the cloud, a virtual space where electronic storage and processing is faster, easier and more affordable.

How the Cloud Will Change eDiscovery

One reason that processing is faster is through distributed processing, which Ball calls “going wide”.  Here’s his analogy:

“Remember that scene in The Matrix where Neo and Trinity arm themselves from gun racks that appear out of nowhere? That’s what it’s like to go wide in the cloud. Cloud computing makes it possible to conjure up hundreds of virtual machines and make short work of complex computing tasks. Need a supercomputer-like array of VMs for a day? No problem. When the grunt work’s done, those VMs pop like soap bubbles, and usage fees cease. There’s no capital expenditure, no amortization, no idle capacity. Want to try the latest concept search tool? There’s nothing to buy! Just throw the tool up on a VM and point it at the data.”

Because the cloud is entirely virtual, operating on servers whose locations are unknown and mostly irrelevant, it throws the rules for eDiscovery right out the metaphorical window.

Ball also believes that everything changes once discoverable information goes into the cloud. “Bringing ESI beneath one big tent narrows the gap between retention policy and practice and fosters compatible forms of ESI across web-enabled applications”.

“Moving ESI to the cloud,” Ball adds, “also spells an end to computer forensics.” Where there are no hard drives, there can be no artifacts of deleted information—so, deleted really means deleted.

What’s more, “[c]loud computing makes collection unnecessary”. Where discovery requires that information be collected to guarantee its preservation, putting a hold on ESI located in the cloud will safely keep any users from destroying it. And because cloud computing allows for faster processing than can be accomplished on a regular hard drive, the search for discovery documents will move to where they’re located, in the cloud. Not only will this approach be easier, it will also save money.

Ball concludes his analysis with the statement, “That e-discovery will live primarily in the cloud isn’t a question of whether but when.”

So, what do you think? Is cloud computing the future of eDiscovery? Is that future already here? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: More On the Recommind Patent Controversy

Perhaps the most controversial story discussed in the eDiscovery community in quite some time is the controversy regarding the patent recently announced by Recommind for Predictive Coding via press release entitled, Recommind Patents Predictive Coding, issued on June 8.  I haven’t seen this much backlash against a company or individual since last summer when LeBron James’ decision to leave the Cleveland Cavaliers for the Miami Heat (and the subsequent championship-like celebration that he and his teammates conducted before the season).  How did that turn out?  😉

Since that announcement, there have been several articles and blog posts about it, including:

  • This one, from Monica Bay of Law Technology News, asking the question: “Is Recommind Blowing Smoke?”  where discussed the buzz over Recommind’s announcement;
  • This one, from Evan Koblentz (also of Law Technology News), entitled “Recommend Intends to Flex Predictive Coding Muscles” which includes responses from Catalyst and Valora Technologies;
  • This one, also from Evan Koblentz, a blog post from EDD Update, where Recommind General Counsel and Vice President Craig Carpenter acknowledges that Recommind failed to obtain a trademark for the term Predictive Coding (though Recommind is still using the ™ symbol on the term Predictive Coding onthis page);
  • Three blog posts in four days from Sharon D. Nelson of Ride the Lightning blog, which debate the enforceability of the patent and include a response from OrcaTec, noting that Recommind’s implied threat of litigation is “nothing more than an attempt to bully the market place”.

There are several other articles and blog posts regarding the topic, but if I listed them all, I’d have no room left for anything new!  Sorry that I couldn’t include them all.

I reached out to Bill Dimm, founder of Hot Neuron LLC, makers of Clustify, which clusters documents in groups for effective, expedited review and asked him his thoughts about the Recommind press release and patent.  Here are his comments:

“Recommind’s press release would have been accurately titled ‘Recommind Patents a Method for Predictive Coding,’ but it went with the much more provocative title ‘Recommind Patents Predictive Coding,’ implying  that its patent covers every conceivable way of doing predictive coding.  The only way I can see that being accurate is if you DEFINE predictive coding to be exactly the procedure outlined in claim 1 of Recommind’s patent.  Of course, ‘predictive coding’ is a relatively new term, so the definition is up for debate.  The patent itself says:

‘Predictive coding refers to the capability to use a small set of coded documents (or partially coded documents) to predict document coding of a corpus.’ That sure sounds like it allows for a lot of possibilities beyond the procedure in claim 1 of the patent.  The press release goes on to say: ‘ONLY [emphasis is mine] Recommind’s patented, iterative, computer-assisted approach can ‘bend the cost curve’ of document review.’  Really?  So, Recommind has the ONLY product in the industry that works?  A few of us disagree.  Even clustering, which Recommind claims does not qualify as predictive coding will bend the cost curve because the efficiency boost it provides increases with the size of the document set.

Moving on from the press release to the patent itself, I would recommend reading claim 1 if you are interested in such things.  It is the most general method that the USPTO allowed Recommind to claim —  the other claims are all dependent claims that describe more specific embodiments of claim 1, presumably so that Recommind would have a leg left to stand on if prior art was found to invalidate claim 1.  Claim 1 describes a procedure for predictive coding that involves quite a few steps.  It is my understanding (I am NOT a lawyer) that the patent is irrelevant for any predictive coding procedure that does not include every single one of the steps listed in claim 1.  Since claim 1 includes things like identification cycles, rolling loads, and random sampling, it seems unlikely that existing products would accidentally infringe on the patent.

As far as Clustify is concerned, Recommind’s patent is irrelevant since our procedure for predictive coding is different.  In fact, I explained in a presentation at a recent conference why random sampling is a very inefficient approach (something that has been known for decades in other fields), so I wouldn’t even be tempted to follow Recommind’s procedure.”

So, what do you think?  Will the Recommind predictive coding patent allow them to rule predictive coding?  Or only their specific approach?  Will LeBron James ever win a championship?  Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: Hot Neuron is a partner of Trial Solutions, which has used their product, Clustify, in various client projects.

eDiscovery Case Law: Downloading Confidential Information Leads to Motion to Compel Production

The North Dakota District Court has recently decided in favor of a motion to compel production of electronic evidence, requiring imaging of computer hard drives, in a case involving the possible electronic theft of trade secrets.

In Weatherford U.S., L.P. v. Chase Innis and Noble Casings Inc., No. 4:09-cv-061, 2011 WL 2174045 (D.N.D. June 2, 2011), the court ruled to allow the plaintiff to select and hire a forensic expert at its own expense to conduct imaging of the defendants’ hard drives. The purpose of this investigation was to discern whether or not confidential data that was downloaded from the plaintiff’s computers was, in fact, used in the building of the defendants’ own oil services firm.

Although the judge noted that courts are generally “cautious” in authorizing such hard drive imaging, this motion was substantiated by the defendant, Innis’s, “acknowledgment that he downloaded [plaintiff’s] files to a thumb drive without permission.” The court believed that circumstances of the case warranted further investigation into the defendant’s computer history:

  • The plaintiff, Weatherford US LP, had previously alleged that Chance Innis, a former employee, had downloaded confidential and proprietary information and used it to his advantage in starting his own competing company, Noble Casing Inc.
  • Innis had admitted to returning to Weatherford US offices late in the evening of the day he was terminated and downloading files onto a thumb drive without permission. Two weeks later, he launched his own competing oil services company, the co-defendant in this case, Noble Casing Inc. However, Innis maintains that he did not later access the files stored on his thumb drive and never used them in the process of starting his own company.
  • Contrary to these assertions, forensic examination of the thumb drive showed that the files were later accessed; whether or not they were instrumental in the startup of Noble Casing Inc. remains in question.
  • The plaintiff requested access to the defendant’s computers in the pursuit of previously subpoenaed documents, proposing that they select, hire, and pay for the services of a forensic investigator to image the defendants’ hard drives.
  • The defendants objected, proposing instead that an expert be chosen in agreement by all parties.
  • The court ruled in favor of the plaintiff’s motion in this instance, agreeing that all materials imaged will be shown to the defendant to screen for privilege before being shared with the plaintiff.
  • The court maintained that it is not unusual for imaging of hard drives to be allowed by the court in cases such as this, “particularly in cases where trade secrets and electronic evidence are both involved.”

So, what do you think?  Do you agree that Weatherford should have been allowed to examine images of the defendants’ hard drives, or should Innis’ privacy and that of his company have been protected?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Avoiding eDiscovery Nightmares: 10 Ways CEOs Can Sleep Easier

I found this article in the CIO Central blog on Forbes.com from Robert D. Brownstone – it’s a good summary of issues for organizations to consider so that they can avoid major eDiscovery nightmares.  The author counts down his top ten list David Letterman style (clever!) to provide a nice easy to follow summary of the issues.  Here’s a summary recap, with my ‘two cents’ on each item:

10. Less is more: The U.S. Supreme Court ruled unanimously in 2005 in the Arthur Andersen case that a “retention” policy is actually a destruction policy.  It’s important to routinely dispose of old data that is no longer needed to have less data subject to discovery and just as important to know where that data resides.  My two cents: A data map is a great way to keep track of where the data resides.

9. Sing Kumbaya: They may speak different languages, but you need to find a way to bridge the communication gap between Legal and IT to develop an effective litigation-preparedness program.  My two cents: Require cross-training so that each department can understand the terms and concepts important to the other.  And, don’t forget the records management folks!

8. Preserve or Perish: Assign the litigation hold protocol to one key person, either a lawyer or a C-level executive to decide when a litigation hold must be issued.  Ensure an adequate process and memorialize steps taken – and not taken.  My two cents: Memorialize is underlined because an organization that has a defined process and the documentation to back it up is much more likely to be given leeway in the courts than a company that doesn’t document its decisions.

7. Build the Three-Legged Stool: A successful eDiscovery approach involves knowledgeable people, great technology, and up-to-date written protocols.  My two cents: Up-to-date written protocols are the first thing to slide when people get busy – don’t let it happen.

6. Preserve, Protect, Defend: Your techs need the knowledge to avoid altering metadata, maintain chain-of-custody information and limit access to a working copy for processing and review.  My two cents: A good review platform will assist greatly in all three areas.

5. Natives Need Not Make You Restless: Consider exchanging files to be produced in their original/”native” formats to avoid huge out-of-pocket costs of converting thousands of files to image format.  My two cents: Be sure to address how redactions will be handled as some parties prefer to image those while others prefer to agree to alter the natives to obscure that information.

4. Get M.A.D.?  Then Get Even: Apply the Mutually Assured Destruction (M.A.D.) principle to agree with the other side to take off the table costly volumes of data, such as digital voicemails and back-up data created down the road.  My two cents: That’s assuming, of course, you have the same levels of data.  If one party has a lot more data than the other party, there may be no incentive for that party to agree to concessions.

3. Cooperate to Cull Aggressively and to Preserve Clawback Rights: Setting expectations regarding culling efforts and reaching a clawback agreement with opposing counsel enables each side to cull more aggressively to reduce eDiscovery costs.  My two cents: Some parties will agree on search terms up front while others will feel that gives away case strategy, so the level of cooperation may vary from case to case.

2. QA/QC: Employ Quality Assurance (QA) tests throughout review to ensure a high accuracy rate, then perform Quality Control (QC) testing before the data goes out the door, building time in the schedule for that QC testing.  Also, consider involving a search-methodology expert.  My two cents: I cannot stress that last point enough – the ability to illustrate how you got from the large collection set to the smaller production set will be imperative to responding to any objections you may encounter to the produced set.

1. Never Drop Your Laptop Bag and Run: Dig in, learn as much as you can and start building repeatable, efficient approaches.  My two cents: It’s the duty of your attorneys and providers to demonstrate competency in eDiscovery best practices.  How will you know whether they have or not unless you develop that competency yourself?

So, what do you think?  Are there other ways for CEOs to avoid eDiscovery nightmares?   Please share any comments you might have or if you’d like to know more about a particular topic.