Industry Trends

eDiscovery Trends: Predictive Coding Strategy and Survey Results

Yesterday, we introduced the Virtual LegalTech online educational session Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and defined predictive coding while also noting the two “learning” methods that most predictive coding mechanisms use to predict document classifications.  To get background information regarding the session, including information about the speakers (Jason Baron, Maura Grossman and Bennett Borden), click here.

The session also focused on strategies for using predictive coding and results of the TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.  Strategies discussed by Bennett Borden include:

  • Understanding the technology used by a particular provider:  Not only will supervised and active learning mechanisms often yield different results, but there are differing technologies within each of these learning mechanisms.
  • Understand the state of the law regarding predictive coding technology: So far, there is no case law available regarding use of this technology and, while it may eventually be the future of document review, that has yet to be established.
  • Obtain buy-in by the requesting party to use predictive coding technology: It’s much easier when the requesting party has agreed to your proposed approach and that agreement is included in an order of the court which covers the approach and also includes a FRE 502 “clawback” agreement and order.  To have a chance to obtain that buy-in and agreement, you’ll need a diligent approach that includes “tiering” of the collection by probable responsiveness and appropriate sampling of each tier level.

Maura Grossman then described TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.  The team took the EDRM Enron Version 2 Dataset of 1.3 million public domain files, deduped it down to 685,000+ unique files and 5.5 GB of uncompressed data.  The team also identified eight different hypothetical eDiscovery requests for the test.

Participating predictive coding technologies were then given a “seed set” of roughly 1,000 documents that had previously been identified by TREC as responsive or non-responsive to each of the requests. Using this information, participants were required to rank the documents in the larger collection from most likely to least likely to be responsive, and estimate the likelihood of responsiveness as a probability for each document.  The study ranked the participants on recall rate accuracy based on 30% of the collection retrieved (200,000 files) and also on the predicted recall to determine a prediction accuracy.

The results?  Actual recall rates for all eight discovery requests ranged widely among the tools from 85.1% actual recall down to 38.2% (on individual requests, the range was even wider – as much as 82% different between the high and the low).  The prediction accuracy rates for the tools also ranged somewhat widely, from a high of 95% to a low of 42%.

Based on this study, it is clear that these technologies can differ significantly on how effective and efficient they are at correctly ranking and categorizing remaining documents in the collection based on the exemplar “seed set” of documents.  So, it’s always important to conduct sampling of both machine coded and human coded documents for quality control in any project, with or without predictive coding (we sometimes forget that human coded documents can just as often be incorrectly coded!).

For more about the TREC 2010 Legal Track study, click here.  As noted yesterday, you can also check out a replay of the session or download the slides for the presentation at the Virtual LegalTech site.

Full Disclosure: Trial Solutions provides predictive coding services using Hot Neuron LLC’s Clustify™, which categorizes documents by looking for similar documents in the exemplar set that satisfy a user-specified criteria, such as a minimum conceptual similarity or near-duplicate percentage.

So, what do you think?  Have you used predictive coding on a case?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: What the Heck is “Predictive Coding”?

 

Yesterday, ALM hosted another Virtual LegalTech online "live" day online.  Every quarter, theVirtual LegalTech site has a “live” day with educational sessions from 9 AM to 5 PM ET, most of which provide CLE credit in certain states (New York, California, Florida, and Illinois).

One of yesterday’s sessions was Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding”.  The speakers for this session were:

Jason Baron: Director of Litigation for the National Archives and Records Administration, a founding co-coordinator of the National Institute of Standards and Technology’s Text Retrieval Conference (“TREC”) legal track and co-chair and editor-in-chief for various working groups for The Sedona Conference®;

Maura Grossman: Counsel at Wachtell, Lipton, Rosen & Katz, co-chair of the eDiscovery Working Group advising the New York State Unified Court System and coordinator of the 2010 TREC legal track; and

Bennett Borden: co-chair of the e-Discovery and Information Governance Section at Williams Mullen and member of Working Group I of The Sedona Conference on Electronic Document Retention and Production, as well as the Cloud Computing Drafting Group.

This highly qualified panel discussed a number of topics related to predictive coding, including practical applications of predictive coding technologies and results of the TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.

Before discussing the strategies for using predictive coding technologies and the results of the TREC study, it’s important to understand what predictive coding is.  The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

The panel used an analogy for predictive coding by relating it to spam filters that review and classify email and learn based on previous classifications which emails can be considered “spam”.  Just as no spam filter perfectly classifies all emails as spam or legitimate, predictive coding does not perfectly identify all relevant documents.  However, they can “learn” to identify most of the relevant documents based on one of two “learning” methods:

  • Supervised Learning: a human chooses a set of “exemplar” documents that feed the system and enable it to rank the remaining documents in the collection based on their similarity to the exemplars (e.g., “more like this”);
  • Active Learning: the system chooses the exemplars on which human reviewers make relevancy determinations, then the system learns from those classifications to apply to the remaining documents in the collection.

Tomorrow, I “predict” we will get into the strategies and the results of the TREC study.  You can check out a replay of the session at theVirtual LegalTech site. You’ll need to register – it’s free – then login and go to the CLE Center Auditorium upon entering the site (which is up all year, not just on "live days").  Scroll down until you see this session and then click on “Attend Now” to view the replay presentation.  You can also go to the Resource Center at the site and download the slides for the presentation.

So, what do you think?  Do you have experience with predictive coding?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Tips: SaaS and eDiscovery – More Top Considerations

Friday, we began talking about the article regarding Software as a Service (SaaS) and eDiscovery entitled Top 7 Legal Things to Know about Cloud, SaaS and eDiscovery on CIO Update.com, written by David Morris and James Shook from EMC.  The article, which relates to storage of ESI within cloud and SaaS providers, can be found here.

The article looks at key eDiscovery issues that must be addressed for organizations using public cloud and SaaS offerings for ESI, and Friday’s post looked at the first three issues.  Here are the remaining four issues from the article (requirements in bold are quoted directly from the article):

4. What if there are technical issues with e-discovery in the cloud?  The article discusses how identifying and collecting large volumes of data can have significant bandwidth, CPU, and storage requirements and that the cloud provider may have to do all of this work for the organization.  It pays to be proactive, determine potential eDiscovery needs for the data up front and, to the extent possible, negotiate eDiscovery requirements into the agreement with the cloud provider.

5. If the cloud/SaaS provider loses or inadvertently deletes our information, aren’t they responsible? As noted above, if the agreement with the cloud provider includes eDiscovery requirements for the cloud provider to meet, then it’s easier to enforce those requirements.  Currently, however, these agreements rarely include these types of requirements.  “Possession, custody or control” over the data points to the cloud provider, but courts usually focus their efforts on the named parties in the case when deciding on spoliation claims.  Sounds like a potential for third party lawsuits.

6. If the cloud/SaaS provider loses or inadvertently deletes our information, what are the potential legal ramifications?  If data was lost because of the cloud provider, the organization will probably want to establish that they’re not at fault. But it may take more than establishing who deleted the data. – the organization may need to demonstrate that it acted diligently in selecting the provider, negotiating terms with established controls and notifying the provider of hold requirements in a timely manner.  Even then, there is no case law guidance as to whether demonstrating such would shift that responsibility and most agreements with cloud providers will limit potential damages for loss of data or data access.

7. How do I protect our corporation from fines and sanction for ESI in the cloud?  The article discusses understanding what ESI is potentially relevant and where it’s located.  This can be accomplished, in part, by creating a data map for the organization that covers data in the cloud as well as data stored within the organization.  Again, covering eDiscovery and other compliance requirements with the provider when negotiating the initial agreement can make a big difference.  As always, be proactive to minimize issues when litigation strikes.

Let’s face it, cloud and SaaS solutions are here to stay and they are becoming increasingly popular for organizations of all sizes to avoid the software and infrastructure costs of internal solutions.  Being proactive and including corporate counsel up front in decisions related to SaaS selections will enable your organization to avoid many potential problems down the line.

So, what do you think?  Does your company have mechanisms in place for discovery of your cloud data?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Tips: SaaS and eDiscovery – Top Considerations

 

There was an interesting article this week regarding Software as a Service (SaaS) and eDiscovery entitled Top 7 Legal Things to Know about Cloud, SaaS and eDiscovery on CIO Update.com, written by David Morris and James Shook from EMC.  The article, which relates to storage of ESI within cloud and SaaS providers, can be found here.

The authors note that “[p]roponents of the cloud compare it to the shift in electrical power generation at the turn of the century [1900’s], where companies had to generate their own electric power to run factories.  Leveraging expertise and economies of scale, electric companies soon emerged and began delivering on-demand electricity at an unmatched cost point and service level.”, which is what cloud components argue that the SaaS model is doing for IT services.

However, the decision to move to SaaS solutions for IT services doesn’t just affect IT – there are compliance and legal considerations to consider as well.  Because the parties to a case have a duty to identify, preserve and produce relevant electronically stored information (ESI), information for those parties stored in a cloud infrastructure or SaaS application is subject to those same requirements, even though it isn’t necessarily in their total control.  With that in mind, the article looks at key eDiscovery issues that must be addressed for organizations using public cloud and SaaS offerings for ESI, as follows (requirements in bold are quoted directly from the article):

  1. Where is ESI actually located when it is in the ethereal cloud or SaaS application?  It’s important to know where your data is actually stored.  Because SaaS providers are expected to deliver data on demand at any time, they may store your data in more than one data center for redundancy purposes.  Data centers could be located outside of the US, so different compliance and privacy requirements may come into play if there is a need to produce data from these locations.
  2. What are the legal implications of e-discovery in the cloud? Little case law exists on the subject, but it is expected that the responsibility for timely preservation, collection and production of the data remains with the organization at party in the lawsuit, even though that data may be in direct control of the cloud provider.
  3. What happens if a lawsuit is in the US but one company’s headquarters is in another country? Or what if the data is in a country where the privacy rules are different?  The article references one case – AccessData Corp. v. ALSTE Technologies GMBH , 2010 WL 318477 (D. Utah Jan. 21, 2010) – where the German company ALSTE cited German privacy laws as preventing it from collecting relevant company emails that were located in Germany (the US court compelled production anyway).  So, jurisdictional factors can come into play when cloud data is housed in a foreign jurisdiction.

This is too big a topic to cover in one post, so we’ll cover the other four eDiscovery issues to address in Monday’s post.  Let the anticipation build!

So, what do you think?  Does your company have ESI hosted in the cloud?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Facemail Unlikely to Replace Traditional Email

In a November post on eDiscoveryDaily, we reported that Facebook announced on November 15 that it’s rolling out a new messaging system, including chat, text messaging, status updates and email (informally dubbed “Facemail”) that would bring messaging systems together in one place, so you don’t have to remember how each of your friends prefers to be contacted.  Many have wondered whether Facemail would be a serious threat to Google’s Gmail, Yahoo Mail and Microsoft Live Hotmail, given that Facebook has a user base of 500 million plus users from which to draw.  And, there was considerable concern raised by eDiscovery analysts that Facebook plans to preserve these messages, regardless of the form in which they are generated, forever.

However, Facemail isn’t likely to replace users’ current email accounts, according to an online poll currently being conducted by the Wall Street Journal.  More than 61 percent of over 4,001 participants who have taken the poll so far said they wouldn’t use Facebook Messages as their primary email service.  18.4 percent of voters said that they would use it as their primary email, with 20.5 percent indicating that they were not sure.  You can cast your vote here.  I just voted, so these numbers reflect “up-to-the-minute” poll results (as of 5:52 AM CST, Wednesday, December 08, that is).

Facebook CEO Mark Zuckerberg envisions the Facemail model of email, instant messaging and SMS text messages as a simpler, faster messaging model than email’s traditional subject lines and carbon copies, which Zuckerberg considers to be “antiquated”.

Whether Facemail develops as a serious threat to Gmail, Hotmail or Yahoo Mail (or even Microsoft Outlook or Lotus Notes) remains to be seen.  However, at least a couple of industry analysts think that it could become a significant development.

“A powerful, unified presence manager would also enable the user to express how he’d like to communicate, and to manipulate that ‘how’ and ‘when’ availability to different types of contacts,” industry analyst David Card stated in a post on GigaOm.com.  “If Facebook establishes Messages as a user’s primary tool to manage presence across multiple communications vehicles, it would be an incredibly sticky app, with huge customer lock-in potential.”

Gartner analyst Matt Cain told eWEEK.com, “It will have little impact at first on the public portal email vendors because it is a barebones email service. But if Facebook makes it the equivalent of these other services, it will have a significant deleterious impact on competing email services”.

As stated in the earlier post, it’s important to have a social governance policy in place to not only address new mechanisms such as Facemail, but all social media mechanisms that might be in use by your employees.

So, what do you think?  Do you plan to consider using Facemail as your primary email service?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Some SaaS Benefits for eDiscovery

I found an interesting article on Ezine Articles by Sharon Gonzalez, a freelance technical writer with 15 years experience writing on various technical subjects, especially in the areas of cloud computing, Software as a Service (SaaS), and Internet technologies.  The article entitled EDiscovery on SaaS, discusses some of the benefits of SaaS solutions for eDiscovery.

Gonzalez notes that “use of [the] eDiscovery SaaS model which has brought down the costs of many organizations” because the “model is a vendor hosted infrastructure that is highly secured and the customers can run the applications from their own machines”.  Advantages noted by Gonzalez include:

  • Easy Manageable Services: Legal teams are able to process, analyze and review data files using the eDiscovery tools from the SaaS provider via their own browser and control and secure information within those tools.  No software to install.
  • No Problem for Storage Space: The SaaS model “eliminates all requirements of added infrastructure for…increasing storage space”.  While many eDiscovery SaaS models charge a monthly fee based on data stored, that fee is eliminated once the data is no longer needed.
  • Cost-Effective Solutions Provided: Gonzalez notes “Since…the SaaS architecture is maintained by vendors, IT departments are free from the burden of maintaining it. It is also a cost-effective method as it cuts down expenditure on hiring additional IT professionals and other physical components. The companies have to pay a charge to the vendors which work out far cheaper than investing large sums themselves”.
  • Built-In Disaster Recovery: Redundant storage, backup systems, backup power supplies, etc. are expensive to implement, but those mechanisms are a must for SaaS providers to provide their clients with the peace of mind that their data will be secure and accessible.  Because the SaaS provider is able to allocate the cost for those mechanisms across all of its clients, costs for each client are considerably less to provide that secure environment.

There are SaaS applications for eDiscovery throughout the EDRM life cycle from Information Management thru Presentation.

Full disclosure: Trial Solutions is the leader in self service, on demand SaaS litigation document review solutions, offering FirstPass™, powered by Venio FPR™, for early case assessment and first pass review as well as OnDemand™ for linear review and production.

So, what do you think?  Have you used any SaaS hosted solutions for eDiscovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Sanctions at an All-Time High

eDiscovery sanctions are at an all-time high, according to a Duke Law Journal law review article.  The article summarizes a study of 401 cases involving motions for sanctions related to discovery of electronically stored information (ESI) in federal courts through 2009, with a total of 230 sanction awards in those cases.  A link to the article can be found here.

In an increasing number of cases, more attention is focused on eDiscovery than on the merits, with a motion for sanctions becoming very common.  The sanctions imposed against parties in many of these cases have been severe, including adverse jury instructions, significant monetary awards and even dismissals. These sanctions have occurred despite the safe harbor provisions of Rule 37(e) of the Federal Rules of Civil Procedure, which have provided little protection to parties or counsel.

The study also found that defendants are sanctioned almost three times as often as the plaintiffs in a lawsuit (175 to 53). The most common type of misconduct to receive a sanction was failing to preserve relevant information (sanctions were granted in 90 cases). Often, multiple types of misconduct led to the sanctions. Other types of misconduct included a failure to produce information and delays in producing the information.

Other key notable stats:

  • 354 of the 401 cases where sanctions were requested and 198 of the 230 sanction awards have occurred since 2004;
  • The most common types of cases with sanctions are employment (17 percent), contract (16 percent), intellectual property (15.5 percent) and tort cases (11 percent);
  • 183 district court judges and 111 magistrate judges from 75 federal districts in 44 states, the Virgin Islands, the District of Columbia, and Puerto Rico, have issued written opinions regarding e-discovery sanctions;
  • Cases involving e-discovery sanctions and sanction awards more than tripled between 2003 and 2004, from 9 to 29 sanction cases, and from 6 to 21 sanction awards;
  • There were more e-discovery sanction cases (97) and more e-discovery sanction awards (46) in 2009 than in any prior year – more than in all years prior to 2005 combined!!

The study also has a year-to-year breakdown of sanctions from 1981 through 2009, with a bar chart that illustrates the tremendous growth in sanction cases and awards in the last six years.  A partner and senior attorneys at King & Spaulding’s Discovery Center assisted the students in analyzing the cases and identifying the trends in sanctions.

So, what do you think?  Have you been involved in any cases where sanctions have been requested or awarded?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Facemail and eDiscovery

Email is dead.

So says Facebook founder Mark Zuckerberg.  “It’s too formal,” he declared, announcing his company’s new messaging service last week in San Francisco.

Facebook announced last week that it’s rolling out a new messaging system, including chat, text messaging, status updates and email (surprise!).  Zuckerberg touts it as a way of bringing messaging systems together in one place, so you don’t have to remember how each of your friends prefers to be contacted.  Will the integrated product (informally dubbed “Facemail”) that some have called “Gmail killer” be a serious threat to Gmail, MSN and Yahoo Mail?  Maybe.  With 500 million plus users, Facebook certainly has a head start towards a potentially large user base.

However, some caveats to consider from a business standpoint:

  1. Facemail messages will be clustered by sender instead of by subject, which they consider to be “antiquated”.  May be great from a social standpoint, but not so good when you need to follow the thread of a conversation with multiple people.
  2. Unified messaging is not an entirely new concept.  Just last year, Google introduced Google Wave, designed to “merge key features of media like e-mail, instant messaging, wikis, and social networking”.  Earlier this year, Google announced plans to scrap Google Wave after it failed to gain a significant following.  It will be interesting to see whether Facebook can succeed where Google failed.
  3. From an eDiscovery perspective, the potential concern is that Facebook plans to preserve these messages, regardless of the form in which they are generated, forever.  So, if your company has a retention policy in place, these communications will fall outside of that policy.

Is it time to panic?  It might be tempting to overreact and ban the use of Facemail and other outside email and social media sites, but that seems impractical in today’s social media climate.

A better approach is to have a policy in place to govern use of outside email, chat and social media that covers what employees should do (e.g., act responsibly and ethically when participating in online communities), what employees should not do (e.g., disclose confidential information, plagiarize copyrighted information, etc.) and the consequences for violating the policy (e.g., lost customers, firings, lawsuits, etc.).  We will talk more about a social governance policy in an upcoming post.  In the meantime, here is a reference to our September post for information on requesting information from Facebook via civil subpoena.

So, what do you think?  Does your company have a social governance policy?  Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – So, what happened to the architect behind Google Wave, Lars Rasmussen?  He just joined Facebook.  Interesting, huh?  🙂