Doug Austin, Author at CloudNine

eDiscovery Trends: Brian Schrader of Business Intelligence Associates (BIA)

February 24, 2012

This is the fifth of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends?
What are you working on that you’d like our readers to know about?

Today’s thought leader is Brian Schrader. Brian is Co-Founder and President of Business Intelligence Associates, Inc. (BIA). Brian is an expert and frequent writer and speaker on eDiscovery and computer forensics topics, particularly those addressing the collection, preservation and processing functions of the eDiscovery process.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?

Well, I think you don't have to walk around the floor very much to see that this year everybody is talking about predictive coding. I think you're going to see that shake out a lot over the next year. We've been doing predictive coding for about a year and a half now, and we have our own algorithms for that. We have our review teams, and they've been using our algorithms to do predictive coding. We like to call it “suggestive coding”.

What I expect you’ll find this year is a standard shakeout among providers because everybody talks about predictive coding. The question is how does everybody approach it? It's very much a black-box solution. Most people don't know what goes on inside that process and how the process works. So, I think that's going to be a hot topic for a while. We're doing a lot of predictive coding and BIA is going to be announcing some cool things later this year on our predictive coding offerings.

Every provider that you talk to seems to have a predictive coding solution. I'm really looking forward to seeing how things develop, because we have a lot of input on it and a lot of experience. We have our review team that is reviewing millions and millions of documents per year, so we can compare various predictive coding engines to real results. It gives us the ability to review the technology. We look forward to being part of that conversation and I hope to see a little bit more clarity from the players and some real standards set around that process.

The courts have now also started to look at these algorithmic methods, Judge Peck in particular. Everybody agrees that key word searching is inadequate. But, people are still tentative about it – they say “it sounds good, but how does it work? How are we going to approach it?”

Which trend(s), if any, haven’t emerged to this point like you thought they would?

Frankly, I thought we'd see a lot more competition for us in data collection. A huge pain point for companies is how to gather all their data from all over the world. It's something we've always focused on. I started to see some providers focus on that, but now it looks like everybody, even some of the classic data collection providers, are focusing more on review tools. That surprises me a bit, though I'm happy to be left with a wide-open field to have more exposure there.

When we first came out with TotalDiscovery.com last year, we thought we'd see all sorts of similar solutions pop up out there, but we just haven't. Even the traditional collection companies haven't really offered a similar solution. Perhaps it’s because everybody has a “laser focus” on predictive coding, since document review is so much more expensive. I think that has really overpowered the focus of a lot of providers as they've focused only on that. We have tried to focus on both collection and review.

I think data processing has become a commodity. In talking to customers, they don't really ask about it anymore. They all expect that everybody has the same base level capabilities. Everybody knows that McDonald's secret sauce is basically Thousand Island dressing, so it’s no longer unique, the “jig is up”. So, it's all about the ends, the collection, and the review.

What are your general observations about LTNY this year and how it fits into emerging trends?

Well, predictive coding again. I think there's an awful lot of talk but not enough detail. What you're seeing is a lot of providers who are saying “we’ll have predictive coding in six months”. You're going to see a huge number of players in that field this year. Everybody's going to throw a hat in the ring, and it's going to be interesting to see how that all works out. Because how do you set the standards? Who gets up there and really cooperates?

I think it's really up to the individual companies to get together and cooperate on this. This particular field is so critical to the legal process that I don't think you can have everybody having individual standards and processes. The most successful companies are going to be the ones that step up and work together to set those standards. And, I don't know for sure, but I wouldn't be surprised if The Sedona Conference already has a subcommittee on this topic.

What are you working on that you’d like our readers to know about?

Our biggest announcement is around data collection – we've vastly expanded it. Our motto is to collect “any data, anytime, anywhere”. We've been providing data collection services for over a decade, and our collection guys like to say they've never met a piece of data they didn't like.

Now, we've brought that data collection capability direction to TotalDiscovery.com. The latest upgrade, which we’re previewing at the show to be released in March, will offer the ability to collect data from social media sites like Facebook, Twitter, as well as collections from Webmail and Apple systems. So, you can collect pretty much anything through TotalDiscovery.com that we have historically offered in our services division. It gives you a single place to manage data collection and bring it all together in one place, and then deliver it out to the review platform you want.

We’re on a three-week development cycle, which doesn’t always mean new features every three weeks, but it does mean we’re regularly adding new features. Mid-year in 2011, we added legal hold capabilities and we’ve also recently added other components to simplify search and data delivery. Now, we’ve added expanded collection for social media sites, Webmail and Apple. Later this year, we expect to release our predictive coding capabilities to enable clients to perform predictive coding right after collection instead of waiting until the data is in the review tool.

Thanks, Brian, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: DOJ Criminal Attorneys Now Have Their Own eDiscovery Protocols

February 23, 2012

Criminal attorneys, are you discouraged that there is a lack of eDiscovery rules and guidelines for criminal cases? If you work for the Department of Justice or other related law enforcement agencies, cheer up!

As noted in the Law Technology News article, DOJ Lays Down the Law on Criminal E-Discovery Protocols, written by Evan Koblentz, the government's Joint Electronic Technology Working Group (JETWG), led by the DOJ, unveiled its best practices guide for eDiscovery at a federal software summit in Washington on February 10. The 21 page document, “intended for cases where the volume and/or nature of the ESI produced as discovery significantly increases the complexity of the case”, primarily consists of three sections:

Recommendations for ESI Discovery in Federal Criminal Cases: Provides a general framework for managing ESI, including planning, production, transmission, dispute resolution, and security;
Strategies and Commentary on ESI Discovery in Federal Criminal Cases: Provide more detailed guidance for implementing the recommendations – this section will evolve to reflect experiences in actual cases; and
ESI Discovery Checklist: One page checklist for addressing ESI production issues.

While the one page checklist has several items that would apply to any case, there are some items specific to criminal cases that would make it a handy reference for conducting eDiscovery on those cases. The three sections are based on ten basic principles, which should have familiarity to those who have been dealing with eDiscovery in civil cases. They are as follows:

Lawyers have a responsibility to have an adequate understanding of electronic discovery.
In the process of planning, producing, and resolving disputes about ESI discovery, the parties should include individuals with sufficient technical knowledge and experience regarding ESI.
At the outset of a case, the parties should meet and confer about the nature, volume, and mechanics of producing ESI discovery. Where the ESI discovery is particularly complex or produced on a rolling basis, an on-going dialogue may be helpful.
The parties should discuss what formats of production are possible and appropriate, and what formats can be generated. Any format selected for producing discovery should maintain the ESI’s integrity, allow for reasonable usability, reasonably limit costs, and, if possible, conform to industry standards for the format.
When producing ESI discovery, a party should not be required to take on substantial additional processing or format conversion costs and burdens beyond what the party has already done or would do for its own case preparation or discovery production.
Following the meet and confer, the parties should notify the court of ESI discovery production issues or problems that they reasonably anticipate will significantly affect the handling of the case.
The parties should discuss ESI discovery transmission methods and media that promote efficiency, security, and reduced costs. The producing party should provide a general description and maintain a record of what was transmitted.
In multi-defendant cases, the defendants should authorize one or more counsel to act as the discovery coordinator(s) or seek appointment of a Coordinating Discovery Attorney.
The parties should make good faith efforts to discuss and resolve disputes over ESI discovery, involving those with the requisite technical knowledge when necessary, and they should consult with a supervisor, or obtain supervisory authorization, before seeking judicial resolution of an ESI discovery dispute or alleging misconduct, abuse, or neglect concerning the production of ESI.
All parties should limit dissemination of ESI discovery to members of their litigation team who need and are approved for access, and they should also take reasonable and appropriate measures to secure ESI discovery against unauthorized access or disclosure.

Evan’s article provides comments from Andrew Goldsmith, the national criminal eDiscovery coordinator, regarding the efforts and intent of the document and training program for DOJ attorneys and other law enforcement personnel, as well as efforts of the department to determine how to apply commercial, civil litigation oriented, eDiscovery software to criminal cases. It’s a good read and the guidelines look promising as a resource for criminal attorneys to manage eDiscovery in those cases.

So, what do you think? Do these guidelines show promise for eDiscovery in criminal cases? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Tom Gelbmann of Gelbmann & Associates, LLC

February 22, 2012

This is the fourth of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends?
What are you working on that you’d like our readers to know about?

Today’s thought leader is Tom Gelbmann. Tom is Principal of Gelbmann & Associates, LLC. Since 1993, Gelbmann & Associates, LLC has advised law firms and Corporate Law Departments to realize the full benefit of their investments in Information Technology. Tom has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey; last year he and George Socha converted the Survey into Apersee, an online system for selecting eDiscovery providers and their offerings. In 2005, he and George Socha launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are nine active projects with over 300 members from 81 participating organizations.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012? And which trend(s), if any, haven’t emerged to this point like you thought they would?

I’m seeing an interesting trend regarding offerings from traditional top tier eDiscovery providers. Organizations who have invested in eDiscovery related technologies are beginning to realize these same technologies can be applied to information governance and compliance and enable an organization to get a much greater grasp on its total content. Greater understanding of location and profile of content not only helps with eDiscovery and compliance, but also business intelligence and finally – destruction – something few organizations are willing to address.

We have often heard – Storage is cheap. The full sentence should be: Storage is cheap, but management is expensive. I think that a lot of the tools that have been applied for collection, culling, search and analysis enable organizations to look at large quantities of information that is needlessly retained. It also allows them to take a look at information and get some insights on their processes and how that information is either helping their processes or, more importantly, hindering those processes and I think it's something you're going to see will help sell these tools upstream rather than downstream.

As far as items that haven't quite taken off, I think that technology assisted coding – I prefer that term over “predictive coding” – is coming, but it's not there yet. It’s going to take a little bit more, not necessarily waiting for the judiciary to help, but just for organizations to have good experiences that they could talk about that demonstrate the value. You're not going to remove the human from the process. But, it's giving the human a better tool. It’s like John Henry, with the ax versus the steam engine. You can cut a lot more wood with the steam engine, but you still need the human.

What are your general observations about LTNY this year and how it fits into emerging trends?

Based on the sessions that I've attended, I think there's much more education. There's just really more practical information for people to take away on how to manage eDiscovery and deal with eDiscovery related products or problems, whether it's cross-border issues, how to deal with the volumes, how to bring processes in house or work effectively with vendors. There's a lot more practical “how-tos” than I've seen in the past.

What are you working on that you’d like our readers to know about?

Well, I think one of the things I'm very proud of with EDRM is that just before LegalTech, we put out a press release of what's happening with the projects, and I'm very pleased that five of the nine EDRM projects had significant announcements. You can go to EDRM.net for that press release that details those accomplishments, but it shows that EDRM is very vibrant, and the teams are actually making good progress.

Secondly, George Socha and I are very proud about the progress of Apersee, which was announced last year at LegalTech. We've learned a lot, and we've listened to our clientele in the market – consumers and providers. We listened, and then our customers changed their mind. But, as a result, it's on a stronger track and we're very proud to announce that we have two gold sponsors, AccessData and Nuix. We’re also talking to additional potential sponsors, and I think we'll have those announcements very shortly.

Thanks, Tom, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Best Practices: Google’s Blunder Keeps Them Under the (Smoking) Gun

February 21, 2012

As we noted back in November, a mistake made by Google during discovery in its lawsuit with Oracle could cost the company dearly, perhaps billions. Here’s a brief recap of the case:

Google is currently involved in a lawsuit with Oracle over license fees associated with Java, which forms a critical part of Google’s Android operating system. Google has leveraged free Android to drive mobile phone users to their ecosystem and extremely profitable searches and advertising.

Despite the use of search technology to cull down a typically large ESI population, a key email, written by Google engineer Tim Lindholm a few weeks before Oracle filed suit against Google, was produced that could prove damaging to their case. With the threat of litigation from Oracle looming, Lindholm was instructed by Google executives to identify alternatives to Java for use in Android, presumably to strengthen their negotiating position.

“What we’ve actually been asked to do (by Larry and Sergey) is to investigate what technical alternatives exist to Java for Android and Chrome,” the email reads in part, referring to Google co-founders Larry Page and Sergey Brin. “We’ve been over a bunch of these, and think they all suck. We conclude that we need to negotiate a license for Java under the terms we need.”

Lindholm added the words “Attorney Work Product” and sent the email to Andy Rubin (Google’s top Android executive) and Google in-house attorney Ben Lee; however, Lindholm’s computer saved nine drafts of the email while he was writing it – before he added the words and addressed the email to Lee. Because Lee’s name and the words “attorney work product” weren’t on the earlier drafts, they weren’t picked up by the eDiscovery software as privileged documents, and they were produced to Oracle.

Judge William Alsup of the U.S. District Court in Oakland, California, indicated to Google’s lawyers that it might suggest willful infringement of Oracle’s patents and despite Google’s motion to “clawback” the email on the grounds it was “unintentionally produced privileged material”, Alsup refused to exclude the document at trial. Google next filed a petition for a writ of mandamus with the U.S. Court of Appeals for the Federal Circuit in Washington, D.C., seeking to have the appeals court overrule Alsup’s decision permitting Oracle to use the email as evidence in the trial.

On February 6, the Federal Circuit upheld Alsup’s ruling that the email is not privileged, denying Google’s mandamus petition. Observing that the email was written at the request of Google’s co-founders, Larry Page and Sergey Brin (who are not lawyers) and did not refer specifically to legal advice or the senior counsel’s investigation, the appeals court rejected Google’s petition.

As we noted before, organizing the documents into clusters based on similar content, might have grouped the unsent drafts with the identified “attorney work product” final version and helped to ensure that the drafts were classified as intended and not produced.

So, what do you think? Could this mistake cost Google billions? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Jim McGann of Index Engines

February 20, 2012

This is the third of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends?
What are you working on that you’d like our readers to know about?

Today’s thought leader is Jim McGann. Jim is Vice President of Information Discovery at Index Engines. Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. He has worked for leading software firms, including Information Builders and the French-based engineering software provider Dassault Systemes. In recent years he has worked for technology-based start-ups that provide financial services and information management solutions.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012? And which trend(s), if any, haven’t emerged to this point like you thought they would?

I think what we're seeing is a lot of people becoming a bit more proactive. I may combine your questions together because I'm surprised that people haven’t become proactive sooner. LegalTech has included a focus on litigation readiness for how long? Ten years or so? And we're still dealing with how to react to litigation, and you're still seeing fire drills occur. There’s still not enough setting up of environments in the corporate world and in the legal world that would enable customers to respond more quickly. It surprises me how little has been developed in this regard..

I think the reason for the slow start is that there are a lot of regulations that have been evolving and people haven't really understood what they need to prepare and how to react. There’s been ten years of LegalTech and we're still struggling with how to respond to basic litigation requests because the volume has grown, accessibility arguments have changed, Federal rules have been solidified, and so forth.

What we're seeing when we go and talk to customers (and we talk to a lot of end-user customers that are facing litigation) is IT on one end of the table saying, ‘we need to solve this for the long term’, and litigation support teams on the other end of the table saying, ‘I need this today, I’ve been requesting data since July, and I still haven't received it and it's now January’. That's not good.

The evolution is from what we call “litigation support”. Litigation support, which is more on the reactive side to proactive litigation readiness, expects to be able to push a button and put a hold on John Doe's mailbox. Or, specifically find content that’s required at a moment's notice.

So, I think the trend is litigation readiness. Are people really starting to prepare for it? Every meeting that we go into, we see IT organizations, who are in the compliance security groups, rolling up their sleeves and saying I need to solve this for my company long term but we have this litigation. It's a mixed environment. In the past, we would go meet with litigation support teams, and IT wasn't involved. You're seeing buzz words like Information Governance. You're seeing big players like IBM, EMC and Symantec jumping deep into it.

What's strange is that IT organizations are getting involved in formalizing a process that hasn't been formalized in the past. It's been very much, maybe not “ad hoc”, but IT organizations did what they could to meet project needs. Now IT is looking at solving the problem long term, and there’s a struggle. Attorneys are not the best long term planners – they're doing what they need to do. They've got 60 days to do discovery, and IT is thinking five years. We need to balance this out.

What are your general observations about LTNY this year and how it fits into emerging trends?

We're talking to a lot of people that are looking at next generation solutions. The problems have changed, so solutions are evolving to address how you solve those problems.

There's also been a lot of consolidation in the eDiscovery space as well, so people are saying that their relationship has changed with their other vendors. There have been a lot of those conversations.

I'm not sure what the attendance is at this year’s show, but attendees seem to be serious about looking for new solutions. Maybe because the economy was so bad over the past year or maybe because it's a new budget year and budgets are freeing up, but people are looking at making changes, looking at new solutions. We see that a lot with service providers, as well as law firms and other end users.

What are you working on that you’d like our readers to know about?

We’ve announced the release of Octane Version 4.3, which preserves files and emails at a bit level from MS Exchange and IBM Lotus Notes, as well as indexing forensics images and evidence files at speeds reaching 1TB per hour using a single node. Bit-for-bit email processing and forensic image indexing speeds are unprecedented breakthroughs in the industry. Bit-level indexing is not only faster but also more reliable because email is stored in its original format with no need for conversion. Index Engines can also now index terabytes of network data including forensic images in hours, not weeks, like traditional tools. So, we’re excited about the new version of Octane.

We’ve also just announced a partnership with Merrill Corporation, to provide our technology to collect and process ESI from networks, desktops, forensic images and legacy backup tapes, for both reactive litigation and proactive litigation readiness. Merrill has recognized the shift in reactive to proactive litigation readiness that I mentioned earlier and we are excited to be aligned with Merrill in meeting the demands of their customers in this regard.

Thanks, Jim, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Christine Musil of Informative Graphics Corporation (IGC)

February 17, 2012

This is the second of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends? (Note: Christine was interviewed the night before the show, so there were obviously no observations at that point)
What are you working on that you’d like our readers to know about?

Today’s thought leader is Christine Musil. Christine has a diverse career in engineering and marketing spanning 18 years. Christine has been with IGC since March 1996, when she started as a technical writer and a quality assurance engineer. After moving to marketing in 2001, she has applied her in-depth knowledge of IGC's products and benefits to marketing initiatives, including branding, overall messaging, and public relations. She has also been a contributing author to a number of publications on archiving formats, redaction, and viewing technology in the enterprise.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012? And which trend(s), if any, haven’t emerged to this point like you thought they would?

That's a hard question. Especially for us because we're somewhat tangential to the market, and not as deeply enmeshed in the market as a lot of the other vendors are. I think the number of acquisitions in the industry was what we expected, though maybe the M&A players themselves were surprising. For example, I didn't personally see the recent ADI acquisition (Applied Discovery acquired by Siris Capital) coming. And while we weren’t surprised that Clearwell was acquired, we thought that their being acquired by Symantec was an interesting move.

So, we expect the consolidation to continue. We watched the major content management players like EMC OpenText to see if they would acquire additional, targeted eDiscovery providers to round out some of their solutions, but through 2011 they didn’t seem to have decided whether they're “all in” despite some previous acquisitions in the space. We had wondered if some of them have decided maybe they're out again, though EMC is here in force for Kazeon this year. So, I think that’s some of what surprised me about the market.

Other trends that I see are potentially more changes in the FRCP (Federal Rules of Civil Procedure) and probably a continued push towards project-based pricing. We have certainly felt the pressure to do more project-based pricing, so we're watching that. Escalating data volumes have caused cost increases and, obviously, something's going to have to give there. That's where I think we’re going to see more regulations come out through new FRCP rules to provide more proportionality to the Discovery process, or clients will simply dictate more pricing alternatives.

What are you working on that you’d like our readers to know about?

We just announced a new release of our Brava!^® product, version 7.1, at the show. The biggest additions to Brava are in the Enterprise version, and we’re debuting a the new Brava Changemark^® Viewer (Changemark®) for smartphones as well as an upcoming Brava HTML client for tablets. iPads have been a bigger game changer than I think a lot of people even anticipated. So, we’re excited about it. Also new with Brava 7.1 isvideo collaboration and improved enterprise readiness and performance for very large deployments.

We also just announced the results of our Redaction Survey, which we conducted to gauge user adoption of toward electronic redaction software. Nearly 65% of the survey respondents were from law firms, so that was a key indicator of the importance of redaction within the legal community. Of the respondents, 25% of them indicated that they are still doing redaction manually, with markers or redaction tape, 32% are redacting electronically, and nearly 38% are using a combined approach with paper-based and software-driven redaction. Of those that redact electronically, the reasons that they prefer electronic redaction included professional look of the redactions, time savings, efficiency and “environmental friendliness” of doing it electronically.

For us, it's exciting moving into those areas and our partnerships continue to be exciting, as well. We have partnerships with LexisNexis and Clearwell, both of which are unaffected by the recent acquisitions. So, that's what's new at IGC.

Thanks, Christine, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Case Law: Predictive Coding Considered by Judge in New York Case

February 16, 2012

In Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (S.D.N.Y. Feb. 8, 2012), Magistrate Judge Andrew J. Peck of the U.S. District Court for the Southern District of New York instructed the parties to submit proposals to adopt a protocol for e-discovery that includes the use of predictive coding, perhaps the first known case where a technology assisted review approach was considered by the court.

In this case, the plaintiff, Monique Da Silva Moore, filed a Title VII gender discrimination action against advertising conglomerate Publicis Groupe, on her behalf and the behalf of other women alleged to have suffered discriminatory job reassignments, demotions and terminations. Discovery proceeded to address whether Publicis Groupe:

Compensated female employees less than comparably situated males through salary, bonuses, or perks;
Precluded or delayed selection and promotion of females into higher level jobs held by male employees; and
Disproportionately terminated or reassigned female employees when the company was reorganized in 2008.

Consultants provided guidance to the plaintiffs and the court to develop a protocol to use iterative sample sets of 2,399 documents from a collection of 3 million documents to yield a 95 percent confidence level and a 2 percent margin of error (see our previous posts here, here and here on how to determine an appropriate sample size, randomly select files and conduct an iterative approach). In all, the parties expect to review between 15,000 to 20,000 files to create the “seed set” to be used to predictively code the remainder of the collection.

The parties were instructed to submit their draft protocols by February 16th, which is today(!). The February 8th hearing was attended by counsel and their respective ESI experts. It will be interesting to see what results from the draft protocols submitted and the opinion from Judge Peck that results.

So, what do you think? Should courts order the use of technology such as predictive coding in litigation? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: George Socha of Socha Consulting

February 15, 2012

This is the first of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends?
What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha. A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey; last year he and Tom Gelbmann converted the Survey into Apersee, an online system for selecting eDiscovery providers and their offerings. In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are nine active projects with over 300 members from 81 participating organizations. George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?

I may have said this last year too, but it holds true even more this year – if there's an emerging trend, it's the trend of people talking about the emerging trend. It started last year and this year every person in the industry seems to be delivering the emerging trend. Not to be too crass about it, but often the message is, "Buy our stuff", a message that is not especially helpful.

Regarding actual emerging trends, each year we all try to sum up legal tech in two or three words. The two words for this year can be “predictive coding.” Use whatever name you want, but that's what everyone seems to be hawking and talking about at LegalTech this year. This does not necessarily mean they really can deliver. It doesn't mean they know what “predictive coding” is. And it doesn't mean they've figured out what to do with “predictive coding.” Having said that, expanding the use of machine assisted review capabilities as part of the e-discovery process is a important step forward. It also has been a while coming. The earliest I can remember working with a client, doing what's now being called predictive coding, was in 2003. A key difference is that at that time they had to create their own tools. There wasn't really anything they could buy to help them with the process.

Which trend(s), if any, haven’t emerged to this point like you thought they would?

One thing I don't yet hear is discussion about using predictive coding capabilities as a tool to assist with determining what data to preserve in the first place. Right now the focus is almost exclusively on what do you do once you’ve “teed up” data for review, and then how to use predictive coding to try to help with the review process.

Think about taking the predictive coding capabilities and using them early on to make defensible decisions about what to and what not to preserve and collect. Then consider continuing to use those capabilities throughout the e-discovery process. Finally, look into using those capabilities to more effectively analyze the data you're seeing, not just to determine relevance or privilege, but also to help you figure out how to handle the matter and what to do on a substantive level.

What are your general observations about LTNY this year and how it fits into emerging trends?

Well, Legal Tech continues to have been taken over by electronic discovery. As a result, we tend to overlook whole worlds of technologies that can be used to support and enhance the practice of law. It is unfortunate that in our hyper-focus on e-discovery, we risk losing track of those other capabilities.

What are you working on that you’d like our readers to know about?

With regard to EDRM, we recently announced that we have hit key milestones in five projects. Our EDRM Enron Email Data Set has now officially become an Amazon public dataset, which I think will mean wider use of the materials.

We announced the publication of our Model Code of Conduct, which was five years in the making. We have four signatories so far, and are looking forward to seeing more organizations sign on.

We announced the publication of version 2.0 of our EDRM XML schema. It's a tightened-up schema, reorganized so that it should be a bit easier to use and more efficient in the operation.

With the Metrics project, we are beginning to add information to a database that we've developed to gather metrics, the objective being to be able to make available metrics with an empirical basis, rather than the types of numbers bandied about today, where no one seems to know how they were arrived at. Also, last year the Uniform Task Billing Management System (UTBMS) code set for litigation was updated. The codes to use for tracking e-discovery activities were expanded from a single code that covered not just e-discovery but other activities, to a number of codes based on the EDRM Metrics code set.

On the Information Governance Reference Model (IGRM) side, we recently published a joint white paper with ARMA. The paper cross-maps the EDRMs Information Governance Reference Model (IGRM) with ARMA's Generally Accepted Recordkeeping Principles (GARP). We look forward to more collaborative materials coming out of the two organizations.

As for Apersee, we continue to allow consumers search the data on the site for free, but we also are longer charging providers a fee for their information to be available. Instead, we now have two sponsors and some advertising on the site. This means that any provider can put information in, and everyone can search that information. The more data that goes in, the more useful the searching process comes because. All this fits our goal of creating a better way to match consumers with the providers who have the services, software, skills and expertise that the consumers actually need.

And on a consulting and testifying side, I continue to work a broad array of law firms; corporate and governmental consumers of e-discovery services and software; and providers offering those capabilities.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: “Assisted” is the Key Word for Technology Assisted Review

February 14, 2012

As noted in our blog post entitled 2012 Predictions – By The Numbers, almost all of the sets of eDiscovery predictions we reviewed (9 out of 10) predicted a greater emphasis on Technology Assisted Review (TAR) in the coming year. It was one of our predictions, as well. And, during all three days at LegalTech New York (LTNY) a couple of weeks ago, sessions were conducted that addressed technology assisted review concepts and best practices.

While some equate technology assisted review with predictive coding, other technology approaches such as conceptual clustering are also increasing in popularity. They qualify as TAR approaches, as well. However, for purposes of this blog post, we will focus on predictive coding.

Over a year ago, I attended a Virtual LegalTech session entitled Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and wrote a blog post from that entitled What the Heck is “Predictive Coding”? The speakers for the session were Jason R. Baron, Maura Grossman and Bennett Borden (Jason and Bennett are previous thought leader interviewees on this blog). The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

It’s very cool technology and capable of efficient and accurate review of the document collection, saving costs without sacrificing quality of review (in some cases, it yields even better results than traditional manual review). However, there is one key phrase in the definition above that can make or break the success of the predictive coding process: “based on human review of only a subset of the document collection”.

Key to the success of any review effort, whether linear or technology assisted, is knowledge of the subject matter. For linear review, knowledge of the subject matter usually results in preparation of high quality review instructions that (assuming the reviewers competently follow those instructions) result in a high quality review. In the case of predictive coding, use of subject matter experts (SMEs) to review a core subset of documents (typically known as a “seed set”) and make determinations regarding that subset is what enables the technology in predictive coding to “predict” the responsiveness and importance of the remaining documents in the collection. The more knowledgeable the SMEs are in creating the “seed set”, the more accurate the “predictions” will be.

And, as is the case with other processes such as document searching, sampling the results (by determining the appropriate sample size of responsive and non-responsive items, randomly selecting those samples and reviewing both groups – responsive and non-responsive – to test the results) will enable you to determine how effective the process was in predictively coding the document set. If sampling shows that the process yielded inadequate results, take what you’ve learned from the sample set review and apply it to create a more accurate “seed set” for re-categorizing the document collection. Sampling will enable you to defend the accuracy of the predictive coding process, while saving considerable review costs.

So, what do you think? Have you utilized predictive coding in any of your reviews? How did it work for you? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: International Trade Commission Considers Proportionality Proposal

February 13, 2012

As eDiscovery costs continue to escalate, proposals to bring proportionality to the eDiscovery process have become increasingly popular, such as this model order to limit eDiscovery in patent cases proposed by Federal Circuit Chief Judge Randall Rader last year (which was adopted for use in this case). In January, Chief Judge Rader and three members of the Council (Council Chairman Ed Reines of Weil, Tina Chappell of Intel Corporation, and John Whealan, Associate Dean of Intellectual Property Studies at the George Washington University School of Law) presented a proposal to the U.S. International Trade Commission (USITC) to streamline eDiscovery in section 337 investigations.

Under Section 337 of the Tariff Act of 1930 (19 U.S.C. § 1337), the USITC conducts investigations into allegations of certain unfair practices in import trade. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

The proposal tracks the approach of the district court eDiscovery model order that is being adopted in several district courts and under consideration in others. Chairman Reines described the proposal as flexible, reasonably simple, and easy to administer. Under the proposal, litigants would:

Indicate whether ESI such as email is being sought or not;
Presumptively limit the number of custodians whose files will be searched, the locations of those documents, and the search terms that will be used (if litigants exceed the specified limits, they would assume the additional costs);
Use focused search terms limited to specific contested issues; and
Allow privileged documents to be exchanged without losing privilege.

For more regarding the regarding the USITC proposal to streamline eDiscovery in section 337 investigations, including reactions from USITC members, click to see the USITC press release here.

So, what do you think? Please share any comments you might have or if you’d like to know more about a particular topic.

Doug Austin

eDiscovery Trends: Brian Schrader of Business Intelligence Associates (BIA)

eDiscovery Trends: DOJ Criminal Attorneys Now Have Their Own eDiscovery Protocols

eDiscovery Trends: Tom Gelbmann of Gelbmann & Associates, LLC

eDiscovery Best Practices: Google’s Blunder Keeps Them Under the (Smoking) Gun

eDiscovery Trends: Jim McGann of Index Engines

eDiscovery Trends: Christine Musil of Informative Graphics Corporation (IGC)

eDiscovery Case Law: Predictive Coding Considered by Judge in New York Case

eDiscovery Trends: George Socha of Socha Consulting

eDiscovery Trends: “Assisted” is the Key Word for Technology Assisted Review

eDiscovery Trends: International Trade Commission Considers Proportionality Proposal

Status: Updated