Review Archives

eDiscovery Case Law: Award for Database Costs Reversed Due to Cost Sharing Agreement

December 27, 2011

An award of costs of $938,957.72, including the winning party’s agreed half share of the cost of a database or $234,702.43, was reversed in Synopsys, Inc. v. Ricoh Co. (In re Ricoh Co. Patent Litigation), No. 2011-1199 (Fed. Cir. Nov. 23, 2011). While the cost of the database could have been taxed to the losing party, the agreement between the parties on cost sharing controlled the ultimate taxation of costs.

After almost seven years of litigation, Synopsys obtained summary judgment and a declaration in Ricoh’s action against seven Synopsys customers that a Ricoh software patent on integrated circuits had not been infringed. During the litigation, Ricoh and Synopsis were unable to agree on a form of production of Synopsis email with its customers, and Ricoh suggested using an electronic discovery company to compile and maintain a database of the email. Synopsis agreed to use of the company’s services and to pay half the cost of the database. After Synopsis obtained summary judgment, the district court approved items in the Synopsis bill of costs totaling $938,957.72, including $234,702.43 for Synopsis’ half share of the cost of the database and $234,702.43 for document production costs.

The court on appeal of the taxation of costs agreed that 28 U.S.C.S. § 1920 provided for recovery of the cost of the database, which was used to produce email in its native format. According to the court, “electronic production of documents can constitute ‘exemplification’ or ‘making copies’ under section 1920(4).” However, the parties had entered into an agreement on splitting the cost of the database and nothing in the 14-page agreement or communications regarding the agreement indicated that the agreement was anything other than a final agreement on the costs of the database. Faced with “scant authority from other circuits as to whether a cost-sharing agreement between parties to litigation is controlling as to the ultimate taxation of costs,” the court concluded the parties’ cost-sharing agreement was controlling. It reversed the district court’s award of $234,702.43 for Synopsis’ half share of the cost of the database.

The court also reversed and remanded the award of an additional $234,702.43 for document production costs because those costs were not adequately documented. For example, many of the invoices simply stated “document production” and did not indicate shipment to opposing counsel. The court stated that the “document production” phrase “does not automatically signify that the copies were produced to opposing counsel.”

So, what do you think? Should the agreement between parties have superseded the award? Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required).

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Bennett Borden

December 15, 2011

This is the second of our Holiday Thought Leader Interview series. I interviewed several thought leaders to get their perspectives on various eDiscovery topics.

Today's thought leader is Bennett B. Borden. Bennett is the co-chair of Williams Mullen’s eDiscovery and Information Governance Section. Based in Richmond, Va., his practice is focused on Electronic Discovery and Information Law. He has published several papers on the use of predictive coding in litigation. Bennett is not only an advocate for predictive coding in review, but has reorganized his own litigation team to more effectively use advanced computer technology to improve eDiscovery.

You have written extensively about the ways that the traditional, or linear review process is broken. Most of our readers understand the issue, but how well has the profession at large grappled with this? Are the problems well understood?

The problem with the expense of document review is well understood, but how to solve it is less well known. Fortunately, there is some great research being done by both academics and practitioners that is helping shed light on both the problem and the solution. In addition to the research we’ve written about in The Demise of Linear Review and Why Document Review is Broken, some very informative research has come out of the TREC Legal Track and subsequent papers by Maura R. Grossman and Gordon V. Cormack, as well as by Jason R. Baron, the eDiscovery Institute, Douglas W. Oard and Herbert L. Roitblat, among others. Because of this important research, the eDiscovery bar is becoming increasingly aware of how document review and, more importantly, fact development can be more effective and less costly through the use of advanced technology and artful strategy.

You are a proponent of computer-assisted review- is computer search technology truly mature? Is it a defensible strategy for review?

Absolutely. In fact, I would argue that computer-assisted review is actually more defensible than traditional linear review. By computer-assisted review, I mean the utilization of advanced search technologies beyond mere search terms (e.g., topic modeling, clustering, meaning-based search, predictive coding, latent semantic analysis, probabilistic latent semantic analysis, Bayesian probability) to more intelligently address a data set. These technologies, to a greater or lesser extent, group documents based upon similarities, which allows a reviewer to address the same kinds of documents in the same way.

Computers are vastly superior to humans in quickly finding similarities (and dissimilarities) within data. And, the similarities that computers are able to find have advanced beyond mere content (through search terms) to include many other aspects of data, such as correspondents, domains, dates, times, location, communication patterns, etc. Because the technology can now recognize and address all of these aspects of data, the resulting groupings of documents is more granular and internally cohesive. This means that the reviewer makes fewer and more consistent choices across similar documents, leading to a faster, cheaper, better and more defensible review.

How has the use of [computer-assisted review] predictive coding changed the way you tackle a case? Does it let you deploy your resources in new ways?

I have significantly changed how I address a case as both technology and the law have advanced. Although there is a vast amount of data that might be discoverable in a particular case, less than 1 percent of that data is ever used in the case or truly advances its resolution. The resources I deploy focus on identifying that 1 percent, and avoiding the burden and expense largely wasted on the 99 percent. Part of this is done through developing, negotiating and obtaining reasonable and iterative eDiscovery protocols that focus on the critical data first. EDiscovery law has developed at a rapid pace and provides the tools to develop and defend these kinds of protocols. An important part of these protocols is the effective use of computer-assisted review.

Lately there has been a lot of attention given to the idea that computer-assisted review will replace attorneys in litigation. How much truth is there to that idea? How will computer-assisted review affect the role of attorneys?

Technology improves productivity, reducing the time required to accomplish a task. This is no less true of computer-assisted review. The 2006 amendments to the Federal Rules of Civil Procedure caused a massive increase in the number of attorneys devoted to the review of documents. As search technology and the review tools that employ them continue to improve, the demand for attorneys devoted to review will obviously decline.

But this is not a bad thing. Traditional linear document review is horrifically tedious and boring, and it is not the best use of legal education and experience. Fundamentally, litigators develop facts and apply the law to those facts to determine a client’s position to advise them to act accordingly. Computer-assisted review allows us to get at the most relevant facts more quickly, reducing both the scope and duration of litigation. This is what lawyers should be focused on accomplishing, and computer-assisted review can help them do so.

With the rise of computer-assisted review, do lawyers need to learn new skills? Do lawyers need to be computer scientists or statisticians to play a role?

Lawyers do not need to be computer scientists or statisticians, but they certainly need to have a good understanding of how information is created, how it is stored, and how to get at it. In fact, lawyers who do not have this understanding, whether alone or in conjunction with advisory staff, are simply not serving their clients competently.

You’ve suggested that lawyers involved in computer-assisted review enjoy the work more than in the traditional manual review process. Why do you think that is?

I think it is because the lawyers are using their legal expertise to pursue lines of investigation and develop the facts surrounding them, as opposed to simply playing a monotonous game of memory match. Our strategy of review is to use very talented lawyers to address a data set using technological and strategic means to get to the facts that matter. While doing so our lawyers uncover meaning within a huge volume of information and weave it into a story that resolves the matter. This is exciting and meaningful work that has had significant impact on our clients’ litigation budgets.

How is computer assisted review changing the competitive landscape? Does it provide an opportunity for small firms to compete that maybe didn’t exist a few years ago?

We live in the information age, and lawyers, especially litigators, fundamentally deal in information. In this age it is easier than ever to get to the facts that matter, because more facts (and more granular facts) exist within electronic information. The lawyer who knows how to get at the facts that matter is simply a more effective lawyer. The information age has fundamentally changed the competitive landscape. Small companies are able to achieve immense success through the skillful application of technology. The same is true of law firms. Smaller firms that consciously develop and nimbly utilize the technological advantages available to them have every opportunity to excel, perhaps even more so than larger, highly-leveraged firms. It is no longer about size and head-count, it’s about knowing how to get at the facts that matter, and winning cases by doing so.

Thanks, Bennett, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Jason R. Baron

December 14, 2011

This is the first of the Holiday Thought Leader Interview series. I interviewed several thought leaders to get their perspectives on various eDiscovery topics.

Today’s thought leader is Jason R. Baron. Jason has served as the National Archives' Director of Litigation since May 2000 and has been involved in high-profile cases for the federal government. His background in eDiscovery dates to the Reagan Administration, when he helped retain backup tapes containing Iran-Contra records from the National Security Council as the Justice Department’s lead counsel. Later, as director of litigation for the U.S. National Archives and Records Administration, Jason was assigned a request to review documents pertaining to tobacco litigation in U.S. v. Philip Morris.

He currently serves as The Sedona Conference Co-Chair of the Working Group on Electronic Document Retention and Production. Baron is also one of the founding coordinators of the TREC Legal Track, a search project organized through the National Institute of Standards and Technology to evaluate search protocols used in eDiscovery. This year, Jason was awarded the Emmett Leahy Award for Outstanding Contributions and Accomplishments in the Records and Information Management Profession.

You were recently awarded the prestigious Emmett Leahy Award for excellence in records management. Is it unusual that a lawyer wins such an award? Or is the job of the litigator and records manager becoming inextricably linked?

Yes, it was unusual: I am the first federal lawyer to win the Emmett Leahy award, and only the second lawyer to have done so in the 40-odd years that the award has been given out. But my career path in the federal government has been a bit unusual as well: I spent seven years working as lead counsel on the original White House PROFS email case (Armstrong v. EOP), followed by more than a decade worrying about records-related matters for the government as Director of Litigation at NARA. So with respect to records and information management, I long ago passed at least the Malcolm Gladwell test in "Outliers" where he says one needs to spend 10,000 hours working on anything to develop a level of "expertise." As to the second part of your question, I absolutely believe that to be a good litigation attorney these days one needs to know something about information management and eDiscovery — since all evidence is "born digital" and lots of it needs to be searched for electronically. As you know, I also have been a longtime advocate of a greater linking between the fields of information retrieval and eDiscovery.

In your acceptance speech you spoke about the dangers of information overload and the possibility that it will make it difficult for people to find important information. How optimistic that we can avoid this dystopian future? How can the legal profession help the world avoid this fate?

What I said was that in a world of greater and greater retention of electronically stored information, we need to leverage artificial intelligence and specifically better search algorithms to keep up in this particular information arms race. Although Ralph Losey teased me in a recent blog post that I was being unduly negative about future information dystopias, I actually am very optimistic about the future of search technology assisting in triaging the important from the ephemeral in vast collections of archives. We can achieve this through greater use of auto-categorization and search filtering methods, as well as a having a better ability in the future to conduct meaningful searches across the enterprise (whether in the cloud or not). Lawyers can certainly advise their clients how to practice good information governance to accomplish these aims.

You were one of the founders of the TREC Legal Track research project. What do you consider that project’s achievement at this point?

The initial idea for the TREC Legal Track was to get a better handle on evaluating various types of alternative search methods and technologies, to compare them against a "baseline" of how effective lawyers were in relying on more basic forms of keyword searching. The initial results were a wake-up call, in showing lawyers that sole reliance on simple keywords and Boolean strings sometimes results in a large quantity of relevant evidence going missing. But during the half-decade of research that now has gone into the track, something else of perhaps even greater importance has emerged from the results, namely: we have a much better understanding now of what a good search process looks like, which includes a human in the loop (known in the Legal Track as a topic authority) evaluating on an ongoing, iterative basis what automated search software kicks out by way of initial results. The biggest achievement however may simply be the continued existence of the TREC Legal Track itself, still going in its 6th year in 2011, and still producing important research results, on an open, non-proprietary platform, that are fully reproducible and that benefit both the legal profession as well as the information retrieval academic world. While I stepped away after 4 years from further active involvement in the Legal Track as a coordinator, I continue to be highly impressed with the work of the current track coordinators, led by Professor Doug Oard at the University of Maryland, who was remained at the helm since the very beginning.

To what extent has TREC’s research proven the reliability of computer-assisted review in litigation? Is there a danger that the profession assumes the reliability of computer-assisted review is a settled matter?

The TREC Legal Track results I am most familiar with through calendar year 2010 have shown computer-assisted review methods finding in some cases on the order of 85% of relevant documents (a .85 recall rate) per topic while only producing 10% false positives (a .90 precision rate). Not all search methods have had these results, and there has been in fact a wide variance in success achieved, but these returns are very promising when compared with historically lower rates of recall and precision across many information retrieval studies. So the success demonstrated to date is highly encouraging. Coupled with these results has been additional research reported by Maura Grossman & Gordon Cormack, in their much-cited paper Technology-Assisted Review in EDiscovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, which makes the case for the greater accuracy and efficiency of computer-assisted review methods.

Other research conducted outside of TREC, most notably by Herbert Roitblat, Patrick Oot and Anne Kershaw, also point in a similar direction (as reported in their article Mandating Reasonableness in a Reasonable Inquiry). All of these research efforts buttress the defensibility of technology-assisted review methods in actual litigation, in the event of future challenges. Having said this, I do agree that we are still in the early days of using many of the newer predictive types of automated search methods, and I would be concerned about courts simply taking on faith the results of past research as being applicable in all legal settings. There is no question however that the use of predictive analytics, clustering algorithms, and seed sets as part of technology-assisted review methods is saving law firms money and time in performing early case assessment and for multiple other purposes, as reported in a range of eDiscovery conferences and venues — and I of course support all of these good efforts.

You have discussed the need for industry standards in eDiscovery. What benefit would standards provide?

Ever since I served as Co-Editor in Chief on The Sedona Conference Commentary on Achieving Quality in eDiscovery (2009), I have been thinking that the process for conducting good eDiscovery. That paper focused on project management, sampling, and imposing various forms of quality controls on collection, review, and production. The question is, is a good eDiscovery process capable of being fit into a maturity model of sorts, and might be useful to consider whether vendors and law firms would benefit from having their in-house eDiscovery processes audited and certified as meeting some common baseline of quality? To this end, the DESI IV workshop ("Discovery of ESI") held in Pittsburgh last June, as part of the Thirteenth International AI and Law Conference (ICAIL 2011), had as its theme exploring what types of model standards could be imposed on the eDiscovery discipline, so that we all would be able to work from some common set of benchmarks, Some 75 people attended and 20-odd papers were presented. I believe the consensus in the room was that we should be pursuing further discussions as to what an ISO 9001-type quality standard would look like as applied to the specific eDiscovery sector, much as other industry verticals have their own ISO standards for quality. Since June, I have been in touch with some eDiscovery vendors have actually undergone an audit process to achieve ISO 9001 certification. This is an area where no consensus has yet emerged as to the path forward — but I will be pursuing further discussions with DESI workshop attendees in the coming months and promise to report back in this space as to what comes of these efforts.

What sort of standards would benefit the industry? Do we need standards for pieces of the eDiscovery process, like a defensible search standard, or are you talking about a broad quality assurance process?

DESI IV started by concentrating on what would constitute a defensible search standard; however, it became clear at the workshop and over the course of the past few months that we need to think bigger, in looking across the eDiscovery life cycle as to what constitutes best practices through automation and other means. We need to remember however that eDiscovery is a very young discipline, as we're only five years out from the 2006 Rules Amendments. I don't have all the answers, by any means, on what would constitute an acceptable set of standards, but I like to ask questions and believe in a process of continuous, lifelong learning. As I said, I promise I'll let you know about what success has been achieved in this space.

Thanks, Jason, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Best Practices: When is it OK to Produce without Linear Review?

December 13, 2011

At eDiscoveryDaily, the title of our daily post usually reflects some eDiscovery news and/or analysis that we are providing our readers. However, based on a comment I received from a colleague last week, I thought I would ask a thought provoking question for this post.

There was an interesting post in the EDD Update blog a few days ago entitled Ediscovery Production Without Review, written by Albert Barsocchini, Esq. The post noted that due to “[a]dvanced analytics, judicial acceptance of computer aided coding, claw back/quick-peek agreements, and aggressive use of Rule 16 hearings”, many attorneys are choosing to produce responsive ESI without spending time and money on a final linear review.

A colleague of mine sent me an email with a link to the post and stated, “I would not hire a firm if I knew they were producing without a doc by doc review.”

Really? What if:

You collected the equivalent of 10 million pages* and still had 1.2 million potentially responsive pages after early data assessment/first pass review? (reducing 88% of the population, which is a very high culling percentage in most cases)
And your review team could review 60 pages per hour, requiring 20,000 hours to complete the responsiveness review?
And their average rate was a very reasonable $75 per hour to review, resulting in a total cost of $1.5 million to perform a doc by doc review?
And you had a clawback agreement in place so that you could claw back any inadvertently produced privileged files?

“Would you insist on a doc by doc review then?”, I asked.

Let’s face it, $1.5 million is a lot of money. That may seem like an inordinate amount of money to spend on linear review and the data volume for some large cases may be so voluminous that an effective argument might be made to rely on technology to identify the files to produce.

On the other hand, if you’re a company like Google and you inadvertently produced a document in a case potentially worth billions of dollars, $1.5 million doesn’t seem near as big an amount to spend given the risk associated with potential mistakes. Also, as the Google case and this case illustrate, there are no guarantees with regards to the ability to claw back inadvertently produced files. The cost of linear review will, especially in larger cases, need to be weighed against the potential risk of not conducting that review for the organization to determine what’s the best approach for them.

So, what do you think? Do you produce in cases where not all of the responsive documents are reviewed before production? Are there criteria that you use to determine when to conduct or forego linear review? Please share any comments you might have or if you’d like to know more about a particular topic.

*I used pages in the example to provide a frame of reference to which most attorneys can relate. While 10 million pages may seem like a large collection, at an average of 50,000 pages per GB, that is only 200 total GB. Many laptops and desktops these days have a drive that big, if not larger. Depending on your review approach, most, if not all, original native files would probably never be converted to a standard paginated document format (i.e., TIFF or PDF). So, it is unlikely that the total page count of the collection would ever be truly known.

eDiscovery Best Practices: Production is the “Ringo” of the eDiscovery Phases

December 1, 2011

Since eDiscovery Daily debuted over 14 months ago, we’ve covered a lot of case law decisions related to eDiscovery. 65 posts related to case law to date, in fact. We’ve covered cases associated with sanctions related to failure to preserve data, issues associated with incomplete collections, inadequate searching methodologies, and inadvertent disclosures of privileged documents, among other things. We’ve noted that 80% of the costs associated with eDiscovery are in the Review phase and that volume of data and sources from which to retrieve it (including social media and “cloud” repositories) are growing exponentially. Most of the “press” associated with eDiscovery ranges from the “left side of the EDRM model” (i.e., Information Management, Identification, Preservation, Collection) through the stages to prepare materials for production (i.e., Processing, Review and Analysis).

All of those phases lead to one inevitable stage in eDiscovery: Production. Yet, few people talk about the actual production step. If Preservation, Collection and Review are the “John”, “Paul” and “George” of the eDiscovery process, Production is “Ringo”.

It’s the final crucial step in the process, and if it’s not handled correctly, all of the due diligence spent in the earlier phases could mean nothing. So, it’s important to plan for production up front and to apply a number of quality control (QC) checks to the actual production set to ensure that the production process goes as smooth as possible.

Planning for Production Up Front

When discussing the production requirements with opposing counsel, it’s important to ensure that those requirements make sense, not only from a legal standpoint, but a technical standpoint as well. Involve support and IT personnel in the process of deciding those parameters as they will be the people who have to meet them. Issues to be addressed include, but not limited to:

Format of production (e.g., paper, images or native files);
Organization of files (e.g., organized by custodian, legal issue, etc.);
Numbering scheme (e.g., Bates labels for images, sequential file names for native files);
Handling of confidential and privileged documents, including log requirements and stamps to be applied;
Handling of redactions;
Format and content of production log;
Production media (e.g., CD, DVD, portable hard drive, FTP, etc.).

I was involved in a case recently where opposing counsel was requesting an unusual production format where the names of the files would be the subject line of the emails being produced (for example, “Re: Completed Contract, dated 12/01/2011”). Two issues with that approach: 1) The proposed format only addressed emails, and 2) Windows file names don’t support certain characters, such as colons (:) or slashes (/). I provided that feedback to the attorneys so that they could address with opposing counsel and hopefully agree on a revised format that made more sense. So, let the tech folks confirm the feasibility of the production parameters.

The workflow throughout the eDiscovery process should also keep in mind the end goal of meeting the agreed upon production requirements. For example, if you’re producing native files with metadata, you may need to take appropriate steps to keep the metadata intact during the collection and review process so that the metadata is not inadvertently changed. For some file types, metadata is changed merely by opening the file, so it may be necessary to collect the files in a forensically sound manner and conduct review using copies of the files to keep the originals intact.

Tomorrow, we will talk about preparing the production set and performing QC checks to ensure that the ESI being produced to the requesting party is complete and accurate.

So, what do you think? Have you had issues with production planning in your cases? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Could This Be the Most Expensive eDiscovery Mistake Ever?

November 15, 2011

Many of you have Android phones. I do, as well. As you may know, Android is Google’s operating system for phones and Android phones have become extraordinarily popular.

However, as noted in this Computerworld UK article, it may be a failure in searching that ironically may cost Google big time in its litigation with Oracle over the Android operating system.

Google is currently involved in a lawsuit with Oracle over license fees associated with Java. Oracle acquired Java when it purchased Sun Microsystems and many companies license Java. Java forms a critical part of Google’s Android operating system and Google has leveraged free Android to drive mobile phone users to their ecosystem and extremely profitable searches and advertising. Android has been so successful for Google that a loss to Oracle could result in billions of dollars in damages.

To cull down a typically large ESI population, Google turned to search technology to help identify potentially responsive and potentially privileged files. Unfortunately for Google, a key email was produced that could prove damaging to their case. The email was written by Google engineer Tim Lindholm a few weeks before Oracle filed suit against Google. With Oracle having threatened to sue Google for billions of dollars, Lindholm was instructed by Google executives to identify alternatives to Java for use in Android, presumably to strengthen their negotiating position.

"What we've actually been asked to do (by Larry and Sergey) is to investigate what technical alternatives exist to Java for Android and Chrome," the email reads in part, referring to Google co-founders Larry Page and Sergey Brin. "We've been over a bunch of these, and think they all suck. We conclude that we need to negotiate a license for Java under the terms we need."

Lindholm added the words “Attorney Work Product” and sent the email to Andy Rubin (Google’s top Android executive) and Google in-house attorney Ben Lee. Unfortunately, Lindholm’s computer saved nine drafts of the email while he was writing it – before he added the words and addressed the email to Lee. Because Lee's name and the words "attorney work product" weren't on the earlier drafts, they weren't picked up by the eDiscovery software as privileged documents, and they were sent off to Oracle's lawyers.

Oracle's lawyers read from the email at two hearings over the summer and Judge William Alsup of the U.S. District Court in Oakland, California, indicated to Google's lawyers that it might suggest willful infringement of Oracle's patents. Google filed a motion to "clawback" the email on the grounds it was "unintentionally produced privileged material." Naturally, Oracle objected, and after a three-month legal battle, Alsup refused last month to exclude the document at trial.

How did Google let such a crucial email slip through production? It’s difficult to say without fully knowing their methodology. Did they rely too much on technology to identify files for production without providing a full manual review of all files being produced? Or, did manual review (which can be far from perfect) let the email slip through as well? Conceivably, organizing the documents into clusters, based on similar content, might have grouped the unsent drafts with the identified “attorney work product” final version and helped to ensure that the drafts were classified as intended.

So, what do you think? Could this mistake cost Google billions? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Rewind: Eleven for 11-11-11

November 11, 2011

Since today is one of only 12 days this century where the month, day and year are the same two-digit numbers (not to mention the biggest day for “craps” players to hit Las Vegas since July 7, 2007!), it seems an appropriate time to look back at some of our recent topics. So, in case you missed them, here are eleven of our recent posts that cover topics that hopefully make eDiscovery less of a “gamble” for you!

eDiscovery Best Practices: Testing Your Search Using Sampling: On April 1, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator. On April 4, we talked about how to make sure the sample set is randomly selected. In this post, we’ll walk through an example of how you can test and refine a search using sampling.

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think: Here’s a sample scenario: You identify custodians relevant to the case and collect files from each. Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! What happened?!?

eDiscovery Trends: Why Predictive Coding is a Hot Topic: Last month, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

eDiscovery Best Practices: Does Anybody Really Know What Time It Is?: Does anybody really know what time it is? Does anybody really care? OK, it’s an old song by Chicago (back then, they were known as the Chicago Transit Authority). But, the question of what time it really is has a significant effect on how eDiscovery is handled.

eDiscovery Best Practices: Message Thread Review Saves Costs and Improves Consistency: Insanity is doing the same thing over and over again and expecting a different result. But, in ESI review, it can be even worse when you get a different result. Most email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion.

eDiscovery Best Practices: When Collecting, Image is Not Always Everything: There was a commercial in the early 1990s for Canon cameras in which tennis player Andre Agassi uttered the quote that would haunt him for most of his early career – “Image is everything.” When it comes to eDiscovery preservation and collection, there are times when “Image is everything”, as in a forensic “image” of the media is necessary to preserve all potentially responsive ESI. However, forensic imaging of media is usually not necessary for Discovery purposes.

eDiscovery Trends: If You Use Auto-Delete, Know When to Turn It Off: Federal Rule of Civil Procedure 37(f), adopted in 2006, is known as the “safe harbor” rule. While it’s not always clear to what extent “safe harbor” protection extends, one case from a few years ago, Disability Rights Council of Greater Washington v. Washington Metrop. Trans. Auth., D.D.C. June 2007, seemed to indicate where it does NOT extend – auto-deletion of emails.

eDiscovery Best Practices: Checking for Malware is the First Step to eDiscovery Processing: A little over a month ago, I noted that we hadn’t missed a (business) day yet in publishing a post for the blog. That streak almost came to an end back in May. As I often do in the early mornings before getting ready for work, I spent some time searching for articles to read and identifying potential blog topics and found a link on a site related to “New Federal Rules”. Curious, I clicked on it and…up popped a pop-up window from our virus checking software (AVG Anti-Virus, or so I thought) that the site had found a file containing a “trojan horse” program. The odd thing about the pop-up window is that there was no “Fix” button to fix the trojan horse. So, I chose the best available option to move it to the vault. Then, all hell broke loose.

eDiscovery Trends: An Insufficient Password Will Thwart Even The Most Secure Site: Several months ago, we talked about how most litigators have come to accept that Software-as-a-Service (SaaS) systems are secure. However, according to a recent study by the Ponemon Institute, the chance of any business being hacked in the next 12 months is a “statistical certainty”. No matter how secure a system is, whether it’s local to your office or stored in the “cloud”, an insufficient password that can be easily guessed can allow hackers to get in and steal your data.

eDiscovery Trends: Social Media Lessons Learned Through Football: The NFL Football season began back in September with the kick-off game pitting the last two Super Bowl winners – the New Orleans Saints and the Green Bay Packers – against each other to start the season. An incident associated with my team – the Houston Texans – recently illustrated the issues associated with employees’ use of social media sites, which are being faced by every organization these days and can have eDiscovery impact as social media content has been ruled discoverable in many cases across the country.

eDiscovery Strategy: "Command" Model of eDiscovery Must Make Way for Collaboration: In her article "E-Discovery 'Command' Culture Must Collapse" (via Law Technology News), Monica Bay discusses the old “command” style of eDiscovery, with a senior partner leading his “troops” like General George Patton – a model that summit speakers agree is "doomed to failure" – and reports on the findings put forward by judges and litigators that the time has come for true collaboration.

So, what do you think? Did you learn something from one of these topics? If so, which one? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscoveryDaily would like to thank all veterans and the men and women serving in our armed forces for the sacrifices you make for our country. Thanks to all of you and your families and have a happy and safe Veterans Day!

eDiscovery Best Practices: Cluster Documents for More Effective Review

November 8, 2011

With document review estimated to up to 80% of the total cost of the eDiscovery process and the amount of data in the world growing at an exponential rate, it’s no wonder that many firms are turning to technology to make the review process more efficient. Whether using sophisticated searching capabilities of early case assessment (ECA) tools such as FirstPass®, powered by Venio FPR™ to filter collections more effectively or predictive coding techniques (as discussed in these two recent blog posts) to make the coding process more efficient, technology is playing an important role in saving review costs. And, of course, review tools that manage the review process make review more efficient (like OnDemand®), simply by delivering documents efficiently and tracking review progress.

How the documents are organized for review can also make a big difference in the efficiency of review, not only saving costs, but also improving accuracy by assigning similar documents to the same reviewer. This process of organizing documents with similar content into “clusters” (also known as “concepts”) helps each reviewer make quicker review decisions (if a single reviewer looks at one document to determine responsiveness and the next few documents are duplicates or mere variations of that first document, he or she can quickly “tag” most of those variations in the same manner or identify the duplicates). It also promotes consistency by enabling the same reviewer to review all similar documents in a cluster (for example, you don’t get one reviewer marking a document as privileged while another reviewer fails to mark a copy of the that same document as such, leading to inconsistencies and potential inadvertent disclosures). Reviewers are human and do make mistakes.

Clustering software such as Hot Neuron’s Clustify™ examines the text in your documents, determines which documents are related to each other, and groups them into clusters. Clustering organizes the documents according to the structure that arises naturally, without preconceptions or query terms. It labels each cluster with a set of keywords, providing a quick overview of the cluster. It also identifies a “representative document” that can be used as a proxy for the cluster.

Examples of types of documents that can be organized into clusters:

Email Message Threads: Each message in the thread contains the conversation up to that point, so the ability to group those messages into a cluster enables the reviewer to quickly identify the email(s) containing the entire conversation, categorize those and possibly dismiss the rest as duplicative (if so instructed).
Document Versions: As “drafts” of documents are prepared, the content of each draft is similar to the previous version, so a review decision made on one version could be quickly applied to the rest of the versions.
Routine Reports: Sometimes, periodic reports are generated that may or may not be responsive – grouping those reports together in a cluster can enable a single reviewer to make that determination and quickly apply it to all documents in the cluster.
Published Documents: Have you ever published a file to Adobe PDF format? Many of you have. What you end up with is an exact copy of the original file (from Word, Excel or other application) in content, but different in format – hence, these documents won’t be identified as “dupes” based on a HASH value. Clustering puts those documents together in a group so that the dupes can still be quickly identified and addressed.

Within the parameters of a review tool which manages the review process and delivers documents quickly and effectively for review, organizing documents into clusters can speed decision making during review, saving considerable time and review costs.

So, what do you think? Have you used software to organize documents into clusters or concepts for more effective review? Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: I work for CloudNine Discovery, which provides SaaS-based eDiscovery review applications FirstPass® (for early case assessment) and OnDemand® (for linear review and production). CloudNine Discovery has an alliance with Hot Neuron and uses Clustify™ software to provide conceptual clustering and near-duplicate identification services for its clients.

eDiscovery Trends: Why Predictive Coding is a Hot Topic

October 25, 2011

Yesterday, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

We all know the reasons why predictive coding is considered such a panacea, but it is easy to forget why it is needed and why the legal industry is still grappling with eDiscovery issues after so many years. Jason Baron, Director of Litigation at the U.S. National Archives and Records Administration, recently won the 2011 Emmett Leahy Award for excellence in records and information management. He took the opportunity to step back and consider why exactly the problem won’t go away. He believes that technology can help solve our problems, if applied intelligently. “We lawyers types remain stuck in a paradigm that too often relies on people and not automated technologies,” he said.

But he also warns that electronically stored data may soon overwhelm the profession. By now, readers of this blog are familiar with the dire and mind-boggling predictions about the volume of discoverable electronic data being created every day. Litigators are obviously concerned that new types of information and growing volumes of data will swamp the courts, but the problem could affect all aspects of modern life. “At the start of the second decade of the 21st century, we need to recognize that the time is now to prevent what I have termed the coming digital dark ages,” Baron said. “The ongoing and exponentially increasing explosion of information means that over the next several decades the world will be seeing records and information growth orders of magnitude greater than anything seen by humankind to date. We all need better ways to search through this information.”

As one of the leaders of the TREC Legal Track, a research experiment into searching large volumes of data more effectively, Baron has an intimate understanding of the challenges ahead, and he has serious concerns. “The paradox of our age is information overload followed by future inability to access anything of important. We cannot let that future happen” he said, talking to a roomful of records management experts and litigators. “We all need to be smarter in preventing this future dystopia.”

eDiscovery blogger Ralph Losey linked to both Judge Peck’s article and Jason’s speech, and expanded on those thoughts. Losey prefers to believe, as he wrote in a post called The Dawn of a Golden Age of Justice, that lawyers will not only survive, but thrive despite the explosion in information. “We must fight fire with fire by harnessing the new (Artificial Intelligence) capacities of computers,” he says. “If we boost our own intelligence and abilities with algorithmic agents we will be able to find the evidence we need in the trillions of documents implicated by even average disputes.”

So, what do you think? Will Artificial Intelligence in the hands of truth-seeking lawyers save us from information overload, or has the glut of electronic information already swamped the world? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: A Green Light for Predictive Coding?

October 24, 2011

There are a handful of judges whose pronouncements on anything eDiscovery-related are bound to get legal technologists talking. Judge Andrew Peck, United States magistrate judge for the Southern District of New York is one of them. His recent article, Search, Forward, published in Law Technology News, is one of the few judicial pronouncements on the use of predictive coding and has sparked a lively debate.

To date there is no reported case tackling the use of advanced computer-assisted search technology (“predictive coding” in the current vernacular) despite growing hype. Many litigators are hoping that judges will soon weigh in and give the profession some real guidance on the use of predictive coding in litigation. Peck says it will likely be a long time before a definitive statement come from the bench, but in the meantime his article provides perhaps the best insight into at least one judge’s thinking.

Judge Peck is probably best known in eDiscovery circles for the March 19, 2009 decision, William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009) (Peck, M.J.). In it, he called for "careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or 'keywords' to be used to produce emails or other electronically stored information".

Peck notes that lawyers are not eager to take the results of computer review before a judge and face possible rejection. However, he says those fears are misplaced, that admissibility is defined by content of a document, not how it was found. Peck also relies heavily on research we have discussed on this blog, including the TREC Legal Track, to argue that advanced search technology can provide defensible search methods.

While he stops short of green lighting the use of such technology, he does encourage lawyers in this direction. “Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval,” he writes. “In my opinion, computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ (Fed. R. Civ. P. 1) determination of cases in our e-discovery world.”

Silicon Valley consultant Mark Michels agrees with Peck’s article writing in Law Technology News that, “the key to (predictive coding’s) defensibility is upfront preparation to ensure that the applied tools and techniques are subject to thoughtful quality control during the review process.”

But other commenters are quick to point out the limitations of predictive coding. Ralph Losey expands on Peck’s argument, describing specific and defensible deployment of predictive coding (or Artificial Intelligence in Losey’s piece). He says predictive coding can speed up the process, but that the failure rate is still too high. Losey points out “the state of technology and law today still requires eyeballs on all ESI before it goes out the door and into the hands of the enemy,” he writes. “The negative consequences of disclosure of secrets, especially attorney-client privilege and work product privilege secrets, is simply too high.”

Judge Peck’s article is just one sign that thoughtful, technology-assisted review be deployed in litigation. Tomorrow, we will review some darker musings on the likelihood that predictive coding will save eDiscovery from the exploding universe of discoverable data.

So, what do you think? Is predictive coding ready for prime time? Can lawyers confidently take results from new search technology before a judge without fear of rejection? Please share any comments you might have or if you'd like to know more about a particular topic.

Review