Processing Archives

eDiscovery Case Law: Predictive Coding Considered by Judge in New York Case

February 16, 2012

In Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (S.D.N.Y. Feb. 8, 2012), Magistrate Judge Andrew J. Peck of the U.S. District Court for the Southern District of New York instructed the parties to submit proposals to adopt a protocol for e-discovery that includes the use of predictive coding, perhaps the first known case where a technology assisted review approach was considered by the court.

In this case, the plaintiff, Monique Da Silva Moore, filed a Title VII gender discrimination action against advertising conglomerate Publicis Groupe, on her behalf and the behalf of other women alleged to have suffered discriminatory job reassignments, demotions and terminations. Discovery proceeded to address whether Publicis Groupe:

Compensated female employees less than comparably situated males through salary, bonuses, or perks;
Precluded or delayed selection and promotion of females into higher level jobs held by male employees; and
Disproportionately terminated or reassigned female employees when the company was reorganized in 2008.

Consultants provided guidance to the plaintiffs and the court to develop a protocol to use iterative sample sets of 2,399 documents from a collection of 3 million documents to yield a 95 percent confidence level and a 2 percent margin of error (see our previous posts here, here and here on how to determine an appropriate sample size, randomly select files and conduct an iterative approach). In all, the parties expect to review between 15,000 to 20,000 files to create the “seed set” to be used to predictively code the remainder of the collection.

The parties were instructed to submit their draft protocols by February 16th, which is today(!). The February 8th hearing was attended by counsel and their respective ESI experts. It will be interesting to see what results from the draft protocols submitted and the opinion from Judge Peck that results.

So, what do you think? Should courts order the use of technology such as predictive coding in litigation? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: George Socha of Socha Consulting

February 15, 2012

This is the first of the 2012 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?
Which trend(s), if any, haven’t emerged to this point like you thought they would?
What are your general observations about LTNY this year and how it fits into emerging trends?
What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha. A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey; last year he and Tom Gelbmann converted the Survey into Apersee, an online system for selecting eDiscovery providers and their offerings. In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are nine active projects with over 300 members from 81 participating organizations. George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the emerging trends in eDiscovery that will have the greatest impact in 2012?

I may have said this last year too, but it holds true even more this year – if there's an emerging trend, it's the trend of people talking about the emerging trend. It started last year and this year every person in the industry seems to be delivering the emerging trend. Not to be too crass about it, but often the message is, "Buy our stuff", a message that is not especially helpful.

Regarding actual emerging trends, each year we all try to sum up legal tech in two or three words. The two words for this year can be “predictive coding.” Use whatever name you want, but that's what everyone seems to be hawking and talking about at LegalTech this year. This does not necessarily mean they really can deliver. It doesn't mean they know what “predictive coding” is. And it doesn't mean they've figured out what to do with “predictive coding.” Having said that, expanding the use of machine assisted review capabilities as part of the e-discovery process is a important step forward. It also has been a while coming. The earliest I can remember working with a client, doing what's now being called predictive coding, was in 2003. A key difference is that at that time they had to create their own tools. There wasn't really anything they could buy to help them with the process.

Which trend(s), if any, haven’t emerged to this point like you thought they would?

One thing I don't yet hear is discussion about using predictive coding capabilities as a tool to assist with determining what data to preserve in the first place. Right now the focus is almost exclusively on what do you do once you’ve “teed up” data for review, and then how to use predictive coding to try to help with the review process.

Think about taking the predictive coding capabilities and using them early on to make defensible decisions about what to and what not to preserve and collect. Then consider continuing to use those capabilities throughout the e-discovery process. Finally, look into using those capabilities to more effectively analyze the data you're seeing, not just to determine relevance or privilege, but also to help you figure out how to handle the matter and what to do on a substantive level.

What are your general observations about LTNY this year and how it fits into emerging trends?

Well, Legal Tech continues to have been taken over by electronic discovery. As a result, we tend to overlook whole worlds of technologies that can be used to support and enhance the practice of law. It is unfortunate that in our hyper-focus on e-discovery, we risk losing track of those other capabilities.

What are you working on that you’d like our readers to know about?

With regard to EDRM, we recently announced that we have hit key milestones in five projects. Our EDRM Enron Email Data Set has now officially become an Amazon public dataset, which I think will mean wider use of the materials.

We announced the publication of our Model Code of Conduct, which was five years in the making. We have four signatories so far, and are looking forward to seeing more organizations sign on.

We announced the publication of version 2.0 of our EDRM XML schema. It's a tightened-up schema, reorganized so that it should be a bit easier to use and more efficient in the operation.

With the Metrics project, we are beginning to add information to a database that we've developed to gather metrics, the objective being to be able to make available metrics with an empirical basis, rather than the types of numbers bandied about today, where no one seems to know how they were arrived at. Also, last year the Uniform Task Billing Management System (UTBMS) code set for litigation was updated. The codes to use for tracking e-discovery activities were expanded from a single code that covered not just e-discovery but other activities, to a number of codes based on the EDRM Metrics code set.

On the Information Governance Reference Model (IGRM) side, we recently published a joint white paper with ARMA. The paper cross-maps the EDRMs Information Governance Reference Model (IGRM) with ARMA's Generally Accepted Recordkeeping Principles (GARP). We look forward to more collaborative materials coming out of the two organizations.

As for Apersee, we continue to allow consumers search the data on the site for free, but we also are longer charging providers a fee for their information to be available. Instead, we now have two sponsors and some advertising on the site. This means that any provider can put information in, and everyone can search that information. The more data that goes in, the more useful the searching process comes because. All this fits our goal of creating a better way to match consumers with the providers who have the services, software, skills and expertise that the consumers actually need.

And on a consulting and testifying side, I continue to work a broad array of law firms; corporate and governmental consumers of e-discovery services and software; and providers offering those capabilities.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: “Assisted” is the Key Word for Technology Assisted Review

February 14, 2012

As noted in our blog post entitled 2012 Predictions – By The Numbers, almost all of the sets of eDiscovery predictions we reviewed (9 out of 10) predicted a greater emphasis on Technology Assisted Review (TAR) in the coming year. It was one of our predictions, as well. And, during all three days at LegalTech New York (LTNY) a couple of weeks ago, sessions were conducted that addressed technology assisted review concepts and best practices.

While some equate technology assisted review with predictive coding, other technology approaches such as conceptual clustering are also increasing in popularity. They qualify as TAR approaches, as well. However, for purposes of this blog post, we will focus on predictive coding.

Over a year ago, I attended a Virtual LegalTech session entitled Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and wrote a blog post from that entitled What the Heck is “Predictive Coding”? The speakers for the session were Jason R. Baron, Maura Grossman and Bennett Borden (Jason and Bennett are previous thought leader interviewees on this blog). The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

It’s very cool technology and capable of efficient and accurate review of the document collection, saving costs without sacrificing quality of review (in some cases, it yields even better results than traditional manual review). However, there is one key phrase in the definition above that can make or break the success of the predictive coding process: “based on human review of only a subset of the document collection”.

Key to the success of any review effort, whether linear or technology assisted, is knowledge of the subject matter. For linear review, knowledge of the subject matter usually results in preparation of high quality review instructions that (assuming the reviewers competently follow those instructions) result in a high quality review. In the case of predictive coding, use of subject matter experts (SMEs) to review a core subset of documents (typically known as a “seed set”) and make determinations regarding that subset is what enables the technology in predictive coding to “predict” the responsiveness and importance of the remaining documents in the collection. The more knowledgeable the SMEs are in creating the “seed set”, the more accurate the “predictions” will be.

And, as is the case with other processes such as document searching, sampling the results (by determining the appropriate sample size of responsive and non-responsive items, randomly selecting those samples and reviewing both groups – responsive and non-responsive – to test the results) will enable you to determine how effective the process was in predictively coding the document set. If sampling shows that the process yielded inadequate results, take what you’ve learned from the sample set review and apply it to create a more accurate “seed set” for re-categorizing the document collection. Sampling will enable you to defend the accuracy of the predictive coding process, while saving considerable review costs.

So, what do you think? Have you utilized predictive coding in any of your reviews? How did it work for you? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Preparing Your 30(b)(6) Witnesses

February 8, 2012

When it comes to questions and potential issues that the receiving party may have about the discovery process of the producing party, one of the most common and direct methods for conducting “discovery about the discovery” is a deposition under Federal Rule 30(b)(6). This rule enables a party to serve a deposition notice on the entity involved in the litigation rather than an individual. The notice identifies the topics to be covered in the deposition, and the entity being deposed must designate one or more people qualified to answer questions on the identified topics.

While those designated to testify may not necessarily have day-to-day responsibility related to the identified topics, they must be educated enough in those issues to sufficiently address them during the testimony. Serving a deposition notice on the entity under Federal Rule 30(b)(6) saves the deposing party from having to identify specific individual(s) to depose while still enabling the topics to be fully explored in a single deposition.

Topics to be covered in a 30(b)(6) deposition can vary widely, depending on the facts and circumstances of the case. However, there are some typical topics that the deponent(s) should be prepared to address.

Legal Hold Process: Perhaps the most common area of focus in a 30(b)(6) deposition is the legal hold process as spoliation of data can occur when the legal hold process is unsound and data spoliation is the most common cause of sanctions resulting from the eDiscovery process. Issues to address include:

General description of the legal hold process including all details of that policy and specific steps that were taken in this case to effectuate a hold.
Timing of issuing the legal hold and to whom it was issued.
Substance of the legal hold communication (if the communication is not considered privileged).
Process for selecting sources for legal hold, identification of sources that were eliminated from legal hold, and a description of the rationale behind those decisions.
Tracking and follow-up with the legal hold sources to ensure understanding and compliance with the hold process.
Whether there are any processes in place in the company to automatically delete data and, if so, what steps were taken to disable them and when were those steps taken?

Collection Process: Logically, the next eDiscovery step discussed in the 30(b)(6) deposition is the process for collecting preserved data:

Method of collecting ESI for review, including whether the method preserved all relevant metadata intact.
Chain of custody tracking from origination to destination.

Searching and Culling: Once the ESI is collected, the methods for conducting searches and culling the collection down for review must be discussed:

Method used to cull the ESI prior to review, including the tools used, the search criteria for inclusion in review and how the search criteria was developed (including potential use of subject matter experts to flush out search terms).
Process for testing and refining search terms used.

Review Process: The 30(b)(6) witness(es) should be prepared to fully describe the review process, including:

Methods to conduct review of the ESI including review application(s) used and workflow associated with the review process.
Use of technology to assist with the review, such as clustering, predictive coding, duplicate and near-duplicate identification.
To the extent the process can be described, methodology for identifying and documenting privileged ESI on the privilege log (this methodology may be important if the producing party may request to “claw back” any inadvertently produced privileged ESI).
Personnel employed to conduct ESI review, including their qualifications, experience, and training.

Production Process: Information regarding the production process, including:

Methodology for organizing and verifying the production, including confirmation of file counts and spot QC checks of produced files for content.
The total volume of ESI collected, reviewed, and produced.

Depending on the specifics of the case and discovery efforts, there may be further topics to be addressed to ensure that the producing party has met its preservation and discovery obligations.

So, what do you think? Have you had to prepare 30(b)(6) witnesses for deposition? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Joshua Poje

December 19, 2011

This is the fourth of our Holiday Thought Leader Interview series. I interviewed several thought leaders to get their perspectives on various eDiscovery topics.

Today’s thought leader is Joshua Poje. Joshua is a Research Specialist with the American Bar Association’s Legal Technology Resource Center, which publishes the Annual Legal Technology Survey. He is a graduate of DePaul University College of Law and Augustana College.

Why does the American Bar Association produce an annual legal technology survey? Why does legal technology demand special attention?

Technology is inescapable for lawyers today. It's integrated into most aspects of the profession, whether that's communicating with clients, interacting with the courts, or marketing a practice. At this point, if you want to understand how lawyers are practicing, you really have to understand how they're using technology.

That's what we're trying to measure with our survey and that's also the reason we direct our survey questionnaires to practicing attorneys rather than to IT staff or vendors. We aren't just interested in learning what tools are on the market or what technology firms are purchasing; we want to know what they're actually using.

How long have you been involved with the ABA Legal Technology Survey, and how has it changed in that time?

The 2011 ABA Legal Technology Survey Report is the fifth edition I've worked on personally, but the survey has been running in various forms for more than 15 years. Aside from moving to electronic publishing via PDF in 2008, the biggest change we've made in the time I've been here was adding a sixth volume–Technology Basics. That volume allowed us to take a deeper dive into basic questions about budgeting, training, and security.

Aside from that, most of the changes in the survey are evolutionary. We sit down every Fall and evaluate the questionnaire, sometimes adding a few questions about new technology and sometimes dropping questions about technology that's fallen out of use. We try to maintain a high level of consistency from year-to-year so that we can take a meaningful look at trends.

Lawyers have a reputation for being late adopters of technology and even technophobic in many respects. Is this an accurate assessment? Has that changed, or is there still an element of truth to the stereotype?

Lawyers are in a difficult position when it comes to new technology. Normal businesses and organizations have to deal with issues like cost, training, and implementation obstacles when they adopt new technology, and the biggest risk is usually just losing money. Lawyers share those challenges and risks, but also have to consider their obligations under their states' rules of professional conduct. A misstep under the rules can have serious and long-lasting professional consequences. So I think it's understandable that some lawyers take a cautious approach to new technology.

That said, lawyers have certainly become more comfortable with new technology over the last few years. Take Twitter, for example. A recent Pew study found that 13 percent of online adults use Twitter. That's right in line with our 2011 survey, where 14 percent of our lawyer respondents reported using Twitter for personal, non-professional purposes. Around 6 percent even use it for professional activities.

In some cases, lawyers actually seem to be leading on technology. A Nielsen study from May 2011 found that just 5 percent of US consumers own a tablet device like the iPad. In our survey, 20 percent of our respondents reported having tablets available at their firms with 12 percent reporting that they personally use the devices.

There seems to be a new trend or buzzword ever few years that dominates the legal technology conversation. At one point it was all about knowledge management and now it seems to be cloud computing, and then whatever comes next. Do you get the sense legal technologists are prone to getting taken in by hype? Or are they generally practical consumers of technology?

The endless hype cycle is just a reality of the technology sector, legal or otherwise. I think our challenge as legal technology professionals is to navigate the hype to identify the useful, practical tools and strategies that lawyers and other legal professionals can put to good use. We also have to be on alert for the technology that might be problematic for lawyers, given the rules of professional conduct.

There are certainly times when the technology we love doesn't catch on with practicing attorneys. Technology experts have been pushing RSS for years, and yet in 2011 we still had 64 percent of our respondents report that they never use it. But on the other hand, "paperless" was the hot buzzword five or six years ago, and now it's a standard strategy at many law firms of all sizes.

Have the demands of eDiscovery forced the profession to come to grips with their own technology use? Are lawyers more savvy about managing their data?

EDiscovery has certainly been influential for some attorneys, but it's worth noting that 42 percent of our respondents in 2011 reported that they never receive eDiscovery requests on behalf of their clients, and 49 percent reported that they never make eDiscovery requests. Those numbers have barely moved over the last few years.

As you might expect, electronically stored information (ESI) has generally been a bigger concern at the large law firms. In 2011, 77 percent of respondents at firms with 500+ attorneys reported that their firm had been involved in a case requiring processing/review of ESI, compared to just 19 percent of solo practitioners. Those large firms, however, outsource a significant amount of their eDiscovery processing. In 2011, 62 percent reported outsourcing eDiscovery processing to eDiscovery consultants, 50 percent outsourced to computer forensics specialists, and 35 percent outsourced to other lawyers in the U.S.

What trends and technologies are you most interested in following in the next survey?

Cloud computing is definitely a topic to keep an eye on. In 2011, 76 percent of our respondents reported that they had never used a cloud-based tool for legal tasks. Of those, 63 percent cited unfamiliarity with the technology as a reason. A lot of attention has been focused on the cloud this year, though, particularly after Apple's iCloud announcement. It'll be interesting to see how those numbers move in 2012.

Mobile technology should be another interesting area. BlackBerry held onto the overall lead for smartphones in 2011, but iOS and Android made substantial gains. Among our solo and small firm respondents, the iPhone actually led the BlackBerry. Will that carry over to the larger firms in 2012? And on the tablet front, it should be interesting to see how the market shifts. In 2011, 96 percent of the respondents who reported having a tablet available specified the iPad. Apple now has competition from Motorola, Samsung, RIM, HP and others, so it's possible we could see movement in the numbers.

Thanks, Joshua, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Case Law: Another Losing Plaintiff Taxed for eDiscovery Costs

December 9, 2011

As noted yesterday and back in May, prevailing defendants are becoming increasingly successful in obtaining awards against plaintiffs for reimbursement of eDiscovery costs.

An award of costs to the successful defendants in a patent infringement action included $64,295 in costs for conversion of data to TIFF format and $5,950 for an eDiscovery project manager in Jardin v. DATAllegro, Inc., No. 08-CV-1462-IEG (WVG), (S.D. Cal. Oct. 12, 2011).

Defendants in a patent infringement action obtained summary judgment of non-infringement and submitted bills of costs that included $64,295 in costs for conversion of data to TIFF format and $5,950 for an eDiscovery project manager. Plaintiff contended that the costs should be denied because he had litigated the action and its difficult issues in good faith and there was a significant economic disparity between him and the corporate parent of one of the defendants.

The court concluded that plaintiff had failed to rebut the presumption in Fed. R. Civ. P. 54 in favor of awarding costs. The action was resolved through summary judgment rather than a complicated trial, and there was no case law suggesting that the assets of a parent corporation should be considered in assessing costs. The financial position of the party having to pay the costs might be relevant, but it appeared plaintiff was the founder of a company that had been sold for $500 million.

Taxing of costs for converting files to TIFF format was appropriate, according to the court, because the Federal Rules required production of electronically stored information and “a categorical rule prohibiting costs for converting data into an accessible, readable, and searchable format would ignore the practical realities of discovery in modern litigation.” The court stated: “Therefore, where the circumstances of a particular case necessitate converting e-data from various native formats to the .TIFF or another format accessible to all parties, costs stemming from the process of that conversion are taxable exemplification costs under 28 U.S.C. § 1920(4).”

The court also rejected plaintiff’s argument that costs associated with an eDiscovery “project manager” were not taxable because they related to the intellectual effort involved in document production:

Here, the project manager did not review documents or contribute to any strategic decision-making; he oversaw the process of converting data to the .TIFF format to prevent inconsistent or duplicative processing. Because the project manager’s duties were limited to the physical production of data, the related costs are recoverable.

So, what do you think? Will more prevailing defendants seek to recover eDiscovery costs from plaintiffs? Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required). For eDiscovery news and best practices, check out the Applied Discovery Blog here.

eDiscovery Best Practices: Production is the “Ringo” of the eDiscovery Phases

December 1, 2011

Since eDiscovery Daily debuted over 14 months ago, we’ve covered a lot of case law decisions related to eDiscovery. 65 posts related to case law to date, in fact. We’ve covered cases associated with sanctions related to failure to preserve data, issues associated with incomplete collections, inadequate searching methodologies, and inadvertent disclosures of privileged documents, among other things. We’ve noted that 80% of the costs associated with eDiscovery are in the Review phase and that volume of data and sources from which to retrieve it (including social media and “cloud” repositories) are growing exponentially. Most of the “press” associated with eDiscovery ranges from the “left side of the EDRM model” (i.e., Information Management, Identification, Preservation, Collection) through the stages to prepare materials for production (i.e., Processing, Review and Analysis).

All of those phases lead to one inevitable stage in eDiscovery: Production. Yet, few people talk about the actual production step. If Preservation, Collection and Review are the “John”, “Paul” and “George” of the eDiscovery process, Production is “Ringo”.

It’s the final crucial step in the process, and if it’s not handled correctly, all of the due diligence spent in the earlier phases could mean nothing. So, it’s important to plan for production up front and to apply a number of quality control (QC) checks to the actual production set to ensure that the production process goes as smooth as possible.

Planning for Production Up Front

When discussing the production requirements with opposing counsel, it’s important to ensure that those requirements make sense, not only from a legal standpoint, but a technical standpoint as well. Involve support and IT personnel in the process of deciding those parameters as they will be the people who have to meet them. Issues to be addressed include, but not limited to:

Format of production (e.g., paper, images or native files);
Organization of files (e.g., organized by custodian, legal issue, etc.);
Numbering scheme (e.g., Bates labels for images, sequential file names for native files);
Handling of confidential and privileged documents, including log requirements and stamps to be applied;
Handling of redactions;
Format and content of production log;
Production media (e.g., CD, DVD, portable hard drive, FTP, etc.).

I was involved in a case recently where opposing counsel was requesting an unusual production format where the names of the files would be the subject line of the emails being produced (for example, “Re: Completed Contract, dated 12/01/2011”). Two issues with that approach: 1) The proposed format only addressed emails, and 2) Windows file names don’t support certain characters, such as colons (:) or slashes (/). I provided that feedback to the attorneys so that they could address with opposing counsel and hopefully agree on a revised format that made more sense. So, let the tech folks confirm the feasibility of the production parameters.

The workflow throughout the eDiscovery process should also keep in mind the end goal of meeting the agreed upon production requirements. For example, if you’re producing native files with metadata, you may need to take appropriate steps to keep the metadata intact during the collection and review process so that the metadata is not inadvertently changed. For some file types, metadata is changed merely by opening the file, so it may be necessary to collect the files in a forensically sound manner and conduct review using copies of the files to keep the originals intact.

Tomorrow, we will talk about preparing the production set and performing QC checks to ensure that the ESI being produced to the requesting party is complete and accurate.

So, what do you think? Have you had issues with production planning in your cases? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Rewind: Eleven for 11-11-11

November 11, 2011

Since today is one of only 12 days this century where the month, day and year are the same two-digit numbers (not to mention the biggest day for “craps” players to hit Las Vegas since July 7, 2007!), it seems an appropriate time to look back at some of our recent topics. So, in case you missed them, here are eleven of our recent posts that cover topics that hopefully make eDiscovery less of a “gamble” for you!

eDiscovery Best Practices: Testing Your Search Using Sampling: On April 1, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator. On April 4, we talked about how to make sure the sample set is randomly selected. In this post, we’ll walk through an example of how you can test and refine a search using sampling.

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think: Here’s a sample scenario: You identify custodians relevant to the case and collect files from each. Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! What happened?!?

eDiscovery Trends: Why Predictive Coding is a Hot Topic: Last month, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

eDiscovery Best Practices: Does Anybody Really Know What Time It Is?: Does anybody really know what time it is? Does anybody really care? OK, it’s an old song by Chicago (back then, they were known as the Chicago Transit Authority). But, the question of what time it really is has a significant effect on how eDiscovery is handled.

eDiscovery Best Practices: Message Thread Review Saves Costs and Improves Consistency: Insanity is doing the same thing over and over again and expecting a different result. But, in ESI review, it can be even worse when you get a different result. Most email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion.

eDiscovery Best Practices: When Collecting, Image is Not Always Everything: There was a commercial in the early 1990s for Canon cameras in which tennis player Andre Agassi uttered the quote that would haunt him for most of his early career – “Image is everything.” When it comes to eDiscovery preservation and collection, there are times when “Image is everything”, as in a forensic “image” of the media is necessary to preserve all potentially responsive ESI. However, forensic imaging of media is usually not necessary for Discovery purposes.

eDiscovery Trends: If You Use Auto-Delete, Know When to Turn It Off: Federal Rule of Civil Procedure 37(f), adopted in 2006, is known as the “safe harbor” rule. While it’s not always clear to what extent “safe harbor” protection extends, one case from a few years ago, Disability Rights Council of Greater Washington v. Washington Metrop. Trans. Auth., D.D.C. June 2007, seemed to indicate where it does NOT extend – auto-deletion of emails.

eDiscovery Best Practices: Checking for Malware is the First Step to eDiscovery Processing: A little over a month ago, I noted that we hadn’t missed a (business) day yet in publishing a post for the blog. That streak almost came to an end back in May. As I often do in the early mornings before getting ready for work, I spent some time searching for articles to read and identifying potential blog topics and found a link on a site related to “New Federal Rules”. Curious, I clicked on it and…up popped a pop-up window from our virus checking software (AVG Anti-Virus, or so I thought) that the site had found a file containing a “trojan horse” program. The odd thing about the pop-up window is that there was no “Fix” button to fix the trojan horse. So, I chose the best available option to move it to the vault. Then, all hell broke loose.

eDiscovery Trends: An Insufficient Password Will Thwart Even The Most Secure Site: Several months ago, we talked about how most litigators have come to accept that Software-as-a-Service (SaaS) systems are secure. However, according to a recent study by the Ponemon Institute, the chance of any business being hacked in the next 12 months is a “statistical certainty”. No matter how secure a system is, whether it’s local to your office or stored in the “cloud”, an insufficient password that can be easily guessed can allow hackers to get in and steal your data.

eDiscovery Trends: Social Media Lessons Learned Through Football: The NFL Football season began back in September with the kick-off game pitting the last two Super Bowl winners – the New Orleans Saints and the Green Bay Packers – against each other to start the season. An incident associated with my team – the Houston Texans – recently illustrated the issues associated with employees’ use of social media sites, which are being faced by every organization these days and can have eDiscovery impact as social media content has been ruled discoverable in many cases across the country.

eDiscovery Strategy: "Command" Model of eDiscovery Must Make Way for Collaboration: In her article "E-Discovery 'Command' Culture Must Collapse" (via Law Technology News), Monica Bay discusses the old “command” style of eDiscovery, with a senior partner leading his “troops” like General George Patton – a model that summit speakers agree is "doomed to failure" – and reports on the findings put forward by judges and litigators that the time has come for true collaboration.

So, what do you think? Did you learn something from one of these topics? If so, which one? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscoveryDaily would like to thank all veterans and the men and women serving in our armed forces for the sacrifices you make for our country. Thanks to all of you and your families and have a happy and safe Veterans Day!

eDiscovery Trends: When you DE-NIST, A Lot May Be Missed

September 7, 2011

eDiscovery Daily has referenced several articles in the past by Craig Ball, including this one and this one, and also conducted a thought leader interview with him at LegalTech New York earlier this year. Craig regularly has great observations about eDiscovery trends that are not talked about in other forums, so I try to “keep tabs” on his articles and provide some of those useful insights to this blog.

Last week on his blog, “Ball in your court”, Craig discussed shortcomings associated with “DE-NISTing”, which is the process of removing files from review that are standard components of the computer’s operating system and off-the-shelf software applications such as Microsoft Office applications. There’s no need to review these files as they are considered system files and would not generally contain work product of the user. These files are identified by their known HASH values that uniquely identify their content and matched against a list maintained by the National Software Reference Library, a branch of the National Institute for Standards and Technology (NIST – hence the term “DE-NISTing” to reference removing these files from the review set).

While the NIST list is updated four times per year, Craig was noting that a number of these system files were not being removed during the “DE-NISTing” process on workstations using Windows 7 and the latest release of Microsoft Office. So, Craig ran a test by performing a “pristine install” of Windows 7 on a “sterile” hard drive, which consisted of 47,690 files. Of those, only 7,277 were removed during “DE-NISTing”, meaning that 85% of the files were not removed during this process and could be left in the review set if not removed via any other means.

Why were so many files missed? Evidently, the NIST list does not yet include Windows 7 files, despite the fact that there are more than 350 million workstations that run Windows 7. It also doesn’t include Microsoft Office 2010 files yet either. So, the NIST list is not as up to date as it could be.

As a result, several service providers supplement the NIST list with other files, but as Craig notes, it’s important to be able to trace and defend the supplemented list if required and not try to pass it off as the official NIST list (which Craig likens to selling a “Prada knockoff”).

Supplementing the NIST list by removing system files such as EXE and DLL files is a clearly documentable method to reduce the number of files in the review set. This method doesn’t depend on HASH values and, assuming that these file types are not responsive (which is usually the case) can be an effective method for eliminating files to review.

So, what do you think? Do you depend on the NIST list to remove files from review sets? Do you use any supplemental methods for further reducing these sets? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: Cloud Covered by Ball

August 8, 2011

What is the cloud, why is it becoming so popular and why is it important to eDiscovery? These are the questions being addressed—and very ably answered—in the recent article Cloud Cover (via Law Technology News) by computer forensics and eDiscovery expert Craig Ball, a previous thought leader interviewee on this blog.

Ball believes that the fears about cloud data security are easily dismissed when considering that “neither local storage nor on-premises data centers have proved immune to failure and breach”. And as far as the cloud's importance to the law and to eDiscovery, he says, "the cloud is re-inventing electronic data discovery in marvelous new ways while most lawyers are still grappling with the old."

What kinds of marvelous new ways, and what do they mean for the future of eDiscovery?

What is the Cloud?

First we have to understand just what the cloud is. The cloud is more than just the Internet, although it's that, too. In fact, what we call "the cloud" is made up of three on-demand services:

Software as a Service (SaaS) covers web-based software that performs tasks you once carried out on your computer's own hard drive, without requiring you to perform your own backups or updates. If you check your email virtually on Hotmail or Gmail or run a Google calendar, you're using SaaS.
Platform as a Service (PaaS) happens when companies or individuals rent virtual machines (VMs) to test software applications or to run processes that take up too much hard drive space to run on real machines.
Infrastructure as a Service (IaaS) encompasses the use and configuration of virtual machines or hard drive space in whatever manner you need to store, sort, or operate your electronic information.

These three models combine to make up the cloud, a virtual space where electronic storage and processing is faster, easier and more affordable.

How the Cloud Will Change eDiscovery

One reason that processing is faster is through distributed processing, which Ball calls “going wide”. Here’s his analogy:

“Remember that scene in The Matrix where Neo and Trinity arm themselves from gun racks that appear out of nowhere? That's what it's like to go wide in the cloud. Cloud computing makes it possible to conjure up hundreds of virtual machines and make short work of complex computing tasks. Need a supercomputer-like array of VMs for a day? No problem. When the grunt work's done, those VMs pop like soap bubbles, and usage fees cease. There's no capital expenditure, no amortization, no idle capacity. Want to try the latest concept search tool? There's nothing to buy! Just throw the tool up on a VM and point it at the data.”

Because the cloud is entirely virtual, operating on servers whose locations are unknown and mostly irrelevant, it throws the rules for eDiscovery right out the metaphorical window.

Ball also believes that everything changes once discoverable information goes into the cloud. "Bringing ESI beneath one big tent narrows the gap between retention policy and practice and fosters compatible forms of ESI across web-enabled applications".

"Moving ESI to the cloud," Ball adds, "also spells an end to computer forensics." Where there are no hard drives, there can be no artifacts of deleted information—so, deleted really means deleted.

What's more, “[c]loud computing makes collection unnecessary”. Where discovery requires that information be collected to guarantee its preservation, putting a hold on ESI located in the cloud will safely keep any users from destroying it. And because cloud computing allows for faster processing than can be accomplished on a regular hard drive, the search for discovery documents will move to where they're located, in the cloud. Not only will this approach be easier, it will also save money.

Ball concludes his analysis with the statement, "That e-discovery will live primarily in the cloud isn't a question of whether but when."

So, what do you think? Is cloud computing the future of eDiscovery? Is that future already here? Please share any comments you might have or if you'd like to know more about a particular topic.

Processing