Electronic Discovery Archives

eDiscovery Best Practices: Determining Appropriate Sample Size to Test Your Search

April 1, 2011

We’ve talked about searching best practices quite a bit on this blog. One part of searching best practices (as part of the “STARR” approach I described in an earlier post) is to test your search results (both the result set and the files not retrieved) to determine whether the search you performed is effective at maximizing both precision and recall to the extent possible, so that you retrieve as many responsive files as possible without having to review too many non-responsive files. One question I often get is: how many files do you need to review to test the search?

If you remember from statistics class in high school or college, statistical sampling is choosing a percentage of the results population at random for inspection to gather information about the population as a whole. This saves considerable time, effort and cost over reviewing every item in the results population and enables you to obtain a “confidence level” that the characteristics of the population reflect your sample. Statistical sampling is a method used for everything from exit polls to predict elections to marketing surveys to poll customers on brand popularity and is a generally accepted method of drawing conclusions for an overall results population. You can sample a small portion of a large set to obtain a 95% or 99% confidence level in your findings (with a margin of error, of course).

So, does that mean you have to find your old statistics book and dust off your calculator or (gasp!) slide rule? Thankfully, no.

There are several sites that provide sample size calculators to help you determine an appropriate sample size, including this one. You’ll simply need to identify a desired confidence level (typically 95% to 99%), an acceptable margin of error (typically 5% or less) and the population size.

So, if you perform a search that retrieves 100,000 files and you want a sample size that provides a 99% confidence level with a margin of error of 5%, you’ll need to review 660 of the retrieved files to achieve that level of confidence in your sample (only 383 files if a 95% confidence level will do). If 1,000,000 files were not retrieved, you would only need to review 664 of the not retrieved files to achieve that same level of confidence (99%, with a 5% margin of error) in your sample. As you can see, the sample size doesn’t need to increase much when the population gets really large and you can review a relatively small subset to understand your collection and defend your search methodology to the court.

On Monday, we will talk about how to randomly select the files to review for your sample. Same bat time, same bat channel!

So, what do you think? Do you use sampling to test your search results? Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: The Evaluation Process

March 31, 2011

Sometimes selecting a service provider for a project will be a quick, easy process. You may have a small project — similar to others you’ve handled — that you need to get up and running quickly. If you have a list of good vendors with which you’ve worked, it may be as easy as a phone call or two to check availability and you’ll be all set. In other cases, your selection process may be more involved. Perhaps you are looking to build a preferred vendor program or you’ve got a large case involving many stakeholders who are looking for a thorough evaluation. When a thorough evaluation is needed, here’s a suggested approach:

Make a list of candidates: Include vendors that have done a good job for you in the past. Ask peers in the industry for suggestions. In some cases, stakeholders may ask you to consider vendors with which they have a relationship.
Make initial calls: Call each vendor to get general information, to ensure they don’t have a conflict of interest, and to gauge their availability and interest in the project. Revise the list if necessary.
Send out Request for Proposal (RFP) / Request for Information (RFI): In the next posts in this series, we’ll talk about these documents, so stay tuned.
Review the responses. Check the responses for completeness. If there are holes, you can request missing information, or you might consider scratching a vendor from the list if there was blatant disregard for the requirements.
Follow-up: You’ll probably have questions about every proposal, and you’ll want to clarify some points with each vendor. And, there may be some points you’ll want to negotiate. Even if a proposal is clear and doesn’t require an explanation, it’s useful to verify your understanding of approach and pricing.
Rank each vendor: List each evaluation point by importance, and rank each vendor for each point. While this is an important step and a valuable tool, don’t let it replace good judgment. Sometimes your instincts may tell you something different than the rankings do, and that should not be ignored!
Check references for the vendors of most interest. Later in this series, we’ll talk about effectively checking references.
Make your selection (or your recommendation to the stakeholders).
Notify the vendor you’ve selected and agree to a contract.
Contact the other vendors and tell them they were not selected.

What has been your experience with evaluating and selecting service providers? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Case Law: Destroy Data, Pay $1 Million, Lose Case

March 30, 2011

A federal judge in Chicago has levied sanctions against Rosenthal Collins Group LLC and granted a default judgment to the defendant for misconduct in a patent infringement case, also ordering the Chicago-based futures broker’s counsel to pay “the costs and attorneys fees incurred in litigating this motion” where plaintiff’s agent modified metadata related to relevant source code and wiped several relevant disks and devices prior to their production and where the court found counsel participated in “presenting misleading, false information, materially altered evidence and willful non-compliance with the Court’s orders.”

In Rosenthal Collins Group, LLC v. Trading Techs. Int’l, No. 05 C 4088, 2011 WL 722467 (N.D. Ill. Feb. 23, 2011), U.S. District Judge Sharon Johnson Coleman assessed a sanction of $1 million to Rosenthal Collins (RCG) and granted defendant/counter-plaintiff Trading Technologies’ (TT) motion for evidentiary sanctions and default judgment. Much of the reason was due to the actions of RCG’s agent, Walter Buist. Here’s why:

During Buist’s deposition, he admitted to “turning back the clock” to change the “last modified” date on the previously modified source code to make it appear that the modifications had occurred much earlier. Despite clear evidence of these facts, RCG continued to deny them, even calling the claims “libelous,” “audacious,” and “Oliver Stone-esque.”
Buist also later admitted “wiping” six of seven zip disks that originally contained the relevant source code. While he did not admit wiping the seventh disk, it was also wiped, and the Court found that it was “impossible to believe that it is merely coincidence that the seventh disk happened to be wiped on May 2, 2006, which just happened to be the same day that TT was scheduled to inspect it.”
The Court found that here was evidence that “virtually every piece of media ordered produced by the Court in May 2007 and July 2008 was wiped, altered, or destroyed.”
Despite RCG’s (and its counsel’s) attempts to distance itself from “its own agent, employed for the purposes of pursuing this litigation” and disavowing any “actual knowledge” of wrongdoing, Buist was RCG’s agent and, therefore, RCG was bound by Buist’s behavior and actions.
Even if RCG and its counsel had no knowledge of the destruction of the evidence, the destruction might have been avoided if RCG had complied with the Court’s orders in a timely manner to produce the materials and/or preserved the evidence by taking custody of it.

So, what do you think? Should parties and their counsel be liable for the actions of an agent on their behalf? Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Other Evaluation Criteria

March 29, 2011

In the last posts in this blog series, we talked about evaluating service provider pricing, quality, scalability and flexibility. There are a few other things you may wish to look at as well, that may be especially significant for large, long-term projects or relationships. Those things are:

Litigation Experience: Select a service provider that has litigation experience versus general business experience. A non-litigation service provider that does scanning — for example — may be able to technically meet your requirements. They are probably not, however, accustomed to the inflexible schedules and changing priorities that are commonplace in litigation work.
Corporate Profile and Tenure: For a large project, be sure to select a service provider that’s been around for a while and has a proven track record. You want to be confident that the service provider that starts your project will be around to finish your project.
Security and Confidentiality: You want to ensure that your documents, data, and information are secure and kept confidential. This means that you require a secure physical facility, secure systems, and appropriate confidentiality guidelines and agreements.
SaaS Service Providers: For them, you need to evaluate the technology functionality and ensure that it includes the features you require, that those features are easy to access and to use, and that access, system reliability, system speed, and system security meet your requirements.
Facility Location and Accessibility: For many projects and many types of services, it won’t be necessary to spend time on the project site. For other projects, that might not be the case. For example, if a service provide is staffing a large document review project at its facility, the litigation team may need to spend time at the facility overseeing work and doing quality control reviews. In such a case, the geographic location and the facility’s access to airports and hotels may be a consideration.

A lot goes into selecting the right service provider for a project, and it’s worth the time and effort to do a careful, thorough evaluation. In the next posts in this series, we’ll discuss the vendor evaluation and selection process.

What has been your experience with evaluating and selecting service providers? What evaluation criteria have you found to be most important? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: Forbes on the Rise of Predictive Coding

March 28, 2011

First the New York Times with an article about eDiscovery, now Forbes. Who’s next, The Wall Street Journal? 😉

Forbes published a blog post entitled E-Discovery And the Rise of Predictive Coding a few days ago. Written by Ben Kerschberg, Founder of Consero Group LLC, it gets into some legal issues and considerations regarding predictive coding that are interesting. For some background on predictive coding, check out our December blog posts, here and here.

First, the author provides a very brief history of document review, starting with bankers boxes and WordPerfect and “[a]fter an interim phase best characterized by simple keyword searches and optical character recognition”, it evolved to predictive coding. OK, that’s like saying that Gone with the Wind started with various suitors courting Scarlett O’Hara and after an interim phase best characterized by the Civil War, marriage and heartache, Rhett says to Scarlett, “Frankly, my dear, I don’t give a damn.” A bit oversimplification of how review has evolved.

Nonetheless, the article gets into a couple of important legal issues raised by predictive coding. They are:

Satisfying Reasonable Search Requirements: Whether counsel can utilize the benefits of predictive coding and still meet legal obligations to conduct a reasonable search for responsive documents under the federal rules. The question is, what constitutes a reasonable search under Federal Rule 26(g)(1)(A), which requires that the responding attorney attest by signature that “with respect to a disclosure, it is complete and correct as of the time it is made”?
Protecting Privilege: Whether counsel can protect attorney-client privilege for their client when a privileged document is inadvertently disclosed. Fed. Rule of. Evidence 502 provides that a court may order that a privilege or protection is not waived by disclosure if the disclosure was inadvertent and the holder of the privilege took reasonable steps to prevent disclosure. Again, what’s reasonable?

The author concludes that the use of predictive coding is reasonable, because it a) makes document review more efficient by providing only those documents to the reviewer that have been selected by the algorithm; b) makes it more likely that responsive documents will be produced, saving time and resources; and c) refines relevant subsets for review, which can then be validated statistically.

So, what do you think? Does predictive coding enable attorneys to satisfy these legal issues? Is it reasonable? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Does Size Matter?

March 25, 2011

I admit it, with a title like “Does Size Matter?”, I’m looking for a few extra page views…. 😉

I frequently get asked how big does an ESI collection need to be to benefit from eDiscovery technology. In a recent case with one of my clients, the client had a fairly small collection – only about 4 GB. But, when a judge ruled that they had to start conducting depositions in a week, they needed to review that data in a weekend. Without FirstPass™, powered by Venio FPR™ to cull the data and OnDemand® to manage the linear review, they would not have been able to make that deadline. So, they clearly benefited from the use of eDiscovery technology in that case.

But, if you’re not facing a tight deadline, how large does your collection need to be for the use of eDiscovery technology to provide benefits?

I recently conducted a webinar regarding the benefits of First Pass Review – aka Early Case Assessment, or a more accurate term (as George Socha points out regularly), Early Data Assessment. One of the topics discussed in that webinar was the cost of review for each gigabyte (GB). Extrapolated from an analysis conducted by Anne Kershaw a few years ago (and published in the Gartner report E-Discovery: Project Planning and Budgeting 2008-2011), here is a breakdown:

Estimated Cost to Review All Documents in a GB:

Pages per GB: 75,000
Pages per Document: 4
Documents Per GB: 18,750
Review Rate: 50 documents per hour
Total Review Hours: 375
Reviewer Billing Rate: $50 per hour

Total Cost to Review Each GB: $18,750

Notes: The number of pages per GB can vary widely. Page per GB estimates tend to range from 50,000 to 100,000 pages per GB, so 75,000 pages (18,750 documents) seems an appropriate average. 50 documents reviewed per hour is considered to be a fast review rate and $50 per hour is considered to be a bargain price. eDiscovery Daily provided an earlier estimate of $16,650 per GB based on assumptions of 20,000 documents per GB and 60 documents reviewed per hour – the assumptions may change somewhat, but, either way, the cost for attorney review of each GB could be expected to range from at least $16,000 to $18,000, possibly more.

Advanced culling and searching capabilities of First Pass Review tools like FirstPass can enable you to cull out 70-80% of most collections as clearly non-responsive without having to conduct attorney review on those files. If you have merely a 2 GB collection and assume the lowest review cost above of $16,000 per GB, the use of a First Pass Review tool to cull out 70% of the collection can save $22,400 in attorney review costs. Is that worth it?

So, what do you think? Do you use eDiscovery technology for only the really large cases or ALL cases? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Is Disclosure of Search Terms Required?

March 24, 2011

I read a terrific article a couple of days ago from the New York Law Journal via Law Technology News entitled Search Terms Are More Than Mere Words, that had some interesting takes about the disclosure of search terms in eDiscovery. The article was written by David J. Kessler, Robert D. Owen, and Emily Johnston of Fulbright & Jaworski. The primary emphasis of the article was with regard to the forced disclosure of search terms by courts.

In the age of “meet and confer”, it has become much more common for parties to agree to exchange search terms in a case to limit costs and increase transparency. However, as the authors correctly note, search terms reflect counsel’s strategy for the case and, therefore, work product. Their position is that courts should not force disclosure of search terms and that disclosure of terms is “not appropriate under the Federal Rules of Civil Procedure”. The article provides a compelling argument as to why forced disclosure is not appropriate and provides some good case cites where courts have accepted or rejected requests to compel provision of search terms. I won’t try to recap them all here – check out the article for more information.

So, should disclosure of search terms be generally required? If not, what does that mean in terms of utilizing a defensible approach to searching?

Personally, I agree with the authors that forced disclosure of search terms is generally not appropriate, as it does reflect strategy and work product. However, there is an obligation for each party to preserve, collect, review and produce all relevant materials to the best of their ability (that are not privileged, of course). Searching is an integral part of that process. And, the article does note that “chosen terms may come under scrutiny if there is a defect in the production”, though “[m]ere speculation or unfounded accusations” should not lead to a requirement to disclose search terms.

With that said, the biggest component of most eDiscovery collections today is email, and that email often reflects discussions between parties in the case. In these cases, it’s much easier for opposing counsel to identify legitimate defects in the production because they have some of the same correspondence and documents and can often easily spot discrepancies in the production set. If they identify legitimate omissions from the production, those omissions could cause the court to call into question your search procedures. Therefore, it’s important to conduct a defensible approach to searching (such as the “STARR” approach I described in an earlier post) to be able to defend yourself if those questions arise. Demonstrating a defensible approach to searching will offer the best chance to preserve your rights to protect your work product of search terms that reflect your case strategy.

So, what do you think? Do you think that forced disclosure of search terms is appropriate? Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Capacity, Scalability, and Flexibility

March 23, 2011

In the last couple of blogs in this series, we talked about evaluating service-provider pricing and quality. The highest-quality, fairest-priced vendor is of no use to you, however, if they can’t get your work done by when you need it. And, unfortunately, it’s not as straightforward as telling them what you have, what you need, and by when you need it. Early in an ediscovery project, you are in a world of “unknowns”. You are working with assumptions and best guesses, and the only thing you know for sure is that things will change. The bottom line is, when you start talking to service providers, you probably won’t have good information.

One thing, however, most likely won’t change: your schedule. Regardless of how big the job gets, you still have production deadlines and interim milestones to meet. You, therefore, need a vendor that has the capacity to handle your work, that can scale up with the resources needed to deal with increased volume, and that can be flexible to adapt to changing needs and priorities. What’s important today, may take a backseat to something more important that arises tomorrow.

The best way to deal with this is open communication with the service provider in the evaluation process. Don’t limit your questions to computing power and capacity. That’s just part of the picture, and that’s the easy part. You want a service provider who will go the extra mile and work with you to get you what you need, when you need it. The technology doesn’t do that.

In your conversations with service providers, provide information on what you do know, what you are assuming, and what you are guessing. Ask how changes in the volume or requirements will impact their ability to meet your schedule. Ask about their ability to scale up. Ask about their procedures for changing priorities in processing a collection. Give them best and worst case scenarios and ask for commitments for either situation. Ask about after-hours resources and their ability and willingness to run multiple shifts if that’s needed. And ask for references — specifically for people who had last minute, dramatic changes to the scope of a project.

What has been your experience with service providers meeting your schedule requirements? Do you have good or bad experiences you can tell us about? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Best Practices: What is “Reduping?”

March 22, 2011

As emails are sent out to multiple custodians, deduplication (or “deduping”) has become a common practice to eliminate multiple copies of the same email or file from the review collection, saving considerable review costs and ensuring consistency by not having different reviewers apply different responsiveness or privilege determinations to the same file (e.g., one copy of a file designated as privileged while the other is not may cause a privileged file to slip into the production set). Deduping can be performed either across custodians in a case or within each custodian.

Everyone who works in electronic discovery knows what “deduping” is. But how many of you know what “reduping” is? Here’s the answer:

“Reduping” is the process of re-introducing duplicates back into the population for production after completing review. There are a couple of reasons why a producing party may want to “redupe” the collection after review:

Deduping Not Requested by Receiving Party: As opposing parties in many cases still don’t conduct a meet and confer or discuss specifications for production, they may not have discussed whether or not to include duplicates in the production set. In those cases, the producing party may choose to produce the duplicates, giving the receiving party more files to review and driving up their costs. The attitude of the producing party can be “hey, they didn’t specify, so we’ll give them more than they asked for.”
Receiving Party May Want to See Who Has Copies of Specific Files: Sometimes, the receiving party does request that “dupes” are identified, but only within custodians, not across them. In those cases, it’s because they want to see who had a copy of a specific email or file. However, the producing party still doesn’t want to review the duplicates (because of increasing costs and the possibility of inconsistent designations), so they review a deduped collection and then redupe after review is complete.

Many review applications support the capability for reduping. For example, FirstPass™, powered by Venio FPR™, suppresses the duplicates from review, but applies the same tags to the duplicates of any files tagged during first pass review. When it’s time to export the collection, to either move the potentially responsive files on to linear review (in a product like OnDemand®) or straight to production, the user can decide at that time whether or not to export the dupes. Those dupes have the same designations as the primary copies, ensuring consistency in handling them downstream.

So, what do you think? Does your review tool support “reduping”? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily Celebrates its “Sixmonthiversary”

March 21, 2011

Six months ago yesterday, eDiscovery Daily was launched. At the time of our launch, we pondered whether we were crazy to commit to a daily blog (albeit restricted to business days). But, I guess it’s a sign of how much the eDiscovery industry has grown in that there has not been a shortage of topics to address; instead, the challenge has been selecting which topics to address. And, so far, we haven’t missed a business day yet (knock on wood!).

Six months is 3.5 dog years, but I’m not sure what it is in blog years. Nonetheless, we’ve learned to crawl, are walking pretty well and are getting ready to run! We’ve more than doubled viewership since the first month, with our four biggest “hit count” days all coming in the last 5 weeks and have more than quadrupled our subscriber base during that time!

And, we have you to thank for our growth to date! We appreciate the interest you’ve shown in the topics and will do our best to continue to provide interesting and useful eDiscovery news and analysis. And, as always, please share any comments you might have or if you’d like to know more about a particular topic!

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, The Electronic Discovery Reading Room, Ride the Lightning, Litigation Support Blog.com, Adventures in Document Review, ABA Journal, ABC's of E-Discovery, Above the Law, EDD: Issues, Law, and Solutions, Law.com and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!). We really appreciate it!

For those of you who are relatively new to eDiscovery Daily, here are some posts back from the early days you may have missed. Enjoy!

eDiscovery Searching 101: Don’t Get “Wild” with Wildcards

eDiscovery Searching 101: It's a Mistake to Ignore the Mistakes

First Pass Review: Of Your Opponent’s Data

eDiscovery Project Management: Applying Project Management Techniques to Electronic Discovery

eDiscovery Case Study: Term List Searching for Deadline Emergencies!

SaaS and eDiscovery: Load Your Own Data

Electronic Discovery