Analysis

eDiscovery Best Practices: A “Random” Idea on Search Sampling

 

Friday, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator.  Today, we’ll talk about how to make sure the sample size is randomly selected.

A randomly selected sample gives each file an equal chance of being reviewed and eliminates the chance of bias being introduced into the sample which might skew the results.  Merely selecting the first or last x number of items (or any other group) in the set may not reflect the population as a whole – for example, all of those items could come from a single custodian.  To ensure a fair, defensible sample, it needs to be selected randomly.

So, how do you select the numbers randomly?  Once again, the Internet helps us out here.

One site, Random.org, has a random integer generator which will randomly generate whole numbers.  You simply need to supply the number of random integers that you need to be generated, the starting number and ending number of the range within which the randomly generated numbers should fall.  The site will then generate a list of numbers that you can copy and paste into a text file or even a spreadsheet.  The site also provides an Advanced mode, which provides options for the numbers (e.g., decimal, hexadecimal), output format and how the randomization is ‘seeded’ (to generate the numbers).

In the example from Friday, you would provide 660 as the number of random integers to be generated, with a starting number of 1 and an ending number of 100,000 to get a list of random numbers for testing your search that yielded 100,000 files with hits (664, 1 and 1,000,000 respectively to get a list of numbers to test the non-hits).  You could paste the numbers into a spreadsheet, sort them and then retrieve the files by position in the result set based on the random numbers retrieved and review each of them to determine whether they reflect the intent of the search.  You’ll then have a good sense of how effective your search was, based on the random sample.  And, probably more importantly, using that random sample to test your search results will be a highly defensible method to verify your approach in court.

Tomorrow, we'll walk through a sample iteration to show how the sampling will ultimately help us refine our search.

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Determining Appropriate Sample Size to Test Your Search

 

We’ve talked about searching best practices quite a bit on this blog.  One part of searching best practices (as part of the “STARR” approach I described in an earlier post) is to test your search results (both the result set and the files not retrieved) to determine whether the search you performed is effective at maximizing both precision and recall to the extent possible, so that you retrieve as many responsive files as possible without having to review too many non-responsive files.  One question I often get is: how many files do you need to review to test the search?

If you remember from statistics class in high school or college, statistical sampling is choosing a percentage of the results population at random for inspection to gather information about the population as a whole.  This saves considerable time, effort and cost over reviewing every item in the results population and enables you to obtain a “confidence level” that the characteristics of the population reflect your sample.  Statistical sampling is a method used for everything from exit polls to predict elections to marketing surveys to poll customers on brand popularity and is a generally accepted method of drawing conclusions for an overall results population.  You can sample a small portion of a large set to obtain a 95% or 99% confidence level in your findings (with a margin of error, of course).

So, does that mean you have to find your old statistics book and dust off your calculator or (gasp!) slide rule?  Thankfully, no.

There are several sites that provide sample size calculators to help you determine an appropriate sample size, including this one.  You’ll simply need to identify a desired confidence level (typically 95% to 99%), an acceptable margin of error (typically 5% or less) and the population size.

So, if you perform a search that retrieves 100,000 files and you want a sample size that provides a 99% confidence level with a margin of error of 5%, you’ll need to review 660 of the retrieved files to achieve that level of confidence in your sample (only 383 files if a 95% confidence level will do).  If 1,000,000 files were not retrieved, you would only need to review 664 of the not retrieved files to achieve that same level of confidence (99%, with a 5% margin of error) in your sample.  As you can see, the sample size doesn’t need to increase much when the population gets really large and you can review a relatively small subset to understand your collection and defend your search methodology to the court.

On Monday, we will talk about how to randomly select the files to review for your sample.  Same bat time, same bat channel!

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Other Evaluation Criteria

 

In the last posts in this blog series, we talked about evaluating service provider pricing, quality, scalability and flexibility.  There are a few other things you may wish to look at as well, that may be especially significant for large, long-term projects or relationships.  Those things are:

  1. Litigation Experience:  Select a service provider that has litigation experience versus general business experience.   A non-litigation service provider that does scanning — for example — may be able to technically meet your requirements.  They are probably not, however, accustomed to the inflexible schedules and changing priorities that are commonplace in litigation work.
  2. Corporate Profile and Tenure:  For a large project, be sure to select a service provider that’s been around for a while and has a proven track record.  You want to be confident that the service provider that starts your project will be around to finish your project.
  3. Security and Confidentiality:  You want to ensure that your documents, data, and information are secure and kept confidential.  This means that you require a secure physical facility, secure systems, and appropriate confidentiality guidelines and agreements.
  4. SaaS Service Providers: For them, you need to evaluate the technology functionality and ensure that it includes the features you require, that those features are easy to access and to use, and that access, system reliability, system speed, and system security meet your requirements.
  5. Facility Location and Accessibility:  For many projects and many types of services, it won’t be necessary to spend time on the project site.   For other projects, that might not be the case.  For example, if a service provide is staffing a large document review project at its facility, the litigation team may need to spend time at the facility overseeing work and doing quality control reviews.  In such a case, the geographic location and the facility’s access to airports and hotels may be a consideration.

A lot goes into selecting the right service provider for a project, and it’s worth the time and effort to do a careful, thorough evaluation.  In the next posts in this series, we’ll discuss the vendor evaluation and selection process.

What has been your experience with evaluating and selecting service providers?  What evaluation criteria have you found to be most important?  Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: Forbes on the Rise of Predictive Coding

 

First the New York Times with an article about eDiscovery, now Forbes.  Who’s next, The Wall Street Journal?  😉

Forbes published a blog post entitled E-Discovery And the Rise of Predictive Coding a few days ago.  Written by Ben Kerschberg, Founder of Consero Group LLC, it gets into some legal issues and considerations regarding predictive coding that are interesting.  For some background on predictive coding, check out our December blog posts, here and here.

First, the author provides a very brief history of document review, starting with bankers boxes and WordPerfect and “[a]fter an interim phase best characterized by simple keyword searches and optical character recognition”, it evolved to predictive coding.  OK, that’s like saying that Gone with the Wind started with various suitors courting Scarlett O’Hara and after an interim phase best characterized by the Civil War, marriage and heartache, Rhett says to Scarlett, “Frankly, my dear, I don’t give a damn.”  A bit oversimplification of how review has evolved.

Nonetheless, the article gets into a couple of important legal issues raised by predictive coding.  They are:

  • Satisfying Reasonable Search Requirements: Whether counsel can utilize the benefits of predictive coding and still meet legal obligations to conduct a reasonable search for responsive documents under the federal rules.  The question is, what constitutes a reasonable search under Federal Rule 26(g)(1)(A), which requires that the responding attorney attest by signature that “with respect to a disclosure, it is complete and correct as of the time it is made”?
  • Protecting Privilege: Whether counsel can protect attorney-client privilege for their client when a privileged document is inadvertently disclosed.  Fed. Rule of. Evidence 502 provides that a court may order that a privilege or protection is not waived by disclosure if the disclosure was inadvertent and the holder of the privilege took reasonable steps to prevent disclosure.  Again, what’s reasonable?

The author concludes that the use of predictive coding is reasonable, because it a) makes document review more efficient by providing only those documents to the reviewer that have been selected by the algorithm; b) makes it more likely that responsive documents will be produced, saving time and resources; and c) refines relevant subsets for review, which can then be validated statistically.

So, what do you think?  Does predictive coding enable attorneys to satisfy these legal issues?   Is it reasonable?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Does Size Matter?

 

I admit it, with a title like “Does Size Matter?”, I’m looking for a few extra page views….  😉

I frequently get asked how big does an ESI collection need to be to benefit from eDiscovery technology.  In a recent case with one of my clients, the client had a fairly small collection – only about 4 GB.  But, when a judge ruled that they had to start conducting depositions in a week, they needed to review that data in a weekend.  Without FirstPass™, powered by Venio FPR™ to cull the data and OnDemand® to manage the linear review, they would not have been able to make that deadline.  So, they clearly benefited from the use of eDiscovery technology in that case.

But, if you’re not facing a tight deadline, how large does your collection need to be for the use of eDiscovery technology to provide benefits?

I recently conducted a webinar regarding the benefits of First Pass Review – aka Early Case Assessment, or a more accurate term (as George Socha points out regularly), Early Data Assessment.  One of the topics discussed in that webinar was the cost of review for each gigabyte (GB).  Extrapolated from an analysis conducted by Anne Kershaw a few years ago (and published in the Gartner report E-Discovery: Project Planning and Budgeting 2008-2011), here is a breakdown:

Estimated Cost to Review All Documents in a GB:

  • Pages per GB:                75,000
  • Pages per Document:      4
  • Documents Per GB:        18,750
  • Review Rate:                 50 documents per hour
  • Total Review Hours:       375
  • Reviewer Billing Rate:     $50 per hour

Total Cost to Review Each GB:      $18,750

Notes: The number of pages per GB can vary widely.  Page per GB estimates tend to range from 50,000 to 100,000 pages per GB, so 75,000 pages (18,750 documents) seems an appropriate average.  50 documents reviewed per hour is considered to be a fast review rate and $50 per hour is considered to be a bargain price.  eDiscovery Daily provided an earlier estimate of $16,650 per GB based on assumptions of 20,000 documents per GB and 60 documents reviewed per hour – the assumptions may change somewhat, but, either way, the cost for attorney review of each GB could be expected to range from at least $16,000 to $18,000, possibly more.

Advanced culling and searching capabilities of First Pass Review tools like FirstPass can enable you to cull out 70-80% of most collections as clearly non-responsive without having to conduct attorney review on those files.  If you have merely a 2 GB collection and assume the lowest review cost above of $16,000 per GB, the use of a First Pass Review tool to cull out 70% of the collection can save $22,400 in attorney review costs.  Is that worth it?

So, what do you think?  Do you use eDiscovery technology for only the really large cases or ALL cases?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Is Disclosure of Search Terms Required?

 

I read a terrific article a couple of days ago from the New York Law Journal via Law Technology News entitled Search Terms Are More Than Mere Words, that had some interesting takes about the disclosure of search terms in eDiscovery.  The article was written by David J. Kessler, Robert D. Owen, and Emily Johnston of Fulbright & Jaworski.  The primary emphasis of the article was with regard to the forced disclosure of search terms by courts.

In the age of “meet and confer”, it has become much more common for parties to agree to exchange search terms in a case to limit costs and increase transparency.  However, as the authors correctly note, search terms reflect counsel’s strategy for the case and, therefore, work product.  Their position is that courts should not force disclosure of search terms and that disclosure of terms is “not appropriate under the Federal Rules of Civil Procedure”.  The article provides a compelling argument as to why forced disclosure is not appropriate and provides some good case cites where courts have accepted or rejected requests to compel provision of search terms.  I won’t try to recap them all here – check out the article for more information.

So, should disclosure of search terms be generally required?  If not, what does that mean in terms of utilizing a defensible approach to searching?

Personally, I agree with the authors that forced disclosure of search terms is generally not appropriate, as it does reflect strategy and work product.  However, there is an obligation for each party to preserve, collect, review and produce all relevant materials to the best of their ability (that are not privileged, of course).  Searching is an integral part of that process.  And, the article does note that “chosen terms may come under scrutiny if there is a defect in the production”, though “[m]ere speculation or unfounded accusations” should not lead to a requirement to disclose search terms.

With that said, the biggest component of most eDiscovery collections today is email, and that email often reflects discussions between parties in the case.  In these cases, it’s much easier for opposing counsel to identify legitimate defects in the production because they have some of the same correspondence and documents and can often easily spot discrepancies in the production set.  If they identify legitimate omissions from the production, those omissions could cause the court to call into question your search procedures.  Therefore, it’s important to conduct a defensible approach to searching (such as the “STARR” approach I described in an earlier post) to be able to defend yourself if those questions arise.  Demonstrating a defensible approach to searching will offer the best chance to preserve your rights to protect your work product of search terms that reflect your case strategy.

So, what do you think?  Do you think that forced disclosure of search terms is appropriate?   Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Evaluating Quality

Yesterday, we talked about evaluating service-provider pricing.  That, of course, is just part of the picture.  You need a service provider that can and does provide high-quality work that meets your expectations.

This can be hard to assess when you are evaluating a service provider with which you don’t have prior experience. And, unfortunately, it’s just not possible to know up-front if a service provider will do high-quality work on any given project.  You can, however, determine whether a service provider is likely to do high-quality work.  Here are some suggestions for doing so:

  1. Ask for references, and check them.  Ask for both end-user references and for people who were the point of contact with the service provider.  And ask for references for projects that were similar in size and scope to your project.  Later in this blog series, I’m going to give you some suggestions for doing an effective reference check.
  2. Look at their procedures and processes.  This is important for tasks that are labor intensive, and for tasks that are heavily technology based too.  Look at intake procedures, workflow procedures, and status-tracking procedures.
  3. Look at the type and level of quality control that is done.  Find out what is checked 100%, what is sampled, what triggers rework, what computer validation is done, and what is checked manually.
  4. Ask about staff qualifications, experience and training.
  5. Ask about project management.  A well-managed project will yield higher-quality results.  For certain types of projects, you might also require interviewing the project manager that will be assigned to your project.
  6. Evaluate the quality of your communication with the service provider during the evaluation process.  Did they understand your questions and your needs?  Were documents submitted to you (proposals and correspondence) clear and free of errors?  I might not eliminate a service provider from consideration for problems in this area, but I’d certainly question the care the service provider might take with my work if they didn’t take care in their communications with me.

What has been your experience with service provider work quality?  Do you have good or bad experiences you can tell us about?  Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Evaluating Price

 

When you are looking for help with handling discovery materials, there are hundreds of service providers to choose from.  It’s important that you choose one that can meet your schedule, has fair pricing and does high-quality work.  But there are other things you should look at as well. 

In the next few blogs in this series, we’re going to discuss what you should be looking at when you evaluate a service provider.  Note that these points are not covered in order of importance.  The importance of any single evaluation point will vary from case to case and will depend on things like the type of service you are looking for, the duration of the project, the complexity of the project, and the size of the project.

Let’s start with Price.  Obviously, costs are significant and the first thing most people look at when doing an evaluation.  Unfortunately, many people don’t look at anything else.  Don’t fall into that trap.  If a service provider offers prices much lower than everyone else’s, that should sound some alarms.  There’s a chance the service provider doesn’t understand the task or is cutting corners somewhere.  Do a lot of digging and take a close look at the organization’s procedures and technology before selecting a service provider that is comparatively very low-priced. 

There’s another very important consideration when you are comparing service provider pricing:  not all pricing models are the same.  Make sure you understand every component of a service provider’s price, what’s included, what’s not, what exactly you are paying for, and how it affects the bottom line.  Let me give you an example.  Some service providers charge per GB for “input” gigs for electronic discovery processing, while others charge per GB for “output” gigs.  Of course, the ones that charge for “input” gigs charge a lower per gig price, but they are charging for more gigabytes. 

Understand how a service provider’s pricing is structured and what it means when you are evaluating prices.  It’s always a good idea to ask a service provider to estimate total costs for a project to verify your understanding.

In the next blogs in this series, we’ll look at other things you should be looking at when selecting a vendor.

What has been your experience with service provider work?  Do you have good or bad experiences you can tell us about?  Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: Despite What NY Times Says, Lawyers Not Going Away

 

There was a TV commercial in the mid-80’s where a soap opera actor delivered the line “I’m not a doctor, but I play one on TV”.  Can you remember the product it was advertising (without clicking on the link)?  If so, you win the trivia award of the day!  😉

I’m a technologist who has been working in litigation support and eDiscovery for over twenty years.  If you’ve been reading eDiscovery Daily for awhile, you’ve probably noticed that I’ve written several posts regarding significant case law as it pertains to eDiscovery.  I often feel that I should offer a disclaimer before each of these posts saying “I’m not a lawyer, but I play one on the Web”.  As the disclaimer at the bottom of the page stipulates, these posts aren’t meant to provide legal advice and it is not my intention to do so, but merely to identify cases that may be of interest to our readers and I try to provide a basic recap of these cases and leave it at that.  As Clint Eastwood once said, “A man’s got to know his limitations”.

A few days ago, The New York Times published an article entitled Armies of Expensive Lawyers, Replaced by Cheaper Software which discussed how, using ‘artificial intelligence, “e-discovery” software can analyze documents in a fraction of the time for a fraction of the cost’ (extraneous comma in the title notwithstanding).  The article goes on to discuss linguistic and sociological techniques for retrieval of relevant information and discusses how the Enron Corpus, available in a number of forms, including through EDRM, has enabled software providers to make great strides in analytical capabilities using this large base of data to use in testing.  It also discusses whether this will precipitate a march to the unemployment line for scores of attorneys.

A number of articles and posts since then have offered commentary as to whether that will be the case.  Technology tools will certainly reduce document populations significantly, but, as the article noted, “[t]he documents that the process kicks out still have to be read by someone”.  Not only that, the article still makes the assumption that people too often make with search technology – that it’s a “push a button and get your answer” approach to identifying relevant documents.  But, as has been noted in several cases and also here on this blog, searching is an iterative process where sampling the search results is recommended to confirm that the search maximizes recall and precision to the extent possible.  Who do you think is going to perform that sampling?  Lawyers – that’s who (working with technologists like me, of course!).  And, some searches will require multiple iterations of sampling and analysis before the search is optimized.

Therefore, while the “armies” of lawyers many not need near as many members of the infantry, they will still need plenty of corporals, sergeants, captains, colonels and generals.  And, for those entry-level reviewing attorneys that no longer have a place on review projects?  Well, we could always use a few more doctors on TV, right?  😉

So, what do you think?  Are you a review attorney that has been impacted by technology – positively or negatively?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: George Socha of Socha Consulting

 

This is the seventh of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha.  A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey.  In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are eight active projects with over 300 members from 81 participating organizations. George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

On the very “flip” side, the number one trend to date in 2011 is predictions about trends in 2011.  They are part of a consistent and long-term pattern, which is that many of these trend predictions are not trend predictions at all – they are marketing material and the prediction is “you will buy my product or service in the coming year”.

That said, there are a couple of things of note.  Since I understand you talked to Tom about Apersee, it’s worth noting that corporations are struggling with working through a list of providers to find out who provides what services.  You would figure that there is somewhere in the range of 500 or so total providers.  But, my ever-growing list, which includes both external and law firm providers, is at more than 1,200.  Of course, some of those are probably not around anymore, but I am confident that there are at least 200-300 that I do not yet have on the list.  My guess when the list shakes out is that there are roughly 1,100 active providers out there today.  If you look at information from the National Center for State Courts and the Federal Judicial Center, you’ll see that there are about 11 million new lawsuits filed every year.  I saw an article in the Cornell Law Forum a week or two ago which indicated that there are roughly 1.1 million lawyers in the country.  So, there are 11 million lawsuits, 1.1 million lawyers and 1,100 providers.  Most of those lawyers have no experience with eDiscovery and most of those lawsuits have no provider involved, which means eDiscovery is still very much an emerging market, not even close to being a mature market.  As fast as providers disappear, through attrition or acquisition, new providers enter the market to take their place.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  Maybe this is overly optimistic, but part of what I’m seeing in leading up to the conference, on various web sites and at the conference itself, is that a series of incremental changes taking place over a long period are finally leading to some radical differences.  One of those differences is that we finally are reaching a point where a number of providers can make the claim to being “end-to-end providers” with some legitimacy.  For as long as we’ve had the EDRM model, we’ve had providers that have professed to cover the full EDRM landscape, by which they generally have meant Identification through Production.  A growing number of providers not only cover that portion of the EDRM spectrum but have some ability to address Information Management, Presentation, or both   By and large, those providers are getting there by building their software and services based on experience and learning over the past 8 to 10 to 12 years, introducing new offerings at the show that reflect that learned experience.

A couple of days ago, I only half-jokingly issued “the Dyson challenge” (as in the Dyson vacuum cleaner).  Every year, come January, our living room carpet is strewn with pine tree needles and none of the vacuum cleaners that we have ever had have done a good job of picking up those needles.  The Dyson vacuum cleaner claims it cyclones capture more dirt than anything, but I was convinced that could not include those needles.  Nonetheless I tried, and to my surprise it worked like a charm!  I want to see the providers offering products able to perform at that high level, not just meeting but exceeding expectations.

I also see a feeling of excitement and optimism that wasn’t apparent at last year’s show.

What are you working on that you’d like our readers to know about?

As I mentioned, we have launched the Apersee web site, designed to allow consumers to find providers and products that fit their specific needs.  The site is in beta and the link is live.  It’s in beta because we’re still working on features to make it as useful as possible to customers and providers.  We’re hoping it’s a question of weeks, not months, before those features are implemented.  Once we go fully live, we will go two months with the system “wide open” – where every consumer can see all the provider and product information that any provider has put in the system.  After that, consumers will be able to see full provider and product profiles for providers who have purchased blocks of views.  Even if a provider does not purchase views, all selection criteria it enters are searchable, but search results will display only the provider’s name and website name.  Providers will be able to get stats on queries and how many times their information is viewed, but not detailed information as to which customers are connecting and performing the queries.

As for EDRM, we continue to make progress with an array of projects and a growing number of collaborative efforts, such as the work the Data Set group has down with TREC Legal and the work the Metrics group has done with the LEDES Committee. We not only want to see membership continue to grow, but we also want to continue to push for more active participation to continue to make progress in the various working groups.  We’ve just met at the show here regarding the EDRM Testing pilot project to address testing standards.  There are very few guidelines for testing of electronic discovery software and services, so the Testing project will become a full EDRM project as of the EDRM annual meeting this May to begin to address the need for those guidelines.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!