Review

eDiscovery Best Practices: Competency Ethics – It’s Not Just About the Law Anymore

 

A few months ago at LegalTech New York, I conducted a thought leader interview with Tom O’Connor of Gulf Coast Legal Technology Center, who didn’t exactly mince words when talking about the trend for attorneys to “finally tak[e] technology seriously”.  As he noted, “lawyers are finally trying to take some time to try to get up to speed – whining and screaming pitifully all the way about how it’s not fair, and the sanctions are too high and there’s too much data.  Get a life, get a grip.  Use the tools that are out there that have been given to you for years.” 

Strong words, indeed.  The American Bar Association (ABA) Model Rules of Professional Conduct (Model Rules) require that an attorney possess and demonstrate a certain requisite level of knowledge in order to be considered competent to handle a given matter.  Specifically, Model Rule 1.1 states that, "[a] lawyer shall provide competent representation to a client. Competent representation requires the legal knowledge, skill, thoroughness, and preparation reasonably necessary for the representation."

Preparation not only means understanding a specific area of the law (for example, antitrust or patent law, both highly specialized.).  It also means having the technical knowledge and skills necessary to serve the client in the area of discovery.

The ethical responsibilities of counsel these days includes competently directing and managing the identification, preservation, collection, processing, analysis, review and production of electronically stored information (ESI) required to be produced pursuant to lawful discovery requests.  If counsel does not have that level of competency in a particular area, he or she is obligated to either acquire the knowledge or skill necessary to support those needs, or include someone else who does have the requisite skills as part of the representation.

Not too long ago, I met with an attorney and discussed how they handled preservation obligations with their clients.  The attorney indicated that he expected his clients to self-manage their own preservation and collection.  When I asked him why he didn’t try to get more involved to make sure it was being handled properly, he said, “I don’t want to alarm them.  They might decide they need a bigger firm.”

Recent case law is full of cases where counsel didn’t fully understand their eDiscovery obligations, and got themselves and their clients “burned” in the process.  If your organization gets involved in litigation, make sure to include eDiscovery competence among the factors you consider when determining counsel qualifications to represent you.

So, what do you think?  Is your counsel eDiscovery savvy?  If not, do they use a provider that is?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: Defendant Can’t Be Plaintiff’s Friend on Facebook

In Piccolo v. Paterson, Bucks County, Pa., Common Pleas Court Judge Albert J. Cepparulo denied the motion from the defendant requesting access to the photos of plaintiff Sara Piccolo posted in her Facebook account.

Piccolo filed an action against the defendants after being injured in a one-car accident while a passenger in a car driven by defendant Lindsay Paterson. According to the defense motion, filed by attorneys at Moore & Riemenschneider, Piccolo testified she had a Facebook account and was asked at deposition if the defense counsel could send a “neutral friend request” to Piccolo so that he could review the Facebook postings Piccolo testified she made every day.  Piccolo’s attorney, Benjamin G. Lipman , ultimately denied the request, responding that the “‘materiality and importance of the evidence … is outweighed by the annoyance, embarrassment, oppression and burden to which it exposes'” the plaintiff.

The defense argued that access to Piccolo’s Facebook page would provide necessary and relevant information related to the claims by Piccolo and cited a case, McMillen v. Hummingbird Speedway, Inc. (previously summarized by eDiscoveryDaily here), in which the court ordered the plaintiff to provide his username and password to the defendant’s attorney. The plaintiff’s attorney argued that the defense had only asked for the pictures Piccolo posted on Facebook and that they had already been provided with “as complete a photographic record of the pre-accident and post-accident condition” of Piccolo.

As a result of the accident in May 2007, Piccolo suffered lacerations to her lip and chin when hit in the face with an airbag. She had 95 stitches to her face and then surgery to repair her scarring six months later. With permanent scars on her face, Piccolo allowed the insurer in 2008 to take photographs of her face and gave the defense 20 photos of her face from the week following the accident and five photos from the months just before the accident.

In Piccolo’s response to the defense motion, Lipman argued that defense counsel had only asked at Piccolo’s deposition about the pictures she posted on Facebook, not any textual postings. He said that the defendant had already been provided “as complete a photographic record of the pre-accident and post-accident condition” of Piccolo as she “could reasonably have a right to expect in this case.”

Judge Cepparulo agreed, ruling with the plaintiff and denying the defense access to Piccolo’s Facebook page in a one-paragraph motion.

So, what do you think?  Did the judge make the correct call or should he have issued a ruling consistent with McMillen?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: 4 Steps to Effective eDiscovery With Software Analytics

 

I read an interesting article from Texas Lawyer via Law.com entitled “4 Steps to Effective E-Discovery With Software Analytics” that has some interesting takes on project management principles related to eDiscovery and I’ve interjected some of my thoughts into the analysis below.  A copy of the full article is located here.  The steps are as follows:

1. With the vendor, negotiate clear terms that serve the project's key objectives.  The article notes the important of tying each collection and review milestone (e.g., collecting and imaging data; filtering data by file type; removing duplicates; processing data for review in a specific review platform; processing data to allow for optical character recognition (OCR) searching; and converting data into a tag image file format (TIFF) for final production to opposing counsel) to contract terms with the vendor. 

The specific milestones will vary – for example, conversion to TIFF may not be necessary if the parties agree to a native production – so it’s important to know the size and complexity of the project, and choose only an experienced eDiscovery vendor who can handle the variations.

2. Collect and process data.  Forensically sound data collection and culling of obviously unresponsive files (such as system files) to drastically decrease the overall review costs are key services that a vendor provides in this area.  As we’ve noted many times on this blog, effective culling can save considerable review costs – each gigabyte (GB) culled can save $16-$18K in attorney review costs.

The article notes that a hidden cost is the OCR process of translating extracted text into a searchable form and that it’s an optimal negotiation point with the vendor.  This may have been true when most collections were paper based, but as most collections today are electronic based, the percentage of documents requiring OCR is considerably less than it used to be.  However, it is important to be prepared that there are some native files which will be “image only”, such as TIFFs and scanned PDFs – those will require OCR to be effectively searched.

3. Select a data and document review platform.  Factors such as ease of use, robustness, and reliability of analytic tools, support staff accessibility to fix software bugs quickly, monthly user and hosting fees, and software training and support fees should be considered when selecting a document review platform.

The article notes that a hidden cost is selecting a platform with which the firm’s litigation support staff has no experience as follow-up consultation with the vendor could be costly.  This can be true, though a good vendor training program and an intuitive interface can minimize or even eliminate this component.

The article also notes that to take advantage of the vendor’s more modern technology “[a] viable option is to use a vendor's review platform that fits the needs of the current data set and then transfer the data to the in-house system”.  I’m not sure why the need exists to transfer the data back – there are a number of vendors that provide a cost-effective solution appropriate for the duration of the case.

4. Designate clear areas of responsibility.  By doing so, you minimize or eliminate inefficiencies in the project and the article mentions the RACI matrix to determine who is responsible (individuals responsible for performing each task, such as review or litigation support), accountable (the attorney in charge of discovery), consulted (the lead attorney on the case), and informed (the client).

Managing these areas of responsibility effectively is probably the biggest key to project success and the article does a nice job of providing a handy reference model (the RACI matrix) for defining responsibility within the project.

So, what do you think?  Do you have any specific thoughts about this article?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Thought Leader Interview with Jeffrey Brandt, Editor of Pinhawk Law Technology Daily Digest

 

As eDiscovery Daily has done in the past, we have periodically interviewed various thought leaders in eDiscovery and legal technology to provide insight as to trends in the industry for our readers to consider.  Recently, I was able to interview Jeffrey Brandt, Editor of the Pinhawk Law Technology Daily Digest and columnist for Legal IT Professionals.

With an educational background in computer science and mathematics from the University of Pittsburgh, Jeff has over twenty four years of experience in the field of legal automation working with various organizations in the United States, Canada, and the United Kingdom.  As a technology and management consultant to hundreds of law firms and corporate law departments he has worked on information management projects including: long range strategic planning, workflow management and reengineering, knowledge management, IT structure and personnel requirements and budgeting. Working as a CIO at several large law firms, Jeff has helped bring oversight, coordination and change management to initiatives including: knowledge management, library & research services, eDiscovery, records management, technology and more. Most recently, he served as the Chief Information and Knowledge Officer with an AMLaw 100 law firm based out of Washington, DC.

Jeff has also been asked to serve on numerous advisory councils and CIO advisory boards for key vendors in the legal space, advising them on issues of client service and future product direction.  He is a long time member (and former board member) of the International Legal Technology Association (ILTA) and has taught CLE classes on topics ranging from litigation support to ethics and technology.

What do you consider to be the current significant trends in eDiscovery in 2011 and beyond on which people in the industry are, or should be, focused?

I would say that the biggest two are the project management component and, for lack of a better term, automated or artificial intelligence.

The whole concept and the complexities of what it takes to manage a case today are more challenging than ever, including issues like the number of sources, the amount of data in the sources, the format in which you’re producing, where can the data go and who can see it.  I remember the days when people used to take a couple of bankers boxes, put them in their car and go home and work on the documents.  You simply cannot do that today – the amount of information today is just insane.

As for artificial intelligence, as was discussed in the (Pinhawk) digest recently, you’re seeing the emergence of predictive coding and using computers to cull through the massive amounts of information so that a human can take the final pass.  I think more and more we’re going to see people relying on those types of technologies – some because they embrace it, others because there is no other way to humanly do it.

I think if there’s any third trend it would probably be where do we go next to get the data?  In terms of social media, mining Facebook and Twitter and all the other various sources for additional data as part of the discovery process has become a challenge.

You recently became editor of the Pinhawk Law Technology Daily Digest.  Tell me about that and about your plans for the digest.

Well, I think there are several things going forward.  My role is to keep up the good work that Curt Meltzer, the founding editor, started and fill the “big shoes” that Curt left behind.  My goal is to expand the sources of information from which Pinhawk draws.  There are about 400 sources today and I think by the time my sources (and possibly a few others) are added in, there will be over 500.  We’ve also talked about going to our readership and asking them “what are your go-to and must read sources?” to include those sources as well.  We’ll also be looking to incorporate social media tools to hopefully make the experience much more comprehensive and easier to participate in for the Pinhawk digest reader.

And, what should we be looking for in your column in Legal IT Professionals?

Well, I like to dabble in multiple areas – in the small consulting practice that I have, I do a little bit of everything.  I’ve recently done some very interesting work in communities of practice, using social media tools, focusing them inward in law firms to provide the forum for lawyers to open up, share and mentor to others.  I like KM (Knowledge Management) and related topics and we had a recent post in Pinhawk talking about the future of the law firm.  To me, those types of discussions are fascinating.

You take the extremes and you’ve got the “law factory”, you take the high-end and you’ve got the “bet the farm” law firm.  How technology plays a role in whatever culture, whatever focus a law firm puts itself on is interesting.  And then you watch and see some of the rumblings and inklings of what can be done in places like Australia, where you have third-party investment of law firms and the United Kingdom, where they are about to get third-party investment.  There was a recent article about third-party ownership of law firms in North Carolina.  You look at examples like that and you say “is the model of partnership alive?”  When you get into “big law”, are they really partnerships?  Where are they in the spectrum of a thousand sole practitioners operating under one letterhead to a firm of a thousand lawyers?  That’s where I think that communities of practice and social media tools are going to help lawyers know more about their own partners and own firms. 

It’s sad that in some firms the lawyers on the north side of the building don’t even know the lawyers on the south side of the building, let alone the people on the eighth floor vs. the tenth floor.  It’s a changing landscape.  When I got into legal and was first a CIO at Porter, Wright, Morris & Arthur, 250 lawyers in Columbus, Ohio was the 83rd largest law firm in the US – an AMLAW 100 firm.  Today, does that size a firm even make it into the AMLAW 250?

In my column at Legal IT Professionals, you’ll see more about KM and change management.  Another part of my practice is mentoring IT executives in how to deal with business problems related to the business of law and I think that might be my next post – free advice to the aspiring CIO.

This might sound odd coming from a technologist, but…it’s not really about the technology.  From a broad standpoint, you can be successful with most software tools.  A law firm isn’t going to be made or broken whether it chose OpenText or iManage as a document management tool or chose a specific litigation support tool.  It is more about the people, the education and the process than it is the actual tool.  Yes, there are some horrible tools that you should avoid, but, all things being equal, it’s really more the other pieces of the equation that determine your success.

Thanks, Jeff, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think

 

Here’s a sample scenario: You identify custodians relevant to the case and collect files from each.  Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians.  You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel.  After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!!  What happened?!?

Did the vendor accidentally “double-bill” you?  That would be great – but no.  There’s a much more logical explanation and, unfortunately, you may wind up paying a lot more to process these files that you expected.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files.  For example, as noted above, Outlook emails are typically saved for each custodian in a personal storage (.PST) file format, which is an expanding container file. For most custodians, all of their email (and the corresponding attachments, if present) resides in a few PST files.  The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space?  The PST file is like one of those vacuum bags – it typically stores the emails and attachments in a compressed format to save space.  When the emails and attachments are processed into a review tool, they are expanded into their normal size.  This expanded size can be 1.5 to 2 times larger than the scanned size (or more).  And, that’s what many vendors will bill on – the expanded size.

There are other types of archive container files that compress the contents – .zip and .rar files are two examples of compressed container files.  These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, usually in – you guessed it – email.  With email comprising a majority of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.  It’s important to be prepared for that and know your options when processing that data, so you can effectively anticipate those processing costs.

So, what do you think?  Have you ever been surprised by processing costs of your ESI?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Testing Your Search Using Sampling

Friday, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator.  Yesterday, we talked about how to make sure the sample size is randomly selected.

Today, we’ll walk through an example of how you can test and refine a search using sampling.

TEST #1: Let’s say in an oil company we’re looking for documents related to oil rights.  To try to be as inclusive as possible, we will search for “oil” AND “rights”.  Here is the result:

  • Files retrieved with “oil” AND “rights”: 200,000
  • Files NOT retrieved with “oil” AND “rights”: 1,000,000

Using the site to determine an appropriate sample size that we identified before, we determine a sample size of 662 for the retrieved files and 664 for the non-retrieved files to achieve a 99% confidence level with a margin of error of 5%.  We then use this site to generate random numbers and then proceed to review each item in the retrieved and NOT retrieved items sets to determine responsiveness to the case.  Here are the results:

  • Retrieved Items: 662 reviewed, 24 responsive, 3.6% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 661 non-responsive, 99.5% non-responsive rate.

Nearly every item in the NOT retrieved category was non-responsive, which is good.  But, only 3.6% of the retrieved items were responsive, which means our search was WAY over-inclusive.  At that rate, 192,800 out of 200,000 files retrieved will be NOT responsive and will be a waste of time and resource to review.  Why?  Because, as we determined during the review, almost every published and copyrighted document in our oil company has the phrase “All Rights Reserved” in the document and will be retrieved.

TEST #2: Let’s try again.  This time, we’ll conduct a phrase search for “oil rights” (which requires those words as an exact phrase).  Here is the result:

  • Files retrieved with “oil rights”: 1,500
  • Files NOT retrieved with “oil rights”: 1,198,500

This time, we determine a sample size of 461 for the retrieved files and (again) 664 for the NOT retrieved files to achieve a 99% confidence level with a margin of error of 5%.  Even though, we still have a sample size of 664 for the NOT retrieved files, we generate a new list of random numbers to review those items, as well as the 461 randomly selected retrieved items.  Here are the results:

  • Retrieved Items: 461 reviewed, 435 responsive, 94.4% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 523 non-responsive, 78.8% non-responsive rate.

Nearly every item in the retrieved category was responsive, which is good.  But, only 78.8% of the NOT retrieved items were responsive, which means over 20% of the NOT retrieved items were actually responsive to the case (we also failed to retrieve 8 of the items identified as responsive in the first iteration).  So, now what?

TEST #3: If you saw this previous post, you know that proximity searching is a good alternative for finding hits that are close to each other without requiring the exact phrase.  So, this time, we’ll conduct a proximity search for “oil within 5 words of rights”.  Here is the result:

  • Files retrieved with “oil within 5 words of rights”: 5,700
  • Files NOT retrieved with “oil within 5 words of rights”: 1,194,300

This time, we determine a sample size of 595 for the retrieved files and (once again) 664 for the NOT retrieved files, generating a new list of random numbers for both sets of items.  Here are the results:

  • Retrieved Items: 595 reviewed, 542 responsive, 91.1% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 655 non-responsive, 98.6% non-responsive rate.

Over 90% of the items in the retrieved category were responsive AND nearly every item in the NOT retrieved category was non-responsive, which is GREAT.  Also, all but one of the items previously identified as responsive was retrieved.  So, this is a search that appears to maximize recall and precision.

Had we proceeded with the original search, we would have reviewed 200,000 files – 192,800 of which would have been NOT responsive to the case.  By testing and refining, we only had to review 8,815 files –  3,710 sample files reviewed plus the remaining retrieved items from the third search (5,700595 = 5,105) – most of which ARE responsive to the case.  We saved tens of thousands in review costs while still retrieving most of the responsive files, using a defensible approach.

Keep in mind that this is a simple example — we’re not taking into account misspellings and other variations we may want to include in our criteria.

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: A “Random” Idea on Search Sampling

 

Friday, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator.  Today, we’ll talk about how to make sure the sample size is randomly selected.

A randomly selected sample gives each file an equal chance of being reviewed and eliminates the chance of bias being introduced into the sample which might skew the results.  Merely selecting the first or last x number of items (or any other group) in the set may not reflect the population as a whole – for example, all of those items could come from a single custodian.  To ensure a fair, defensible sample, it needs to be selected randomly.

So, how do you select the numbers randomly?  Once again, the Internet helps us out here.

One site, Random.org, has a random integer generator which will randomly generate whole numbers.  You simply need to supply the number of random integers that you need to be generated, the starting number and ending number of the range within which the randomly generated numbers should fall.  The site will then generate a list of numbers that you can copy and paste into a text file or even a spreadsheet.  The site also provides an Advanced mode, which provides options for the numbers (e.g., decimal, hexadecimal), output format and how the randomization is ‘seeded’ (to generate the numbers).

In the example from Friday, you would provide 660 as the number of random integers to be generated, with a starting number of 1 and an ending number of 100,000 to get a list of random numbers for testing your search that yielded 100,000 files with hits (664, 1 and 1,000,000 respectively to get a list of numbers to test the non-hits).  You could paste the numbers into a spreadsheet, sort them and then retrieve the files by position in the result set based on the random numbers retrieved and review each of them to determine whether they reflect the intent of the search.  You’ll then have a good sense of how effective your search was, based on the random sample.  And, probably more importantly, using that random sample to test your search results will be a highly defensible method to verify your approach in court.

Tomorrow, we'll walk through a sample iteration to show how the sampling will ultimately help us refine our search.

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Determining Appropriate Sample Size to Test Your Search

 

We’ve talked about searching best practices quite a bit on this blog.  One part of searching best practices (as part of the “STARR” approach I described in an earlier post) is to test your search results (both the result set and the files not retrieved) to determine whether the search you performed is effective at maximizing both precision and recall to the extent possible, so that you retrieve as many responsive files as possible without having to review too many non-responsive files.  One question I often get is: how many files do you need to review to test the search?

If you remember from statistics class in high school or college, statistical sampling is choosing a percentage of the results population at random for inspection to gather information about the population as a whole.  This saves considerable time, effort and cost over reviewing every item in the results population and enables you to obtain a “confidence level” that the characteristics of the population reflect your sample.  Statistical sampling is a method used for everything from exit polls to predict elections to marketing surveys to poll customers on brand popularity and is a generally accepted method of drawing conclusions for an overall results population.  You can sample a small portion of a large set to obtain a 95% or 99% confidence level in your findings (with a margin of error, of course).

So, does that mean you have to find your old statistics book and dust off your calculator or (gasp!) slide rule?  Thankfully, no.

There are several sites that provide sample size calculators to help you determine an appropriate sample size, including this one.  You’ll simply need to identify a desired confidence level (typically 95% to 99%), an acceptable margin of error (typically 5% or less) and the population size.

So, if you perform a search that retrieves 100,000 files and you want a sample size that provides a 99% confidence level with a margin of error of 5%, you’ll need to review 660 of the retrieved files to achieve that level of confidence in your sample (only 383 files if a 95% confidence level will do).  If 1,000,000 files were not retrieved, you would only need to review 664 of the not retrieved files to achieve that same level of confidence (99%, with a 5% margin of error) in your sample.  As you can see, the sample size doesn’t need to increase much when the population gets really large and you can review a relatively small subset to understand your collection and defend your search methodology to the court.

On Monday, we will talk about how to randomly select the files to review for your sample.  Same bat time, same bat channel!

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Forbes on the Rise of Predictive Coding

 

First the New York Times with an article about eDiscovery, now Forbes.  Who’s next, The Wall Street Journal?  😉

Forbes published a blog post entitled E-Discovery And the Rise of Predictive Coding a few days ago.  Written by Ben Kerschberg, Founder of Consero Group LLC, it gets into some legal issues and considerations regarding predictive coding that are interesting.  For some background on predictive coding, check out our December blog posts, here and here.

First, the author provides a very brief history of document review, starting with bankers boxes and WordPerfect and “[a]fter an interim phase best characterized by simple keyword searches and optical character recognition”, it evolved to predictive coding.  OK, that’s like saying that Gone with the Wind started with various suitors courting Scarlett O’Hara and after an interim phase best characterized by the Civil War, marriage and heartache, Rhett says to Scarlett, “Frankly, my dear, I don’t give a damn.”  A bit oversimplification of how review has evolved.

Nonetheless, the article gets into a couple of important legal issues raised by predictive coding.  They are:

  • Satisfying Reasonable Search Requirements: Whether counsel can utilize the benefits of predictive coding and still meet legal obligations to conduct a reasonable search for responsive documents under the federal rules.  The question is, what constitutes a reasonable search under Federal Rule 26(g)(1)(A), which requires that the responding attorney attest by signature that “with respect to a disclosure, it is complete and correct as of the time it is made”?
  • Protecting Privilege: Whether counsel can protect attorney-client privilege for their client when a privileged document is inadvertently disclosed.  Fed. Rule of. Evidence 502 provides that a court may order that a privilege or protection is not waived by disclosure if the disclosure was inadvertent and the holder of the privilege took reasonable steps to prevent disclosure.  Again, what’s reasonable?

The author concludes that the use of predictive coding is reasonable, because it a) makes document review more efficient by providing only those documents to the reviewer that have been selected by the algorithm; b) makes it more likely that responsive documents will be produced, saving time and resources; and c) refines relevant subsets for review, which can then be validated statistically.

So, what do you think?  Does predictive coding enable attorneys to satisfy these legal issues?   Is it reasonable?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Does Size Matter?

 

I admit it, with a title like “Does Size Matter?”, I’m looking for a few extra page views….  😉

I frequently get asked how big does an ESI collection need to be to benefit from eDiscovery technology.  In a recent case with one of my clients, the client had a fairly small collection – only about 4 GB.  But, when a judge ruled that they had to start conducting depositions in a week, they needed to review that data in a weekend.  Without FirstPass™, powered by Venio FPR™ to cull the data and OnDemand® to manage the linear review, they would not have been able to make that deadline.  So, they clearly benefited from the use of eDiscovery technology in that case.

But, if you’re not facing a tight deadline, how large does your collection need to be for the use of eDiscovery technology to provide benefits?

I recently conducted a webinar regarding the benefits of First Pass Review – aka Early Case Assessment, or a more accurate term (as George Socha points out regularly), Early Data Assessment.  One of the topics discussed in that webinar was the cost of review for each gigabyte (GB).  Extrapolated from an analysis conducted by Anne Kershaw a few years ago (and published in the Gartner report E-Discovery: Project Planning and Budgeting 2008-2011), here is a breakdown:

Estimated Cost to Review All Documents in a GB:

  • Pages per GB:                75,000
  • Pages per Document:      4
  • Documents Per GB:        18,750
  • Review Rate:                 50 documents per hour
  • Total Review Hours:       375
  • Reviewer Billing Rate:     $50 per hour

Total Cost to Review Each GB:      $18,750

Notes: The number of pages per GB can vary widely.  Page per GB estimates tend to range from 50,000 to 100,000 pages per GB, so 75,000 pages (18,750 documents) seems an appropriate average.  50 documents reviewed per hour is considered to be a fast review rate and $50 per hour is considered to be a bargain price.  eDiscovery Daily provided an earlier estimate of $16,650 per GB based on assumptions of 20,000 documents per GB and 60 documents reviewed per hour – the assumptions may change somewhat, but, either way, the cost for attorney review of each GB could be expected to range from at least $16,000 to $18,000, possibly more.

Advanced culling and searching capabilities of First Pass Review tools like FirstPass can enable you to cull out 70-80% of most collections as clearly non-responsive without having to conduct attorney review on those files.  If you have merely a 2 GB collection and assume the lowest review cost above of $16,000 per GB, the use of a First Pass Review tool to cull out 70% of the collection can save $22,400 in attorney review costs.  Is that worth it?

So, what do you think?  Do you use eDiscovery technology for only the really large cases or ALL cases?   Please share any comments you might have or if you’d like to know more about a particular topic.