eDiscovery Budgeting, Part 1: Assumptions and Elements that Contribute to Cost


While attorneys may struggle with the regional and international regulations surrounding eDiscovery, your client is likely to be less concerned with the practical legal details of your discovery request, and more concerned with the financial cost.

Whether you're working with the plaintiff or the defense, one of the most important considerations in preparing for eDiscovery is presenting the expense accurately and completely to the client – and that means understanding for yourself the factors that go into budgeting for eDiscovery. There are two main sets of elements to consider: those that affect budgeting and estimates, and those that will have a direct impact on the ultimate cost of eDiscovery.

Understanding Assumptions in eDiscovery

Because so much of the eDiscovery process cannot be predicted without accurate information, it's important to confirm any estimates from a client or from opposing counsel before proceeding with a budget.

Does your client really know the volume of data that is likely to be contained in certain files or backups, or are they providing generalized figures that may not be accurate? Do you know for certain the precise scope of the information you need to examine for discovery? Attorneys need to verify as many estimates as possible, noting any and all assumptions in their estimates so that the client can prepare for potential changes in eDiscovery costs if those early assumptions prove to be inaccurate.

eDiscovery budgeting is predicated on guesswork and assumptions that may include:

  • Volume
  • Scope
  • Efficiency
  • Risk
  • Timing

Each of these factors will be discussed in an upcoming blog post next week detailing the assumptions that go into estimating a budget for eDiscovery.

Breaking Down the Cost of eDiscovery

Once the estimate is complete and you’re ready to tackle the real work of eDiscovery, there are particular elements that contribute to the cost, while others are more minimal.

Some of the major elements comprising the cost of eDiscovery include:

  • Collection: including factors such as travel, retrieval, custodian interviews, and forensic collection (if necessary)
  • Volume of data
  • Number of custodians
  • Human review: the most expensive factor in eDiscovery costs
  • Case complexity

I'll discuss more on each of these factors in an upcoming blog post, as well.

The cost of eDiscovery can also be affected by the degree of open communication with opposing counsel. A cooperative relationship with the opposition can streamline discovery, while a contentious relationship makes it likely that discovery-related motions and court appearances will increase the total cost of this process.

So, what do you think? How much up front effort goes into your eDiscovery budgeting process? How do you monitor progress against the budget?  Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Best Practices: 4 Steps to Effective eDiscovery With Software Analytics


I read an interesting article from Texas Lawyer via entitled “4 Steps to Effective E-Discovery With Software Analytics” that has some interesting takes on project management principles related to eDiscovery and I’ve interjected some of my thoughts into the analysis below.  A copy of the full article is located here.  The steps are as follows:

1. With the vendor, negotiate clear terms that serve the project's key objectives.  The article notes the important of tying each collection and review milestone (e.g., collecting and imaging data; filtering data by file type; removing duplicates; processing data for review in a specific review platform; processing data to allow for optical character recognition (OCR) searching; and converting data into a tag image file format (TIFF) for final production to opposing counsel) to contract terms with the vendor. 

The specific milestones will vary – for example, conversion to TIFF may not be necessary if the parties agree to a native production – so it’s important to know the size and complexity of the project, and choose only an experienced eDiscovery vendor who can handle the variations.

2. Collect and process data.  Forensically sound data collection and culling of obviously unresponsive files (such as system files) to drastically decrease the overall review costs are key services that a vendor provides in this area.  As we’ve noted many times on this blog, effective culling can save considerable review costs – each gigabyte (GB) culled can save $16-$18K in attorney review costs.

The article notes that a hidden cost is the OCR process of translating extracted text into a searchable form and that it’s an optimal negotiation point with the vendor.  This may have been true when most collections were paper based, but as most collections today are electronic based, the percentage of documents requiring OCR is considerably less than it used to be.  However, it is important to be prepared that there are some native files which will be “image only”, such as TIFFs and scanned PDFs – those will require OCR to be effectively searched.

3. Select a data and document review platform.  Factors such as ease of use, robustness, and reliability of analytic tools, support staff accessibility to fix software bugs quickly, monthly user and hosting fees, and software training and support fees should be considered when selecting a document review platform.

The article notes that a hidden cost is selecting a platform with which the firm’s litigation support staff has no experience as follow-up consultation with the vendor could be costly.  This can be true, though a good vendor training program and an intuitive interface can minimize or even eliminate this component.

The article also notes that to take advantage of the vendor’s more modern technology “[a] viable option is to use a vendor's review platform that fits the needs of the current data set and then transfer the data to the in-house system”.  I’m not sure why the need exists to transfer the data back – there are a number of vendors that provide a cost-effective solution appropriate for the duration of the case.

4. Designate clear areas of responsibility.  By doing so, you minimize or eliminate inefficiencies in the project and the article mentions the RACI matrix to determine who is responsible (individuals responsible for performing each task, such as review or litigation support), accountable (the attorney in charge of discovery), consulted (the lead attorney on the case), and informed (the client).

Managing these areas of responsibility effectively is probably the biggest key to project success and the article does a nice job of providing a handy reference model (the RACI matrix) for defining responsibility within the project.

So, what do you think?  Do you have any specific thoughts about this article?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think


Here’s a sample scenario: You identify custodians relevant to the case and collect files from each.  Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians.  You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel.  After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!!  What happened?!?

Did the vendor accidentally “double-bill” you?  That would be great – but no.  There’s a much more logical explanation and, unfortunately, you may wind up paying a lot more to process these files that you expected.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files.  For example, as noted above, Outlook emails are typically saved for each custodian in a personal storage (.PST) file format, which is an expanding container file. For most custodians, all of their email (and the corresponding attachments, if present) resides in a few PST files.  The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space?  The PST file is like one of those vacuum bags – it typically stores the emails and attachments in a compressed format to save space.  When the emails and attachments are processed into a review tool, they are expanded into their normal size.  This expanded size can be 1.5 to 2 times larger than the scanned size (or more).  And, that’s what many vendors will bill on – the expanded size.

There are other types of archive container files that compress the contents – .zip and .rar files are two examples of compressed container files.  These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, usually in – you guessed it – email.  With email comprising a majority of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.  It’s important to be prepared for that and know your options when processing that data, so you can effectively anticipate those processing costs.

So, what do you think?  Have you ever been surprised by processing costs of your ESI?   Please share any comments you might have or if you’d like to know more about a particular topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Evaluating Price


When you are looking for help with handling discovery materials, there are hundreds of service providers to choose from.  It’s important that you choose one that can meet your schedule, has fair pricing and does high-quality work.  But there are other things you should look at as well. 

In the next few blogs in this series, we’re going to discuss what you should be looking at when you evaluate a service provider.  Note that these points are not covered in order of importance.  The importance of any single evaluation point will vary from case to case and will depend on things like the type of service you are looking for, the duration of the project, the complexity of the project, and the size of the project.

Let’s start with Price.  Obviously, costs are significant and the first thing most people look at when doing an evaluation.  Unfortunately, many people don’t look at anything else.  Don’t fall into that trap.  If a service provider offers prices much lower than everyone else’s, that should sound some alarms.  There’s a chance the service provider doesn’t understand the task or is cutting corners somewhere.  Do a lot of digging and take a close look at the organization’s procedures and technology before selecting a service provider that is comparatively very low-priced. 

There’s another very important consideration when you are comparing service provider pricing:  not all pricing models are the same.  Make sure you understand every component of a service provider’s price, what’s included, what’s not, what exactly you are paying for, and how it affects the bottom line.  Let me give you an example.  Some service providers charge per GB for “input” gigs for electronic discovery processing, while others charge per GB for “output” gigs.  Of course, the ones that charge for “input” gigs charge a lower per gig price, but they are charging for more gigabytes. 

Understand how a service provider’s pricing is structured and what it means when you are evaluating prices.  It’s always a good idea to ask a service provider to estimate total costs for a project to verify your understanding.

In the next blogs in this series, we’ll look at other things you should be looking at when selecting a vendor.

What has been your experience with service provider work?  Do you have good or bad experiences you can tell us about?  Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: Craig Ball of Craig D. Ball, P.C.


This is the ninth (and final) of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Craig Ball.  Craig is a prolific contributor to continuing legal and professional education programs throughout the United States, having delivered over 600 presentations and papers.  Craig’s articles on forensic technology and electronic discovery frequently appear in the national media, including in American Bar Association, ATLA and American Lawyer Media print and online publications.  He also writes a monthly column on computer forensics and e-discovery for Law Technology News called "Ball in your Court," honored as both the 2007 and 2008 Gold Medal honoree as “Best Regular Column” as awarded by Trade Association Business Publications International.  It’s also the 2009 Gold and 2007 Silver Medalist honoree of the American Society of Business Publication Editors as “Best Contributed Column” and their 2006 Silver Medalist honoree as “Best Feature Series” and “Best Contributed Column.””  The presentation, "PowerPersuasion: Craig Ball on PowerPoint," is consistently among the top rated continuing legal educational programs from coast-to-coast.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

Price compression is a major trend.  Consumers are very slowly waking up to the fact that they have been the “drunken sailors on leave” in terms of how they have approached eDiscovery and there have been many “vendors of the night” ready to roll them for their paychecks.  eDiscovery has been more like a third world market where vendors have said “let’s ask for some crazy number” and perhaps they’ll be foolish enough to pay it.  And, if they don’t pay that one, let’s hit them with a little lower number, mention sanctions, give them a copy of something from Judge Scheindlin or Judge Grimm and then try again.  Until finally, they are so dissolved in a pool of their own urine that they’re willing to pay an outrageous price.  Those days are coming to an end and smart vendors are going to be prepare to be able to demonstrate the value and complexity behind their offerings.

I am seeing people recognizing that the “gravy train” is over except for the most egregious challenging eDiscovery situations where numbers really have little meaning.  When you’re talking about tens of thousands of employees and petabytes of data, the numbers can get astronomical.  But, for the usual case, with a more manageable number of custodians and issues, people are waking up to the fact that we can’t keep reinventing this wheel of great expense, so clients are pushing for more rational approaches and a few forward thinking vendors are starting to put forward some products will allow you to quantify what your exposure is going to be in eDiscovery.  We’re just not going to see per GB processing prices that are going to be measured in the double and triple digits – that just can’t go, at least when you’re talking about the raw data on the input side.  So, I’m seeing some behind the firewall products, even desktop products, that are going to be able to allow lawyers and people with relatively little technical expertise to handle small and medium sized cases.  Some of the hosting services are putting together pricing where, though I haven’t really tested them in real world situations, are starting to sound rational and less frightening.

I’m continuing to see more fragmentation in the market and I would like to see more integrated products, but it’s still like packaging a rather motley crew of different pieces that don’t always fit together well at all.  You’ve got relatively new review tools, some strong players like Clearwell and stronger than they used to be players like Relativity.  You’ve got people “from down under” that are really changing the game like Nuix.  And, you’ve got some upstarts – products that we’ve really not yet heard of at all.  I’m seeing at this conference that any one of them has the potential of becoming an industry standard.  I’m seeing some real innovation, some real new code bases coming out and that is impressive to me because it just hadn’t been happening before, it’s been “old wine in new bottles” for several years.

I also see some new ideas in collection.  I think people are starting to embrace what George Socha would like for me to aptly call the left side of the EDRM.  A lot of people have turned their heads away from the ugly business of selecting data to process and the collection of it and forensic and chain of custody issues and would gather it up any way they liked and process it.  But, I think there are some new and very viable ways that companies are offering for self-collection, for tracking of collection, for desk side interviews, and for generation and management of legal holds.  We’re seeing a lot of things emerging on that front.  Most of what I see in the legal hold management space is just awful.  That doesn’t mean it’s all awful, but most of it is awful.  It’s a lot of marketing speak, a lot of industry jargon, wrapped around a very uncreative, somewhat impractical, set of tools.  The question really is, are these things really much better than a well designed spreadsheet?  Certainly, they’re more scalable, but some have a “rushed to market” feel to me and I think it’s going to take them some time to mature.  Everyone is jumping on this Pension Committee bandwagon that Judge Scheindlin created for us, and not everyone has brought their Sunday best.

As for social media, it is a big deal because, if you’re paying attention to what’s happening with the generation about to explode on the scene, they simply have marginalized email.  Just as we are starting to get our arms around email, it’s starting to move off center stage.  And, I think the most important contribution to eDiscovery in 2010 has occurred silently and with little fanfare and I’d like to make sure you mention it.  In November, Facebook, the most important social networking site on the planet, very quietly provided the ability for you to package and collect, for personal storage, the entire contents of your Facebook life, including your Wall, your messaging, and your Facemail.  For all of the pieces of your Facebook existence, you can simply click and receive it back in a Zip file.  The ability to preserve and, ultimately, reopen and process that data is the most forward thinking thing that has emerged from the social networking world since there has been a social networking world.  How wonderful that Facebook had the foresight to say “you know, it would be nice if we could give people their entire Facebook stuff in a neat package in a moment in time”.

None of the others have done that yet, but I think that Facebook is so important that it’s going to make that a standard.  It’s going to need to be in Google Apps, it’s going to need to be in Gmail.  If you’re going to live your life “in the cloud”, then you’re going to have to have a way to grab your life from the cloud and move it somewhere else.  Maybe their portability was a way to head off antitrust, for all I know.  Whatever their motivation, I don’t think that most lawyers know that there is essentially this one-click preservation of Facebook.  If a vendor did it, you would hear about it in the elevators here at the show.  Facebook did it for free, and without any fanfare, and it’s an important thing for you to get out there.  The vendor that comes out with a tool that processes these packages that emerge, especially if they announce it when the Oscars come out {laugh}, is well positioned.

So, yes, social networking is important because it means that a lot of things change, forensics change.  You’re just not going to be able to do media forensics anymore on cloud content.  The cloud is going to make eDiscovery simpler, and that’s the one thing I haven’t heard anybody say, because you’ll have less you’ll need to delete and it’s much more likely to be gone – really gone – when you delete it (no forensics needed).  Collection and review can be easier.  What would you rather search, Gmail or Outlook?  Not only can Outlook emails be in several places, but the quality of a Google-based search is better, even though it’s not built for eDiscovery.  If I’m going to stand up in court and say that “I searched all these keywords and I saw all of the communications related to these keywords”, I’d rather do it with the force of Google than with the historically “snake bitten” engine for search that’s been in Outlook.  We always say in eDiscovery that you don’t use Outlook as a review and search tool because we know it isn’t good.  So, we take the container files, PSTs and OSTs and we parse them in better tools.  I think we’ll be able to do it both ways. 

I foresee a day not long off when Google will allow either the repatriation of those collections for use in more powerful tools or will allow different types of searches to be run on the Gmail collections other than just Gmail search.  You may be able to do searches and collect from your own Gmail, to place a hold on that Gmail.  Right now, you’d have to collect it, tag it, move it to a folder – you have to do some gyrations.  I think it will mature and they may open their API, so that there can be add-on tools from the lab or from elsewhere that will allow people to hook into Gmail.  To a degree, you can do that right now, by paying an upgrade fee for Postini, where they can download a PST with your Gmail content.  The problem with that is that Gmail is structured data, you really need to see the threading that Gmail provides to really appreciate the conversation that is Gmail.  Whereas, if you pull it down to PST (except in the latest version of Outlook, which I think 2010 does a pretty good job of threading), I don’t know if that is replicated in the Postini PST.  I’ll have to test that.

Office 2010 is a trend, as well.  Outlook 2010 is the first Microsoft tool that is eDiscovery friendly, by design.  I think Exchange 2010 is going to make our lives easier in eDiscovery.  We’re going to have a lot more “deleted” information hang around in the Windows 7 environment and in the Outlook 2010 and Exchange 2010 environment.  Data is not going away until you jump through some serious hoops to make it go away.

I think the iPad is also going to have quite an impact.  At first, it will be smoke and mirrors, but before 2011 bids us goodbye, I think the iPad is going to find its way into some really practical, gestural interfaces for working with data in eDiscovery.  I’ve yet to see anything yet but a half-assed version of an app.  Everyone rushed out and you wanted some way to interface with your product, but they didn’t build a purpose-built app for the iPad to really take advantage of its strengths, to be able to gesturally move between screens.  I foresee a day where you’ll have a ring of designations around the screen and you’ll flip a document, like a privileged document, into the appropriate designation and it will light up or something so that you know it went into the correct bin – as if you were at a desk and you were moving paper to different parts of the desk.  Sometimes, I wonder why somebody hasn’t thought of this before.  I’ve done no metrics, I’ve done no ergonomic studies to know that the paper metaphor serves the task well.  But, my gut tells me that we need to teach lawyers to walk before they can run, to help them interact with data in a metaphor that they understand in a graphical user interface.  Point and click, drag and drop, pinch and stretch, which are three dimensional concepts translated into a two dimensional interface. The interface of the iPad is so intuitive that a three year old could figure it out.  Just like Windows Explorer impacted the design of so many applications (“it’s an Explorer-like interface”), the iPad will do the same.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  I think that the show felt well attended, upbeat, fresher that it has in two years.  I give the credit to the vendors showing up with some genuinely new products, instead of renamed, remarketed new products, although there’s still plenty of that.  There were so many announcements of new products before the show that you really wonder how new is this product?  But, there were some that really look like they were built from the ground up and that’s impressive.  There’s some money being spent on development again, and that’s positive.  The traffic was better, I’m glad we finally eliminated the loft area of the exhibit hall that would get so hot and uncomfortable.  I thought the traffic flow was very difficult in a positive way, which is to say that there were a lot of warm bodies out there, walking and talking and looking.

Henry Dicker and his team should be congratulated and I wouldn’t be surprised if they set a record over the past several years at this show.  The budgets were showing, money was freed up and that’s a positive for everyone in this industry.  Also, the quality of the questions being put forward in the educational tracks are head and shoulders better, more incisive and insightful and more advanced.  We’re starting to see the results of people working at the “201 level”, but we still don’t have enough technologists here, it’s still way too lawyer heavy.  This is the New York market, everybody is chasing after the Fortune 500, but everything has to be downward scalable too.  A good show.

What are you working on that you’d like our readers to know about?

The first week of June, I’m going to be teaching a technology for lawyers and litigation support professionals academy with an ultra all star cast of a very small, but dedicated faculty, including Michael Arkfeld, Judge Paul Grimm, Judge John Facciola, and others.  It’s called the eDiscovery Training Academy and will be held at the Georgetown Law School. It’s going to be rigorous, challenging, extremely technical and the hope is that the people emerge from that week genuinely equipped to talk the talk and walk the walk of productive 26(f) conferences and real interaction with IT personnel and records managers.  We’re going to start down at the surface of the magnetic media and we’re going to keep climbing until we can climb no further.

Thanks, Craig, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!