eDiscovery Case Law: No Sanctions for Scrubbing Computers Assumed to be Imaged


When scrubbing data from a computer drive related to litigation, it’s a good idea to make absolutely sure that there is another copy of that data, via backup or forensic image.  Don’t just take someone’s word for it.

In Federal Trade Commission v. First Universal Lending, LLC, No. 09-82322-CIV, (S.D. Fla. Feb. 17, 2011), the FTC investigated the defendants for their mortgage modification practices by alleging that defendants had violated the Federal Trade Commission Act and that defendants had acted in violation of the Telemarketing Sales Rule. For the duration of the investigation, the court appointed a temporary receiver who took control of defendants’ business premises.

During the discovery stage, the FTC wanted to preserve relevant data that was on defendants’ computers and servers by imaging them. When defendants’ were ask about the locations of all relevant computers and servers, they failed to reveal the location of servers with relevant data. As a result, these servers were not imaged and thus the data was not preserved. Due to misleading testimony by defendants, the receiver believed that all computers and servers had been imaged. Because of the incorrect belief that all of the relevant data had been preserved, the receiver permitted defendants to scrub the computers and sell them. It turned out that some of these were the ones that had not been imaged.

Defendants filed a motion to enjoin the prosecution and/or moved for dismissal of the case due to plaintiff’s spoliation of evidence. Defendants asserted that the FTC had either destroyed or caused to be destroyed computer evidence that would prove all of the defendants’ defenses.

The court found no basis for imposing sanctions against the FTC for the destruction of defendants’ computer system and denied defendants’ motion. The court established that it can impose an adverse inference against a party where the court finds that the party has engaged in spoliation of evidence. For this inference to be applicable there has to be a finding of bad faith. A court can make this finding through direct evidence or circumstantial evidence. If bad faith is based on circumstantial evidence, the following prerequisites must be present: (1) evidence once existed that could fairly be supposed to have been material to the proof or defense of a claim at issue in the case; (2) the spoliating party engaged in an affirmative act causing the evidence to be lost; (3) the spoliating party did so while it knew or should have known of its duty to preserve the evidence; and (4) the affirmative act causing the loss cannot be credibly explained as not involving bad faith by the reason proffered by the spoliator.

The court found that there was no direct evidence of bad faith. Further it pointed out that defendants failed to establish bad faith by circumstantial evidence, since the FTC had not destroyed the computer systems, but rather, the defendants did. The court went on to state, that even assuming, arguendo, that defendants destroyed the hard drives due to the receiver’s agent’s instruction, it did not change the fact that neither the receiver, nor the agent is the FTC.

Furthermore, the court went on that to the extent that defendants’ position could be construed to seek to attribute blame to the FTC for the receiver’s instruction to scrub the computers based on the FTCs misstatement, there was no malicious motive on the FTC’s investigator evident. This was at most negligent, and negligence is not sufficient for an adverse inference instruction as a sanction for spoliation.

Further, the defendants did not demonstrate that the absence of the missing data was fatal to their defense because it was established that alternative sources of information existed.

At last, the court emphasized that the FTC was under no obligation to preserve defendants’ evidence, especially considering the fact that the FTC never had control or dominion over the computers – the receiver did.

So, what do you think?  What are your procedures for ensuring data backup before destruction?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: eLessons Learned Blog.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Facebook’s Self-Collection Mechanism

One of the most enlightening revelations resulting from my interview with Craig Ball at LegalTech (published last Friday) was regarding a feature that he mentioned which Facebook added late last year that allows any user to download their information.  I thought it was such a significant bit of information that a post dedicated to the feature (in addition to the coverage in the interview) was warranted.

This feature is available via the Account Settings menu and enables users to collect their wall posts, friends lists, photos, videos, messaging, and any other personal content, save it into a Zip file and download the Zip file.  Craig also wrote about the feature in Law Technology News last month – that article is located here.

When you initiate the download, especially if you’re an active Facebook user, it may take Facebook a while to gather all information (several minutes or more, mine took about an hour).  Eventually, you’ll get an email to let you know that your information is packaged and ready for download.  Once you verify your identify by providing your password and click “Download Now”, you’ll get a Zip file containing a snapshot of your Facebook environment in a collection of HTML files with your Wall, Profile and other pages and copies of any content files (e.g., photos, videos, etc.) that you had uploaded.

Think about the significance of this for a moment.  Now, 500 million users of the most popular social network on the planet (which includes not just individuals, but organizations as well) have a mechanism to “self-collect” their data for their own use and safekeeping.  Or, they can “self-collect” for use in litigation.  In his article, Craig likens Facebook’s download function to Staples’ famous easy button.  How can an attorney argue an overly burdensome collection when you simply have to click a button?

With a social network behemoth like Facebook now offering this feature, will other social network and cloud solution providers soon follow?  Let’s hope so.  As Craig notes in his article, “maybe the cloud isn’t the eDiscovery headache some think”.  Spread the word!

So, what do you think?  Have you been involved in a case that could have benefited from a cloud-based self-collection tool?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Craig Ball of Craig D. Ball, P.C.


This is the ninth (and final) of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Craig Ball.  Craig is a prolific contributor to continuing legal and professional education programs throughout the United States, having delivered over 600 presentations and papers.  Craig’s articles on forensic technology and electronic discovery frequently appear in the national media, including in American Bar Association, ATLA and American Lawyer Media print and online publications.  He also writes a monthly column on computer forensics and e-discovery for Law Technology News called "Ball in your Court," honored as both the 2007 and 2008 Gold Medal honoree as “Best Regular Column” as awarded by Trade Association Business Publications International.  It’s also the 2009 Gold and 2007 Silver Medalist honoree of the American Society of Business Publication Editors as “Best Contributed Column” and their 2006 Silver Medalist honoree as “Best Feature Series” and “Best Contributed Column.””  The presentation, "PowerPersuasion: Craig Ball on PowerPoint," is consistently among the top rated continuing legal educational programs from coast-to-coast.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

Price compression is a major trend.  Consumers are very slowly waking up to the fact that they have been the “drunken sailors on leave” in terms of how they have approached eDiscovery and there have been many “vendors of the night” ready to roll them for their paychecks.  eDiscovery has been more like a third world market where vendors have said “let’s ask for some crazy number” and perhaps they’ll be foolish enough to pay it.  And, if they don’t pay that one, let’s hit them with a little lower number, mention sanctions, give them a copy of something from Judge Scheindlin or Judge Grimm and then try again.  Until finally, they are so dissolved in a pool of their own urine that they’re willing to pay an outrageous price.  Those days are coming to an end and smart vendors are going to be prepare to be able to demonstrate the value and complexity behind their offerings.

I am seeing people recognizing that the “gravy train” is over except for the most egregious challenging eDiscovery situations where numbers really have little meaning.  When you’re talking about tens of thousands of employees and petabytes of data, the numbers can get astronomical.  But, for the usual case, with a more manageable number of custodians and issues, people are waking up to the fact that we can’t keep reinventing this wheel of great expense, so clients are pushing for more rational approaches and a few forward thinking vendors are starting to put forward some products will allow you to quantify what your exposure is going to be in eDiscovery.  We’re just not going to see per GB processing prices that are going to be measured in the double and triple digits – that just can’t go, at least when you’re talking about the raw data on the input side.  So, I’m seeing some behind the firewall products, even desktop products, that are going to be able to allow lawyers and people with relatively little technical expertise to handle small and medium sized cases.  Some of the hosting services are putting together pricing where, though I haven’t really tested them in real world situations, are starting to sound rational and less frightening.

I’m continuing to see more fragmentation in the market and I would like to see more integrated products, but it’s still like packaging a rather motley crew of different pieces that don’t always fit together well at all.  You’ve got relatively new review tools, some strong players like Clearwell and stronger than they used to be players like Relativity.  You’ve got people “from down under” that are really changing the game like Nuix.  And, you’ve got some upstarts – products that we’ve really not yet heard of at all.  I’m seeing at this conference that any one of them has the potential of becoming an industry standard.  I’m seeing some real innovation, some real new code bases coming out and that is impressive to me because it just hadn’t been happening before, it’s been “old wine in new bottles” for several years.

I also see some new ideas in collection.  I think people are starting to embrace what George Socha would like for me to aptly call the left side of the EDRM.  A lot of people have turned their heads away from the ugly business of selecting data to process and the collection of it and forensic and chain of custody issues and would gather it up any way they liked and process it.  But, I think there are some new and very viable ways that companies are offering for self-collection, for tracking of collection, for desk side interviews, and for generation and management of legal holds.  We’re seeing a lot of things emerging on that front.  Most of what I see in the legal hold management space is just awful.  That doesn’t mean it’s all awful, but most of it is awful.  It’s a lot of marketing speak, a lot of industry jargon, wrapped around a very uncreative, somewhat impractical, set of tools.  The question really is, are these things really much better than a well designed spreadsheet?  Certainly, they’re more scalable, but some have a “rushed to market” feel to me and I think it’s going to take them some time to mature.  Everyone is jumping on this Pension Committee bandwagon that Judge Scheindlin created for us, and not everyone has brought their Sunday best.

As for social media, it is a big deal because, if you’re paying attention to what’s happening with the generation about to explode on the scene, they simply have marginalized email.  Just as we are starting to get our arms around email, it’s starting to move off center stage.  And, I think the most important contribution to eDiscovery in 2010 has occurred silently and with little fanfare and I’d like to make sure you mention it.  In November, Facebook, the most important social networking site on the planet, very quietly provided the ability for you to package and collect, for personal storage, the entire contents of your Facebook life, including your Wall, your messaging, and your Facemail.  For all of the pieces of your Facebook existence, you can simply click and receive it back in a Zip file.  The ability to preserve and, ultimately, reopen and process that data is the most forward thinking thing that has emerged from the social networking world since there has been a social networking world.  How wonderful that Facebook had the foresight to say “you know, it would be nice if we could give people their entire Facebook stuff in a neat package in a moment in time”.

None of the others have done that yet, but I think that Facebook is so important that it’s going to make that a standard.  It’s going to need to be in Google Apps, it’s going to need to be in Gmail.  If you’re going to live your life “in the cloud”, then you’re going to have to have a way to grab your life from the cloud and move it somewhere else.  Maybe their portability was a way to head off antitrust, for all I know.  Whatever their motivation, I don’t think that most lawyers know that there is essentially this one-click preservation of Facebook.  If a vendor did it, you would hear about it in the elevators here at the show.  Facebook did it for free, and without any fanfare, and it’s an important thing for you to get out there.  The vendor that comes out with a tool that processes these packages that emerge, especially if they announce it when the Oscars come out {laugh}, is well positioned.

So, yes, social networking is important because it means that a lot of things change, forensics change.  You’re just not going to be able to do media forensics anymore on cloud content.  The cloud is going to make eDiscovery simpler, and that’s the one thing I haven’t heard anybody say, because you’ll have less you’ll need to delete and it’s much more likely to be gone – really gone – when you delete it (no forensics needed).  Collection and review can be easier.  What would you rather search, Gmail or Outlook?  Not only can Outlook emails be in several places, but the quality of a Google-based search is better, even though it’s not built for eDiscovery.  If I’m going to stand up in court and say that “I searched all these keywords and I saw all of the communications related to these keywords”, I’d rather do it with the force of Google than with the historically “snake bitten” engine for search that’s been in Outlook.  We always say in eDiscovery that you don’t use Outlook as a review and search tool because we know it isn’t good.  So, we take the container files, PSTs and OSTs and we parse them in better tools.  I think we’ll be able to do it both ways. 

I foresee a day not long off when Google will allow either the repatriation of those collections for use in more powerful tools or will allow different types of searches to be run on the Gmail collections other than just Gmail search.  You may be able to do searches and collect from your own Gmail, to place a hold on that Gmail.  Right now, you’d have to collect it, tag it, move it to a folder – you have to do some gyrations.  I think it will mature and they may open their API, so that there can be add-on tools from the lab or from elsewhere that will allow people to hook into Gmail.  To a degree, you can do that right now, by paying an upgrade fee for Postini, where they can download a PST with your Gmail content.  The problem with that is that Gmail is structured data, you really need to see the threading that Gmail provides to really appreciate the conversation that is Gmail.  Whereas, if you pull it down to PST (except in the latest version of Outlook, which I think 2010 does a pretty good job of threading), I don’t know if that is replicated in the Postini PST.  I’ll have to test that.

Office 2010 is a trend, as well.  Outlook 2010 is the first Microsoft tool that is eDiscovery friendly, by design.  I think Exchange 2010 is going to make our lives easier in eDiscovery.  We’re going to have a lot more “deleted” information hang around in the Windows 7 environment and in the Outlook 2010 and Exchange 2010 environment.  Data is not going away until you jump through some serious hoops to make it go away.

I think the iPad is also going to have quite an impact.  At first, it will be smoke and mirrors, but before 2011 bids us goodbye, I think the iPad is going to find its way into some really practical, gestural interfaces for working with data in eDiscovery.  I’ve yet to see anything yet but a half-assed version of an app.  Everyone rushed out and you wanted some way to interface with your product, but they didn’t build a purpose-built app for the iPad to really take advantage of its strengths, to be able to gesturally move between screens.  I foresee a day where you’ll have a ring of designations around the screen and you’ll flip a document, like a privileged document, into the appropriate designation and it will light up or something so that you know it went into the correct bin – as if you were at a desk and you were moving paper to different parts of the desk.  Sometimes, I wonder why somebody hasn’t thought of this before.  I’ve done no metrics, I’ve done no ergonomic studies to know that the paper metaphor serves the task well.  But, my gut tells me that we need to teach lawyers to walk before they can run, to help them interact with data in a metaphor that they understand in a graphical user interface.  Point and click, drag and drop, pinch and stretch, which are three dimensional concepts translated into a two dimensional interface. The interface of the iPad is so intuitive that a three year old could figure it out.  Just like Windows Explorer impacted the design of so many applications (“it’s an Explorer-like interface”), the iPad will do the same.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  I think that the show felt well attended, upbeat, fresher that it has in two years.  I give the credit to the vendors showing up with some genuinely new products, instead of renamed, remarketed new products, although there’s still plenty of that.  There were so many announcements of new products before the show that you really wonder how new is this product?  But, there were some that really look like they were built from the ground up and that’s impressive.  There’s some money being spent on development again, and that’s positive.  The traffic was better, I’m glad we finally eliminated the loft area of the exhibit hall that would get so hot and uncomfortable.  I thought the traffic flow was very difficult in a positive way, which is to say that there were a lot of warm bodies out there, walking and talking and looking.

Henry Dicker and his team should be congratulated and I wouldn’t be surprised if they set a record over the past several years at this show.  The budgets were showing, money was freed up and that’s a positive for everyone in this industry.  Also, the quality of the questions being put forward in the educational tracks are head and shoulders better, more incisive and insightful and more advanced.  We’re starting to see the results of people working at the “201 level”, but we still don’t have enough technologists here, it’s still way too lawyer heavy.  This is the New York market, everybody is chasing after the Fortune 500, but everything has to be downward scalable too.  A good show.

What are you working on that you’d like our readers to know about?

The first week of June, I’m going to be teaching a technology for lawyers and litigation support professionals academy with an ultra all star cast of a very small, but dedicated faculty, including Michael Arkfeld, Judge Paul Grimm, Judge John Facciola, and others.  It’s called the eDiscovery Training Academy and will be held at the Georgetown Law School. It’s going to be rigorous, challenging, extremely technical and the hope is that the people emerge from that week genuinely equipped to talk the talk and walk the walk of productive 26(f) conferences and real interaction with IT personnel and records managers.  We’re going to start down at the surface of the magnetic media and we’re going to keep climbing until we can climb no further.

Thanks, Craig, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: George Socha of Socha Consulting


This is the seventh of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha.  A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey.  In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are eight active projects with over 300 members from 81 participating organizations. George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

On the very “flip” side, the number one trend to date in 2011 is predictions about trends in 2011.  They are part of a consistent and long-term pattern, which is that many of these trend predictions are not trend predictions at all – they are marketing material and the prediction is “you will buy my product or service in the coming year”.

That said, there are a couple of things of note.  Since I understand you talked to Tom about Apersee, it’s worth noting that corporations are struggling with working through a list of providers to find out who provides what services.  You would figure that there is somewhere in the range of 500 or so total providers.  But, my ever-growing list, which includes both external and law firm providers, is at more than 1,200.  Of course, some of those are probably not around anymore, but I am confident that there are at least 200-300 that I do not yet have on the list.  My guess when the list shakes out is that there are roughly 1,100 active providers out there today.  If you look at information from the National Center for State Courts and the Federal Judicial Center, you’ll see that there are about 11 million new lawsuits filed every year.  I saw an article in the Cornell Law Forum a week or two ago which indicated that there are roughly 1.1 million lawyers in the country.  So, there are 11 million lawsuits, 1.1 million lawyers and 1,100 providers.  Most of those lawyers have no experience with eDiscovery and most of those lawsuits have no provider involved, which means eDiscovery is still very much an emerging market, not even close to being a mature market.  As fast as providers disappear, through attrition or acquisition, new providers enter the market to take their place.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  Maybe this is overly optimistic, but part of what I’m seeing in leading up to the conference, on various web sites and at the conference itself, is that a series of incremental changes taking place over a long period are finally leading to some radical differences.  One of those differences is that we finally are reaching a point where a number of providers can make the claim to being “end-to-end providers” with some legitimacy.  For as long as we’ve had the EDRM model, we’ve had providers that have professed to cover the full EDRM landscape, by which they generally have meant Identification through Production.  A growing number of providers not only cover that portion of the EDRM spectrum but have some ability to address Information Management, Presentation, or both   By and large, those providers are getting there by building their software and services based on experience and learning over the past 8 to 10 to 12 years, introducing new offerings at the show that reflect that learned experience.

A couple of days ago, I only half-jokingly issued “the Dyson challenge” (as in the Dyson vacuum cleaner).  Every year, come January, our living room carpet is strewn with pine tree needles and none of the vacuum cleaners that we have ever had have done a good job of picking up those needles.  The Dyson vacuum cleaner claims it cyclones capture more dirt than anything, but I was convinced that could not include those needles.  Nonetheless I tried, and to my surprise it worked like a charm!  I want to see the providers offering products able to perform at that high level, not just meeting but exceeding expectations.

I also see a feeling of excitement and optimism that wasn’t apparent at last year’s show.

What are you working on that you’d like our readers to know about?

As I mentioned, we have launched the Apersee web site, designed to allow consumers to find providers and products that fit their specific needs.  The site is in beta and the link is live.  It’s in beta because we’re still working on features to make it as useful as possible to customers and providers.  We’re hoping it’s a question of weeks, not months, before those features are implemented.  Once we go fully live, we will go two months with the system “wide open” – where every consumer can see all the provider and product information that any provider has put in the system.  After that, consumers will be able to see full provider and product profiles for providers who have purchased blocks of views.  Even if a provider does not purchase views, all selection criteria it enters are searchable, but search results will display only the provider’s name and website name.  Providers will be able to get stats on queries and how many times their information is viewed, but not detailed information as to which customers are connecting and performing the queries.

As for EDRM, we continue to make progress with an array of projects and a growing number of collaborative efforts, such as the work the Data Set group has down with TREC Legal and the work the Metrics group has done with the LEDES Committee. We not only want to see membership continue to grow, but we also want to continue to push for more active participation to continue to make progress in the various working groups.  We’ve just met at the show here regarding the EDRM Testing pilot project to address testing standards.  There are very few guidelines for testing of electronic discovery software and services, so the Testing project will become a full EDRM project as of the EDRM annual meeting this May to begin to address the need for those guidelines.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Jim McGann of Index Engines


This is the third of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Jim McGann.  Jim is Vice President of Information Discovery at Index Engines.  Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. He has worked for leading software firms, including Information Builders and the French-based engineering software provider Dassault Systemes.  In recent years he has worked for technology-based start-ups that provided financial services and information management solutions.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

What we’re seeing is that companies are becoming a bit more proactive.  Over the past few years we’ve seen companies that have simply been reacting to litigation and it’s been a very painful process because ESI collection has been a “fire drill” – a very last minute operation.  Not because lawyers have waited and waited, but because the data collection process has been slow, complex and overly expensive.  But things are changing. Companies are seeing that eDiscovery is here to stay, ESI collection is not going away and the argument of saying that it’s too complex or expensive for us to collect is not holding water. So, companies are starting to take a proactive stance on ESI collection and understanding their data assets proactively.  We’re talking to companies that are not specifically responding to litigation; instead, they’re building a defensible policy that they can apply to their data sources and make data available on demand as needed.    

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY}  Well, in walking the floor as people were setting up, you saw a lot of early case assessment last year; this year you’re seeing a lot of information governance..  That’s showing that eDiscovery is really rolling into the records management/information governance area.  On the CIO and General Counsel level, information governance is getting a lot of exposure and there’s a lot of technology that can solve the problems.  Litigation support’s role will be to help the executives understand the available technology and how it applies to information governance and records management initiatives.  You’ll see more information governance messaging, which is really a higher level records management message.

As for other trends, one that I’ll tie Index Engines into is ESI collection and pricing.  Per GB pricing is going down as the volume of data is going up.  Years ago, prices were a thousand per GB, then hundreds of dollars per GB, etc.  Now the cost is close to tens of dollars per GB. To really manage large volumes of data more cost-effectively, the collection price had to become more affordable.  Because Index Engines can make data on backup tapes searchable very cost-effectively, for as little as $50 per tape, data on tape has become  as easy to access and search as online data. Perhaps even easier because it’s not on a live network.  Backup tapes have a bad reputation because people think of them as complex or expensive, but if you take away the complexity and expense (which is what Index Engines has done), then they really become “full point-in-time” snapshots.  So, if you have litigation from a specific date range, you can request that data snapshot (which is a tape) and perform discovery on it.  Tape is really a natural litigation hold when you think about it, and there is no need to perform the hold retroactively.

So, what does the ease of which the information can be indexed from tape do to address the inaccessible argument for tape retrieval?  That argument has been eroding over the years, thanks to technology like ours.  And, you see decisions from judges like Judge Scheindlin saying “if you cannot find data in your primary network, go to your backup tapes”, indicating that they consider backup tapes in the next source right after online networks.  You also see people like Craig Ball writing that backup tapes may be the most convenient and cost-effective way to get access to data.  If you had a choice between doing a “server crawl” in a corporate environment or just asking for a backup tape of that time frame, tape is the much more convenient and less disruptive option.  So, if your opponent goes to the judge and says it’s going to take millions of dollars to get the information off of twenty tapes, you must know enough to be in front of a judge and say “that’s not accurate”.  Those are old numbers.  There are court cases where parties have been instructed to use tapes as a cost-effective means of getting to the data.  Technology removes the inaccessible argument by making it easier, faster and cheaper to retrieve data from backup tapes.

The erosion of the accessibility burden is sparking the information governance initiatives. We’re seeing companies come to us for legacy data remediation or management projects, basically getting rid of old tapes. They are saying “if I’ve got ten years of backup tapes sitting in offsite storage, I need to manage that proactively and address any liability that’s there” (that they may not even be aware exists).  These projects reflect a proactive focus towards information governance by remediating those tapes and getting rid of data they don’t need.  Ninety-eight percent of the data on old tapes is not going to be relevant to any case.  The remaining two percent can be found and put into the company’s litigation hold system, and then they can get rid of the tapes.

How do incremental backups play into that?  Tapes are very incremental and repetitive.  If you’re backing up the same data over and over again, you may have 50+ copies of the same email.  Index Engines technology automatically gets rid of system files and applies a standard MD5Hash to dedupe.  Also, by using tape cataloguing, you can read the header and say “we have a Saturday full backup and five incremental during the week, then another Saturday full backup”. You can ignore the incremental tapes and just go after the full backups.  That’s a significant percent of the tapes you can ignore.

What are you working on that you’d like our readers to know about?

Index Engines just announced today a partnership with LeClairRyan. This partnership combines legal expertise for data retention with the technology that makes applying the policy to legacy data possible.   For companies that want to build policy for the retention of legacy data and implement the tape remediation process we have advisors like LeClairRyan that can provide legacy data consultation and oversight.  By proactively managing the potential liability  of legacy data, you are also saving the IT costs to explore that data.

Index Engines  also just announced a new cloud-based tape load service that will provide full identification, search and access to tape data for eDiscovery. The Look & Learn service, starting at $50 per tape, will provide clients with full access to the index of their tape data without the need to install any hardware or software. Customers will be able to search the index and gather knowledge about content, custodians, email and metadata all via cloud access to the Index Engines interface, making discovery of data from tapes even more convenient and affordable.

Thanks, Jim, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Alon Israely, Esq., CISSP of BIA


This is the second of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Alon Israely.  Alon is a Senior Advisor in BIA’s Advisory Services group and when he’s not advising clients on e-discovery issues he works closely with BIA’s product development group for its core technology products.  Alon has over fifteen years of experience in a variety of advanced computing-related technologies and has consulted with law firms and their clients on a variety of technology issues, including expert witness services related to computer forensics, digital evidence management and data security.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

I think one of the important trends for corporate clients and law firms is cost control, whether it’s trying to minimize the amount of project management hours that are being billed or the manner in which the engagement is facilitated.  I’m not suggesting going full-bore necessarily, but taking baby steps to help control costs is a good approach.  I don’t think it’s only about bringing prices down, because I think that the industry in general has been able to do that naturally well.  But, I definitely see a new focus on the manner in which costs are managed and outsourced.  So, very specifically, scoping correctly is key, making sure you’re using the right tool for the right job, keeping efficiencies (whether that’s on the vendor side or the client side) by doing things such as not having five phone calls for a meeting to figure out what the key words are for field searching or just going out and imaging every drive before deciding what’s really needed. Bringing simple efficiencies to the mechanics of doing e-discovery saves tons of money in unnecessary legal, vendor and project management fees.  You can do things that are about creating efficiencies, but are not necessarily changing the process or changing the pricing.

I also see trends in technology, using more focused tools and different tools to facilitate a single project.  Historically, parties would hire three or four different vendors for a single project, but today it may be just one or two vendors or maybe even no vendors, (just the law firm) but, it’s the use of the right technologies for the right situations – maybe not just one piece of software, but leveraging several for different parts of the process.  Overall, I foresee fewer vendors per project, but more vendors increasing their stable of tools.  So, whereas a vendor may have had a review tool and one way of doing collection, now they may have two or three review tools, including an ECA tool, and one or two ways of doing collections. They have a toolkit from which they can choose the best set of tools to bring to the engagement.  Because they have more tools to market, vendors can have the right tool in-their-back-pocket whereas before the tool belonged to just one service provider so you bought from them, or you just didn’t have it.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY} I think you have either a little or a lot of – depending on how aggressive I want to be with my opinion – that there seems to be a disconnect between what they’re speaking about in the panels and what we’re seeing on the floor.  But, I think that’s OK in that the conference itself, is usually a little bit ahead of the curve with respect to topics, and the technology will catch up.  You have topics such as predictive coding and social networking related issues – those are two big ones that you’ll see.  I think, for example, there are very few companies that have a solution for social networking, though we happen to have one.  And, predictive coding is the same scenario.  You have a lot of providers that talk about it, but you have a handful that actually do it, and you have probably even fewer than that who do it right.  I think that next year you’ll see many predictive coding solutions and technologies and many more tools that have that capability built into them.  So, on the conference side, there is one level of information and on the floor side, a different level.

What are you working on that you’d like our readers to know about?

BIA has a new product called, the industry’s first SaaS (software-as-a-service), on-demand collection technology that provides defensible collections.  We just rolled it out, we’re introducing it here at LegalTech and we’re starting a technology preview and signing up people who want to use the application or try it.  It’s specifically for attorneys, corporations, service providers – anyone who’s in the business and needs a tool for defensible data collection performed with agility (always hard to balance) – so without having to buy software or have expert training, users simply login or register and can start immediately.  You don’t have to worry about the traditional business processes to get things set up and started.  Which, if you think about it on the collections side of e-discovery it means that  the client’s CEO or VP of Marketing can call you up and say “I’m leaving, I have my PST here, can you just come get it?” and you can facilitate that process through the web, download an application, walk through a wizard, collect it defensibly, encrypt it and then deliver a filtered set, as needed, for review..

The tool is designed to collect defensibly and to move the collected data – or some subset of that data –to delivery, from there you would select your review tool of choice and we hand it off to the selected review tool.  So, we’re not trying to be everything, we’re focused on automating the left side of the EDRM.  We have loads to certain tools, having been a service provider for ten years, and we’re connecting with partners so that we can do the handoff, so when the client says “I’m ready to deliver my data”, they can choose OnDemand or Concordance or another review tool, and then either directly send it or the client can download and ship it.  We’re not trying to be a review tool and not trying to be an ECA tool that helps you find the needle in the haystack; instead, we’re focused on collecting the data, normalizing it, cataloguing it and handing if off for the attorneys to do their work.

Thanks, Alon, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Database Discovery Pop Quiz ANSWERS


So, how did you do?  Did you know all the answers from Friday’s post – without “googling” them?  😉

Here are the answers – enjoy!

What is a “Primary Key”? The primary key of a relational table uniquely identifies each record in the table. It can be a normal attribute that you expect to be unique (e.g., Social Security Number); however, it’s usually best to be a sequential ID generated by the Database Management System (DBMS).

What is an “Inner Join” and how does it differ from an “Outer Join”?  An inner join is the most common join operation used in applications, creating a new result table by combining column values of two tables.  An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record in one of the tables – even if no other matching record exists.  Sometimes, there is a reason to keep all of the records in one table in your result, such as a list of all employees, whether or not they participate in the company’s benefits program.

What is “Normalization”?  Normalization is the process of organizing data to minimize redundancy of that data. Normalization involves organizing a database into multiple tables and defining relationships between the tables.

How does a “View” differ from a “Table”?  A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is a query stored as an object.

What does “BLOB” stand for?  A Binary Large OBject (BLOB) is a collection of binary data stored as a single entity in a database management system. BLOBs are typically images or other multimedia objects, though sometimes binary executable code is stored as a blob.  So, if you’re not including databases in your discovery collection process, you could also be missing documents stored as BLOBs.  BTW, if you didn’t click on the link next to the BLOB question in Friday’s blog, it takes you to the amusing trailer for the 1958 movie, The Blob, starring a young Steve McQueen (so early in his career, he was billed as “Steven McQueen”).

What is the different between a “flat file” and a “relational” database?  A flat file database is a database designed around a single table, like a spreadsheet. The flat file design puts all database information in one table, or list, with fields to represent all parameters. A flat file is prone to considerable duplicate data, as each value is repeated for each item.  A relational database, on the other hand, incorporates multiple tables with methods (such as normalization and inner and outer joins, defined above) to store data efficiently and minimize duplication.

What is a “Trigger”?  A trigger is a procedure which is automatically executed in response to certain events in a database and is typically used for keeping the integrity of the information in the database. For example, when a new record (for a new employee) is added to the employees table, a trigger might create new records in the taxes, vacations, and salaries tables.

What is “Rollback”?  A rollback is the undoing of partly completed database changes when a database transaction is determined to have failed, thus returning the database to its previous state before the transaction began.  Rollbacks help ensure database integrity by enabling the database to be restored to a clean copy after erroneous operations are performed or database server crashes occur.

What is “Referential Integrity”?  Referential integrity ensures that relationships between tables remain consistent. When one table has a foreign key to another table, referential integrity ensures that a record is not added to the table that contains the foreign key unless there is a corresponding record in the linked table. Many databases use cascading updates and cascading deletes to ensure that changes made to the linked table are reflected in the primary table.

Why is a “Cartesian Product” in SQL almost always a bad thing?  A Cartesian Product occurs in SQL when a join condition (via a WHERE clause in a SQL statement) is omitted, causing all combinations of records from two or more tables to be displayed.  For example, when you go to the Department of Motor Vehicles (DMV) to pay your vehicle registration, they use a database with an Owners and a Vehicles table joined together to determine for which vehicle(s) you need to pay taxes.  Without that join condition, you would have a Cartesian Product and every vehicle in the state would show up as registered to you – that’s a lot of taxes to pay!

If you didn’t know the answers to most of these questions, you’re not alone.  But, to effectively provide the information within a database responsive to an eDiscovery request, knowledge of databases at this level is often necessary to collect and produce the appropriate information.    As Craig Ball noted in his article Ubiquitous Databases, “Get the geeks together, and get out of their way”.  Hey, I resemble that remark!

So, what do you think?  Did you learn anything?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Database Discovery Pop Quiz


Databases: You can’t live with them, you can’t live without them.

Or so it seems in eDiscovery.  On a regular basis, I’ve seen various articles and discussions related to discovery of databases and other structured data and I remain very surprised how few legal teams understand database discovery and know how to handle it.  A colleague of mine (who I’ve known over the years to be honest and reliable) even claimed to me a few months back while working for a nationally known eDiscovery provider that their collection procedures actually excluded database files.

Last month, had an article written by Craig Ball, called Ubiquitous Databases, which provided a lot of good information about database discovery. It included various examples how databases touch our lives every day, while noting that eDiscovery is still ultra document-centric, even when those “documents” are generated from databases.  There is some really good information in that article about Database Management Software (DBMS), Structured Query Language (SQL), Entity Relationship Diagrams (ERDs) and how they are used to manage, access and understand the information contained in databases.  It’s a really good article especially for database novices who need to understand more about databases and how they “tick”.

But, maybe you already know all you need to know about databases?  Maybe you would already be ready to address eDiscovery on your databases today?

Having worked with databases for over 20 years (I stopped counting at 20), I know a few things about databases.  So, here is a brief “pop” quiz on database concepts.  Call them “Database 101” questions.  See how many you can answer!

  • What is a “Primary Key”? (hint: it is not what you start the car with)
  • What is an “Inner Join” and how does it differ from an “Outer Join”?
  • What is “Normalization”?
  • How does a “View” differ from a “Table”?
  • What does “BLOB” stand for? (hint: it’s not this)
  • What is the different between a “flat file” and a “relational” database?
  • What is a “Trigger”?
  • What is “Rollback”? (hint: it has nothing to do with Wal-Mart prices)
  • What is “Referential Integrity”?
  • Why is a “Cartesian Product” in SQL almost always a bad thing?

So, what do you think?  Are you a database guru or a database novice?  Please share any comments you might have or if you’d like to know more about a particular topic.

Did you think I was going to provide the answers at the bottom?  No cheating!!  I’ll answer the questions on Monday.  Hope you can stand it!!

eDiscovery Trends: 2011 Predictions — By The Numbers


Comedian Nick Bakay”>Nick Bakay always ends his Tale of the Tape skits where he compares everything from Married vs. Single to Divas vs. Hot Dogs with the phrase “It's all so simple when you break things down scientifically.”

The late December/early January time frame is always when various people in eDiscovery make their annual predictions as to what trends to expect in the coming year.  We’ll have some of our own in the next few days (hey, the longer we wait, the more likely we are to be right!).  However, before stating those predictions, I thought we would take a look at other predictions and see if we can spot some common trends among those, “googling” for 2011 eDiscovery predictions, and organized the predictions into common themes.  I found serious predictions here, here, here, here and here.  Oh, also here and here.

A couple of quick comments: 1) I had NO IDEA how many times that predictions are re-posted by other sites, so it took some work to isolate each unique set of predictions.  I even found two sets of predictions from ZL Technologies, one with twelve predictions and another with seven, so I had to pick one set and I chose the one with seven (sorry, eWEEK!). If I have failed to accurately attribute the original source for a set of predictions, please feel free to comment.  2) This is probably not an exhaustive list of predictions (I have other duties in my “day job”, so I couldn’t search forever), so I apologize if I’ve left anybody’s published predictions out.  Again, feel free to comment if you’re aware of other predictions.

Here are some of the common themes:

  • Cloud and SaaS Computing: Six out of seven “prognosticators” indicated that adoption of Software as a Service (SaaS) “cloud” solutions will continue to increase, which will become increasingly relevant in eDiscovery.  No surprise here, given last year’s IDC forecast for SaaS growth and many articles addressing the subject, including a few posts right here on this blog.
  • Collaboration/Integration: Six out of seven “augurs” also had predictions related to various themes associated with collaboration (more collaboration tools, greater legal/IT coordination, etc.) and integration (greater focus by software vendors on data exchange with other systems, etc.).  Two people specifically noted an expectation of greater eDiscovery integration within organization governance, risk management and compliance (GRC) processes.
  • In-House Discovery: Five “pundits” forecasted eDiscovery functions and software will continue to be brought in-house, especially on the “left-side of the EDRM model” (Information Management).
  • Diverse Data Sources: Three “soothsayers” presaged that sources of data will continue to be more diverse, which shouldn’t be a surprise to anyone, given the popularity of gadgets and the rise of social media.
  • Social Media: Speaking of social media, three “prophets” (yes, I’ve been consulting my thesaurus!) expect social media to continue to be a big area to be addressed for eDiscovery.
  • End to End Discovery: Three “psychics” also predicted that there will continue to be more single-source end-to-end eDiscovery offerings in the marketplace.

The “others receiving votes” category (two predicting each of these) included maturing and acceptance of automated review (including predictive coding), early case assessment moving toward the Information Management stage, consolidation within the eDiscovery industry, more focus on proportionality, maturing of global eDiscovery and predictive/disruptive pricing.

Predictive/disruptive pricing (via Kriss Wilson of Superior Document Services and Charles Skamser of eDiscovery Solutions Group respective blogs) is a particularly intriguing prediction to me because data volumes are continuing to grow at an astronomical rate, so greater volumes lead to greater costs.  Creativity will be key in how companies deal with the larger volumes effectively, and pressures will become greater for providers (even, dare I say, review attorneys) to price their services more creatively.

Another interesting prediction (via ZL Technologies) is that “Discovery of Databases and other Structured Data will Increase”, which is something I’ve expected to see for some time.  I hope this is finally the year for that.

Finally, I said that I found serious predictions and analyzed them; however, there are a couple of not-so-serious sets of predictions here and here.  My favorite prediction is from The Posse List, as follows: “LegalTech…renames itself “EDiscoveryTech” after survey reveals that of the 422 vendors present, 419 do e-discovery, and the other 3 are Hyundai HotWheels, Speedway Racers and Convert-A-Van who thought they were at the Javits Auto Show.”

So, what do you think?  Care to offer your own “hunches” from your crystal ball?  Please share any comments you might have or if you’d like to know more about a particular topic.