eDiscovery Trends: Craig Ball of Craig D. Ball, P.C.


This is the ninth (and final) of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Craig Ball.  Craig is a prolific contributor to continuing legal and professional education programs throughout the United States, having delivered over 600 presentations and papers.  Craig’s articles on forensic technology and electronic discovery frequently appear in the national media, including in American Bar Association, ATLA and American Lawyer Media print and online publications.  He also writes a monthly column on computer forensics and e-discovery for Law Technology News called "Ball in your Court," honored as both the 2007 and 2008 Gold Medal honoree as “Best Regular Column” as awarded by Trade Association Business Publications International.  It’s also the 2009 Gold and 2007 Silver Medalist honoree of the American Society of Business Publication Editors as “Best Contributed Column” and their 2006 Silver Medalist honoree as “Best Feature Series” and “Best Contributed Column.””  The presentation, "PowerPersuasion: Craig Ball on PowerPoint," is consistently among the top rated continuing legal educational programs from coast-to-coast.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

Price compression is a major trend.  Consumers are very slowly waking up to the fact that they have been the “drunken sailors on leave” in terms of how they have approached eDiscovery and there have been many “vendors of the night” ready to roll them for their paychecks.  eDiscovery has been more like a third world market where vendors have said “let’s ask for some crazy number” and perhaps they’ll be foolish enough to pay it.  And, if they don’t pay that one, let’s hit them with a little lower number, mention sanctions, give them a copy of something from Judge Scheindlin or Judge Grimm and then try again.  Until finally, they are so dissolved in a pool of their own urine that they’re willing to pay an outrageous price.  Those days are coming to an end and smart vendors are going to be prepare to be able to demonstrate the value and complexity behind their offerings.

I am seeing people recognizing that the “gravy train” is over except for the most egregious challenging eDiscovery situations where numbers really have little meaning.  When you’re talking about tens of thousands of employees and petabytes of data, the numbers can get astronomical.  But, for the usual case, with a more manageable number of custodians and issues, people are waking up to the fact that we can’t keep reinventing this wheel of great expense, so clients are pushing for more rational approaches and a few forward thinking vendors are starting to put forward some products will allow you to quantify what your exposure is going to be in eDiscovery.  We’re just not going to see per GB processing prices that are going to be measured in the double and triple digits – that just can’t go, at least when you’re talking about the raw data on the input side.  So, I’m seeing some behind the firewall products, even desktop products, that are going to be able to allow lawyers and people with relatively little technical expertise to handle small and medium sized cases.  Some of the hosting services are putting together pricing where, though I haven’t really tested them in real world situations, are starting to sound rational and less frightening.

I’m continuing to see more fragmentation in the market and I would like to see more integrated products, but it’s still like packaging a rather motley crew of different pieces that don’t always fit together well at all.  You’ve got relatively new review tools, some strong players like Clearwell and stronger than they used to be players like Relativity.  You’ve got people “from down under” that are really changing the game like Nuix.  And, you’ve got some upstarts – products that we’ve really not yet heard of at all.  I’m seeing at this conference that any one of them has the potential of becoming an industry standard.  I’m seeing some real innovation, some real new code bases coming out and that is impressive to me because it just hadn’t been happening before, it’s been “old wine in new bottles” for several years.

I also see some new ideas in collection.  I think people are starting to embrace what George Socha would like for me to aptly call the left side of the EDRM.  A lot of people have turned their heads away from the ugly business of selecting data to process and the collection of it and forensic and chain of custody issues and would gather it up any way they liked and process it.  But, I think there are some new and very viable ways that companies are offering for self-collection, for tracking of collection, for desk side interviews, and for generation and management of legal holds.  We’re seeing a lot of things emerging on that front.  Most of what I see in the legal hold management space is just awful.  That doesn’t mean it’s all awful, but most of it is awful.  It’s a lot of marketing speak, a lot of industry jargon, wrapped around a very uncreative, somewhat impractical, set of tools.  The question really is, are these things really much better than a well designed spreadsheet?  Certainly, they’re more scalable, but some have a “rushed to market” feel to me and I think it’s going to take them some time to mature.  Everyone is jumping on this Pension Committee bandwagon that Judge Scheindlin created for us, and not everyone has brought their Sunday best.

As for social media, it is a big deal because, if you’re paying attention to what’s happening with the generation about to explode on the scene, they simply have marginalized email.  Just as we are starting to get our arms around email, it’s starting to move off center stage.  And, I think the most important contribution to eDiscovery in 2010 has occurred silently and with little fanfare and I’d like to make sure you mention it.  In November, Facebook, the most important social networking site on the planet, very quietly provided the ability for you to package and collect, for personal storage, the entire contents of your Facebook life, including your Wall, your messaging, and your Facemail.  For all of the pieces of your Facebook existence, you can simply click and receive it back in a Zip file.  The ability to preserve and, ultimately, reopen and process that data is the most forward thinking thing that has emerged from the social networking world since there has been a social networking world.  How wonderful that Facebook had the foresight to say “you know, it would be nice if we could give people their entire Facebook stuff in a neat package in a moment in time”.

None of the others have done that yet, but I think that Facebook is so important that it’s going to make that a standard.  It’s going to need to be in Google Apps, it’s going to need to be in Gmail.  If you’re going to live your life “in the cloud”, then you’re going to have to have a way to grab your life from the cloud and move it somewhere else.  Maybe their portability was a way to head off antitrust, for all I know.  Whatever their motivation, I don’t think that most lawyers know that there is essentially this one-click preservation of Facebook.  If a vendor did it, you would hear about it in the elevators here at the show.  Facebook did it for free, and without any fanfare, and it’s an important thing for you to get out there.  The vendor that comes out with a tool that processes these packages that emerge, especially if they announce it when the Oscars come out {laugh}, is well positioned.

So, yes, social networking is important because it means that a lot of things change, forensics change.  You’re just not going to be able to do media forensics anymore on cloud content.  The cloud is going to make eDiscovery simpler, and that’s the one thing I haven’t heard anybody say, because you’ll have less you’ll need to delete and it’s much more likely to be gone – really gone – when you delete it (no forensics needed).  Collection and review can be easier.  What would you rather search, Gmail or Outlook?  Not only can Outlook emails be in several places, but the quality of a Google-based search is better, even though it’s not built for eDiscovery.  If I’m going to stand up in court and say that “I searched all these keywords and I saw all of the communications related to these keywords”, I’d rather do it with the force of Google than with the historically “snake bitten” engine for search that’s been in Outlook.  We always say in eDiscovery that you don’t use Outlook as a review and search tool because we know it isn’t good.  So, we take the container files, PSTs and OSTs and we parse them in better tools.  I think we’ll be able to do it both ways. 

I foresee a day not long off when Google will allow either the repatriation of those collections for use in more powerful tools or will allow different types of searches to be run on the Gmail collections other than just Gmail search.  You may be able to do searches and collect from your own Gmail, to place a hold on that Gmail.  Right now, you’d have to collect it, tag it, move it to a folder – you have to do some gyrations.  I think it will mature and they may open their API, so that there can be add-on tools from the lab or from elsewhere that will allow people to hook into Gmail.  To a degree, you can do that right now, by paying an upgrade fee for Postini, where they can download a PST with your Gmail content.  The problem with that is that Gmail is structured data, you really need to see the threading that Gmail provides to really appreciate the conversation that is Gmail.  Whereas, if you pull it down to PST (except in the latest version of Outlook, which I think 2010 does a pretty good job of threading), I don’t know if that is replicated in the Postini PST.  I’ll have to test that.

Office 2010 is a trend, as well.  Outlook 2010 is the first Microsoft tool that is eDiscovery friendly, by design.  I think Exchange 2010 is going to make our lives easier in eDiscovery.  We’re going to have a lot more “deleted” information hang around in the Windows 7 environment and in the Outlook 2010 and Exchange 2010 environment.  Data is not going away until you jump through some serious hoops to make it go away.

I think the iPad is also going to have quite an impact.  At first, it will be smoke and mirrors, but before 2011 bids us goodbye, I think the iPad is going to find its way into some really practical, gestural interfaces for working with data in eDiscovery.  I’ve yet to see anything yet but a half-assed version of an app.  Everyone rushed out and you wanted some way to interface with your product, but they didn’t build a purpose-built app for the iPad to really take advantage of its strengths, to be able to gesturally move between screens.  I foresee a day where you’ll have a ring of designations around the screen and you’ll flip a document, like a privileged document, into the appropriate designation and it will light up or something so that you know it went into the correct bin – as if you were at a desk and you were moving paper to different parts of the desk.  Sometimes, I wonder why somebody hasn’t thought of this before.  I’ve done no metrics, I’ve done no ergonomic studies to know that the paper metaphor serves the task well.  But, my gut tells me that we need to teach lawyers to walk before they can run, to help them interact with data in a metaphor that they understand in a graphical user interface.  Point and click, drag and drop, pinch and stretch, which are three dimensional concepts translated into a two dimensional interface. The interface of the iPad is so intuitive that a three year old could figure it out.  Just like Windows Explorer impacted the design of so many applications (“it’s an Explorer-like interface”), the iPad will do the same.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  I think that the show felt well attended, upbeat, fresher that it has in two years.  I give the credit to the vendors showing up with some genuinely new products, instead of renamed, remarketed new products, although there’s still plenty of that.  There were so many announcements of new products before the show that you really wonder how new is this product?  But, there were some that really look like they were built from the ground up and that’s impressive.  There’s some money being spent on development again, and that’s positive.  The traffic was better, I’m glad we finally eliminated the loft area of the exhibit hall that would get so hot and uncomfortable.  I thought the traffic flow was very difficult in a positive way, which is to say that there were a lot of warm bodies out there, walking and talking and looking.

Henry Dicker and his team should be congratulated and I wouldn’t be surprised if they set a record over the past several years at this show.  The budgets were showing, money was freed up and that’s a positive for everyone in this industry.  Also, the quality of the questions being put forward in the educational tracks are head and shoulders better, more incisive and insightful and more advanced.  We’re starting to see the results of people working at the “201 level”, but we still don’t have enough technologists here, it’s still way too lawyer heavy.  This is the New York market, everybody is chasing after the Fortune 500, but everything has to be downward scalable too.  A good show.

What are you working on that you’d like our readers to know about?

The first week of June, I’m going to be teaching a technology for lawyers and litigation support professionals academy with an ultra all star cast of a very small, but dedicated faculty, including Michael Arkfeld, Judge Paul Grimm, Judge John Facciola, and others.  It’s called the eDiscovery Training Academy and will be held at the Georgetown Law School. It’s going to be rigorous, challenging, extremely technical and the hope is that the people emerge from that week genuinely equipped to talk the talk and walk the walk of productive 26(f) conferences and real interaction with IT personnel and records managers.  We’re going to start down at the surface of the magnetic media and we’re going to keep climbing until we can climb no further.

Thanks, Craig, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: George Socha of Socha Consulting


This is the seventh of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is George Socha.  A litigator for 16 years, George is President of Socha Consulting LLC, offering services as an electronic discovery expert witness, special master and advisor to corporations, law firms and their clients, and legal vertical market software and service providers in the areas of electronic discovery and automated litigation support. George has also been co-author of the leading survey on the electronic discovery market, The Socha-Gelbmann Electronic Discovery Survey.  In 2005, he and Tom Gelbmann launched the Electronic Discovery Reference Model project to establish standards within the eDiscovery industry – today, the EDRM model has become a standard in the industry for the eDiscovery life cycle and there are eight active projects with over 300 members from 81 participating organizations. George has a J.D. for Cornell Law School and a B.A. from the University of Wisconsin – Madison.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

On the very “flip” side, the number one trend to date in 2011 is predictions about trends in 2011.  They are part of a consistent and long-term pattern, which is that many of these trend predictions are not trend predictions at all – they are marketing material and the prediction is “you will buy my product or service in the coming year”.

That said, there are a couple of things of note.  Since I understand you talked to Tom about Apersee, it’s worth noting that corporations are struggling with working through a list of providers to find out who provides what services.  You would figure that there is somewhere in the range of 500 or so total providers.  But, my ever-growing list, which includes both external and law firm providers, is at more than 1,200.  Of course, some of those are probably not around anymore, but I am confident that there are at least 200-300 that I do not yet have on the list.  My guess when the list shakes out is that there are roughly 1,100 active providers out there today.  If you look at information from the National Center for State Courts and the Federal Judicial Center, you’ll see that there are about 11 million new lawsuits filed every year.  I saw an article in the Cornell Law Forum a week or two ago which indicated that there are roughly 1.1 million lawyers in the country.  So, there are 11 million lawsuits, 1.1 million lawyers and 1,100 providers.  Most of those lawyers have no experience with eDiscovery and most of those lawsuits have no provider involved, which means eDiscovery is still very much an emerging market, not even close to being a mature market.  As fast as providers disappear, through attrition or acquisition, new providers enter the market to take their place.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the second afternoon of LTNY}  Maybe this is overly optimistic, but part of what I’m seeing in leading up to the conference, on various web sites and at the conference itself, is that a series of incremental changes taking place over a long period are finally leading to some radical differences.  One of those differences is that we finally are reaching a point where a number of providers can make the claim to being “end-to-end providers” with some legitimacy.  For as long as we’ve had the EDRM model, we’ve had providers that have professed to cover the full EDRM landscape, by which they generally have meant Identification through Production.  A growing number of providers not only cover that portion of the EDRM spectrum but have some ability to address Information Management, Presentation, or both   By and large, those providers are getting there by building their software and services based on experience and learning over the past 8 to 10 to 12 years, introducing new offerings at the show that reflect that learned experience.

A couple of days ago, I only half-jokingly issued “the Dyson challenge” (as in the Dyson vacuum cleaner).  Every year, come January, our living room carpet is strewn with pine tree needles and none of the vacuum cleaners that we have ever had have done a good job of picking up those needles.  The Dyson vacuum cleaner claims it cyclones capture more dirt than anything, but I was convinced that could not include those needles.  Nonetheless I tried, and to my surprise it worked like a charm!  I want to see the providers offering products able to perform at that high level, not just meeting but exceeding expectations.

I also see a feeling of excitement and optimism that wasn’t apparent at last year’s show.

What are you working on that you’d like our readers to know about?

As I mentioned, we have launched the Apersee web site, designed to allow consumers to find providers and products that fit their specific needs.  The site is in beta and the link is live.  It’s in beta because we’re still working on features to make it as useful as possible to customers and providers.  We’re hoping it’s a question of weeks, not months, before those features are implemented.  Once we go fully live, we will go two months with the system “wide open” – where every consumer can see all the provider and product information that any provider has put in the system.  After that, consumers will be able to see full provider and product profiles for providers who have purchased blocks of views.  Even if a provider does not purchase views, all selection criteria it enters are searchable, but search results will display only the provider’s name and website name.  Providers will be able to get stats on queries and how many times their information is viewed, but not detailed information as to which customers are connecting and performing the queries.

As for EDRM, we continue to make progress with an array of projects and a growing number of collaborative efforts, such as the work the Data Set group has down with TREC Legal and the work the Metrics group has done with the LEDES Committee. We not only want to see membership continue to grow, but we also want to continue to push for more active participation to continue to make progress in the various working groups.  We’ve just met at the show here regarding the EDRM Testing pilot project to address testing standards.  There are very few guidelines for testing of electronic discovery software and services, so the Testing project will become a full EDRM project as of the EDRM annual meeting this May to begin to address the need for those guidelines.

Thanks, George, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Jim McGann of Index Engines


This is the third of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Jim McGann.  Jim is Vice President of Information Discovery at Index Engines.  Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. He has worked for leading software firms, including Information Builders and the French-based engineering software provider Dassault Systemes.  In recent years he has worked for technology-based start-ups that provided financial services and information management solutions.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

What we’re seeing is that companies are becoming a bit more proactive.  Over the past few years we’ve seen companies that have simply been reacting to litigation and it’s been a very painful process because ESI collection has been a “fire drill” – a very last minute operation.  Not because lawyers have waited and waited, but because the data collection process has been slow, complex and overly expensive.  But things are changing. Companies are seeing that eDiscovery is here to stay, ESI collection is not going away and the argument of saying that it’s too complex or expensive for us to collect is not holding water. So, companies are starting to take a proactive stance on ESI collection and understanding their data assets proactively.  We’re talking to companies that are not specifically responding to litigation; instead, they’re building a defensible policy that they can apply to their data sources and make data available on demand as needed.    

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY}  Well, in walking the floor as people were setting up, you saw a lot of early case assessment last year; this year you’re seeing a lot of information governance..  That’s showing that eDiscovery is really rolling into the records management/information governance area.  On the CIO and General Counsel level, information governance is getting a lot of exposure and there’s a lot of technology that can solve the problems.  Litigation support’s role will be to help the executives understand the available technology and how it applies to information governance and records management initiatives.  You’ll see more information governance messaging, which is really a higher level records management message.

As for other trends, one that I’ll tie Index Engines into is ESI collection and pricing.  Per GB pricing is going down as the volume of data is going up.  Years ago, prices were a thousand per GB, then hundreds of dollars per GB, etc.  Now the cost is close to tens of dollars per GB. To really manage large volumes of data more cost-effectively, the collection price had to become more affordable.  Because Index Engines can make data on backup tapes searchable very cost-effectively, for as little as $50 per tape, data on tape has become  as easy to access and search as online data. Perhaps even easier because it’s not on a live network.  Backup tapes have a bad reputation because people think of them as complex or expensive, but if you take away the complexity and expense (which is what Index Engines has done), then they really become “full point-in-time” snapshots.  So, if you have litigation from a specific date range, you can request that data snapshot (which is a tape) and perform discovery on it.  Tape is really a natural litigation hold when you think about it, and there is no need to perform the hold retroactively.

So, what does the ease of which the information can be indexed from tape do to address the inaccessible argument for tape retrieval?  That argument has been eroding over the years, thanks to technology like ours.  And, you see decisions from judges like Judge Scheindlin saying “if you cannot find data in your primary network, go to your backup tapes”, indicating that they consider backup tapes in the next source right after online networks.  You also see people like Craig Ball writing that backup tapes may be the most convenient and cost-effective way to get access to data.  If you had a choice between doing a “server crawl” in a corporate environment or just asking for a backup tape of that time frame, tape is the much more convenient and less disruptive option.  So, if your opponent goes to the judge and says it’s going to take millions of dollars to get the information off of twenty tapes, you must know enough to be in front of a judge and say “that’s not accurate”.  Those are old numbers.  There are court cases where parties have been instructed to use tapes as a cost-effective means of getting to the data.  Technology removes the inaccessible argument by making it easier, faster and cheaper to retrieve data from backup tapes.

The erosion of the accessibility burden is sparking the information governance initiatives. We’re seeing companies come to us for legacy data remediation or management projects, basically getting rid of old tapes. They are saying “if I’ve got ten years of backup tapes sitting in offsite storage, I need to manage that proactively and address any liability that’s there” (that they may not even be aware exists).  These projects reflect a proactive focus towards information governance by remediating those tapes and getting rid of data they don’t need.  Ninety-eight percent of the data on old tapes is not going to be relevant to any case.  The remaining two percent can be found and put into the company’s litigation hold system, and then they can get rid of the tapes.

How do incremental backups play into that?  Tapes are very incremental and repetitive.  If you’re backing up the same data over and over again, you may have 50+ copies of the same email.  Index Engines technology automatically gets rid of system files and applies a standard MD5Hash to dedupe.  Also, by using tape cataloguing, you can read the header and say “we have a Saturday full backup and five incremental during the week, then another Saturday full backup”. You can ignore the incremental tapes and just go after the full backups.  That’s a significant percent of the tapes you can ignore.

What are you working on that you’d like our readers to know about?

Index Engines just announced today a partnership with LeClairRyan. This partnership combines legal expertise for data retention with the technology that makes applying the policy to legacy data possible.   For companies that want to build policy for the retention of legacy data and implement the tape remediation process we have advisors like LeClairRyan that can provide legacy data consultation and oversight.  By proactively managing the potential liability  of legacy data, you are also saving the IT costs to explore that data.

Index Engines  also just announced a new cloud-based tape load service that will provide full identification, search and access to tape data for eDiscovery. The Look & Learn service, starting at $50 per tape, will provide clients with full access to the index of their tape data without the need to install any hardware or software. Customers will be able to search the index and gather knowledge about content, custodians, email and metadata all via cloud access to the Index Engines interface, making discovery of data from tapes even more convenient and affordable.

Thanks, Jim, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Alon Israely, Esq., CISSP of BIA


This is the second of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Alon Israely.  Alon is a Senior Advisor in BIA’s Advisory Services group and when he’s not advising clients on e-discovery issues he works closely with BIA’s product development group for its core technology products.  Alon has over fifteen years of experience in a variety of advanced computing-related technologies and has consulted with law firms and their clients on a variety of technology issues, including expert witness services related to computer forensics, digital evidence management and data security.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

I think one of the important trends for corporate clients and law firms is cost control, whether it’s trying to minimize the amount of project management hours that are being billed or the manner in which the engagement is facilitated.  I’m not suggesting going full-bore necessarily, but taking baby steps to help control costs is a good approach.  I don’t think it’s only about bringing prices down, because I think that the industry in general has been able to do that naturally well.  But, I definitely see a new focus on the manner in which costs are managed and outsourced.  So, very specifically, scoping correctly is key, making sure you’re using the right tool for the right job, keeping efficiencies (whether that’s on the vendor side or the client side) by doing things such as not having five phone calls for a meeting to figure out what the key words are for field searching or just going out and imaging every drive before deciding what’s really needed. Bringing simple efficiencies to the mechanics of doing e-discovery saves tons of money in unnecessary legal, vendor and project management fees.  You can do things that are about creating efficiencies, but are not necessarily changing the process or changing the pricing.

I also see trends in technology, using more focused tools and different tools to facilitate a single project.  Historically, parties would hire three or four different vendors for a single project, but today it may be just one or two vendors or maybe even no vendors, (just the law firm) but, it’s the use of the right technologies for the right situations – maybe not just one piece of software, but leveraging several for different parts of the process.  Overall, I foresee fewer vendors per project, but more vendors increasing their stable of tools.  So, whereas a vendor may have had a review tool and one way of doing collection, now they may have two or three review tools, including an ECA tool, and one or two ways of doing collections. They have a toolkit from which they can choose the best set of tools to bring to the engagement.  Because they have more tools to market, vendors can have the right tool in-their-back-pocket whereas before the tool belonged to just one service provider so you bought from them, or you just didn’t have it.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY} I think you have either a little or a lot of – depending on how aggressive I want to be with my opinion – that there seems to be a disconnect between what they’re speaking about in the panels and what we’re seeing on the floor.  But, I think that’s OK in that the conference itself, is usually a little bit ahead of the curve with respect to topics, and the technology will catch up.  You have topics such as predictive coding and social networking related issues – those are two big ones that you’ll see.  I think, for example, there are very few companies that have a solution for social networking, though we happen to have one.  And, predictive coding is the same scenario.  You have a lot of providers that talk about it, but you have a handful that actually do it, and you have probably even fewer than that who do it right.  I think that next year you’ll see many predictive coding solutions and technologies and many more tools that have that capability built into them.  So, on the conference side, there is one level of information and on the floor side, a different level.

What are you working on that you’d like our readers to know about?

BIA has a new product called, the industry’s first SaaS (software-as-a-service), on-demand collection technology that provides defensible collections.  We just rolled it out, we’re introducing it here at LegalTech and we’re starting a technology preview and signing up people who want to use the application or try it.  It’s specifically for attorneys, corporations, service providers – anyone who’s in the business and needs a tool for defensible data collection performed with agility (always hard to balance) – so without having to buy software or have expert training, users simply login or register and can start immediately.  You don’t have to worry about the traditional business processes to get things set up and started.  Which, if you think about it on the collections side of e-discovery it means that  the client’s CEO or VP of Marketing can call you up and say “I’m leaving, I have my PST here, can you just come get it?” and you can facilitate that process through the web, download an application, walk through a wizard, collect it defensibly, encrypt it and then deliver a filtered set, as needed, for review..

The tool is designed to collect defensibly and to move the collected data – or some subset of that data –to delivery, from there you would select your review tool of choice and we hand it off to the selected review tool.  So, we’re not trying to be everything, we’re focused on automating the left side of the EDRM.  We have loads to certain tools, having been a service provider for ten years, and we’re connecting with partners so that we can do the handoff, so when the client says “I’m ready to deliver my data”, they can choose OnDemand or Concordance or another review tool, and then either directly send it or the client can download and ship it.  We’re not trying to be a review tool and not trying to be an ECA tool that helps you find the needle in the haystack; instead, we’re focused on collecting the data, normalizing it, cataloguing it and handing if off for the attorneys to do their work.

Thanks, Alon, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Database Discovery Pop Quiz ANSWERS


So, how did you do?  Did you know all the answers from Friday’s post – without “googling” them?  😉

Here are the answers – enjoy!

What is a “Primary Key”? The primary key of a relational table uniquely identifies each record in the table. It can be a normal attribute that you expect to be unique (e.g., Social Security Number); however, it’s usually best to be a sequential ID generated by the Database Management System (DBMS).

What is an “Inner Join” and how does it differ from an “Outer Join”?  An inner join is the most common join operation used in applications, creating a new result table by combining column values of two tables.  An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record in one of the tables – even if no other matching record exists.  Sometimes, there is a reason to keep all of the records in one table in your result, such as a list of all employees, whether or not they participate in the company’s benefits program.

What is “Normalization”?  Normalization is the process of organizing data to minimize redundancy of that data. Normalization involves organizing a database into multiple tables and defining relationships between the tables.

How does a “View” differ from a “Table”?  A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is a query stored as an object.

What does “BLOB” stand for?  A Binary Large OBject (BLOB) is a collection of binary data stored as a single entity in a database management system. BLOBs are typically images or other multimedia objects, though sometimes binary executable code is stored as a blob.  So, if you’re not including databases in your discovery collection process, you could also be missing documents stored as BLOBs.  BTW, if you didn’t click on the link next to the BLOB question in Friday’s blog, it takes you to the amusing trailer for the 1958 movie, The Blob, starring a young Steve McQueen (so early in his career, he was billed as “Steven McQueen”).

What is the different between a “flat file” and a “relational” database?  A flat file database is a database designed around a single table, like a spreadsheet. The flat file design puts all database information in one table, or list, with fields to represent all parameters. A flat file is prone to considerable duplicate data, as each value is repeated for each item.  A relational database, on the other hand, incorporates multiple tables with methods (such as normalization and inner and outer joins, defined above) to store data efficiently and minimize duplication.

What is a “Trigger”?  A trigger is a procedure which is automatically executed in response to certain events in a database and is typically used for keeping the integrity of the information in the database. For example, when a new record (for a new employee) is added to the employees table, a trigger might create new records in the taxes, vacations, and salaries tables.

What is “Rollback”?  A rollback is the undoing of partly completed database changes when a database transaction is determined to have failed, thus returning the database to its previous state before the transaction began.  Rollbacks help ensure database integrity by enabling the database to be restored to a clean copy after erroneous operations are performed or database server crashes occur.

What is “Referential Integrity”?  Referential integrity ensures that relationships between tables remain consistent. When one table has a foreign key to another table, referential integrity ensures that a record is not added to the table that contains the foreign key unless there is a corresponding record in the linked table. Many databases use cascading updates and cascading deletes to ensure that changes made to the linked table are reflected in the primary table.

Why is a “Cartesian Product” in SQL almost always a bad thing?  A Cartesian Product occurs in SQL when a join condition (via a WHERE clause in a SQL statement) is omitted, causing all combinations of records from two or more tables to be displayed.  For example, when you go to the Department of Motor Vehicles (DMV) to pay your vehicle registration, they use a database with an Owners and a Vehicles table joined together to determine for which vehicle(s) you need to pay taxes.  Without that join condition, you would have a Cartesian Product and every vehicle in the state would show up as registered to you – that’s a lot of taxes to pay!

If you didn’t know the answers to most of these questions, you’re not alone.  But, to effectively provide the information within a database responsive to an eDiscovery request, knowledge of databases at this level is often necessary to collect and produce the appropriate information.    As Craig Ball noted in his article Ubiquitous Databases, “Get the geeks together, and get out of their way”.  Hey, I resemble that remark!

So, what do you think?  Did you learn anything?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Database Discovery Pop Quiz


Databases: You can’t live with them, you can’t live without them.

Or so it seems in eDiscovery.  On a regular basis, I’ve seen various articles and discussions related to discovery of databases and other structured data and I remain very surprised how few legal teams understand database discovery and know how to handle it.  A colleague of mine (who I’ve known over the years to be honest and reliable) even claimed to me a few months back while working for a nationally known eDiscovery provider that their collection procedures actually excluded database files.

Last month, had an article written by Craig Ball, called Ubiquitous Databases, which provided a lot of good information about database discovery. It included various examples how databases touch our lives every day, while noting that eDiscovery is still ultra document-centric, even when those “documents” are generated from databases.  There is some really good information in that article about Database Management Software (DBMS), Structured Query Language (SQL), Entity Relationship Diagrams (ERDs) and how they are used to manage, access and understand the information contained in databases.  It’s a really good article especially for database novices who need to understand more about databases and how they “tick”.

But, maybe you already know all you need to know about databases?  Maybe you would already be ready to address eDiscovery on your databases today?

Having worked with databases for over 20 years (I stopped counting at 20), I know a few things about databases.  So, here is a brief “pop” quiz on database concepts.  Call them “Database 101” questions.  See how many you can answer!

  • What is a “Primary Key”? (hint: it is not what you start the car with)
  • What is an “Inner Join” and how does it differ from an “Outer Join”?
  • What is “Normalization”?
  • How does a “View” differ from a “Table”?
  • What does “BLOB” stand for? (hint: it’s not this)
  • What is the different between a “flat file” and a “relational” database?
  • What is a “Trigger”?
  • What is “Rollback”? (hint: it has nothing to do with Wal-Mart prices)
  • What is “Referential Integrity”?
  • Why is a “Cartesian Product” in SQL almost always a bad thing?

So, what do you think?  Are you a database guru or a database novice?  Please share any comments you might have or if you’d like to know more about a particular topic.

Did you think I was going to provide the answers at the bottom?  No cheating!!  I’ll answer the questions on Monday.  Hope you can stand it!!

eDiscovery Trends: 2011 Predictions — By The Numbers


Comedian Nick Bakay”>Nick Bakay always ends his Tale of the Tape skits where he compares everything from Married vs. Single to Divas vs. Hot Dogs with the phrase “It's all so simple when you break things down scientifically.”

The late December/early January time frame is always when various people in eDiscovery make their annual predictions as to what trends to expect in the coming year.  We’ll have some of our own in the next few days (hey, the longer we wait, the more likely we are to be right!).  However, before stating those predictions, I thought we would take a look at other predictions and see if we can spot some common trends among those, “googling” for 2011 eDiscovery predictions, and organized the predictions into common themes.  I found serious predictions here, here, here, here and here.  Oh, also here and here.

A couple of quick comments: 1) I had NO IDEA how many times that predictions are re-posted by other sites, so it took some work to isolate each unique set of predictions.  I even found two sets of predictions from ZL Technologies, one with twelve predictions and another with seven, so I had to pick one set and I chose the one with seven (sorry, eWEEK!). If I have failed to accurately attribute the original source for a set of predictions, please feel free to comment.  2) This is probably not an exhaustive list of predictions (I have other duties in my “day job”, so I couldn’t search forever), so I apologize if I’ve left anybody’s published predictions out.  Again, feel free to comment if you’re aware of other predictions.

Here are some of the common themes:

  • Cloud and SaaS Computing: Six out of seven “prognosticators” indicated that adoption of Software as a Service (SaaS) “cloud” solutions will continue to increase, which will become increasingly relevant in eDiscovery.  No surprise here, given last year’s IDC forecast for SaaS growth and many articles addressing the subject, including a few posts right here on this blog.
  • Collaboration/Integration: Six out of seven “augurs” also had predictions related to various themes associated with collaboration (more collaboration tools, greater legal/IT coordination, etc.) and integration (greater focus by software vendors on data exchange with other systems, etc.).  Two people specifically noted an expectation of greater eDiscovery integration within organization governance, risk management and compliance (GRC) processes.
  • In-House Discovery: Five “pundits” forecasted eDiscovery functions and software will continue to be brought in-house, especially on the “left-side of the EDRM model” (Information Management).
  • Diverse Data Sources: Three “soothsayers” presaged that sources of data will continue to be more diverse, which shouldn’t be a surprise to anyone, given the popularity of gadgets and the rise of social media.
  • Social Media: Speaking of social media, three “prophets” (yes, I’ve been consulting my thesaurus!) expect social media to continue to be a big area to be addressed for eDiscovery.
  • End to End Discovery: Three “psychics” also predicted that there will continue to be more single-source end-to-end eDiscovery offerings in the marketplace.

The “others receiving votes” category (two predicting each of these) included maturing and acceptance of automated review (including predictive coding), early case assessment moving toward the Information Management stage, consolidation within the eDiscovery industry, more focus on proportionality, maturing of global eDiscovery and predictive/disruptive pricing.

Predictive/disruptive pricing (via Kriss Wilson of Superior Document Services and Charles Skamser of eDiscovery Solutions Group respective blogs) is a particularly intriguing prediction to me because data volumes are continuing to grow at an astronomical rate, so greater volumes lead to greater costs.  Creativity will be key in how companies deal with the larger volumes effectively, and pressures will become greater for providers (even, dare I say, review attorneys) to price their services more creatively.

Another interesting prediction (via ZL Technologies) is that “Discovery of Databases and other Structured Data will Increase”, which is something I’ve expected to see for some time.  I hope this is finally the year for that.

Finally, I said that I found serious predictions and analyzed them; however, there are a couple of not-so-serious sets of predictions here and here.  My favorite prediction is from The Posse List, as follows: “LegalTech…renames itself “EDiscoveryTech” after survey reveals that of the 422 vendors present, 419 do e-discovery, and the other 3 are Hyundai HotWheels, Speedway Racers and Convert-A-Van who thought they were at the Javits Auto Show.”

So, what do you think?  Care to offer your own “hunches” from your crystal ball?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: Pension Committee

This holiday week, we’re taking a look back at some of the cases which have had the most significance (from an eDiscovery standpoint) of the year.  The first case we will look at is The Pension Committee of the Montreal Pension Plan v. Banc of America Securities, LLC, 29010 U.S. Dist. Lexis 4546 (S.D.N.Y. Jan. 15, 2010) (as amended May 28, 2010), commonly referred to as “Pension Committee”.

In “Pension Committee”, New York District Court Judge Shira Scheindlin defined negligence, gross negligence, and willfulness from an eDiscovery standpoint and cementing her status as the most famous “Judge Scheindlin” in New York (as opposed to “Judge Judy” Sheindlin, who spells her last name without a “c”).  Judge Scheindlin titled her 85-page opinion Zubulake Revisited: Six Years Later.  The

This case addresses preservation and spoliation requirements of the plaintiff and information which should have been preserved by the plaintiffs after the lawsuit was filed. Judge Scheindlin addresses in considerable detail, defining the levels of culpability — negligence, gross negligence, and willfulness in the electronic discovery context.

Issues that constituted negligence according to Judge Scheindlin’s opinion included:

  • Failure to obtain records from all employees (some of whom may have had only a passing encounter with the issues in the litigation), as opposed to key players;
  • Failure to take all appropriate measures to preserve ESI;
  • Failure to assess the accuracy and validity of selected search terms.

Issues that constituted gross negligence or willfulness according to Judge Scheindlin’s opinion included:

  • Failure to issue a written litigation hold;
  • Failure to collect information from key players;
  • Destruction of email or backup tapes after the duty to preserve has attached;
  • Failure to collect information from the files of former employees that remain in a party’s possession, custody, or control after the duty to preserve has attached.

The opinion also addresses 1) responsibility to establish the relevance of evidence that is lost as well as responsibility to prove that the absence of the missing material has caused prejudice to the innocent party, 2) a novel burden-shifting test in addressing burden of proof and severity of the sanction requested and 3) guidance on the important issue of preservation of backup tapes.

The result: spoliation sanctions against 13 plaintiffs based on their alleged failure to timely issue written litigation holds and to preserve certain evidence before the filing of the complaint.

Scheindlin based sanctions on the conduct and culpability of the spoliating party, regardless of the relevance of the documents destroyed, which has caused some to label the opinion as “draconian”.  In at least one case, Orbit One Communications Inc. v. Numerex Corp., 2010 WL 4615547 (S.D.N.Y. Oct. 26, 2010)., Magistrate Judge James C. Francis concluded that sanctions for spoliation must be based on the loss of at least some information relevant to the dispute.  It will be interesting to see how other cases refer to the Pension Committee case down the road.

So, what do you think?  Is this the most significant eDiscovery case of 2010?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Tips: SaaS and eDiscovery – More Top Considerations

Friday, we began talking about the article regarding Software as a Service (SaaS) and eDiscovery entitled Top 7 Legal Things to Know about Cloud, SaaS and eDiscovery on CIO, written by David Morris and James Shook from EMC.  The article, which relates to storage of ESI within cloud and SaaS providers, can be found here.

The article looks at key eDiscovery issues that must be addressed for organizations using public cloud and SaaS offerings for ESI, and Friday’s post looked at the first three issues.  Here are the remaining four issues from the article (requirements in bold are quoted directly from the article):

4. What if there are technical issues with e-discovery in the cloud?  The article discusses how identifying and collecting large volumes of data can have significant bandwidth, CPU, and storage requirements and that the cloud provider may have to do all of this work for the organization.  It pays to be proactive, determine potential eDiscovery needs for the data up front and, to the extent possible, negotiate eDiscovery requirements into the agreement with the cloud provider.

5. If the cloud/SaaS provider loses or inadvertently deletes our information, aren’t they responsible? As noted above, if the agreement with the cloud provider includes eDiscovery requirements for the cloud provider to meet, then it’s easier to enforce those requirements.  Currently, however, these agreements rarely include these types of requirements.  “Possession, custody or control” over the data points to the cloud provider, but courts usually focus their efforts on the named parties in the case when deciding on spoliation claims.  Sounds like a potential for third party lawsuits.

6. If the cloud/SaaS provider loses or inadvertently deletes our information, what are the potential legal ramifications?  If data was lost because of the cloud provider, the organization will probably want to establish that they’re not at fault. But it may take more than establishing who deleted the data. – the organization may need to demonstrate that it acted diligently in selecting the provider, negotiating terms with established controls and notifying the provider of hold requirements in a timely manner.  Even then, there is no case law guidance as to whether demonstrating such would shift that responsibility and most agreements with cloud providers will limit potential damages for loss of data or data access.

7. How do I protect our corporation from fines and sanction for ESI in the cloud?  The article discusses understanding what ESI is potentially relevant and where it’s located.  This can be accomplished, in part, by creating a data map for the organization that covers data in the cloud as well as data stored within the organization.  Again, covering eDiscovery and other compliance requirements with the provider when negotiating the initial agreement can make a big difference.  As always, be proactive to minimize issues when litigation strikes.

Let’s face it, cloud and SaaS solutions are here to stay and they are becoming increasingly popular for organizations of all sizes to avoid the software and infrastructure costs of internal solutions.  Being proactive and including corporate counsel up front in decisions related to SaaS selections will enable your organization to avoid many potential problems down the line.

So, what do you think?  Does your company have mechanisms in place for discovery of your cloud data?  Please share any comments you might have or if you’d like to know more about a particular topic.