Searching

eDiscovery Trends: Jack Halprin of Autonomy

 

This is the fifth of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Jack Halprin.  As Vice President, eDiscovery and Compliance with Autonomy, Jack serves as internal and external legal subject matter expert for best practices and defensible processes around litigation, electronic discovery, legal hold, and compliance issues. He speaks frequently on enterprise legal risk management, compliance, and eDiscovery at industry events and seminars, and has authored numerous articles on eDiscovery, legal hold, social media, and knowledge management. He is actively involved in The Sedona Conference, ACC, and Electronic Discovery Reference Model (EDRM). With a BA in Chemistry from Yale University, a JD from the University of California-Los Angeles, and certifications from the California, Connecticut, Virginia and Patent Bars, Mr. Halprin has varied expertise that lends itself well to both the legal and technical aspects of electronic discovery collection and preservation.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

If I look at the overall trends, social media and the cloud are probably the two hottest topics from a technology perspective and also a data management perspective.  From the legal perspective, you’re looking at preservation issues and sanctions as well as the idea of proportionality.  You also see a greater need for technology that can meet the needs of attorneys and understand the meaning of information.  More and more, everyone is realizing that keyword searches are lacking – they aren’t really as effective as everyone thinks they are.

We’re also starting to see two other technology related trends.  The industry is consolidating and customers are really starting to look for a single platform.  The current process of importing/exporting of data from storage to legal hold collection, to early case assessment, to review, to production and creating several extra copies of the documents in the process is not manageable going forward.  Customers want to be able to preserve in place, to analyze in place, and they don’t want to have to collect and duplicate the data again and again.  If you look at the left side of EDRM, the more proactive side, they don’t want put data or documents in a special repository unless it’s a true record that no one needs to access on a regular basis.  They want to work with active data where it lives.

You’ll see a reduction in the number of vendors in the next year or two, and the technology will not only be able to handle the current data sources, but the increased data volumes and new types of data we’re seeing.  Everyone is looking at social media and saying “how are we going to handle this”, when it’s really just another data source that has to be addressed.  Yes, it’s challenging because there is so much of it and it is even more conversational than email, taking it to a whole new level, but it’s really no different from other data sources.  A keyword search on a social media site is not going to net you the results you’re looking for, but conceptual search to understand the context of what people mean will help you identify the relevant information.  Growth rates are predicted at more than 60 percent for unstructured information, but social media is growing at a much faster clip.  A lot of people are looking at social media and moving to the cloud to manage this data, reducing some of the infrastructure costs, taking strain off the network and reducing their IT footprint.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first afternoon of LTNY}  I’ll take it first from the Autonomy perspective.  We have social media solutions, which we’ve had for our marketing business (Interwoven) for some time.  We’ve also had social media governance technology for quite some time as well, and we announced today new capabilities for identifying, preserving and collecting social media for eDiscovery, which is part of and builds on our end-to-end solution.  I haven’t spent much time on the floor yet, but based on everything I’ve seen in the eDiscovery space, a lot of people are talking about social media, but no one really understands how to address it.  You’ve got people scraping {social media} pages, but if you scrape the page without the active link or without capturing the context behind it, you’re missing the wealth of the information.  We’re taking a different approach, we take the entire page, including the context and active links.

There’s also a wide disparity in terms of the cloud.  Is it public?  Is it private?  How much control do you have over your data when it’s in the cloud?  You’ve got a lot of vendors out there that aren’t transparent about their data centers.  You’ve got vendors that say they’re SAS 70 Type II certified, but it’s their data center, not the vendor itself, that is certified.  So, who’s got the experience?  Every year at LegalTech, there are probably forty new vendors out there and the next year, half or more of them are gone.

As for the tone of the show, I think it’s certainly more upbeat than last year when attendance was down, and it’s a bit more “bouncy” this year.  With that in mind, you’ll continue to see acquisitions and you’ll have the issue companies merged through acquisition using different technologies and different search engines, meaning they’re not on a single platform and not really a single solution.  So, that gets back to the idea that customers are really looking for a single platform with a single engine underneath it.  That’s how we approach it, and I think others are trying to get to that point, but I don’t think there are many vendors there yet.  That’s where the trend is heading.

What are you working on that you’d like our readers to know about?

In addition to the new social media eDiscovery capabilities described above, we’ve announced the Autonomy Chaining Console, which is a dashboard to provide corporate legal departments with greater visibility and defensibility across the entire process and to eliminate those risky data import/export handoffs through each step.  Many of the larger corporations have hundreds of cases, dozens of outside law firms, and terabytes of data to manage.  The process today is very “silo” oriented – data is sent to processing vendors, it is sent to law firms, etc.  So, you get these “weak links in the chain” where data can get lost and risks of spoliation and costs increase.  Autonomy announced the whole idea of chaining last year promoting the idea that we can seamlessly connect law firms and their corporate clients in a secure manner, so that the law firm can login to a secure portal and can manage the data that they’re allowed to access.  The Chaining Console strengthens that capability, and it adds Autonomy IDOL’s ability to understand meaning and allows corporate and outside counsel to look at the same data on the same solution.  It uses IDOL to determine potential custodians, understand fact patterns and identify other companies that may be involved by really analyzing the data and providing an understanding of what’s there.  It can also monitor and track risk, so you can set up certain policies around key issues; for example, insider trading, securities fraud, FCPA, etc.  Using those policies, it can alert you to the risks that are there and possibly identify the custodians that are engaging in risky behavior.  And, of course, it tracks the data from start to finish, giving corporate counsel, legal IT, IT, litigation support, litigation counsel as well as outside counsel a single view of the data on a single dashboard.  It strengthens our message and takes us to the next step in really providing the end-to-end platform for our clients.

We’ve also announced iManage in the cloud for legal information management in the cloud.  The cloud-based Information Management platform combines WorkSite, Records Manager, Universal Search, Process Automation and ConflictsManager to help attorneys manage the content throughout the matter lifecycle from inception to disposition.  It uses IDOL’s ability to group concepts, so if you have a conflict with Apple, it knows that you’re searching for terms related to Apple computer such as Mac, iPhone, Steve Jobs, Steve Wozniak, Jonathon Ives and understands that these are related terms and individuals.  And, we’ve just announced the cloud-based version of that.  We’re already managing information governance in the cloud for a lot of our clients and the platform leverages our private cloud, which is the world’s largest private cloud with over 17 petabytes of data.

And, then we have a market leadership announcement with additional major law firms that are using our solutions, such as Brownstein Hyatt Farber Schreck LLP, Brown Rudnick LLP, Fennemore Craig, etc.  So, we have four press releases with new developments at Autonomy that we’ve announced here at the show.

Thanks, Jack, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Christine Musil of Informative Graphics Corporation (IGC)

 

This is the fourth of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Christine Musil.  Christine has a diverse career in engineering and marketing spanning 15 years. Christine has been with IGC since March 1996, when she started as a technical writer and a quality assurance engineer. After moving to marketing in 2001, she has applied her in-depth knowledge of IGC's products and benefits to marketing initiatives, including branding, overall messaging, and public relations. She has also been a contributing author to a number of publications on archiving formats, redaction, and viewing technology in the enterprise.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

For us, the biggest trend is elevation of the importance of eDiscovery, from what happens the minute you find out you have a lawsuit until the end of the case.  There’s a lot more discussion about how you can prevent it, how you can be better prepared, and I think that’s where the new buzzword, information governance, comes in.  We partner with OpenText and we partner with EMC on their content management side and we definitely see them pushing into the eDiscovery market to provide an end-to-end solution and stop trying to treat eDiscovery as an isolated issue. I think that the elevation of eDiscovery and inclusion of eDiscovery more into the regular business workflow of an organization is a pretty significant trend to watch.

Another trend that I see is an elevation of the use of search and how people can get more out of their searches to save time and cost.  This may be somewhat skewed based on our perspective in the market, but we’ve had a lot of requests for our redaction software to pick up the search that has already been done. Providers work so hard to come up with amazingly complicated algorithms to find data.  Why reinvent the wheel?  The companies all ask why all the other vendors can’t just take those search results and use it. 

Since you’ve written a white paper about native review and redaction, where do you see that heading?  Well, I hope that people will stop printing things out, scanning it back in to TIFF, then OCRing it and handing everybody back a disk of flat images and a separate disk with OCR text.  I sort of understand why they do it, but I think a less paranoid or adversarial approach through more effective “meet and confer” agreements on how you are going to present things are going to make it so much easier for everybody.  I hope in three to five years people say “I’m not afraid to hand you my native files because I know how to check them and know what metadata they contain and whether there are any tracked changes or other potential issues”.  So, the paranoia and fear that people have about the unknown that they can’t see in their documents and whether there is a smoking gun in there should die down.  I think people are getting smarter – now that they’re not producing paper – as to what  electronic files contain.  Hopefully, they will understand that native format is OK and when they need to redact, it’s OK to use PDF format to do so.  You tell the other side what you’re doing and what they’re going to get and it becomes a more open and well understood process.

I’m also on the EDRM XML committee and hope a standard load file format that transmits data seamlessly from one side to the other and contains all the information about what has been redacted, among other things, will make things easier on everybody, getting information through the process more seamlessly.  We’re writing white papers about the data set to educate the vendors on how to use it and I have high hopes for what we will be able to accomplish there.

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY}  Well, that’s hard since LegalTech just started [smiles].  I can tell you that in discussions with some of our partners, we’re seeing more support for mobile devices, support for the iPad, etc., to help lawyers work wherever they are and be more efficient wherever they are.  And, I think that literally goes all the way to the courtroom.  So, you’re seeing support for more devices and smaller screens, wherever attorneys get information.

What are you working on that you’d like our readers to know about?

I’m moderating a panel discussion {at LegalTech} titled, The Debate on Native Format Production and Redaction, which includes Craig Ball, George Socha, Tom O’Connor and Browning Marean.  I wrote a white paper last year entitled The Reality of Native Format Production and Redaction, which has inspired this panel discussion here at LegalTech.  So, that should be informative and interesting.  We’ve noticed that there’s just an awful lot of confusion in terms of what’s really required and what’s acceptable and the white paper and panel discussion really speaks to that.  We’re trying to educate our customers and help our partners educate their clients.

The other thing we’re announcing here is the release of integration to OpenText eDOCS.  We’ve been partners with OpenText for content management since 2002 and are very excited to extend our partnership to include this new area. eDOCS has a great presence in the legal space and we look forward to working with them.

Thanks, Christine, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Trends: Jim McGann of Index Engines

 

This is the third of the LegalTech New York (LTNY) Thought Leader Interview series.  eDiscoveryDaily interviewed several thought leaders at LTNY this year and asked each of them the same three questions:

  1. What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?
  2. Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?
  3. What are you working on that you’d like our readers to know about?

Today’s thought leader is Jim McGann.  Jim is Vice President of Information Discovery at Index Engines.  Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. He has worked for leading software firms, including Information Builders and the French-based engineering software provider Dassault Systemes.  In recent years he has worked for technology-based start-ups that provided financial services and information management solutions.

What do you consider to be the current significant trends in eDiscovery on which people in the industry are, or should be, focused?

What we’re seeing is that companies are becoming a bit more proactive.  Over the past few years we’ve seen companies that have simply been reacting to litigation and it’s been a very painful process because ESI collection has been a “fire drill” – a very last minute operation.  Not because lawyers have waited and waited, but because the data collection process has been slow, complex and overly expensive.  But things are changing. Companies are seeing that eDiscovery is here to stay, ESI collection is not going away and the argument of saying that it’s too complex or expensive for us to collect is not holding water. So, companies are starting to take a proactive stance on ESI collection and understanding their data assets proactively.  We’re talking to companies that are not specifically responding to litigation; instead, they’re building a defensible policy that they can apply to their data sources and make data available on demand as needed.    

Which of those trends are evident here at LTNY, which are not being talked about enough, and/or what are your general observations about LTNY this year?

{Interviewed on the first morning of LTNY}  Well, in walking the floor as people were setting up, you saw a lot of early case assessment last year; this year you’re seeing a lot of information governance..  That’s showing that eDiscovery is really rolling into the records management/information governance area.  On the CIO and General Counsel level, information governance is getting a lot of exposure and there’s a lot of technology that can solve the problems.  Litigation support’s role will be to help the executives understand the available technology and how it applies to information governance and records management initiatives.  You’ll see more information governance messaging, which is really a higher level records management message.

As for other trends, one that I’ll tie Index Engines into is ESI collection and pricing.  Per GB pricing is going down as the volume of data is going up.  Years ago, prices were a thousand per GB, then hundreds of dollars per GB, etc.  Now the cost is close to tens of dollars per GB. To really manage large volumes of data more cost-effectively, the collection price had to become more affordable.  Because Index Engines can make data on backup tapes searchable very cost-effectively, for as little as $50 per tape, data on tape has become  as easy to access and search as online data. Perhaps even easier because it’s not on a live network.  Backup tapes have a bad reputation because people think of them as complex or expensive, but if you take away the complexity and expense (which is what Index Engines has done), then they really become “full point-in-time” snapshots.  So, if you have litigation from a specific date range, you can request that data snapshot (which is a tape) and perform discovery on it.  Tape is really a natural litigation hold when you think about it, and there is no need to perform the hold retroactively.

So, what does the ease of which the information can be indexed from tape do to address the inaccessible argument for tape retrieval?  That argument has been eroding over the years, thanks to technology like ours.  And, you see decisions from judges like Judge Scheindlin saying “if you cannot find data in your primary network, go to your backup tapes”, indicating that they consider backup tapes in the next source right after online networks.  You also see people like Craig Ball writing that backup tapes may be the most convenient and cost-effective way to get access to data.  If you had a choice between doing a “server crawl” in a corporate environment or just asking for a backup tape of that time frame, tape is the much more convenient and less disruptive option.  So, if your opponent goes to the judge and says it’s going to take millions of dollars to get the information off of twenty tapes, you must know enough to be in front of a judge and say “that’s not accurate”.  Those are old numbers.  There are court cases where parties have been instructed to use tapes as a cost-effective means of getting to the data.  Technology removes the inaccessible argument by making it easier, faster and cheaper to retrieve data from backup tapes.

The erosion of the accessibility burden is sparking the information governance initiatives. We’re seeing companies come to us for legacy data remediation or management projects, basically getting rid of old tapes. They are saying “if I’ve got ten years of backup tapes sitting in offsite storage, I need to manage that proactively and address any liability that’s there” (that they may not even be aware exists).  These projects reflect a proactive focus towards information governance by remediating those tapes and getting rid of data they don’t need.  Ninety-eight percent of the data on old tapes is not going to be relevant to any case.  The remaining two percent can be found and put into the company’s litigation hold system, and then they can get rid of the tapes.

How do incremental backups play into that?  Tapes are very incremental and repetitive.  If you’re backing up the same data over and over again, you may have 50+ copies of the same email.  Index Engines technology automatically gets rid of system files and applies a standard MD5Hash to dedupe.  Also, by using tape cataloguing, you can read the header and say “we have a Saturday full backup and five incremental during the week, then another Saturday full backup”. You can ignore the incremental tapes and just go after the full backups.  That’s a significant percent of the tapes you can ignore.

What are you working on that you’d like our readers to know about?

Index Engines just announced today a partnership with LeClairRyan. This partnership combines legal expertise for data retention with the technology that makes applying the policy to legacy data possible.   For companies that want to build policy for the retention of legacy data and implement the tape remediation process we have advisors like LeClairRyan that can provide legacy data consultation and oversight.  By proactively managing the potential liability  of legacy data, you are also saving the IT costs to explore that data.

Index Engines  also just announced a new cloud-based tape load service that will provide full identification, search and access to tape data for eDiscovery. The Look & Learn service, starting at $50 per tape, will provide clients with full access to the index of their tape data without the need to install any hardware or software. Customers will be able to search the index and gather knowledge about content, custodians, email and metadata all via cloud access to the Index Engines interface, making discovery of data from tapes even more convenient and affordable.

Thanks, Jim, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Case Law: Responses to FOIA Requests Must Be Searchable

Southern District of New York Judge Shira A. Scheindlin is at it again!  Her latest ruling is that the federal government must provide documents “in a usable format” when it responds to Freedom of Information Act (FOIA) requests.

Noting that “Once again, this Court is required to rule on an eDiscovery issue that could have been avoided had the parties had the good sense to ‘meet and confer,’ ‘cooperate’ and generally make every effort to ‘communicate’ as to the form in which ESI would be produced.”, Judge Scheindlin ruled that federal agencies must turn over documents that include “metadata,” which allows them to be searched and indexed.  Indicating that “common sense dictates” that the handling of FOIA requests should be informed by “the spirit if not the letter” of the Federal Rules of Civil Procedure, Judge Scheindlin indicated the government offered “a lame excuse” for delivering non-searchable documents.

In National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, 10 Civ. 3488, the National Day Laborer Organizing Network, Center for Constitutional Rights and the Immigration Justice Clinic at the Benjamin N. Cardozo School of Law sued to require production of a wide range of documents under the Freedom of Information Act in August 2010.  In response, the government agency defendants produced documents grouped together in large files that were not searchable, for which individual documents could not be easily identified, with emails separated from their attachments.

Consistent with the decisions of several state courts regarding their own FOIA statutes, Judge Scheindlin ruled that the federal law requires that metadata, which allows for electronic files to be organized and searched, must be retained in the records agencies produce.  While the federal act doesn’t specifically specify the form in which documents must be delivered, it does require that documents be provided in any “format” that is “readily reproducible” by the agency in that format.  Metadata, in the FOIA context, is “readily reproducible,” Judge Scheindlin noted.

Judge Scheindlin also observed that “whether or not metadata has been specifically requested,” the production of non-searchable documents is “an inappropriate downgrading” of electronically stored information and provision of files “stripped of all metadata and lumped together without any indication of where a record begins and ends” is not an “acceptable form of production,” she said.

A copy of the opinion and order can be found here.

So, what do you think?  Have you been the recipient of a “lumped together” non-searchable production recently?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Judges’ Guide to Cost-Effective eDiscovery

 

Last week at LegalTech, I met Joe Howie at the blogger’s breakfast on Tuesday morning.  Joe is the founder of Howie Consulting and is the Director of Metrics Development and Communications for the eDiscovery Institute, which is a 501(c)(3) nonprofit research organization for eDiscovery.

eDiscovery Institute has just released a new publication that is a vendor-neutral guide for approaches to considerably reduce discovery costs for ESI.  The Judges’ Guide to Cost-Effective E-Discovery, co-written by Anne Kershaw (co-Founder and President of the eDiscovery Institute) and Joe Howie, also contains a foreword by the Hon. James C. Francis IV, Magistrate Judge for the Southern District of New York.  Joe gave me a copy of the guide, which I read during my flight back to Houston and found to be a terrific publication that details various mechanisms that can reduce the volume of ESI to review by up to 90 percent or more.  You can download the publication here (for personal review, not re-publication), and also read a summary article about it from Joe in InsideCounsel here.

Mechanisms for reducing costs covered in the Guide include:

  • DeNISTing: Excluding files known to be associated with commercial software, such as help files, templates, etc., as compiled by the National Institute of Standards and Technology, can eliminate a high number of files that will clearly not be responsive;
  • Duplicate Consolidation (aka “deduping”): Deduping across custodians as opposed to just within custodians reduces costs 38% for across-custodian as opposed to 21% for within custodian;
  • Email Threading: The ability to review the entire email thread at once reduces costs 36% over having to review each email in the thread;
  • Domain Name Analysis (aka Domain Categorization): As noted previously in eDiscoveryDaily, the ability to classify items based on the domain of the sender of the email can significantly reduce the collection to be reviewed by identifying emails from parties that are clearly not responsive to the case.  It can also be a great way to quickly identify some of the privileged emails;
  • Predictive Coding: As noted previously in eDiscoveryDaily, predictive coding is the use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. According to this report, “A recent survey showed that, on average, predictive coding reduced review costs by 45 percent, with several respondents reporting much higher savings in individual cases”.

The publication also addresses concepts such as focused sampling, foreign language translation costs and searching audio records and tape backups.  It even addresses some of the most inefficient (and therefore, costly) practices of ESI processing and review, such as wholesale printing of ESI to paper for review (either in paper form or ultimately converted to TIFF or PDF), which is still more common than you might think.  Finally, it references some key rules of the ABA Model Rules of Professional Conduct to address the ethical duty of attorneys in effective management of ESI.  It’s a comprehensive publication that does a terrific job of explaining best practices for efficient discovery of ESI.

So, what do you think?  How many of these practices have been implemented by your organization?  Please share any comments you might have or if you’d like to know more about a particular topic.

Deadline Extended to Vote for the Most Significant eDiscovery Case of 2010

 

Our ‘little experiment’ to see what the readers of eDiscoveryDaily think about case law developments in 2010 needs more time as we have not yet received enough votes yet to have a statistically significant result.  So, we’ve extended the deadline to select the case with the most significant impact on eDiscovery practices in 2010 to February 28.  Evidently, calling out the vote on the last business day before LegalTech is not the best timing.  Live and learn!

As noted previously, we have “nominated” five cases, which we feel were the most significant in different issues of case law, including duty to preserve and sanctions, clawback agreements under Federal Rule of Evidence 502, not reasonably accessible arguments and discoverability of social media content.  If you feel that some other case was the most significant case of 2010, you can select that case instead.  Again, it’s very important to note that you can vote anonymously, so we’re not using this as a “hook” to get your information.  You can select your case without providing any personal information.  However, we would welcome your comments as to why you selected the case you did and you can – optionally – identify yourself as well.

To get more information about the nominated cases (as well as other significant cases), click here.  To cast your vote, click here.

And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Vote for the Most Significant eDiscovery Case of 2010!

 

Since it’s awards season, we thought we would get into the act from an eDiscovery standpoint.  Sure, you have Oscars, Emmys and Grammys – but what about “EDDies”?  (I’ll bet you wondered what Eddie Munster could possibly have to do with eDiscovery, didn’t you?)

So, we’re conducting a ‘little experiment’ to see what the readers of eDiscoveryDaily think about case law developments in 2010.  This is our first annual “EDDies” award to select the case with the most significant impact on eDiscovery practices in 2010.  No cash or prizes being awarded, or even a statuette, but a chance to see what the readers think was the most important case of the year from an eDiscovery standpoint.

We have “nominated” five cases below, which we feel were the most significant in different issues of case law, including duty to preserve and sanctions, clawback agreements under Federal Rule of Evidence 502, not reasonably accessible arguments and discoverability of social media content.  We have a link to review more information about each case, and a link at the bottom of this post to cast your vote.

Very Important!  You can vote anonymously, so we’re not using this as a “hook” to get your information.  You can click on the link at the bottom, select your case and be done with it.  However, we would welcome your comments as to why you selected the case you did and you can – optionally – identify yourself as well.  eDiscoveryDaily will publish selected comments to reflect opinion of the voters as well as the vote results on February 7.  Click here to cast your vote now!

So, here are the cases:

Duty to Preserve/Sanctions

  • The Pension Committee of the Montreal Pension Plan v. Banc of America Securities, LLC, 29010 U.S. Dist. Lexis 4546 (S.D.N.Y. Jan. 15, 2010) (as amended May 28, 2010) – “Pension Committee”: The case that defined negligence, gross negligence, and willfulness in the electronic discovery context and demonstrated the consequences (via sanctions) resulting from those activities.  Judge Shira Scheindlin titled her 85-page opinion “Zubulake Revisited: Six Years Later”.  For more on this case, click here.
  • Victor Stanley, Inc. v. Creative Pipe, Inc., 2010 WL 3530097 (D. Md. 2010) – “Victor Stanley II”: The case of “the gang that couldn’t spoliate straight” where one of the defendants faced imprisonment for up to 2 years (subsequently set aside on appeal) and the opinion included a 12 page chart delineating the preservation and spoliation standards in each judicial circuit.  For more on this case, click here and here.

Clawback Agreements

  • Rajala v. McGuire Woods LLP, 2010 WL 2949582 (D. Kan. July 22, 2010) – “Rajala”: The case that addressed the applicability of Federal Rule of Evidence 502(d) and (e) for “clawback” provisions for inadvertently produced privileged documents.  For more on this case, click here.

Not Reasonably Accessible

  • Major Tours, Inc. v. Colorel, 2010 WL 2557250 (D.N.J. June 22, 2010) – “Major Tours”: The case that established a precedent that a party may obtain a Protective Order relieving it of the duty to access backup tapes, even when that party’s failure to issue a litigation hold caused the data not to be available via any other means.  For more on this case, click here.

Social Media Discovery

  • Crispin v. Christian Audigier Inc., 2010 U.S. Dist. Lexis 52832 (C.D. Calif. May 26, 2010) – “Crispin”: The case that used a 24 year old law (The Stored Communications Act of 1986) to address whether ‘private’ data on social networks is discoverable.  For more on this case, click here.

If you feel that some other case was the most significant case of 2010, you can select that case instead.  Other notable cases include:

  • Rimkus v. Cammarata, 2010 WL 645253 (S.D. Tex. Feb. 19, 2010): Where District Court Judge Lee Rosenthal examined spoliation laws of each of the 13 Federal Circuit Courts of Appeal.
  • Orbit One Communications Inc. v. Numerex Corp., 2010 WL 4615547 (S.D.N.Y. Oct. 26, 2010): Magistrate Judge James C. Francis concluded that sanctions for spoliation must be based on the loss of at least some information relevant to the dispute (differing with “Pension Committee” in this manner).
  • DeGeer v. Gillis, 2010 U.S. Dist. Lexis 97457(N.D. Ill. Sept. 17, 2010): Demonstration of inadvertent disclosure made FRE 502(d) effective, negating waiver of privilege.
  • Takeda Pharmaceutical Co., Ltd. v. Teva Pharmaceuticals USA, Inc., 2010 WL 2640492 (D. Del. June 21, 2010): Defendants’ motion to compel the production of ESI for a period of 18 years was granted, with imposed cost-shifting.
  • E.E.O.C. v. Simply Storage Management, LLC, 2010 U.S. Dist. Lexis 52766 (S.D. Ind. May 11, 2010): EEOC is ordered to produce certain social networking communications.
  • McMillen v. Hummingbird Speedway, Inc., No. 113-2010 CD (C.P. Jefferson, Sept. 9, 2010): Motion to Compel discovery of social network account log-in names and passwords was granted.

Click here to cast your vote now!  Results will be published in eDiscoveryDaily on February 7.

The success of this ‘little experiment’ will determine whether next year there is a second annual “EDDies” award.  😉

And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching: For Defensible Searching, Be a "STARR"

 

Defensible searching has become a priority in eDiscovery as parties in several cases have experienced significant consequences (including sanctions) for not implementing a defensible search strategy in responding to discovery requests.

Probably the most famous case where search approach has been an issue was Victor Stanley, Inc. v. Creative Pipe , Inc., 250 F.R.D. 251 (D. Md. 2008), where Judge Paul Grimm noted that “only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents” and found that privilege on 165 inadvertently produced documents was waived, in part, because of the inadequacy of the search approach.

A defensible search strategy is part using an effective tool (with advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and part using an effective approach to test and verify search results.

I have an acronym that I use to reflect the defensible search process.  I call it “STARR” – as in “STAR” with an extra “R” or Green Bay Packer football legend Bart Starr (sorry, Bears fans!).  For each search that you need to conduct, here’s how it goes:

  • Search: Construct the best search you can to maximize recall and precision for the desired result.  An effective tool gives you more options for constructing a more effective search, which should help in maximizing recall and precision.  For example, as noted on this blog a few days ago, a proximity search can, under the right circumstances, provide a more precise search result without sacrificing recall.
  • Test: Once you’ve conducted the search, it’s important to test two datasets to determine the effectiveness of the search:
    • Result Set: Test the result set by randomly selecting an appropriate sample percentage of the files and reviewing those to determine their responsiveness to the intent of the search.  The appropriate percentage of files to be reviewed depends on the size of the result set – the smaller the set, the higher percentage of it that should be reviewed.
    • Files Not Retrieved: While testing the result set is important, it is also important to randomly select an appropriate sample percentage of the files that were not retrieved in the search and review those as well to see if any responsive hits are identified as missed by the search.
  • Analyze: Analyze the results of the random sample testing of both the result set and also the files not retrieved to determine how effective the search was in retrieving mostly responsive files and whether any responsive files were identified as missed by the search performed.
  • Revise: If the search retrieved a low percentage of responsive files and retrieved a high percentage of non-responsive files, then precision of the search may need to be improved.  If the files not retrieved contained any responsive files, then recall of the search may need to be improved.  Evaluate the results and see what, if any, revisions can be made to the search to improve precision and/or recall.
  • Repeat: Once you’ve identified revisions you can make to your search, repeat the process.  Search, Test, Analyze and (if necessary) Revise the search again until the precision and recall of the search is maximized to the extent possible.

While you can’t guarantee that you will retrieve all of the responsive files or eliminate all of the non-responsive ones, a defensible approach to get as close as you can to that goal will minimize the number of files for review, potentially saving considerable costs and making you a “STARR” in the courtroom when defending your search approach.

So, what do you think?  Are you a “STARR” when it comes to defensible searching?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Database Discovery Pop Quiz ANSWERS

 

So, how did you do?  Did you know all the answers from Friday’s post – without “googling” them?  😉

Here are the answers – enjoy!

What is a “Primary Key”? The primary key of a relational table uniquely identifies each record in the table. It can be a normal attribute that you expect to be unique (e.g., Social Security Number); however, it’s usually best to be a sequential ID generated by the Database Management System (DBMS).

What is an “Inner Join” and how does it differ from an “Outer Join”?  An inner join is the most common join operation used in applications, creating a new result table by combining column values of two tables.  An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record in one of the tables – even if no other matching record exists.  Sometimes, there is a reason to keep all of the records in one table in your result, such as a list of all employees, whether or not they participate in the company’s benefits program.

What is “Normalization”?  Normalization is the process of organizing data to minimize redundancy of that data. Normalization involves organizing a database into multiple tables and defining relationships between the tables.

How does a “View” differ from a “Table”?  A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is a query stored as an object.

What does “BLOB” stand for?  A Binary Large OBject (BLOB) is a collection of binary data stored as a single entity in a database management system. BLOBs are typically images or other multimedia objects, though sometimes binary executable code is stored as a blob.  So, if you’re not including databases in your discovery collection process, you could also be missing documents stored as BLOBs.  BTW, if you didn’t click on the link next to the BLOB question in Friday’s blog, it takes you to the amusing trailer for the 1958 movie, The Blob, starring a young Steve McQueen (so early in his career, he was billed as “Steven McQueen”).

What is the different between a “flat file” and a “relational” database?  A flat file database is a database designed around a single table, like a spreadsheet. The flat file design puts all database information in one table, or list, with fields to represent all parameters. A flat file is prone to considerable duplicate data, as each value is repeated for each item.  A relational database, on the other hand, incorporates multiple tables with methods (such as normalization and inner and outer joins, defined above) to store data efficiently and minimize duplication.

What is a “Trigger”?  A trigger is a procedure which is automatically executed in response to certain events in a database and is typically used for keeping the integrity of the information in the database. For example, when a new record (for a new employee) is added to the employees table, a trigger might create new records in the taxes, vacations, and salaries tables.

What is “Rollback”?  A rollback is the undoing of partly completed database changes when a database transaction is determined to have failed, thus returning the database to its previous state before the transaction began.  Rollbacks help ensure database integrity by enabling the database to be restored to a clean copy after erroneous operations are performed or database server crashes occur.

What is “Referential Integrity”?  Referential integrity ensures that relationships between tables remain consistent. When one table has a foreign key to another table, referential integrity ensures that a record is not added to the table that contains the foreign key unless there is a corresponding record in the linked table. Many databases use cascading updates and cascading deletes to ensure that changes made to the linked table are reflected in the primary table.

Why is a “Cartesian Product” in SQL almost always a bad thing?  A Cartesian Product occurs in SQL when a join condition (via a WHERE clause in a SQL statement) is omitted, causing all combinations of records from two or more tables to be displayed.  For example, when you go to the Department of Motor Vehicles (DMV) to pay your vehicle registration, they use a database with an Owners and a Vehicles table joined together to determine for which vehicle(s) you need to pay taxes.  Without that join condition, you would have a Cartesian Product and every vehicle in the state would show up as registered to you – that’s a lot of taxes to pay!

If you didn’t know the answers to most of these questions, you’re not alone.  But, to effectively provide the information within a database responsive to an eDiscovery request, knowledge of databases at this level is often necessary to collect and produce the appropriate information.    As Craig Ball noted in his Law.com article Ubiquitous Databases, “Get the geeks together, and get out of their way”.  Hey, I resemble that remark!

So, what do you think?  Did you learn anything?  Please share any comments you might have or if you’d like to know more about a particular topic.