Review

Is Technology Assisted Review Older than the US Government? – eDiscovery Trends

A lot of people consider Technology Assisted Review (TAR) and Predictive Coding (PC) to be new technology.  We attempted to debunk that as myth last year after our third annual thought leader interview series by summarizing comments from some of the thought leaders that noted that TAR and PC really just apply artificial intelligence to the review process.  But, the foundation for TAR may go way farther back than you might think.

In the BIA blog, Technology Assisted Review: It’s not as new as you think it is, Robin Athlyn Thompson and Brian Schrader take a look at the origins of at least one theory behind TAR.  Called the “Naive Bayes classifier”, it’s based on theorems that were essentially introduced to the public in 1812.  But, the theorems existed quite a bit earlier than that.

Bayes’s theorem is named after Rev. Thomas Bayes (who died in 1761), who first showed how to use new evidence to update beliefs. He lived so long ago, that there is no known widely accepted portrait of him.  His friend Richard Price edited and presented this work in 1763, after Bayes’s death, as An Essay towards solving a Problem in the Doctrine of Chances.  Bayes’ algorithm remained unknown until it was independently rediscovered and further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 Théorie analytique des probabilities (Analytic theory of probabilities).

Thompson and Schrader go on to discuss more recent uses of artificial intelligence algorithms to map trends, including Amazon’s More Like This functionality that Amazon uses to recommend other items that you may like, based on previous purchases.  That technology has been around for nearly two decades – can you believe it’s been that long? – and is one of the key factors for Amazon’s success over that time.

So, don’t scoff at the use of TAR because it’s “new technology”, that thinking is “naïve”.  Some of the foundation statistical theories for TAR go further back than the birth of our country.

So, what do you think?  Has your organization used technology assisted review on a case yet?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Though it was “Switching Horses in Midstream”, Court Approves Plaintiff’s Predictive Coding Plan – eDiscovery Case Law

In Bridgestone Americas Inc. v. Int’l Bus. Mach. Corp., No. 3:13-1196 (M.D. Tenn. July 22, 2014), Tennessee Magistrate Judge Joe B. Brown, acknowledging that he was “allowing Plaintiff to switch horses in midstream”, nonetheless ruled that that the plaintiff could use predictive coding to search documents for discovery, even though keyword search had already been performed.

In this case where the plaintiff sued the defendant for a $75 million computer system that it claimed threw its “entire business operation into chaos”, the plaintiff requested that the court allow the use of predictive coding in reviewing over two million documents.  The defendant objected, noting that the request was an unwarranted change to the original case management order that did not include predictive coding, and that it would be unfair to use predictive coding after an initial screening had been done with keyword search terms.

Judge Brown conducted a lengthy telephone conference with the parties on June 25 and, began the analysis in his order by observing that “[p]redictive coding is a rapidly developing field in which the Sedona Conference has devoted a good deal of time and effort to, and has provided various best practices suggestions”, also noting that “Magistrate Judge Peck has written an excellent article on the subject and has issued opinions concerning predictive coding.”  “In the final analysis”, Judge Brown continued, “the uses of predictive coding is a judgment call, hopefully keeping in mind the exhortation of Rule 26 that discovery be tailored by the court to be as efficient and cost-effective as possible.”

As a result, noting that “we are talking about millions of documents to be reviewed with costs likewise in the millions”, Judge Brown permitted the plaintiff “to use predictive coding on the documents that they have presently identified, based on the search terms Defendant provided.”  Judge Brown acknowledged that he was “allowing Plaintiff to switch horses in midstream”, so “openness and transparency in what Plaintiff is doing will be of critical importance.”

This case has similar circumstances to Progressive Cas. Ins. Co. v. Delaney, where that plaintiff also desired to shift from the agreed upon discovery methodology for privilege review to a predictive coding methodology.  However, in that case, the plaintiff did not consult with either the court or the requesting party regarding their intentions to change review methodology and the plaintiff’s lack of transparency and lack of cooperation resulted in the plaintiff being ordered to produce documents according to the agreed upon methodology.  It pays to cooperate!

So, what do you think?  Should the plaintiff have been allowed to shift from the agreed upon methodology or did the volume of the collection warrant the switch?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Our 1,000th Post! – eDiscovery Milestones

When we launched nearly four years ago on September 20, 2010, our goal was to be a daily resource for eDiscovery news and analysis.  Now, after doing so each business day (except for one), I’m happy to announce that today is our 1,000th post on eDiscovery Daily!

We’ve covered the gamut in eDiscovery, from case law to industry trends to best practices.  Here are some of the categories that we’ve covered and the number of posts (to date) for each:

We’ve also covered every phase of the EDRM (177) life cycle, including:

Every post we have published is still available on the site for your reference, which has made eDiscovery Daily into quite a knowledgebase!  We’re quite proud of that.

Comparing our first three months of existence to now, we have seen traffic on our site grow an amazing 474%!  Our subscriber base has more than tripled in the last three years!  We want to take this time to thank you, our readers and subcribers, for making that happen.  Thanks for making the eDiscoveryDaily blog a regular resource for your eDiscovery news and analysis!  We really appreciate the support!

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan University, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!).  We really appreciate it!

I also want to extend a special thanks to Jane Gennarelli, who has provided some serial topics, ranging from project management to coordinating review teams to what litigation support and discovery used to be like back in the 80’s (to which some of us “old timers” can relate).  Her contributions are always well received and appreciated by the readers – and also especially by me, since I get a day off!

We always end each post with a request: “Please share any comments you might have or if you’d like to know more about a particular topic.”  And, we mean it.  We want to cover the topics you want to hear about, so please let us know.

Tomorrow, we’ll be back with a new, original post.  In the meantime, feel free to click on any of the links above and peruse some of our 999 previous posts.  Now is your chance to catch up!  😉

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Sides with Defendant in Dispute over Predictive Coding that Plaintiff Requested – eDiscovery Case Law

In the case In re Bridgepoint Educ., Inc., Securities Litigation, 12cv1737 JM (JLB) (S.D. Cal. Aug. 6, 2014), California Magistrate Judge Jill L. Burkhardt ruled that expanding the scope of discovery by nine months was unduly burdensome, despite the plaintiff’s request for the defendant to use predictive coding to fulfill its discovery obligation and also approved the defendants’ method of using search terms to identify responsive documents for the already reviewed three individual defendants, directing the parties to meet and confer regarding the additional search terms the plaintiffs requested.

In this case involving several discovery disputes, a telephonic discovery conference was held in the instant action on June 27, during which, the Court issued oral orders on three of four discovery disputes.  As to the remaining dispute, the Court requested supplemental briefings from both parties and issued a ruling in this order, along with formalizing the remaining orders.

The unresolved discovery dispute concerned the plaintiffs’ “request for discovery extending beyond the time frame that Defendants have agreed to” for an additional nine months.  In their briefing, the defendants (based on the production efforts to date) claimed that expanding the scope of discovery by nine months would increase their review costs by 26% or $390,000.  The plaintiffs’ reply brief argued that the defendants’ estimate reflected the cost of manual review rather than the predictive coding system that the defendants would use – according to the plaintiffs, the cost of predictive coding was the only cost relevant to the defendants’ burden, estimating the additional burden to be roughly $11,279.

Per the Court’s request, the defendants submitted a reply brief addressing the arguments raised by the plaintiffs, arguing that predictive coding software “does not make manual review for relevance merely elective”.  The defendants argued that the software only assigns a percentage estimate to each document that reflects the assessment of the probability that the document is relevant, but the software is not foolproof and that attorney review is still required to ensure that the documents produced are both relevant and not privileged.

Judge Burkhardt, citing the “proportionality” rule of Federal Rule of Civil Procedure Rule 26(b)(2)(C), denied expanding the scope of discovery by nine months, finding that “Defendants have set forth sufficient evidence to conclude that the additional production would be unduly burdensome”.

The plaintiffs, claiming that the defendants “unilaterally-selected search terms” to identify the original production, also argued discovery produced from three Individual Defendants should be added to the Defendants’ predictive coding software.  But, Judge Burkhardt, formalizing the oral order, stated “[t]he Court approved Defendants’ method of using linear screening with the aid of search terms to identify responsive documents with regard to the emails already reviewed for the three Individual Defendants. The parties were directed to meet and confer regarding the additional search terms Plaintiffs would like Defendants to use.”

So, what do you think?  Was the additional discovery scope unduly burdensome or did the plaintiff have a point about reduced discovery costs?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will resume posts on Tuesday.  Happy Labor Day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

When Reviewing and Producing Documents, Don’t Forget the “Mother and Child Reunion” – eDiscovery Best Practices

I love Paul Simon’s music.  One of my favorite songs of his is Mother and Child Reunion.  Of course, I’m such an eDiscovery nerd that every time I think of that song, I think of keeping email and attachment families together.  If you don’t remember the Mother and Child Reunion, you might provide an incomplete production to opposing counsel.

BTW, here’s a little known fact: Paul Simon took the title of the song Mother and Child Reunion from the name of a chicken-and-egg dish he noticed on a Chinese restaurant’s menu.

Like the rest of us, attorneys don’t like to feel shortchanged, especially in discovery.  While there are exceptions, in most cases these days when an email or attachment is deemed to be responsive, the receiving party expects to also receive any “family” members of the responsive file.  Attorneys like to have the complete family when reviewing the production from the other side, even if some of the individual files aren’t responsive themselves.  Receiving an email without its corresponding attachments or receiving some, but not all, of the attachments to an email tends to raise suspicions.  Most attorneys don’t want to give opposing counsel a reason to be suspicious of their production, so parties typically agree to produce the entire email “family” in these cases.  Here’s a scenario:

The case involves a dispute over negotiated gas rate agreements between energy companies and Federal Energy Regulatory Commission (FERC) approval of those rate agreements.  A supervisor at the company verbally requests a copy of a key contract from one of his employees, along with the latest FERC filings and the employee emails a copy of the contract and FERC filing summary attached to an email with the subject “Requested files” and the body stating “Here you go…” (or something to that effect).  A search for “negotiated w/2 (gas or rate or agreement)” retrieves the contract attachment, but not the email, which doesn’t really have any pertinent information on it, or the FERC filing summary.  Only part of the email “family” is responsive to the search.

If it’s important to produce all communications between parties at the company regarding negotiated gas agreements, this communication could be missed – unless your review protocol includes capturing the family members of responsive files and your review software provides an option to view the family members of responsive files and include them in search results.  I underlined “option” because there are still a few cases where parties agree to limit production to actual responsive files and not produce the families (though, in my recent experience, those cases are exceptions).

If your case isn’t one of the exceptions, make sure you have a well thought out protocol and robust software for including family members in your search results and in your document reviews for responsiveness, as well as automated and manual Quality Assurance (QA) and Quality Control (QC) checks to ensure your production contains complete family groups.

So, what came first, the chicken or the egg?  It doesn’t matter, as long as the family group is intact.  🙂

So, what do you think?  How do you handle family groups in discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules in Dispute Between Parties Regarding ESI Protocol, Suggests Predictive Coding – eDiscovery Case Law

In a dispute over ESI protocols in FDIC v. Bowden, CV413-245 (S.D. Ga. June 6, 2014), Georgia Magistrate Judge G. R. Smith approved the ESI protocol from the FDIC and suggested the parties consider the use of predictive coding.

After FDIC-insured Darby Bank & Trust Co. failed in November 2010, the FDIC took over as receiver (as FDIC-R) and brought a bank mismanagement case against sixteen of Darby’s former directors and officers.  Thus far, the parties had been unable to agree on a Joint Protocol for Electronically Stored Information (ESI) and the dispute ultimately reached the court.  The FDIC-R had already spent $614,000 to digitally scan about “2.01 terabytes of data or 153.6 million pages” of data at the bank, but the defendants insisted that the FDIC-R shoulder the burden and expense of reviewing the documents and determining their responsiveness to the claims “”[e]ven though the Bank’s documents were created under Defendants’ custody and control”.

The defendants also argued for a protocol which involved the FDIC-R to “repeatedly search, review, and re-review myriad ‘second-run’ (Phase II) documents, then turn over to them the documents relevant to both claims and defenses that arise in this litigation. The FDIC-R argued for a protocol in which it would produce “categories of documents most likely to contain relevant information” which the defendants could then search, claiming that protocol would be the more “correct allocation of discovery burdens between the parties.” The defendants contended that “search terms alone won’t suffice” and the FDIC-R’s proposed protocol does not relieve the receiver of its Rule 34 burden to “locate and produce responsive documents.”

After reviewing the two proposed protocols, Judge Smith ruled that “given the common ground between the dueling protocols here, the FDIC-R’s ESI protocol will be implemented, as modified by the FDIC-R’s ‘briefing concessions’…as well as by the additional guidance set forth in this Order.”  Those briefing concessions included “offering to open ‘all of the Bank’s former documents . . . [so defendants can retrieve them] to the same extent that the FDIC-R can’” and “offering, in ‘Phase II’ of the disclosure process, to ‘meet and confer with Defendants to reach agreement upon a set of reasonable search terms to run across the database of sources of the ESI to identify documents for production’”.  In approving the FDIC-R’s protocol, Judge Smith stated that “the FDIC-R may meaningfully deploy suitable search terms to satisfy its initial disclosure requirements and respond to forthcoming Rule 34 document requests”.

Also, referencing the DaSilva Moore decision of 2012, Judge Smith stated that “the parties shall consider the use of predictive coding” if ESI protocol disagreements persisted noting that it “has emerged as a far more accurate means of producing responsive ESI in discovery”.

So, what do you think? Should organizations bear the bulk of the discovery burden in cases against individual defendants? Or should the burden be balanced between both parties?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will resume posts on Monday, July 7.  Happy Birthday America!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules that Unilateral Predictive Coding is Not Progressive – eDiscovery Case Law

In Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678-LRH-PAL (D. Nev. May 19, 2014), Nevada Magistrate Judge Peggy A. Leen determined that the plaintiff’s unannounced shift from the agreed upon discovery methodology, to a predictive coding methodology for privilege review was not cooperative.  Therefore, the plaintiff was ordered to produce documents that met agreed-upon search terms without conducting a privilege review first.

This declaratory relief action had been plagued by delays in discovery production, which led to the defendants filing a Motion to Compel the plaintiffs to produce discovery in a timely fashion. Following a hearing, both sides were ordered to meet and confer, and hold meaningful discussions about resolving outstanding ESI issues pursuant to discovery. The plaintiff contended that the defendant’s discovery requests, as standing, would require them to produce approximately 1.8 million documents, which would be unduly burdensome. Both parties agreed to search terms that would reduce the number of potentially responsive documents to around 565,000, which the plaintiff would manually review for privileged documents before producing discovery to the defendant.

Shortly thereafter, the plaintiff determined that manual review would be too expensive and time-consuming, and therefore after consulting with a “nationally-recognized authority on eDiscovery,” elected to apply predictive coding to the identified 565,000 documents. Plaintiff selected a software program that they began using to identify relevant documents with the intention of applying a further predictive coding layer in order to determine which documents were “more likely privileged” and which were “less likely privileged.”

However, the plaintiff did not consult with either the court or the requesting party regarding their intentions to change review methodology. As a result, the defendant objected to the use of predictive coding in this case for several reasons, including the plaintiff’s lack of transparency surrounding its predictive coding methodology and its failure to cooperate, as well as the plaintiff’s failure to adhere to the best practices for the chosen software program which were recommended to them by the authority they chose. Finally, the defendants cited a likelihood of satellite disputes revolving around discovery, should the plaintiff proceed with the current predictive coding, which would further delay production discovery that had already been “stalled for many months.”

The defendant requested that either the plaintiff be required to proceed with predictive coding according to the defendant’s suggested protocol, which would include applying the predictive methodology to all of the originally collected 1.8 million documents, or that the plaintiff produce the non-privileged keyword hits without any review, but allowing them to be subject to a clawback order—which was a second option included in the originally stipulated ESI protocol that both parties had agreed to. Although this option would shift the burden of discovery to the defendant, it was noted that the defendant was “committed to devot[ing] the resources required to review the documents as expeditiously as possible” in order to allow discovery to move forward.

Judge Leen acknowledged potential support for the general methodology of predictive coding in eDiscovery, and stated that a “transparent mutually agreed upon” protocol for such a method would likely have been approved. However, Judge Leen took issue that the plaintiff had refused to “engage in the type of cooperation and transparency that its own eDiscovery consultant has so comprehensibly and persuasively explained is needed for a predictive coding protocol to be accepted by the court or opposing counsel” and instead had “elected and then abandoned the second option—to manually review and produce responsive ESI documents. It abandoned the option it selected unilaterally, without the [defendant’s] acquiescence or the court’s approval and modification of the parties’ stipulated ESI protocol.”

Therefore, Judge Leen elected to enforce the second option described in the agreed-upon ESI protocol, and required the plaintiff to produce all 565,000 documents that matched the stipulated search terms without review, with a clawback option in place for privileged documents as well as permission to apply privilege filters to the documents at issue, and withhold those documents that returned as “most likely privileged.”

So, what do you think? Should parties need to obtain approval regarding the review methodology that they plan to use?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

If Your Documents Are Not Logical, Discovery Won’t Be Either – eDiscovery Best Practices

Scanning may no longer be cool, but it’s still necessary.  Electronic discovery still typically includes a paper component.  When it comes to paper, how documents are identified is critical to how useful they will be.  Here’s an example.

Your client collects hard copy documents from various custodians related to the case and organizes them into folders.  In one of the folders is a one page fax cover sheet attached to a two page letter, as well as an unrelated report and four different contracts, each 15-20 pages.  The entire folder is scanned as a single document, as either a TIFF or PDF file.

Only the letter is retrieved in a search as responsive to the case.  But, because it is contained within a document containing 70 to 80 other pages, you wind up reviewing 70 to 80 unrelated pages that would not otherwise have to review.  It complicates production, as well – how do you produce partial “documents”?  Also, if the non-responsive report and contracts have duplicates in the collection, you can’t effectively de-dupe those to eliminate those from the review population because they’re combined together.

It happens more often than you think.  It also can happen – sometimes quite often – with the scanned documents that the other side produces to you.  So, how do you get the documents into a more logical and usable organization?

Logical Document Determination (or LDD) is a process that some eDiscovery providers (including – shameless plug warning! – CloudNine Discovery).  It’s a process where each image page in a scanned document set is reviewed and the “logical document breaks” (i.e., each page that starts a new document) is identified.  Then, the documents are re-assembled, based on those logical document breaks.

Once the documents are logically organized, other processes – like Optical Character Recognition (OCR) and clustering (including near duplicate identification) can then be performed at the appropriate level of documents and the smaller, more precise, unitized documents can be indexed for searching.  Instead of reviewing a 70-80 page “document” comprised of several logical documents, your search will retrieve the two page letter that is actually responsive, making your review and production processes more efficient.

LDD is typically priced on a per page basis of pages reviewed for logical document breaks – prices can vary depending on the volume of pages to be reviewed and where the work is being performed (there are providers in the US and overseas).  While it’s a manual process, it’s well worth it if your collection of imaged documents is poorly defined.

So, what do you think? Have you ever received a collection of poorly organized image files? If so, did you use Logical Document Determination to organize them properly?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Predictive Analytics: It’s Not Just for Review Anymore – eDiscovery Trends

One of the most frequently discussed trends in this year’s annual thought leader interviews that we conducted was the application of analytics (including predictive analytics) to Information Governance.  A recent report published in the Richmond Journal of Law & Technology addresses how analytics can be used to optimize Information Governance.

Written by Bennett B. Borden & Jason R. Baron (who was one of our thought leaders discussing that very topic), Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice, 20 RICH. J.L. & TECH. 7 (2014) is written for those who are not necessarily experts in the field.  It provides a synopsis of why and how predictive coding first emerged in eDiscovery and defines important terms related to the topic, then discusses aspects of an information governance program where application of predictive coding and related analytical techniques is most useful. Most notably, the authors provide a few “early” examples of the use of advanced analytics, like predictive coding, for non-litigation contexts to illustrate the possibilities for applying the technology.  Here is a high-level breakdown of the report:

Introduction (pages 1-3): Provides a high-level introduction of the topics to be discussed.

A. The Path to Da Silva Moore (pages 3-14): Provides important background to the evolution of managing electronically stored information (ESI) and predictive coding (fittingly, it begins with the words “In the beginning”).  Starting on page 9, the authors discuss “The Da Silva Moore Precedent”, providing a detailed account of the Da Silva Moore case (our post here summarizes our coverage of the case) and also references other cases, as well: In re Actos (Pioglitazone) Products Liability Litigation, Global Aerospace Inc., et al, v. Landow Aviation, L.P., Kleen Products v. Packaging Corp. of America, EORHB, Inc. v. HOA Holdings and In Re: Biomet M2a Magnum Hip Implant Products Liability Litigation.  Clearly, the past couple of years have provided several precedents for the use of predictive coding in litigation.

B. Information Governance and Analytics in the Era of Big Data (pages 15-20): This section provides definitions and important context for terms such as “big data”, “analytics” and “Information Governance”.  It’s important to have the background on these concepts before launching into how analytics can be applied to optimize Information Governance.

C. Applying the Lessons of E-Discovery In Using Analytics for Optimal Information Governance: Some Examples (pages 21-31): With the background of sections A and B under your belt, the heart of the report then gets into the actual application of analytics in different scenarios, using “True Life Examples” that are “’ripped from’ the pages of the author’s legal experience, without embellishment”.  These examples where analytics are used include:

  • A corporate client is being sued by a former employee in a whistleblower qui tam action;
  • A highly regulated manufacturing client decided to outsource the function of safety testing some of its products and a director of the department whose function was being outsourced, despite being offered a generous severance package, demanded four times the severance amount and threatened to go to the company’s regulator with a list of ten supposed major violations that he described in the email if he did not receive what he was asking for.
  • A major company received a whistleblower letter from a reputable third party alleging that several senior personnel were involved with an elaborate kickback scheme that also involved FCPA violations.
  • An acquisition agreement between parties contained a provision such that if the disclosures made by the target were found to be off by a certain margin within thirty days of the acquisition, the purchase price would be adjusted.

In each case, the use of analytics either resulted in a quick settlement, proved the alleged violations to be unfounded, or resulted in an appropriate adjustment in the purchase price of the acquired company.  These real world examples truly illustrate how analytics can be applied beyond the document review stage of eDiscovery.

Conclusion (pages 31-32): While noting that the authors’ intent was to “merely scratch the surface” of the topic, they offer some predictions for the end of the decade and note “expected demand on the part of corporate clients for lawyers to be familiar with state of the art practices in the information governance space”.  In other words, your clients are going to expect you to understand this.

The report is an easy read, even for novices to the technology, and is a must-read for anyone looking to understand more about applying analytics to Information Governance.  Bennett and Jason are both with Drinker Biddle & Reath LLP and are also co-chairs of the Information Governance Initiative (here is our recent blog post about IGI).

So, what do you think? Has your organization applied analytics to big data to reduce or eliminate litigation costs? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Searching for Individuals Isn’t as Straightforward as You Think – eDiscovery Best Practices

I’ve recently worked with a couple of clients who proposed search terms for key individuals that were a bit limited, so I thought this was an appropriate topic to revisit.

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think.  There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents.  Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

  • “Jim Smith”
  • “Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase.  For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search.  Proximity searching is simply looking for two or more words that appear close to each other in the document.  A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”.  Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name.  What are other variations of the name “Jim”?  How about “James” or “Jimmy”?  Or even “Jimbo”?  I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends.  Also, some documents may refer to him by his initials – i.e., “J.T. Smith”.  All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial.  If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 Smith, where “w/3” is “within 3 words of”.  This is the syntax you would use to perform the search in OnDemand®, CloudNine Discovery’s online review tool.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”?  Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock).  Don’t get wild with wildcards!  Using the specific variations you want (e.g., “jim OR jimmy”) is often best, though you should always test your terms (and variations of those terms) to maximize the balance between recall and precision.

Of course, there’s another way to retrieve documents that mention key individuals – through their email addresses.  We’ll touch on that topic next week.

So, what do you think?  How do you handle searching for key individuals within your document collections?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.