Searching

Why Is TAR Like a Bag of M&M’s?, Part Two: eDiscovery Best Practices

Editor’s Note: Tom O’Connor is a nationally known consultant, speaker, and writer in the field of computerized litigation support systems.  He has also been a great addition to our webinar program, participating with me on several recent webinars.  Tom has also written several terrific informational overview series for CloudNine, including eDiscovery and the GDPR: Ready or Not, Here it Comes (which we covered as a webcast), Understanding eDiscovery in Criminal Cases (which we also covered as a webcast) and ALSP – Not Just Your Daddy’s LPO.  Now, Tom has written another terrific overview regarding Technology Assisted Review titled Why Is TAR Like a Bag of M&M’s? that we’re happy to share on the eDiscovery Daily blog.  Enjoy! – Doug

Tom’s overview is split into four parts, so we’ll cover each part separately.  The first part was covered on Tuesday.  Here’s part two.

History and Evolution of Defining TAR

Most people would begin the discussion by agreeing with this framing statement made by Maura Grossman and Gordon Cormack in their seminal article, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, (XVII RICH. J.L. & TECH. 11 (2011):

Overall, the myth that exhaustive manual review is the most effective—and therefore, the most defensible—approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.

A technology-assisted review process may involve, in whole or in part, the use of one or more approaches including, but not limited to, keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, and sampling.

So, TAR began as a process and in the early stage of the discussion, it was common to refer to various TAR tools under the heading “analytics” as illustrated by the graphic below from Relativity.

Copyright © Relativity

That general heading was often divided into two main categories

Structured Analytics

  • Email threading
  • Near duplicate detection
  • Language detection

Conceptual Analytics

  • Keyword expansion
  • Conceptual clustering
  • Categorization
  • Predictive Coding

That definition of Predictive Coding as part of the TAR process held for quite some time. In fact, the current EDRM definition of Predictive Coding still refers to it as:

An industry-specific term generally used to describe a Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on a Subject Matter Expert’s Coding of a Training Set of Documents

But before long, the definition began to erode and TAR started to become synonymous with Predictive Coding. Why?  For several reasons I believe.

  1. The Grossman-Cormack glossary of 2013 used the phrase Coding” to define both TAR and PC and I think various parties then conflated the two. (See No. 2 below)

  1. Continued use of the terms interchangeably. See EG, Ralph Losey’s TARCourse,” where the very beginning of the first chapter states, “We also added a new class on the historical background of the development of predictive coding.”  (which is, by the way, an excellent read).
  2. Any discussion of TAR involves selecting documents using algorithms and most attorneys react to math the way the Wicked Witch of the West reacted to water.

Again, Ralph Losey provides a good example.  (I’m not trying to pick on Ralph, he is just such a prolific writer that his examples are everywhere…and deservedly so). He refers to gain curves, x-axis vs y-axis, HorvitsThompson estimators, recall rates, prevalence ranges and my personal favorite “word-based tf-idf tokenization strategy.”

“Danger. Danger. Warning. Will Robinson.”

  1. Marketing: the simple fact is that some vendors sell predictive coding tools. Why talk about other TAR tools when you don’t make them? Easier to call your tool TAR and leave it at that.

The problem became so acute that by 2015, according to a 2016 ACEDS News Article, Maura Grossman and Gordon Cormack trademarked the terms “Continuous Active Learning” and “CAL”, claiming those terms’ first commercial use on April 11, 2013 and January 15, 2014. In an ACEDS interview earlier in the year, Maura stated that “The primary purpose of our patents is defensive; that is, if we don’t patent our work, someone else will, and that could inhibit us from being able to use it. Similarly, if we don’t protect the marks ‘Continuous Active Learning’ and ‘CAL’ from being diluted or misused, they may go the same route as technology-assisted review and TAR.”

So then, what exactly is TAR? Everyone agrees that manual review is inefficient, but nobody can agree on what software the lawyers should use and how. I still prefer to go back to Maura and Gordon’s original definition. We’re talking about a process, not a product.

TAR isn’t a piece of software. It’s a process that can include many different steps, several pieces of software, and many decisions by the litigation team. Ralph calls it the multi-modal approach: a combination of people and computers to get the best result.

In short, analytics are the individual tools. TAR is the process you use to combine the tools you select.  The next consideration, then, is how to make that selection.

We’ll publish Part 3 – Uses for TAR and When to Use or Not Use It – next Tuesday.

So, what do you think?  How would you define TAR?  And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © Mars, Incorporated and its Affiliates.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Why Is TAR Like a Bag of M&M’s?: eDiscovery Best Practices

Editor’s Note: Tom O’Connor is a nationally known consultant, speaker, and writer in the field of computerized litigation support systems.  He has also been a great addition to our webinar program, participating with me on several recent webinars.  Tom has also written several terrific informational overview series for CloudNine, including eDiscovery and the GDPR: Ready or Not, Here it Comes (which we covered as a webcast), Understanding eDiscovery in Criminal Cases (which we also covered as a webcast) and ALSP – Not Just Your Daddy’s LPO.  Now, Tom has written another terrific overview regarding Technology Assisted Review titled Why Is TAR Like a Bag of M&M’s? that we’re happy to share on the eDiscovery Daily blog.  Enjoy! – Doug

Tom’s overview is split into four parts, so we’ll cover each part separately.  Here’s the first part.

Introduction

Over the past year I have asked this question several different ways in blogs and webinars about technology assisted review (TAR). Why is TAR like ice cream? Think Baskin Robbins? Why is TAR like golf? Think an almost incomprehensible set of rules and explanations. Why is TAR like baseball, basketball or football? Think never ending arguments about the best team ever.

And now my latest analogy. Why is TAR like a bag of M&M’s?  Because there are multiple colors with sometimes a new one thrown in and sometimes they have peanuts inside but sometimes they have chocolate.  And every now and then you get a bag of Reese’s Pieces and think to yourself, “ hmmmm, this is actually better than M&M’s. “

Two recent cases spurred this new rumination on TAR. First came the decision in Winfield, et al. v. City of New York, No. 15-CV-05236 (LTS) (KHP) (S.D.N.Y. Nov. 27, 2017) (covered by eDiscovery Daily here), where Magistrate Judge Parker ordered the parties to meet and confer on any disputes with regards to a TAR process “with the understanding that reasonableness and proportionality, not perfection and scorched-earth, must be their guiding principles.”  More recently is the wonderfully crafted validation protocol (covered by ACEDS here) from Special Master Maura Grossman in the In Re Broiler Chicken Antitrust Litigation, (Jan. 3, 2018) matter currently pending in the Northern District of Illinois.

Both of these cases harkened back to Aurora Cooperative Elevator Company v. Aventine Renewable Energy or Independent Living Center of Southern California v. City of Los Angeles, a 2015 where the court ordered the use of predictive coding after extensive discovery squabbles and the 2016 decision in Hyles v. New York City (covered by eDiscovery Daily here) where by Judge Peck, in declining to order the parties to use TAR, used the phrase on page 1 of his Order, “TAR (technology assisted review, aka predictive coding) … “.

Which brings me to my main point of discussion. Before we can decide on whether or not to use TAR we have to decide what TAR is.  This discussion will focus on the following topics:

  1. History and Evolution of Defining TAR
  2. Uses for TAR and When to Use or Not Use It
  3. Justification for Using TAR
  4. Conclusions

We’ll publish Part 2 – History and Evolution of Defining TAR – on Thursday.

So, what do you think?  How would you define TAR?  And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © Mars, Incorporated and its Affiliates.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Houstonians, Here’s a Terrific Panel Discussion on TAR Right in Your Own Backyard: eDiscovery Best Practices

Next month, I have the privilege of moderating a panel on the current state of the acceptance of technology assisted review (TAR) with three terrific panelists, courtesy of the Association of Certified E-Discovery Specialists (ACEDS).  If you’re in Houston on April 3rd, you might want to check it out!

The panel is titled From Asking About It to Asking For It: The Evolution of the Acceptance and Use of TAR and it will be held at the offices of BoyarMiller law firm at 2925 Richmond Avenue, Houston, Texas  77098 (their offices are on the 14th floor).  The event will begin at 11:30am and will conclude at 1:30pm.  Lunch will be served!

Our panelists will be Christopher Cafiero, J.D., Southwest Territory Manager of Catalyst Repository Systems (and former trial lawyer), Gary Wiener, Independent eDiscovery Consultant, SME and Attorney and Rohit Kelkar, Vice President of R&D at Servient.  We will discuss several topics related to the current state of TAR, including the state of approval of TAR within the legal community, differences in approaches and preferred methods to TAR, disclosure of the use of TAR to opposing parties, and recommendations for those using TAR for the first time.

If you’re in Houston and you’d like to attend, register by clicking here.  Honestly, I don’t know how many people will be able to attend, so I recommend that you register early (but not often) to make sure you can get in.  If you want to learn about TAR in the Houston area, this is an excellent opportunity!

So, what do you think?  Are you interested in learning about TAR and are you going to be in the Houston area on April 3rd?  If so, we’d love to see you there!  And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Denies Plaintiffs’ Request to Email All Defendant Employees as “Simply Unreasonable”: eDiscovery Case Law

In Firefighters’ Ret. Sys., et al. v. Citco Grp. Ltd., et al., No. 13-373-SDD-EWD (M.D. La. Jan. 3, 2018), Louisiana Magistrate Judge Erin Wilder-Doomes denied the plaintiffs’ renewed motion to compel after the parties previously agreed upon search terms and document custodians, stating that the plaintiffs’ request to “email everyone in every Citco entity to ask whether anyone employed by any Citco entity has knowledge relevant to this litigation, and thereafter require the Citco Defendants to conduct additional electronic and hard copy searches for documents” was “simply unreasonable” and would be “unduly burdensome”.

Case Background

In this case regarding claims of unjust enrichment and breach of contract (among others) regarding fund shares purchased for $100 million that ultimately turned out to be worthless, the plaintiffs previously filed a Motion to Compel seeking an order compelling Citco Group to respond to multiple interrogatories and requests for production based upon the knowledge of entities controlled by Citco Group and/or possession of documents by entities controlled by the Citco Group.  In response, the Citco Defendants argued that granting Plaintiffs’ motion would ignore the substantial discovery efforts already made in the case (as the parties had previously agreed to a scope of 56 search terms to be applied against 21 custodians) and would be incompatible with the proportionality requirement of the federal rules.

The Initial Motion to Compel was discussed during an October 2017 status conference with the parties, and the court found that Plaintiffs’ concerns should be addressed with a 30(b)(6) deposition of defendant’s corporate counsel to describe the process for locating responsive documents (and denied the Initial Motion without prejudice to re-urging following the corporate deposition).  After the 30(b)(6) deposition, the plaintiffs filed a Renewed Motion to Compel, contending that the defendants’ responses to these interrogatories “were incomplete and inaccurate” and resulted in “a flawed list of custodians” and a “flawed electronic search for documents.”  They also contended that the defendants’ 30(b)(6) deponent confirmed that “one email can be sent to everyone in the Citco organization and ask them limited questions about their personal knowledge of the issues in this lawsuit”.  The defendants objected, asserting that any additional searches (beyond the previously agreed scope) based on an e-mail questionnaire to all employees “would be disproportional to the needs of this case”.  A December 2017 status conference failed to resolve the dispute.

Judge’s Ruling

Judge Wilder-Doomes reiterated that “Based on the parties’ correspondence, the parties agreed upon 56 search terms and…21 document custodians”.  She also observed that “Plaintiffs still have not explained why the custodians and search terms used were unreasonable. Moreover, although the Citco Defendants have been willing to add additional search terms during the course of this litigation, and note in opposition to the Renewed Motion to Compel that they are ‘prepared to discuss with Plaintiffs additional document custodians (if Plaintiffs identify any),’ Plaintiffs failed to identify proposed additional custodians in either their Renewed Motion to Compel or during the December 12, 2017 status conference.”

In denying the Renewed Motion to Compel, Judge Wilder-Doomes stated: “Instead, Plaintiffs seek permission from this court to email everyone in every Citco entity to ask whether anyone employed by any Citco entity has knowledge relevant to this litigation, and thereafter require the Citco Defendants to conduct additional electronic and hard copy searches for documents. That is simply unreasonable, and in essence is a request for the Citco Defendants to ‘go back to square one’ of their document production efforts despite the parties’ agreement regarding custodians and search terms, the Citco Defendants apparent willingness to consider additional custodians and search terms, and Plaintiffs failure to identify or explain the necessity of any additional custodians or search terms. Further, such a large scale search raises proportionality concerns and, especially in light of the parties’ previous agreements and efforts, would be unduly burdensome.”

So, what do you think?  Was this request a “fishing expedition” by the plaintiffs?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Plaintiff to Reproduce ESI and Produce Search Term List As Agreed: eDiscovery Case Law

In Youngevity Int’l Corp., et al. v. Smith, et al., No: 16-cv-00704-BTM (JLB) (S.D. Cal. Dec. 21, 2017), California Magistrate Judge Jill L. Burkhardt, granted the defendants’ motion to compel proper productions against the plaintiffs and ordered the plaintiffs to either provide its search hit list to the plaintiffs, meet and confer on the results and screen the results for responsiveness and privilege OR produce 700,000 additional responsive documents and pay for the defendants to conduct Technology Assisted Review (TAR) on the results.  Judge Burkhardt also ordered the plaintiffs to designate “only qualifying documents” as confidential or Attorney’s Eyes Only (AEO) and to pay for the reasonable expenses, including attorney’s fees, of bringing the motion.

Case Background

In this case regarding alleged unlawful competition filed by the plaintiffs against Wakaya (the defendants company formed by former distributors of the plaintiffs company), the defendants proposed during discovery in May 2017 a three-step process by which: “(i) each side proposes a list of search terms for their own documents; (ii) each side offers any supplemental terms to be added to the other side’s proposed list; and (iii) each side may review the total number of results generated by each term in the supplemented lists (i.e., a ‘hit list’ from our third-party vendors) and request that the other side omit any terms appearing to generate a disproportionate number of results.”  Six days later, the plaintiffs stated that “[w]e are amenable to the three step process described in your May 9 e-mail” and the parties exchanged lists of proposed search terms to be run on their own ESI and their opponent’s ESI.

While the defendants provided the plaintiffs with a hit list of the total number of results generated by running each term in the expanded search term list across its ESI, the plaintiffs never produced its hit list.  The plaintiffs also made two large productions of approximately 1.9 million pages and 2.3 million pages and, without reviewing them beforehand, mass designated them all as confidential and/or AEO.  The produced ESI contained numerous non-responsive documents and the parties attempted without success to meet and confer (even with Court assistance) on reducing the number of documents classified as AEO.  The plaintiffs also notified the defendants (around the beginning of October 2017), that it had inadvertently failed to produce an additional 700,000 documents due to a technical error by its discovery vendor.

As a result of all of the issues associated with the plaintiffs’ production, the defendants sought an order under FRCP 26(g) or Rule 37 requiring the plaintiffs to remedy its improper production and pay the costs incurred by the defendants as a result of this motion and the costs associated with reviewing the plaintiffs’ prior productions.

Judge’s Ruling

While considering the defendants’ assertions that the plaintiffs “impermissibly certified its discovery responses because its productions amounted to a ‘document dump’ intended to cause unnecessary delay and needlessly increase the cost of litigation”, Judge Burkhardt determined that “Wakaya fails to establish that Youngevity violated Rule 26(g)”, “declin[ing] to find that Youngevity improperly certified its discovery responses when the record before it does not indicate the content of Youngevity’s written responses, its certification, or a declaration stating that Youngevity in fact certified its responses.”

However, Judge Burkhardt stated that “the record indicates that Youngevity did not produce documents following the protocol to which the parties agreed”, noting that “Youngevity failed to produce its hit list to Wakaya, and instead produced every document that hit upon any proposed search term” and that “the parties negotiated a stipulated protective order, which provides that only the ‘most sensitive’ information should be designated as AEO”.  She also stated that “Youngevity conflates a hit on the parties’ proposed search terms with responsiveness…The two are not synonymous…Search terms are an important tool parties may use to identify potentially responsive documents in cases involving substantial amounts of ESI. Search terms do not, however, replace a party’s requests for production.”

As a result, Judge Burkhardt gave the plaintiffs two options for correcting their discovery productions with specific deadlines:

“1) By December 26, 2017, provide its hit list to Defendant; by January 5, 2018, conclude the meet and confer process as to mutually acceptable search terms based upon the hit list results; by January 12, 2018, run the agreed upon search terms across Plaintiff’s data; by February 15, 2018, screen the resulting documents for responsiveness and privilege; and by February 16, 2018, produce responsive, non-privileged documents with only appropriate designations of “confidential” and “AEO” (said production to include that subset of the not-previously-produced 700,000 documents that are responsive and non-privileged); or

2) By December 26, 2017, provide the not-previously-produced 700,000 documents to Defendant without further review; pay the reasonable costs for Defendant to conduct a TAR of the 700,000 documents and the July 21, 2017 and August 22, 2017 productions for responsiveness; by January 24, 2018, designate only those qualifying documents as “confidential” or “AEO”; by that date, any documents not designated in compliance with this Order will be deemed de-designated.”

Judge Burkhardt also ordered the plaintiffs to pay for the reasonable expenses, including attorney’s fees for bringing the motion and for the expenses incurred by the defendants “as a result of Youngevity’s failure to abide by the Stipulated Protective Order.”

So, what do you think?  Did the plaintiffs abuse the process?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Disagrees with Plaintiff’s Contentions that Defendant’s TAR Process is Defective: eDiscovery Case Law

In Winfield, et al. v. City of New York, No. 15-CV-05236 (LTS) (KHP) (S.D.N.Y. Nov. 27, 2017), New York Magistrate Judge Katharine H. Parker, after conducting an in camera review of the defendant’s TAR process and a sample set of documents, granted in part and denied in part the plaintiffs’ motion, ordering the defendant to provide copies of specific documents where the parties disagreed on their responsiveness and a random sample of 300 additional documents deemed non-responsive by the defendant.  Judge Parker denied the plaintiff’s request for information about the defendant’s TAR process, finding no evidence of gross negligence or unreasonableness in their process.

Case Background

In this dispute over alleged discrimination in the City’s affordable housing program, the parties had numerous disputes over the handling of discovery by the defendant in the case.  The plaintiffs lodged numerous complaints about the pace of discovery and document review, which initially involved only manual linear review of documents, so the Court directed the defendant to complete linear review as to certain custodians and begin using Technology Assisted Review (“TAR”) software for the rest of the collection.  After a dispute over the search terms selected for use, the plaintiffs proposed over 800 additional search terms to be run on certain custodians, most of which (after negotiation) were accepted by the defendant (despite a stated additional cost of $248,000 to review the documents).

The defendant proposed to use its TAR software for this review, but the plaintiffs objected, contending that the defendant had over-designated documents as privileged and non-responsive, using an “impermissibly narrow view of responsiveness” during its review process.  To support its contention, the plaintiffs produced certain documents to the Court that the defendant produced inadvertently (including 5 inadvertently produced slip sheets of documents not produced), which they contended should have been marked responsive and relevant.  As a result, the Court required the defendant to submit a letter for in camera review describing its predictive coding process and training for document reviewers.  The Court also required the defendant to provide a privilege log for a sample set of 80 documents that it designated as privileged in its initial review.  Out of those 80 documents, the defendant maintained its original privilege assertions over only 20 documents, finding 36 of them non-privileged and producing them as responsive and another 15 of them as non-responsive.

As a result, the plaintiffs filed a motion requesting random samples of several categories of documents and also sought information about the TAR ranking system used by the defendant and all materials submitted by the defendant for the Court’s in camera review relating to predictive coding.

Judge’s Ruling

Judge Parker noted that both parties did “misconstrue the Court’s rulings during the February 16, 2017 conference” and ordered the defendant to “expand its search for documents responsive to Plaintiffs’ document requests as it construed this Court’s prior ruling too narrowly”, indicating that the plaintiffs should meet and confer with the defendant after reviewing the additional production if they “believe that the City impermissibly withheld documents responsive to specific requests”.

As for the plaintiffs’ challenges to the defendant’s TAR process, Judge Parker referenced Hyles v. New York City, where Judge Andrew Peck, referencing Sedona Principle 6, stated the producing party is in the best position to “evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”  Judge Parker also noted that “[c]ourts are split as to the degree of transparency required by the producing party as to its predictive coding process”, citing cases that considered seed sets as work product and other cases that supported transparency of seed sets.  Relying on her in camera review of the materials provided by the defendant, Judge Parker concluded “that the City appropriately trained and utilized its TAR system”, noting that the defendant’s seed set “included over 7,200 documents that were reviewed by the City’s document review team and marked as responsive or non-responsive in order to train the system” and that “the City provided detailed training to its document review team as to the issues in the case.”

As a result, Judge Parker ordered the defendant “to produce the five ‘slip-sheeted’ documents and the 15 NR {non-responsive documents reclassified from privileged} Documents”, “to provide to Plaintiffs a sample of 300 non-privileged documents in total from the HPD custodians and the Mayor’s Office” and to “provide Plaintiffs with a random sample of 100 non-privileged, non-responsive documents in total from the DCP/Banks review population” (after applying the plaintiffs’ search terms and utilizing TAR on that collection).  Judge Parker ordered the parties to meet and confer on any disputes “with the understanding that reasonableness and proportionality, not perfection and scorched-earth, must be their guiding principles.”  Judge Parker denied the plaintiffs’ request for information about the defendant’s TAR process (but “encouraged” the defendant to share information with the plaintiffs) and denied the plaintiffs’ request to the defendant’s in camera submissions as being protected by the work product privilege.

So, what do you think?  Should TAR ranking systems and seed sets be considered work product or should they be transparent?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Addressing the Inconsistent Email Address: eDiscovery Best Practices

I recently had a client who was trying to search a fairly sizable archive in CloudNine (about 2.75 TB comprised of several million documents) and searching for emails to and from a given custodian.  That search proved a little more challenging than expected due to a legacy Microsoft Exchange attribute.  Let’s take a look at that scenario, substituting a generic email address.

If you have John Dough, who is an employee at Acme Parts, his email address might look like this: jdough@acmeparts.com.  And, for many emails that he sends to others, that’s how his email address might be represented.  However, it could also be represented this way, especially in his Sent Items folder in Exchange:

/O=ACME PARTS/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=jdough

Why does it look like that and not like the “normal” email address that ends in “acmeparts.com”?  Because it’s a different type of address.

The first example – jdough@acmeparts.com – is an SMTP address.  This is the email address you commonly use and refer to when providing others your email address.  It’s probably even on your business card.

The second example – /O=ACME PARTS/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=jdough – is the Exchange x500 address – it’s the internal Exchange address for your account.  So, why does that address exist?

It’s because when Microsoft decided to change the way servers were managed in Exchange 2007, they retained a single administrative group for backwards compatibility and stored details of Exchange 2007 servers there.  The legacyExchangeDN property of the mailbox in Active Director stores this information and, depending on the setup and version of the Exchange server when the emails are pulled from it, could be used as the address shown on some emails (especially when they’re received from internal parties).  I still see it pop up occasionally with some of the email collections that we encounter.

Fun fact for you: The value “FYDIBOHF23SPDLT” after “Exchange Administrative Group” is actually an encoded version of the string “EXCHANGE12ROCKS” with each character replaced with the letter that follows it in the alphabet (E->F, X->Y etc.).

So, what does that mean to you?  It can mean a more challenging effort to locate all of the emails for a given custodian or key party.

To address the situation, I generally like to perform a search for “exchange administrative group” or “FYDIBOHF23SPDLT” in the email participant fields (i.e., To, From, Cc, Bcc).  If I don’t get any hits, then I don’t have any Exchange x500 addresses and there are no worries.

If I do get hits, then I have to account for these email addresses.  Both the SMTP and Exchange x500 address have at least one thing in common – the custodian name.  Typically, that’s first initial and last name, but there are variations as some organizations (if they’re small enough) use just the first or last name for email addresses.  And, if you have two people with the same first initial and last name, you have to distinguish them, so the address could include middle initial (e.g., jtsmith) or number (e.g., jtsmith02).

In its Search form, CloudNine performs an autocomplete of a string typed in for a field, identifying any value for the field that contains that string.  So, an autocomplete for “jdough” in the To, From, Cc or Bcc fields would retrieve both examples at the top of this post if they were present – and also any personal email addresses if he used his first initial and last name on those too.  If it seems apparent that all “jdough” entries are associated with the custodian you’re looking for, then the search can be as simple as “contains jdough” (e.g., From contains jdough to get all variations in the From field).  If it looks like you have email addresses for somebody else, then you may have to search for the specific addresses.  Either way, you can use that technique to ensure retrieval of all of John Dough’s email address variations.

So, what do you think?  Have you encountered Exchange x500 addresses in your email collections? As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Chastises Parties for Turning Case into a “Discovery Slugfest”: eDiscovery Case Law

In UnitedHealthcare of Fla., Inc. et al. v. Am. Renal Assoc., Inc. et al., No. 16-cv-81180-Marra/Matthewman (S.D. Fla. Oct. 20, 2017), Florida Magistrate Judge William Matthewman granted in part and denied in part the plaintiffs’ Motion for Reconsideration or Modification of Omnibus Discovery Order, clarifying the Court’s previous order regarding custodians and search terms, while denying the remainder of the plaintiff’s motion.  Judge Matthewman also chastised both parties for their lack of cooperation on search terms.

In the Court’s August order, the Court permitted Defendants to select an additional 16 custodians and an additional 12 search terms and to request more at a later date if Defendants have a good-faith basis to do so and also ruled that the defendants had not waived any privilege and did not have to produce a privilege log.

In the current Motion, the plaintiffs argued that the Court should reconsider or modify its Order because the Court never made a finding that Plaintiffs’ production was deficient, there is no evidence that would support such a conclusion, the Court did not tailor the additional custodians or search terms to “any purported inadequacy nor to any proportionality limits”, the Court did not “provide any mechanism for ensuring that ARA’s custodians and search terms do not capture an overwhelmingly, burdensome, disproportionate amount of information”, and the Court’s Order was “patently unfair”.  The plaintiffs also argued that the Court should reconsider its decision not to compel the defendants to provide a privilege log because they “wrongfully withheld a responsive, non-privileged document, and the Court should not rely on Defendants’ counsel’s representations that they have no additional non-privileged responsive documents.”

Noting that “the only asserted new evidence submitted by Plaintiffs consists of Docket Entries 303-1 through 303-4” (which included email correspondence, a list of the additional 16 custodians, a list of additional 12 search terms and a Declaration from the Director of e-Discovery at the plaintiff company), Judge Matthewman focused on the last paragraph of the Declaration, which stated:

“In my opinion and based on my experience, if additional time is taken to reexamine the search terms to minimize some of the more obvious deficiencies and then, after the search terms are run, allow for the parties to evaluate which terms hit on an excessive number of documents and narrow them accordingly, the process could be sped up significantly as the volume of documents for the steps after collection and indexing will likely be greatly reduced.”

In response, Judge Matthewman stated: “Ironically, this type of cooperation is exactly what this Court has been expecting from the parties and their counsel throughout this case—to work together to arrive at reasonable search terms, to run those search terms and engage in sampling to see if the search terms are producing responsive documents or excessive irrelevant hits, and then to continue to refine the search terms in a cooperative, professional effort until the search terms are appropriately refined and produce relevant documents without including an excessive number of irrelevant documents. However, despite what paragraph 12 of the Declaration suggests, and despite this Court’s suggestions to the parties and their counsel as to the cooperative and professional manner in which the parties should engage in the e-discovery process in this case, there has instead been an apparent lack of cooperation and constant bickering over discovery, especially e-discovery. The alleged new evidence submitted by Plaintiffs, that is, the list of additional search terms and custodians and the Declaration, clearly show that, where, as here, parties in a large civil case do not cooperatively engage in the e-discovery process, the collection and indexing of documents and the production of relevant documents, become much more difficult.”

Indicating that “the parties and their counsel, through their many discovery disputes and their litigiousness, have unnecessarily turned this case into what can best be termed as a ‘discovery slugfest’”, Judge Matthewman noted that “the parties have filed well over 50 discovery motions, responses, replies, notices, and declarations, many of which have been filed under seal” and that the Court “has held at least six discovery hearings in 2017, most of which were lengthy and contentious.”

Judge Matthewman also referenced several resources regarding cooperation for the parties to consider, including The Sedona Conference, the Federal Judges’ Guide to Discovery, as well as comments from Supreme Court Chief Justice John Roberts regarding the 2015 Amendments to Federal Rules of Civil Procedure 1 and 26.  With that in mind, Judge William granted in part and denied in part the plaintiffs’ Motion for Reconsideration or Modification of Omnibus Discovery Order, clarifying the Court’s previous order regarding custodians and search terms, while denying the remainder of the plaintiff’s motion, including their dispute over the number of custodians and search terms and the failure to require the defendants to produce a privilege log.

So, what do you think?  What can we learn from the parties’ lack of cooperation in this case?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s a Chance to Learn What You Need to Do When a Case is First Filed: eDiscovery Best Practices

The first days after a complaint is filed are critical to managing the eDiscovery requirements of the case efficiently and cost-effectively. With a scheduling order required within 120 days of the complaint and a Rule 26(f) “meet and confer” conference required at least 21 days before that, there’s a lot to do and a short time to do it. Where do you begin?

On Wednesday, September 27 at noon CST (1:00pm EST, 10:00am PST), CloudNine will conduct the webcast Holy ****, The Case is Filed! What Do I Do Now? (yes, that’s the actual title). In this one-hour webcast, we’ll take a look at the various issues to consider and decisions to be made to help you “get your ducks in a row” and successfully prepare for the Rule 26(f) “meet and confer” conference within the first 100 days after the case is filed. Topics include:

  • What You Should Consider Doing before a Case is Even Filed
  • Scoping the Discovery Effort
  • Identifying Employees Likely to Have Potentially Responsive ESI
  • Mapping Data within the Organization
  • Timing and Execution of the Litigation Hold
  • Handling of Inaccessible Data
  • Guidelines for Interviewing Custodians
  • Managing ESI Collection and Chain of Custody
  • Search Considerations and Preparation
  • Handling and Clawback of Privileged and Confidential Materials
  • Determining Required Format(s) for Production
  • Timing of Discovery Deliverables and Phased Discovery
  • Identifying eDiscovery Liaison and 30(b)(6) Witnesses
  • Available Resources and Checklists

I’ll be presenting the webcast, along with Tom O’Connor, who is now a Special Consultant to CloudNine!  If you follow our blog, you’re undoubtedly familiar with Tom as a leading eDiscovery thought leader (who we’ve interviewed several times over the years) and I’m excited to have Tom as a participant in this webcast!  To register for it, click here.

So, what do you think?  When a case is filed, do you have your eDiscovery “ducks in a row”?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Test Your Searches Before the Meet and Confer: eDiscovery Replay

Sometimes, even blog editors need to take a vacation.  But, instead of “going dark” for the week, we thought we would re-cover some topics from the past, when we had a fraction of the readers we do now.  If it’s new to you, it’s still new, right?  Hope you enjoy!  We’ll return with new posts on Monday, August 7.

This was one of the “pitfalls” and “potholes” in eDiscovery we discussed in a recent webcast.  Click here to learn about others.

One of the very first posts ever on this blog discussed the danger of using wildcards.  For those who haven’t been following the blog from the beginning, here’s a recap.

Years ago, I provided search strategy assistance to a client that had already agreed upon several searches with opposing counsel.  One search related to mining activities, so the attorney decided to use a wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining”.

That one search retrieved over 300,000 files with hits.

Why?  Because there are 269 words in the English language that begin with the letters “min”.  Words like “mink”, “mind”, “mint” and “minion” were all being retrieved in this search for files related to “mining”.  We ultimately had to go back to opposing counsel and attempt to negotiate a revised search that was more appropriate.

What made that process difficult was the negotiation with opposing counsel.  My client had already agreed on over 200 terms with opposing counsel and had proposed many of those terms, including this one.  The attorneys had prepared these terms without assistance from a technology consultant (I was brought into the project after the terms were negotiated and agreed upon) and without testing any of the terms.

Since they had been agreed upon, opposing counsel was understandably resistant to modifying the terms.  The fact that my client faced having to review all of these files was not their problem.  We were ultimately able to provide a clear indication that many of the terms in this search were non-responsive and were able to get opposing counsel to agree to a modified list of variations of “mine” that included “minable”, “mine”, “mineable”, “mined”, “minefield”, “minefields”, “miner”, “miners”, “mines”, “mining” and “minings”.  We were able sort through the “minutia” and “minimize” the result set to less than 12,000 files with hits, saving our client a “mint”, which they certainly didn’t “mind”.  OK, I’ll stop now.

However, there were several other inefficient terms that opposing counsel refused to renegotiate and my client was forced to review thousands of additional files that they shouldn’t have had to review, which was a real “mindblower” (sorry, I couldn’t resist).  Had the client included a technical member on the team and had they tested each of these searches before negotiating terms with opposing counsel, they would have been able to figure out which terms were overbroad and would have been better prepared to negotiate favorable search terms for retrieving potentially responsive data.

When litigation is anticipated, it’s never too early to begin collecting potentially responsive data and assessing it by performing searches and testing the results.  However, if you wait until after the meet and confer with opposing counsel, it can be too late.

So, what do you think?  What steps do you take to assess your data before negotiating search terms?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.