Searching

What’s in a Name? Potentially, a Lot of Permutations: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on November 13, 2012 – when eDiscovery Daily was early into its third year of existence.  Back then, the use of predictive coding instead of keyword searching was very uncommon as we had just had our first case (Da Silva Moore) approving the use of technology assisted review earlier in the year.  Now, the use of predictive coding technologies and approaches are much more common, but many (if not most) attorneys still use keyword searching for most cases.  With that in mind, let’s talk about considerations for searching names – they’re still valid close to seven years later!  Enjoy!

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think.  There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents.  Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

  • “Jim Smith”
  • “Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase.  For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search.  Proximity searching is simply looking for two or more words that appear close to each other in the document.  A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”.  Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name.  What are other variations of the name “Jim”?  How about “James” or “Jimmy”?  Or even “Jimbo”?  I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends.  Also, some documents may refer to him by his initials – i.e., “J.T. Smith”.  All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial.  If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 smith, where “w/3” is “within 3 words of”.  This is the syntax you would use to perform the search in our CloudNine Review platform.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”?  Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock).  Don’t get wild with wildcards!  Using the specific variations you want (e.g., “jim OR jimmy”) is usually best.

Next week, we will talk about another way to retrieve documents that mention key individuals – through their email addresses.  Same bat time, same bat channel!

So, what do you think?  How do you handle searching for key individuals within your document collections?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Testing Your Search Using Sampling: eDiscovery Throwback Thursdays

Here is the third and final part in our Throwback Thursday series on sampling. Two weeks ago, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator.  Last week, we talked about how to make sure the sample size is randomly selected.  Today, we’ll walk through an example of how you can test and refine a search using sampling.

This post was originally published on April 5, 2011.  It was part of a three-post series that we have revisited over the past couple of weeks.  We have continued to touch on this topic over the years, including our webcast just last month.  One of our best!

The example is a somewhat simplified real-life example of a search scenario I encountered several years ago where I went through these steps for a similar search to get to a search term that provided the right balance of recall and precision.

TEST #1: Let’s say in an oil company we’re looking for documents related to oil rights.  To try to be as inclusive as possible, we will search for “oil” AND “rights”.  Here is the result:

  • Files retrieved with “oil” AND “rights”: 200,000
  • Files NOT retrieved with “oil” AND “rights”: 1,000,000

Using the site to determine an appropriate sample size that we identified before, we determine a sample size of 662 for the retrieved files and 664 for the non-retrieved files to achieve a 99% confidence level with a margin of error of 5%.  We then use this site, to generate random numbers and then proceed to review each item in the retrieved and NOT retrieved items sets to determine responsiveness to the case.  Here are the results:

  • Retrieved Items: 662 reviewed, 24 responsive, 3.6% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 661 non-responsive, 99.5% non-responsive rate.

Nearly every item in the NOT retrieved category was non-responsive, which is good.  But, only 3.6% of the retrieved items were responsive, which means our search was WAY over-inclusive.  At that rate, 192,800 out of 200,000 files retrieved will be NOT responsive and will be a waste of time and resource to review.  Why?  Because, as we determined during the review, almost every published and copyrighted document in our oil company has the phrase “All Rights Reserved” in the document and will be retrieved.

TEST #2: Let’s try again.  This time, we’ll conduct a phrase search for “oil rights” (which requires those words as an exact phrase).  Here is the result:

  • Files retrieved with “oil rights”: 1,500
  • Files NOT retrieved with “oil rights”: 1,198,500

This time, we determine a sample size of 461 for the retrieved files and (again) 664 for the NOT retrieved files to achieve a 99% confidence level with a margin of error of 5%.  Even though, we still have a sample size of 664 for the NOT retrieved files, we generate a new list of random numbers to review those items, as well as the 461 randomly selected retrieved items.  Here are the results:

  • Retrieved Items: 461 reviewed, 435 responsive, 94.4% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 523 non-responsive, 78.8% non-responsive rate.

Nearly every item in the retrieved category was responsive, which is good.  But, only 78.8% of the NOT retrieved items were not responsive, which means over 20% of the NOT retrieved items were actually responsive to the case (we also failed to retrieve 8 of the items identified as responsive in the first iteration).  So, now what?

TEST #3: This time, we’ll conduct a proximity search for “oil within 5 words of rights”.  Here is the result:

  • Files retrieved with “oil w/5 rights”: 5,700
  • Files NOT retrieved with “oil w/5 rights”: 1,194,300

This time, we determine a sample size of 595 for the retrieved files and (once again) 664 for the NOT retrieved files, generating a new list of random numbers for both sets of items.  Here are the results:

  • Retrieved Items: 595 reviewed, 542 responsive, 91.1% responsive rate.
  • NOT Retrieved Items: 664 reviewed, 655 non-responsive, 98.6% non-responsive rate.

Over 90% of the items in the retrieved category were responsive AND nearly every item in the NOT retrieved category was non-responsive, which is GREAT.  Also, all but one of the items previously identified as responsive was retrieved.  So, this is a search that appears to maximize recall and precision.

Had we proceeded with the original search, we would have reviewed 200,000 files – 192,800 of which would have been NOT responsive to the case.  By testing and refining, we only had to review 8,815 files –  3,710 sample files reviewed plus the remaining retrieved items from the third search (5,700595 = 5,105) – most of which ARE responsive to the case.  We saved tens of thousands in review costs while still retrieving most of the responsive files, using a defensible approach.

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Determining Appropriate Sample Size to Test Your Search: eDiscovery Throwback Thursdays

If you missed it last week, we started a new series – Throwback Thursdays – here on the blog, where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on April 1, 2011 – no fooling!  It was part of a three-post series that we will revisit over the next three weeks – we have continued to touch on this topic over the years, including our webcast just last month.  One of our best!

One part of searching best practices is to test your search results (both the result set and the files not retrieved) to determine whether the search you performed is effective at maximizing both precision and recall to the extent possible, so that you retrieve as many responsive files as possible without having to review too many non-responsive files.  One question I often get is: how many files do you need to review to test the search?

If you remember from statistics class in high school or college, statistical sampling is choosing a percentage of the results population at random for inspection to gather information about the population as a whole.  This saves considerable time, effort and cost over reviewing every item in the results population and enables you to obtain a “confidence level” that the characteristics of the population reflect your sample.  Statistical sampling is a method used for everything from exit polls to predict elections to marketing surveys to poll customers on brand popularity and is a generally accepted method of drawing conclusions for an overall results population.  You can sample a small portion of a large set to obtain a 95% or 99% confidence level in your findings (with a margin of error, of course).

So, does that mean you have to find your old statistics book and dust off your calculator or (gasp!) slide rule?  Thankfully, no.

There are several sites that provide sample size calculators to help you determine an appropriate sample size, including this one.  Many eDiscovery platforms do so as well.  You’ll simply need to identify a desired confidence level (typically 95% to 99%), an acceptable margin of error (typically 5% or less) and the population size.

So, if you perform a search that retrieves 100,000 files and you want a sample size that provides a 99% confidence level with a margin of error of 5%, you’ll need to review 660 of the retrieved files to achieve that level of confidence in your sample (only 383 files if a 95% confidence level will do).  Here’s an illustration of that using the site referenced above.

If 1,000,000 files were not retrieved, you would only need to review 664 of the not retrieved files to achieve that same level of confidence (99%, with a 5% margin of error) in your sample – only four more files to review than the previous sample, even though the collection is 900,000 files larger!  Don’t believe me?  See for yourself here.

As you can see, the sample size doesn’t need to increase much when the population gets really large and you can review a relatively small subset to understand your collection and defend your search methodology to the court.

Next week, we will talk about how to randomly select the files to review for your sample.

So, what do you think?  Do you use sampling to test your search results?   Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Plaintiff to Share in Discovery Costs of Non-Party: eDiscovery Case Law

In Lotus Indus., LLC v. Archer, No. 2:17-cv-13482 (E.D. Mich. May 24, 2019), Michigan Magistrate Judge Anthony P. Patti granted in part and denied in part without prejudice non-party City of Detroit Downtown Development Authority’s (DDA) motion for protective order in connection with the Court’s order granting in part and denying in part the plaintiff’s motion to compel documents requested by subpoena, ordering the plaintiff to pay some of DDA’s discovery costs, but not as much as DDA requested.

Case Background

In this civil RICO and First Amendment retaliation case associated with redevelopment of property in Detroit, the Plaintiff filed a motion to compel production of documents requested in his September 2018 subpoena to nonparty DDA in January 2019.  A hearing was held on the plaintiff’s motion on March 26, 2019, after which the Court entered an order granting in part and denying in part Plaintiff’s motion, ordering DDA to produce, by April 26, 2019, documents responsive to Request Nos. 4-6 of Plaintiff’s subpoena for the November 19, 2016 to present time period, and to produce a privilege log for any documents withheld on the basis of privilege.

On April 19, 2019, DDA filed the instant motion for protective order, seeking an extension of time to produce responsive documents and requesting that the plaintiff pay DDA its share of the expenses of production before being obligated to begin to comply with the Court’s order, contending that the volume of potentially responsive documents was substantially larger than anticipated (48.5 GB of data) and would impose a significant expense on DDA to produce and require far more time to complete than allowed by the Court’s order.  DDA initially anticipated the total expense of production at $127,653.00, which included $21,875.00 in costs to upload the data and approximately $105,778.00 in attorney’s fees in connection with a privilege review. DDA requested Plaintiff pay the $21,875 in costs and 25% of the anticipated attorney’s fees ($26,444.50); in response, the plaintiff opposed that motion and questioned why the costs were so high.

At the May 8, 2019 hearing on the motion, the parties agreed on new search terms to further refine the number of responsive documents and the Court scheduled a status conference for May 23, 2019 to discuss the results of that search. On May 22, 2019, DDA submitted a supplemental brief explaining that the revised search yielded 8.5 GB of data that must be reviewed for privilege, at a cost of $2,125.00 to upload the data to counsel’s eDiscovery platform and anticipated costs of $44,705.00 in attorneys’ fees to conduct a privilege review, so it sought an order for the plaintiff to pay DDA $2,125.00 in costs and $11,176.25 in attorneys’ fees (still 25% of the total attorneys’ fees anticipated).

Judge’s Ruling

Judge Patti found that “DDA has sufficiently established that it will be forced to incur $2,125.00 in fixed costs to upload the 8.5 GB of data to its third-party e-discovery platform in order to review it for production, and that it anticipates incurring $44,705 in attorneys’ fees to conduct a privilege review, prepare a privilege log and prepare the non-privileged documents for production.”

He also noted that “DDA has demonstrated that it has no interest in the outcome of this litigation, as it is not a party and Plaintiff’s prior case against it was dismissed as a sanction for Plaintiff’s ‘repeated misrepresentations’ and ‘failures to comply with discovery orders — despite warnings and the imposition of less severe sanctions…While DDA may more readily bear the expense of production than Plaintiff, that factor alone does not dictate that Plaintiff is relieved of the obligation to pay for some of the expense of production, particularly where this litigation has no particularized public importance and considering the ‘unusual circumstances’ in this case, including that Plaintiff’s prior lawsuit against the DDA was dismissed as a sanction, and he and his clients have been at the receiving end of multiple sanction awards in related and unrelated litigation, significant portions of which this particular plaintiff and his counsel have apparently failed to pay…In addition, the subpoena was directed in part at the general counsel for DDA, and Plaintiff should have anticipated that production of documents in response would require a robust privilege review prior to production, especially given the litigation history between Plaintiff and the DDA.”

As a result, DDA’s motion was granted in part and denied in part without prejudice and Judge Patti ordered that:

  1. “Plaintiff must pay to DDA the sum of $4,360.25, which constitutes the $2,125.00 in costs to upload the 8.5 GB of data to DDA’s counsel’s e-discovery platform, and $2,235.25 in attorneys’ fees (5% of the anticipated attorneys’ fees to conduct a privilege review, prepare a privilege log and prepare the non-privileged documents for production).
  2. Plaintiff must deliver the $4,360.25 check payable to City of Detroit Downtown Development Authority (although the check can be delivered to counsel for DDA) on or before 5:00 p.m. Monday, June 3, 2019. Plaintiff shall also promptly certify such payment to the Court, and include a copy of the check.
  3. DDA need not continue further efforts to produce documents until it is paid in full. Once paid, DDA shall have 45 days from that date to produce responsive documents, and a privilege log for any documents withheld on the basis of privilege.”

So, what do you think?  Do you agree with the distribution of costs?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Establishes Search Protocol to Address Plaintiff’s Motion to Compel: eDiscovery Case Law

In Lawson v. Spirit Aerosystems, Inc., No. 18-1100-EFM-ADM (D. Kan. Apr. 26, 2019), Kansas Magistrate Judge Angel D. Mitchell granted in part and denied in part the plaintiff’s motion to compel, ordering the defendant to produce documents related to two requests and, with regard to a third request, ordering the defendant to “produce these documents to the extent that such documents are captured by the ESI search protocol.”

Case Background

This case regarded the defendant’s alleged breach of a retirement agreement with the plaintiff due to plans by an investment firm to install the plaintiff as CEO of an aircraft component manufacturer (“Arconic”) where the defendant withheld the plaintiff’s retirement benefits because the defendant claimed that he violated the non-compete provision in his retirement agreement.

In discovery, the plaintiff filed a motion to compel, seeking “the court’s intervention regarding discovery related to the “Business” of Spirit and Arconic. Specifically, Mr. Lawson asks the court to compel Spirit to produce (1) its contracts with Boeing and Airbus; (2) its antitrust filings relating to its planned acquisition of Asco Industries; (3) documents related to the aspects of Spirit’s business that Spirit alleges overlap with Arconic’s business; and (4) documents related to Spirit’s relationship with Arconic.”  At a subsequent hearing, the plaintiff clarified that he was not seeking to compel the full scope of documents sought in the original Requests for Production, but rather only the smaller subset of documents that were the subject of his motion to compel.

Judge’s Ruling

With regard to the Boeing and Airbus Contracts, Judge Mitchell granted the plaintiff’s motion “with respect to the portions of these contracts (or amendments, addenda, exhibits, schedules, data compilations, or lists) that relate to Spirit’s deliverables to Boeing and Airbus.”  And, with regard to Antitrust Filings, Judge Mitchell granted the plaintiff’s motion “with respect to the portion of these filings relating to Spirit’s business and market/marketing positioning, including the index(es) for these filings, the “4(c) documents,” and related white papers.”  He ordered the defendant to produce documents related to both categories “on or before May 7, 2019.”

With regard to Product Overlaps and Spirit’s Relationship with Arconic, Judge Mitchell granted these aspects of the motion in part and denied them in part, ordering the defendant to “produce these documents to the extent that such documents are captured by the ESI search protocol.”  That protocol was as follows:

“After consultation with the parties, the court orders the parties to comply with the following ESI search protocol:

  • By May 3, 2019, Mr. Lawson shall identify up to seven categories of documents for which it seeks ESI.
  • By May 20, 2019, for each category of documents, Spirit shall serve a list of the top three custodians most likely to have relevant ESI, from the most likely to the least likely, along with a brief explanation as to why Spirit believes each custodian will have relevant information.
  • By May 23, 2019, Mr. Lawson shall serve a list of five custodians and proposed search terms for each custodian.

 *3 • Spirit shall search the identified custodians’ ESI using these proposed search terms. Spirit shall use sampling techniques to assess whether the search has produced an unreasonably large number of non-responsive or irrelevant results and, if so, Spirit shall suggest modified search terms (e.g., different keywords, negative search restrictions, etc.) by May 30, 2019.

  • The parties shall meet and confer about search terms and try to achieve an estimated responsive hit rate of at least 85%.
  • Spirit shall produce responsive documents from the first five custodians on or before June 21, 2019.
  • Meanwhile, the parties shall begin this same process for the next five custodians. By May 30, 2019, Mr. Lawson will produce to Spirit a list of the next five custodians and proposed search terms for each custodian. If Spirit finds that the estimated responsive hit rate is not at or above 85%, Mr. Lawson shall suggest modified search terms by June 6, 2019. The court will set a deadline for Spirit to produce documents from the second set of five custodians at a later time.

If Mr. Lawson wishes to seek ESI from additional custodians beyond the ten described in this protocol, the parties are directed to contact the court for further guidance.”

Judge Mitchell also denied the plaintiff’s request to order the defendant to pay his attorneys’ fees and costs associated with the motion to compel.

So, what do you think?  Do you think the ordered responsive hit rate of 85% is reasonable?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Denies Plaintiff’s Motion to Compel Production of ESI Related to 34 Searches: eDiscovery Case Law

In Lareau v. Nw. Med. Ctr., No. 2:17-cv-81 (D. Vt. Mar. 27, 2019), Vermont District Judge William K. Sessions III denied the plaintiff’s motion to compel production of ESI related to 34 search terms proposed by the plaintiff during meet and confer with the defendant, based on the extrapolation from a single search term that the plaintiff’s production request would require 170 hours of attorney and paralegal time and would produce little, if any, relevant information.

Case Background

In this case related to claims of wrongful termination stemming (at least in part) from the plaintiff’s disability, the plaintiff initially asked the defendant to produce ESI using 18 search terms. Using only seven of those 18 terms, the defendant produced over 3,000 pages of documents and objected to the scope of the request. The plaintiff moved to compel, and the Court issued an order requiring the parties to confer and agree upon appropriate search terms.

The plaintiff subsequently proposed 34 search terms, some of which were in the original list to which the defendant had objected. The defendant informed plaintiff’s counsel that using just the first four of the proposed 34 terms, it had spent over 20 hours retrieving 2,912 documents totaling 5,336 pages. The plaintiff’s counsel later acknowledged in an email that the initial production was voluminous and unwieldy, and suggested that the defendant use only the newly-proposed search terms.

The defendant made another effort to comply, performing a search using the suggested term “Experian.” The process of searching, coding, and producing reportedly took five hours and identified 472 documents. the defendant represented to the plaintiff’s counsel that few of those documents were relevant. Extrapolating that work to 34 search terms, the defendant contended that the plaintiff’s production request would require 170 hours of attorney and paralegal time and would produce little, if any, relevant information.  As a result, the defendant informed opposing counsel that given the burden of production and the limited relevance of the search results, it would not expend any additional time performing the requested searches. The plaintiff’s counsel invited the defendant to offer additional suggestions as to search terms, but the defendant declined that invitation, leading to the plaintiff’s motion.

Judge’s Ruling

Judge Sessions noted that, under the FRCP, “a party is required to provide ESI unless it shows that the source of such information is ‘not reasonably accessible because of undue burden or cost.’”  With that in mind, Judge Sessions stated:

“Here, the Court ordered cooperation among counsel, and counsel’s efforts did not produce a workable solution. NMC has tried to comply and shown that, to date, the information sought using Lareau’s proposed search terms is not reasonably accessible. Indeed, NMC has expended considerable time and expense producing documents that reportedly have little relevance to this case.”

While noting that he “could nonetheless compel discovery for good cause shown”, Judge Sessions determined that “[h]ere, there has been no such showing.”  Judge Sessions stated: “Since the Court issued its prior Order, NMC has produced 3,384 additional documents containing little relevant information. Without any showing that additional searches are likely to result in a higher rate of success, the Court will not order NMC to engage in further problem-solving.”  As a result, he denied the plaintiff’s motion to compel.

So, what do you think?  Was the defendant’s analysis of expected effort a valid representative sample?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The March Toward Technology Competence (and Possibly Predictive Coding Adoption) Continues: eDiscovery Best Practices

I know, because it’s “March”, right?  :o)  Anyway, it’s about time is all I can say.  My home state of Texas has finally added its name to the list of states that have adopted the ethical duty of technology competence for lawyers, becoming the 36th state to do so.  And, we have a new predictive coding survey to check out.

As discussed on Bob Ambrogi’s LawSites blog, just last week (February 26), the Supreme Court of Texas entered an order amending Paragraph 8 of Rule 1.01 of the Texas Disciplinary Rules of Professional Conduct. The amended comment now reads (emphasis added):

Maintaining Competence

  1. Because of the vital role of lawyers in the legal process, each lawyer should strive to become and remain proficient and competent in the practice of law, including the benefits and risks associated with relevant technology. To maintain the requisite knowledge and skill of a competent practitioner, a lawyer should engage in continuing study and education. If a system of peer review has been established, the lawyer should consider making use of it in appropriate circumstances. Isolated instances of faulty conduct or decision should be identified for purposes of additional study or instruction.

The new phrase in italics above mirrors the one adopted in 2012 by the American Bar Association in amending the Model Rules of Professional Conduct to make clear that lawyers have a duty to be competent not only in the law and its practice, but also in technology.  Hard to believe it’s been seven years already!  Now, we’re up to 36 states that have formally adopted this duty of technology competence.  Just 14 to go!

Also, this weekend, Rob Robinson published the results of the Predictive Coding Technologies and Protocols Spring 2019 Survey on his excellent Complex Discovery blog.  Like the first version of the survey he conducted back in September last year, the “non-scientific” survey designed to help provide a general understanding of the use of predictive coding technologies, protocols, and workflows by data discovery and legal discovery professionals within the eDiscovery ecosystem.  This survey had 40 respondents, up from 31 the last time.

I won’t steal Rob’s thunder, but here are a couple of notable stats:

  • Approximately 62% of responders (62.5%) use more than one predictive coding technology in their predictive coding efforts: That’s considerably higher than I would have guessed;
  • Continuous Active Learning (CAL) was the most used predictive coding protocol with 80% of responders reporting that they use it in their predictive coding efforts: I would have expected that CAL was the leader, but not as dominant as these stats show; and
  • 95% of responders use technology-assisted review in more than one area of data and legal discovery: Which seems a good sign to me that practitioners aren’t just limiting it to identification of relevant documents in review anymore.

Rob’s findings, including several charts, can be found here.

So, what do you think?  Which state will be next to adopt an ethical duty of technology competence for lawyers?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Releases the Final Version of its TAR Guidelines: eDiscovery Best Practices

During last year’s EDRM Spring Workshop, I discussed on this blog that EDRM had released the preliminary draft of its Technology Assisted Review (TAR) Guidelines for public comment.  They gave a mid-July deadline for comments and I even challenged the people who didn’t understand TAR very well to review it and provide feedback – after all, those are the people who would hopefully stand to benefit the most from these guidelines.  Now, over half a year later, EDRM has released the final version of its TAR Guidelines.

The TAR Guidelines (available here) have certainly gone through a lot of review.  In addition to the public comment period last year, it was discussed in the last two EDRM Spring meetings (2017 and 2018), presented at the Duke Distinguished Lawyers’ conference on Technology Assisted Review in 2017 for feedback, and worked on extensively during that time.

As indicated in the press release, more than 50 volunteer judges, practitioners, and eDiscovery experts contributed to the drafting process over a two-year period. Three drafting teams worked on various iterations of the document, led by Matt Poplawski of Winston & Strawn, Mike Quartararo of eDPM Advisory Services, and Adam Strayer of Paul, Weiss, Rifkind, Wharton & Garrison. Tim Opsitnick of TCDI and U.S. Magistrate Judge James Francis IV (Southern District of New York, Ret.), assisted in editing the document and incorporating comments from the public comment period.

“We wanted to address the growing confusion about TAR, particularly marketing claims and counterclaims that undercut the benefits of various versions of TAR software,” said John Rabiej, deputy director of the Bolch Judicial Institute of Duke Law School, which oversees EDRM. “These guidelines provide guidance to all users of TAR and apply across the different variations of TAR. We avoided taking a position on which variation of TAR is more effective, because that very much depends on facts specific to each case. Instead, our goal was to create a definitive document that could explain what TAR is and how it is used, to help demystify it and to help encourage more widespread adoption.”  EDRM/Duke Law also provide a TAR Q&A with Rabiej here.

The 50-page document contains four chapters: The first chapter defines technology assisted review and the TAR process. The second chapter lays out a standard workflow for the TAR process. The third chapter examines alternative tasks for applying TAR, including prioritization, categorization, privilege review, and quality and quantity control. Chapter four discusses factors to consider when deciding whether to use TAR, such as document set, cost, timing, and jurisdiction.

“Judges generally lack the technical expertise to feel comfortable adjudicating disputes involving sophisticated search methodologies. I know I did,” said Magistrate Judge Francis, who assisted in editing the document. “These guidelines are intended, in part, to provide judges with sufficient information to ask the right questions about TAR. When judges are equipped with at least this fundamental knowledge, counsel and their clients will be more willing to use newer, more efficient technologies, recognizing that they run less risk of being caught up in a discovery quagmire because a judge just doesn’t understand TAR. This, in turn, will further the goals of Rule 1 of the Federal Rules of Civil Procedure: to secure the just, speedy, and inexpensive determination of litigation.”

EDRM just announced the release of the final version of the TAR guidelines yesterday, so I haven’t had a chance to read it completely through yet, but a quick comparison to the public comment version from last May seems to indicate the same topics and sub-topics that were covered back then, so there certainly appears to be no major rewrite as a result of the public comment feedback.  I look forward to reading it in detail and determining what specific changes were made.

So, what do you think?  Will these guidelines help the average attorney or judge better understand TAR?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Defendants to Sample Disputed Documents to Help Settle Dispute: eDiscovery Case Law

In Updateme Inc. v. Axel Springer SE, No. 17-cv-05054-SI (LB) (N.D. Cal. Oct. 11, 2018), California Magistrate Judge Laurel Beeler ordered the defendants to review a random sample of unreviewed documents in dispute and produce any responsive documents reviewed (along with a privilege log, if applicable) and report on the number of documents and families reviewed and the rate of responsiveness within one week.

Case Background

In this case where the plaintiff, creator of a news-aggregator cell-phone app, claimed that the defendants “stole” their platform and released a copycat app, learned that the defendants used the code name “Ajax” to refer their product.  The defendants determined that there were 5,126 unique documents (including associated family members) within the previously collected ESI that hit on the term “Ajax”, but they had not reviewed those documents for responsiveness.  The plaintiff asked the court to order the defendants to review those documents and produce responsive documents within two weeks.

The defendants claimed that the term “Ajax” is a project name that they created to refer to the plaintiff’s threatened litigation, not the product itself and claimed that “a sampling of the `Ajax’ documents confirms that, in every responsive document, the term `Ajax’ was used to refer to the dispute itself.”  But, the plaintiff cited 93 produced documents generally and two documents in particular (which the defendants were attempting to clawback as privileged) that referred to their product.  However, the defendants also claimed that it would be unduly burdensome and expensive to review the “Ajax” documents at this stage of the litigation and also argued that the term “Ajax” was not included in the ESI Protocol that the parties agreed upon months ago and should not be added at this late stage.

Judge’s Ruling

Judge Beeler observed this: “Whether ‘Ajax’ refers to Updateme or only the defendants’ dispute with Updateme is in some sense a distinction without a difference. Either way, the search term ‘Ajax’ is likely to return documents that are responsive to Updateme’s request for “[a]ll communications . . . concerning Updateme or the updaemi® application[.]” Documents concerning the defendants’ dispute with Updateme are likely documents concerning Updateme.” 

Judge Beeler also noted that “even if ‘Ajax’ refers to the dispute, that does not mean that documents that contain ‘Ajax’ are necessarily more likely to be privileged or protected from disclosure”, using a hypothetical scenario where two non-lawyers might discuss the impact of the “Ajax” dispute on profits.  She concluded her analysis with this statement: “To the extent the defendants are suggesting that if ‘Ajax’ purportedly refers to their dispute with Updateme, ESI containing ‘Ajax’ should remain outside the scope of discovery, the court is not convinced.”

As a result, Judge Beeler ordered the defendants to “randomly select 10% of the unreviewed documents {in dispute}, review them (and their associated family members) for responsiveness, produce responsive documents (and a privilege log for any responsive documents that are withheld), and provide a chart listing the number of documents and families reviewed and the rate of responsiveness” within one week.  Judge Beeler stated that the parties should then meet and confer if they continued to have disputes regarding these documents.

So, what do you think?  Should random sampling be used more to settle proportionality disputes or should it be a last resort?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Mike Q Says the Weakest Link in TAR is Humans: eDiscovery Best Practices

We started the week with a post from Tom O’Connor (his final post in his eDiscovery Project Management from Both Sides series).  And, we’re ending the week covering an article from Mike Quartararo on Technology Assisted Review (TAR).  You would think we were inadvertently promoting our webcast next week or something.  :o)

Remember The Weakest Link? That was the early 2000’s game show with the sharp-tongued British hostess (Anne Robinson) telling contestants that were eliminated “You are the weakest link.  Goodbye!”  Anyway, in Above the Law (Are Humans The Weak Link In Technology-Assisted Review?), Mike takes a look at the debate as to which tool is the superior tool for conducting TAR and notes the lack of scientific studies that point to any particular TAR software or algorithm being dramatically better or, more importantly, significantly more accurate, than any other.  So, if it’s not the tool that determines the success or failure of a TAR project, what is it?  Mike says when TAR has problems, it’s because of the people.

Of course, Mike knows quite a bit about TAR.  He’s managed his “share of” of projects, has used “various flavors of TAR” and notes that “none of them are perfect and not all of them exceed all expectations in all circumstances”.  Mike has also been associated with the EDRM TAR project (which we covered earlier this year here) for two years as a team leader, working with others to draft proposed standards.

When it comes to observations about TAR that everyone should be able to agree on, Mike identifies three: 1) that TAR is not Artificial Intelligence, just “machine learning – nothing more, nothing less”, 2) that TAR technology works and “TAR applications effectively analyze, categorize, and rank text-based documents”, and 3) “using a TAR application — any TAR application — saves time and money and results in a reasonable and proportional outcome.”  Seems logical to me.

So, when TAR doesn’t work, “the blame may fairly be placed at the feet (and in the minds) of humans.”  We train the software by categorizing the training documents, we operate the software, we analyze the outcome.  So, it’s our fault.

Last month, we covered this case where the plaintiffs successfully requested additional time for discovery when defendant United Airlines, using TAR to manage its review process, produced 3.5 million documents.  However, sampling by the plaintiffs (and later confirmed by United) found that the production contained only 600,000 documents that were responsive to their requests (about 17% of the total production).  That seems like a far less than ideal TAR result to me.  Was that because of human failure?  Perhaps, when it comes down to it, the success of TAR being dependent on humans points us back to the long-used phrase regarding humans and computers: Garbage In, Garbage Out.

So, what do you think?  Should TAR be considered Artificial Intelligence?  As always, please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © British Broadcasting Corporation

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.