Searching

DESI Got Your Input, and Here It Is: eDiscovery Trends

Back in January, we discussed the Discovery of Electronically Stored Information (DESI, not to be confused with Desi Arnaz, pictured above) workshop and its call for papers describing research or practice for the DESI VI workshop that was held last week at the University of San Diego as part of the 15th International Conference on Artificial Intelligence & Law (ICAIL 2015). Now, links to those papers are available on their web site.

The DESI VI workshop aims to bring together researchers and practitioners to explore innovation and the development of best practices for application of search, classification, language processing, data management, visualization, and related techniques to institutional and organizational records in eDiscovery, information governance, public records access, and other legal settings. Ideally, the aim of the DESI workshop series has been to foster a continuing dialogue leading to the adoption of further best practice guidelines or standards in using machine learning, most notably in the eDiscovery space. Organizing committee members include Jason R. Baron of Drinker Biddle & Reath LLP and Douglas W. Oard of the University of Maryland.

The workshop included keynote addresses by Bennett Borden and Jeremy Pickens, a session regarding Topics in Information Governance moderated by Jason R. Baron, presentations of some of the “refereed” papers and other moderated discussions. Sounds like a very informative day!

As for the papers themselves, here is a list from the site with links to each paper:

Refereed Papers

Position Papers

If you’re interested in discovery of ESI, Information Governance and artificial intelligence, these papers are for you! Kudos to all of the authors who submitted them. Over the next few weeks, we plan to dive deeper into at least a few of them.

So, what do you think? Did you attend DESI VI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Resolves Dispute Over Scope of Databases and Searches to be Performed: eDiscovery Case Law

After a week of reviewing previous cases we’ve covered this year with a couple of pop quizzes, we’re back in the saddle covering new cases…

In Willett, et al. v. Redflex Traffic Systems, Inc., No. 1:13-cv-1241-JCH/LAM (D.N.M. May 8, 2015), New Mexico District Judge Lourdes A. Martinez ordered the defendants to produce a spreadsheet referred of file folders, with information for the files on their virtual server(s), the plaintiffs to provide the defendant with a reasonable list of search terms, limited to the relevant time frame, parties, and issues of this case and for the defendants to perform the searches specified by the plaintiffs within ten days of receiving the searches.

Case Background

In this class action case, the plaintiffs alleged that the defendants engaged in nonconsensual automated calls to the plaintiffs on their cellular telephones in violation of the Telephone Consumer Protection Act in order to collect fines imposed by the City of Albuquerque for traffic violations and submitted requests for admission (RFAs) to the defendants to ask them to admit that they obtained the telephone numbers for specific plaintiffs from a skip tracing service. As for the plaintiffs’ document requests, the defendants produced an initial set of 19,000 Bates-labeled pages of documents in response to those requests, but the plaintiffs argued that the production was inadequate and moved to compel a larger production. In turn, the defendants filed their own motion, opposing the plaintiffs’ motion, arguing that the plaintiffs had refused to engage in a search term discussion regarding its database, which contained 1.6 terabytes of data.

The defendants also noted that the cost of processing their entire virtual server to enable more targeted searches would cost between $100,000 and $160,000, but if the parties were to agree to limit the data to be processed, such as by file type, keywords, and creation dates, the defendants might be able to perform those searches at a reasonable cost; otherwise, the cost could be shifted to the plaintiffs or split between the parties.

Judge’s Ruling

With regard to the defendants’ objections to the plaintiffs’ requests for admission, Judge Martinez found that “Defendants’ objections are without merit and should be overruled” and stated that “Defendants’ use of boilerplate, blanket objections are improper” and that the defendants’ “objections that these RFAs do not relate to the parties in this case are especially baffling since the requests specifically name the three Plaintiffs”.

As for the document requests, Judge Martinez ruled that she would “not order CWGP and Credit Control to conduct a search of the entire virtual server because it does not appear that that conducting a search of the entire 1.6 terabytes of data in the virtual server at a cost of $100,000 to $160,000 would be proportional to the likely benefit of such a search”. She also found that “limiting the search of the virtual server by file type, keywords, and creation dates, is a reasonable solution”. As a result, Judge Martinez ordered the defendants to produce a spreadsheet referred of file folders, with information for the files on their virtual server(s), the plaintiffs to provide the defendant with a reasonable list of search terms, limited to the relevant time frame, parties, and issues of this case and for the defendants to perform the searches specified by the plaintiffs within ten days of receiving the searches.

So, what do you think? Was the judge’s decision a reasonable compromise regarding the parties’ search disputes? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

When Collecting Emails, Make Sure You Have a Complete Outlook: eDiscovery Best Practices

I’m out of the office this week, taking the kiddos on a family vacation (can you guess where?). Instead of going dark for the week (which we almost never do), I decided to use the opportunity to give you a chance to catch up on cases we’ve covered so far this year with a couple of case law pop quizzes, sandwiched around a popular post from the past that you may have missed. Today’s post takes a look back at Outlook files and the different forms they take. How many do you know?

Most discovery requests include a request for emails of parties involved in the case. Email data is often the best resource for establishing a timeline of communications in the case and Microsoft® Outlook is the most common email program used in business today. Outlook emails can be stored in several different forms, so it’s important to be able to account for each file format when collecting emails that may be responsive to the discovery request.

There are several different file types that contain Outlook emails, including:

EDB (Exchange Database): The server files for Microsoft Exchange, which is the server environment which manages Outlook emails in an organization. In the EDB file, a user account is created for each person authorized at the company to use email (usually, but not always, employees). The EDB file stores all of the information related to email messages, calendar appointments, tasks, and contacts for all authorized email users at the company. EDB files are the server-side collection of Outlook emails for an organization that uses Exchange, so they are a primary source of responsive emails for those organizations. Not all organizations that use Outlook use Exchange, but larger organizations almost always do.

OST (Outlook Offline Storage Table): Outlook can be configured to keep a local copy of a user’s items on their computer in an Outlook data file that is named an offline Outlook Data File (OST). This allows the user to work offline when a connection to the Exchange computer may not be possible or wanted. The OST file is synchronized with the Exchange computer when a connection is available. If the synchronization is not current for a particular user, their OST file could contain emails that are not on the EDB server file, so OST files may also need to be searched for responsive emails.

PST (Outlook Personal Storage Table): A PST file is another Outlook data file that stores a user’s messages and other items on their computer. It’s the most common file format for home users or small organizations that don’t use Exchange, but instead use an ISP to connect to the Internet (typically through POP3 and IMAP). In addition, Exchange users may move or archive messages to a PST file (either manually or via auto-archiving) to move them out of the primary mailbox, typically to keep their mailbox size manageable. PST files often contain emails not found in either the EDB or OST files (especially when Exchange is not used), so it’s important to search them for responsive emails as well.

MSG (Outlook MSG File): MSG is a file extension for a mail message file format used by Microsoft Outlook and Exchange. Each MSG file is a self-contained unit for the message “family” (email and its attachments) and individual MSG files can be saved simply by dragging messages out of Outlook to a folder on the computer (which could then be stored on portable media, such as CDs or flash drives). As these individual emails may no longer be contained in the other Outlook file types, it’s important to determine where they are located and search them for responsiveness. MSG is also a common format for native production of individual responsive Outlook emails, though HTML is also used (as Outlook emails, by default, are already HTML formatted files).

Other Outlook file types that might contain responsive information are EML (Electronic Mail), which is the Outlook Express e-mail format and PAB (Personal Address Book), which, as the name implies, stores the user’s contact information.

Of course, Outlook emails are not just stored within EDB files on the server or these other file types on the local workstation or portable media; they can also be stored within an email archiving system or synchronized to phones and other portable devices. Regardless, it’s important to account for the different file types when collecting potentially responsive Outlook emails for discovery.

So, what do you think? Are you searching all of these file types for responsive Outlook emails? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Want to Save Review Costs? Be the Master of Your Domain(s): eDiscovery Best Practices

Yesterday, we discussed how some BigLaw firms mark-up reviewer billing rates two to three times (or more) when billing their clients. But, even if that’s not the case, review is still by far the most expensive phase of eDiscovery. One way to minimize those costs is to identify documents that need little or no review and domain categorization can help in identifying those documents.

Even though the types of electronically stored information (ESI) continue to be more diverse, with social media and other sources of ESI becoming more prominent, email is still generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

Several review platforms, including (shameless plug warning!) our CloudNine™ platform (see example above using the ever so ubiquitous Enron data set), support domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection. Domain categorization provides several benefits when reviewing your collection by identifying groups of documents, such as:

  • Non-Responsive ESI: Let’s face it, even if we cull the collection based on search terms, certain non-responsive documents will get through. For example, if custodians have received fantasy football emails from ESPN.com or weekly business newsletters from Fortune.com and those slip through the search criteria, that can add costs to review clearly non-responsive ESI. Instead, with domain categorization, domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains (and their attachments) can be “group-tagged” as non-responsive.
  • Potentially Privileged ESI: If there are any emails associated with outside counsel’s domain, they could obviously represent attorney work product or attorney-client privileged communications (or both). Domain categorization is a quick way to “group-tag” them as potentially privileged, so that they can be reviewed for privilege and dealt with quickly and effectively.
  • Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly and get them prioritized for review.

In essence, domain categorization enables you to put groups of documents into “buckets” to either eliminate them from review entirely or to classify them for a specific workflow for review, saving time and cost during the review process. Time is money!

So, what do you think? Does your review platform provide a mechanism for domain categorization? If so, do you use it to help manage the review process and control costs? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Tired of the “Crap”, Court Sanctions Investors and Lawyers for Several Instances of Spoliation: eDiscovery Case Law

In Clear-View Technologies, Inc., v. Rasnick et al, 5:13-cv-02744-BLF (N.D. Cal. May 13, 2015), California Magistrate Judge Paul S. Grewal sanctioned the defendants $212,320 and also granted a permissive adverse jury instruction that allows the presumption that the defendants’ spoliated documents due to a series of “transgressions” by the defendants and their prior counsel.

You’ve got to love an order that begins this way:

“Deployment of ‘Crap Cleaner’ software—with a motion to compel pending. Lost media with relevant documents. False certification that document production was complete. Failure to take any steps to preserve or collect relevant documents for two years after discussing this very suit. Any one of these transgressions by {the defendants} and their prior counsel might justify sanctions. Taken together, there can be no doubt.”

This case arose from the defendants’ alleged conspiracy with certain former plaintiff’s employees to take over the plaintiff’s company or, failing that, to divert their personnel, intellectual property and investors to a competing enterprise to commercialize the plaintiff’s alcohol tracking product known as the “BarMaster”. As early as May 2011, the plaintiff threatened Defendants with litigation for interfering with the plaintiff’s operations, ultimately filing suit in June 2013.

After the plaintiff’s discovery requests yielded just 422 pages produced by the defendants (including no communications solely between defendants and virtually no communications between defendants and any “co-conspirator” identified in the plaintiff’s requests) the plaintiff moved to compel further production and in September 2014, the court granted the motion and ordered that “(i) Defendants appear by September 23 for depositions regarding ‘document preservation and production,’ and (ii) the parties meet and confer in order to submit to the court by September 30 ‘a plan to retain an independent consultant to do a limited forensic collection and analysis of the media associated with each named defendant.’”

During the depositions, the individual defendants admitted having deleted numerous emails and text messages, failing to preserve devices that potentially responsive data was stored on, failing to search key media and failing to use obvious search terms in the searches that they did perform. Meanwhile, in October 2014, per the parties’ joint agreement, the Court selected the a digital forensics firm (at the defendants’ expense) to perform a forensic analysis of Defendants’ media and email accounts, with the order calling for the defendants to produce over 40 specified electronic media and email accounts for forensic imaging.

The digital forensics firm ultimately found 2,593 relevant documents totaling 12,467 pages – over 12,000 pages more than the defendants had previously produced and also determined that “four separate system optimization and computer cleaning programs were run” (including CCleaner, aka “Crap Cleaner”) on one defendant’s laptop. These programs were loaded onto his laptop and executed on July 22, 2014 – just six days after the filing of the plaintiff’s motion – and resulted in the deletion of “over 50,000 files”. For that and other apparent instances of spoliation of data among the defendants, the plaintiff requested monetary sanctions, an adverse inference instruction and terminating sanctions.

Judge’s Ruling

With regard to the duty to preserve, Judge Grewal stated that “Once upon a time, the federal courts debated exactly when the duty to preserve documents arises. No more. “The duty to preserve evidence begins when litigation is `pending or reasonably foreseeable.’”

Finding that the defendants “were on notice of foreseeable litigation well before spoliation occurred”, that their “spoliation occurred with the required culpable mindset” and that they “failed to produce thousands of documents that contained key terms that the parties designated as relevant to the litigation”, Judge Grewal ruled that “In sum, sanctions are warranted. The only question is what kind.”

Ultimately, Judge Grewal awarded “expenses and fees in this discovery dispute under Fed. R. Civ. P. 37(b)(2)(C)” of $212,320 and granted the request for an adverse instruction that the unproduced material may be deemed to support the plaintiff’s contentions. He also ruled that “Defendants’ prior counsel also must be sanctioned for improperly certifying Defendants’ discovery responses, and for subsequently failing to intervene even after ‘obvious red flags’ arose, such as Defendants’ failure to produce incriminating documents CVT obtained from their third parties.” Also, based on information that the defendants had “stiffed on the bill” for the digital forensics firm, Judge Grewal ruled that “Defendants shall show cause why they should not face further sanctions for this failure.”

Judge Grewal, however, declined to recommend terminating sanctions “in light of public policy and the sufficiency of monetary sanctions and an adverse jury instruction”.

So, what do you think? Should the request for terminating sanctions have been granted? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For Better Document Review, You Need to Approach a ZEN State: eDiscovery Best Practices

Among the many definitions of the word “zen”, the Urban Dictionary provides perhaps the most appropriate (non-religious) definition of the word, as follows: a total state of focus that incorporates a total togetherness of body and mind. However, when it comes to document review, a new web site by eDiscovery thought leader Ralph Losey may change your way of thinking about the word “ZEN”.

Ralph’s new site, ZEN Document Review, introduces ‘ZEN’ as an acronym: Zero Error Numerics. As stated on the site, “ZEN document review is designed to attain the highest possible level of efficiency and quality in computer assisted review. The goal is zero error. The methods to attain that goal include active machine learning, random sampling, objective measurements, and comparative analysis using simple, repeatable systems.”

The ZEN methods were developed by Ralph Losey’s e-Discovery Team (many of which are documented on his excellent e-Discovery Team® blog). They rely on focused attention and full clear communication between review team members.

In the intro video on his site, Ralph acknowledges that it’s impossible to have zero error in any large, complex project, but “with the help of the latest tools and using the right mindset, we can come pretty damn close”. One of the graphics on the site represents an “upside down champagne glass” that illustrates 99.9% probable relevant identified correctly during the review process at the top of the graph and 00.1% probable relevant identified incorrectly at the bottom of the graph.

The ZEN approach includes everything from “predictive coding analytics, a type of artificial intelligence, actively managed by skilled human analysts in a hybrid approach” to “quiet, uninterrupted, single-minded focus” where “dual tasking during review is prohibited” to “judgmental and random sampling and analysis such as i-Recall” and even high ethics, with the goal being to “find and disclose the truth in compliance with local laws, not win a particular case”. And thirteen other factors, as well. Hey, nobody said that attaining ZEN is easy!

Attaining zero error in document review is a lofty goal – I admire Ralph for setting the bar high. Using the right tools, methods and attitude, can we come “pretty damn close”?  What do you think? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

You Should Check the Level of Your Fuzzy When Searching: eDiscovery Best Practices

If the title seems odd, let me clarify. I’m talking about “fuzzy” searching, which is a mechanism by finding alternate words that are close in spelling to the word you’re looking for. Fuzzy searching will expand your search recall, but too much “fuzzy” will leave you reviewing a lot of non-responsive hits.

Attorneys may know what terms they’re looking for, but those terms may not always be spelled correctly. Let’s face it, we all make mistakes. For example, if you’re searching for emails that relate to management decisions, can you be certain that “management” is spelled perfectly throughout the collection? Unlikely. It could be spelled “managment” or “mangement”. Also, you may have a number of image only files that require Optical Character Recognition (OCR), which is usually not 100% accurate. Without an effective search mechanism, you could miss key documents.

That’s where fuzzy searching comes in. Fuzzy searching enables you to find not just the exact matches of the word or words you’re seeking, but also alternate words that are close in spelling to the word you’re looking for (usually one or two characters off). For example, if you’re looking for the term “petroleum”, you can find variations such as “peroleum”, “petoleum” or even “petroleom” – misspellings, OCR errors or other variations (such as the term in a foreign language) that could be relevant.

However, fuzzy searching can also retrieve other legitimate words that are not relevant. Let’s take the term “concept” – if you perform a fuzzy search which retrieves words that are up to two characters off, you’ll get variations like “consent”, “content” and “concern”. So, it’s important to test your results to evaluate your level of precision vs. recall.

In CloudNine’s review platform, our search interface provides a check box to apply fuzzy searching to the entire term, along with a drop down to select the level of “fuzzy” (from 1 to 10, the higher the number, the more “fuzzy” the search results). But, we also enable the user to apply “fuzzy” to individual terms via the ‘%’ character, used generally after the first character to represent words that are one or two characters off. This enables you to perform a search to find documents with only fuzzy hits. Here are a couple of text search examples using an Enron demo set of over 117,000 documents:

  • p%%etroleum and not petroleum: Retrieves all documents that have words within two characters of “petroleum”, but not the word “petroleum” itself. In this case, 59 total documents were retrieved and the variations retrieved included words like “petróleos”, “petróleo” and “pertroleum”. The first two variations are Spanish language variations of “petroleum”, the third appears to be a misspelling. All of these terms appear responsive, so the precision is still good at this level and we retrieved 59 additional documents that are likely responsive that we wouldn’t have retrieved without fuzzy searching.
  • c%%oncept and not concept: Retrieves all documents that have words within two characters of “concept”, but not the word “concept” itself. In this case, 5,304 total documents were retrieved and the variations retrieved included words like “consent”, “Concast”, “content” and “concern”. We retrieved a high number of documents with clearly non-responsive terms, so this search is proving to be over broad and we may need to dial it back. If we reduce it to one character of “concept”, but not the word “concept” itself, we get 291 total documents retrieved and a number of those non-responsive variations are eliminated, giving us a more precise search.

Think of fuzzy searching as a “dial”. If you “dial” it up a little bit, you can retrieve additional responsive hits without sacrificing precision in your search. If you “dial” it up too much, you’ll be reviewing a lot of non-responsive hits and documents. Test your results to play with the “dial” until you get the most appropriate balance of recall and precision in your search.

So, what do you think? Does your keyword search strategy include the use of fuzzy searching? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will return on Monday. Have a nice Easter!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscoveryDaily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Agrees with Plaintiffs, Orders Provision for Qualitative Sampling of Disputed Search Terms: eDiscovery Case Law

In the case In Re: Lithium Ion Batteries Antitrust Litigation, No. 13-MD-02420 YGR (DMR) (N.D. Cal., Feb. 24, 2015), California Magistrate Judge Donna M. Ryu ordered the defendants to comply with the plaintiffs’ proposed qualitative sampling process for keyword search terms, citing DaSilva Moore that keywords “often are overinclusive”.

Case Background

In this multi-district litigation (MDL), the court ordered the parties to meet and confer to negotiate a protocol for the use of search terms in December 2014. The parties agreed upon an iterative process for the development and testing of search terms, summarized as follows:

  1. The producing/responding party will develop an initial list of proposed search terms and provide those terms to the requesting party;
  2. Within 30 days, the requesting party may propose modifications to the list of terms or provide additional terms (up to 125 additional terms or modifications); and
  3. Upon receipt of any additional terms or modifications, the producing/responding party will evaluate the terms, and
  4. Run all additional/modified terms upon which the parties can agree and review the results of those searches for responsiveness, privilege, and necessary redactions, or
  5. For those additional/modified terms to which the producing/responding party objects on the basis of overbreadth or identification of a disproportionate number of irrelevant documents, that party will provide the requesting party with certain quantitative metrics and meet and confer to determine whether the parties can agree on modifications to such terms. Among other things, the quantitative metrics include the number of documents returned by a search term and the nature and type of irrelevant documents that the search term returns. In the event the parties are unable to reach agreement regarding additional/modified search terms, the parties may file a joint letter regarding the dispute.

The parties requested the court’s guidance on a single remaining issue regarding their search term protocol: the steps the parties needed to take if they could not resolve a disagreement over a particular term. The plaintiff wanted the defendant to conduct a randomized qualitative sampling of documents retrieved by searching for any disputed terms, and to then allow the plaintiff to review the resulting documents following a privilege review.

The defendants objected to the proposed sampling provision “solely on the grounds that it will provide Plaintiffs with access to non-responsive, irrelevant documents that will be generated through the procedure.” They argued that the provision was unnecessary due to the detailed quantitative information that they agreed to produce regarding disputed search terms and because “there has been no showing that any Defendant’s production is incomplete.” The plaintiffs countered “that the proposed provision incorporates ESI best practices, including those embodied in materials developed by this Court” and contended that “the best way to refine searches and eliminate unhelpful search terms is to analyze a random sample of documents, including irrelevant ones, to modify the search in an effort to improve precision.”

Judge’s Opinion

With regard to the plaintiffs’ argument, Judge Ryu stated simply, “The court agrees. The point of random sampling is to eliminate irrelevant documents from the group identified by a computerized search and focus the parties’ search on relevant documents only. As the court noted in Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012), a problem with keywords ‘is that they often are overinclusive, that is, they find responsive documents but also large numbers of irrelevant documents.’”

Noting, however, that the defendants “raise a valid concern that the sampling protocol will result in the production of irrelevant information”, Judge Ryu ordered the following parameters to alleviate that concern:

  • At the hearing, the plaintiffs agreed that the defendants “may review the random qualitative sample and remove any irrelevant document(s) from the sample for any reason, provided that they replace the document(s) with an equal number of randomly generated document(s)”;
  • The parties also agreed that the defendants would conduct the qualitative sampling only after they had exhausted an agreed-upon quantitative evaluation process;
  • Judge Ryu ordered that irrelevant documents in the sample “shall be used only for the purpose of resolving disputes regarding search terms in this action, and for no other purpose in this litigation or in any other litigation” and that those irrelevant documents, as well as any attorney notes regarding the sample, “shall be destroyed within fourteen days of resolution of the search term dispute”;
  • Only one attorney from each law firm designated co-lead class counsel for Direct Purchaser Plaintiffs and Indirect Purchaser Plaintiffs (total of six attorneys) would be allowed to review the random sample;
  • The plaintiffs could invoke the random sampling process with respect to no more than five search terms per defendant group.

So, what do you think? Was the court right to order random sampling? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Judge Peck Wades Back into the TAR Pits with ‘Da Silva Moore Revisited’: eDiscovery Case Law

In Rio Tinto Plc v. Vale S.A., 14 Civ. 3042 (RMB)(AJP) (S.D.N.Y. Mar. 2, 2015), New York Magistrate Judge Andrew J. Peck approved the proposed protocol for technology assisted review (TAR) presented by the parties, but made it clear to note that “the Court’s approval ‘does not mean. . . that the exact ESI protocol approved here will be appropriate in all [or any] future cases that utilize [TAR].’”

Judge’s Opinion

Judge Peck began by stating that it had been “three years since my February 24, 2012 decision in Da Silva Moore v. Publicis Groupe & MSL Grp., 287 F.R.D. 182 (S.D.N.Y. 2012)” (see our original post about that case here), where he stated:

“This judicial opinion now recognizes that computer-assisted review [i.e., TAR] is an acceptable way to search for relevant ESI in appropriate cases.”

Judge Peck then went on to state that “[i]n the three years since Da Silva Moore, the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” (Here are links to cases we’ve covered related to TAR in the last three years). He also referenced the Dynamo Holdings case from last year, calling it “instructive” in its approval of TAR, noting that the tax court ruled that “courts leave it to the parties to decide how best to respond to discovery requests”.

According to Judge Peck, the TAR issue still to be addressed overall “is how transparent and cooperative the parties need to be with respect to the seed or training set(s)”, commenting that “where the parties do not agree to transparency, the decisions are split and the debate in the discovery literature is robust”. While observing that the court “need not rule on the need for seed set transparency in this case, because the parties agreed to a protocol that discloses all non-privileged documents in the control sets”, Judge Peck stated:

“One point must be stressed — it is inappropriate to hold TAR to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using TAR for review.”

While approving the parties’ TAR protocol, Judge Peck indicated that he wrote this opinion, “rather than merely signing the parties’ stipulated TAR protocol, because of the interest within the ediscovery community about TAR cases and protocols.” And, he referenced Da Silva Moore once more, stating “the Court’s approval ‘does not mean. . . that the exact ESI protocol approved here will be appropriate in all [or any] future cases that utilize [TAR]. Nor does this Opinion endorse any vendor . . ., nor any particular [TAR] tool.’”

So, what do you think? How transparent should the technology assisted review process be? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules on Dispute about Search Terms and Organization of Produced Documents: eDiscovery Case Law

In Lutzeier v. Citigroup Inc., 4:14-cv-00183-RLW (E.D. Mo. Feb. 2, 2015), Missouri District Judge Ronnie I. White ruled on two motions to compel discovery by the plaintiff, addressing (among other things) disagreement on search terms to be used by the defendant and lack of organization and labeling of the defendant’s production to date.

Case Summary

In this employment termination dispute, the plaintiff filed a motion to compel defendant’s discovery in several areas, including asking the Court to order the defendants to add five categories of search terms, as follows:

(1) “Executive training” and/or “leadership development training program”;

(2) “PEP” and/or “program expenditure proposal” and/or “internal control”,

(3) “OCC,” “office of comptroller of currency,” “FRB,” “federal reserve board,” and/or “consent order”;

(4) “Insufficient assurance”; and

(5) “Whistleblower,” “retaliate,” “retaliation,” “SOX,” “Sarbanes Oxley,” and/or “Dodd Frank.”

The defendants claimed that the new categories of search terms were “so common and generic that they will return a significant volume of irrelevant documents that it is not sufficient to justify the additional burden”, maintaining that using the search protocol for “Fred,” “Lutzeier,” “LOIS,” “COSMOS,” and “Champney” would produce all of the relevant documents. The defendants also claimed that adding these additional search terms would produce an additional 555,909 documents and, therefore, the burden “greatly outweighs the likelihood that these searches will yield additional documents not already captured by Defendants’ search protocol.”

In the plaintiff’s second motion to compel, he complained that the defendants had produced in excess of 46,217 documents without providing any indication as to which documents are responsive to which of Plaintiff’s 58 requests for production. The defendants acknowledged that they did not organize and label their production, but argued that the ESI agreement dictates the method of production and further claimed that, even if Rule 34(b)(2)(E) controls, they had complied with its requirements as the document production was fully searchable, “which negates any need to organize the production”.

Judge’s Ruling

Judge White agreed that “the majority of the search terms suggested by Plaintiff are too generic and are likely to produce a large number of documents that are irrelevant to this case” and found that “the current search criteria adequately ensures that the proper documents that are relevant to Plaintiff’s causes of action are produced”. As a result, he denied the plaintiff’s request to additional search criteria, except for the phrase “consent order” because “there appears to be some confusion as to whether other consent orders exist that are relevant to this case”.

As for organization of the production, Judge White ruled that the method of the defendants’ production “complies with both the ESI agreement and with Rule 34″. Both parties relied on Venture Corp. Ltd. v. Barrett in their arguments, and Judge White held that the defendants “have complied with the requirements outlined there”, finding “that Defendants’ production is in a reasonably usable form or forms and/or the production is searchable, sortable and paired with relevant metadata.”

So, what do you think? What information should courts require to be able to rule on the relevance of search terms? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscoveryDaily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.