Analysis

Sometimes, the Data You Receive Isn’t Ready to Rock and Roll: eDiscovery Best Practices

Having just encountered a similar situation with one of my clients, I thought this was a topic worth revisiting.  Just because data is produced to you, it doesn’t mean that data is ready to “rock and roll”.

Here’s a case in point: I once worked with a client that received a multi-part production from the other side (via another party involved in the litigation, per agreement between the parties) that included image files, OCR text files and metadata (yes, the dreaded “load file” production).  The files that my client received were produced over several months to several other parties in the litigation.  The production contained numerous emails, each of which (of course) included an email sent date.  Can you guess which format the email sent date was provided in?  Here are some choices (using today’s date and 1:00 PM as an example):

  • 09/11/2017 13:00:00
  • 9/11/2017 1:00 PM
  • September 11, 2017 1:00 PM
  • Sep-17-2017 1:00 PM
  • 2013/09/11 13:00:00

The answer: all of them.

Because there were several productions to different parties with (apparently) different format agreements, my client didn’t have the option to request the data to be reproduced in a standard format.  Not only that, the name of the produced metadata field wasn’t consistent between productions – in about 15 percent of the documents the producing party named the field email_date_sent, in the rest of them, it was simply named date_sent.

What a mess, right?

If you know how to fix this issue, then – congrats! – you can probably stop reading.  Our client (both then and recently), didn’t know how.  Fortunately, at CloudNine, there are plenty of computer “geeks” to address problems like this (including me).

In the example above, we had to standardize the format of the dates into one standard format in one field.  We used a combination of SQL queries to get the data into one field and string commands and regular expressions to manipulate dates that didn’t fit a standard SQL date format by re-parsing them into a correct date format.  For example, the date 2017/09/11 was reparsed into 09/11/2017.

Getting the dates into a standard format in a single field not only enabled us to load that data successfully into the CloudNine platform, it also enabled us to then identify (in combination with other standard email metadata fields) duplicates in the collection based on those metadata fields.  As a result, we were able to exclude a significant percentage of the emails as duplicates, which wouldn’t have been possible before the data was converted and standardized.

Over the years, I’ve seen many examples where data (either from our side or the other side) needs to be converted.  It happens more than you think.  When that happens, it’s good to work with a solutions provider that has several “geeks” on their team that can provide that service.  Sometimes, having data that’s ready to “rock and roll” takes some work.

So, what do you think?  Have you received productions that needed conversion?  If so, what did you do?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Sure, No Keyword Before TAR, But What About Keyword Instead of TAR?: eDiscovery Best Practices

Last month, we discussed whether to perform keyword search culling before performing Predictive Coding/Technology Assisted Review (TAR) and, like many have concluded before (even a judge in FCA US, LLC v. Cummins, Inc.), we agree that you shouldn’t perform keyword search culling before TAR.  But, should TAR be performed instead of keyword search – in all cases?  Is TAR always preferable to keyword search?

I was asked that question earlier this week by a colleague, so I thought I would relay what I essentially told him.

Many attorneys that I have observed over the years have typically tried to approach keyword search this way: 1) Identify a bunch of potentially responsive terms, 2) string them together with OR operators in between (i.e., {term 1} OR {term 2}, etc.), 3) run the search, 4) add family members (emails and attachments linked to the files with hits) to the results, and 5) begin review.

If that’s the keyword search methodology you plan to use, then, yes, a sound TAR approach is preferable to that approach pretty much every time.  Sure, proportionality concerns can affect the decision, but I would recommend a sound approach over an unsound approach every time.  Unfortunately, that’s the approach a lot of attorneys still use when it comes to keyword search.

However, it’s important to remember that the “A” in TAR stands for “Assisted” and that TAR is not just about the technology, it’s as much about the process that accompanies the technology.  A bad approach to using TAR will generally lead to bad results with the technology, or at least inefficient results.  “Good TAR” includes a sound process for identifying training candidates for the software, reviewing those candidates and repeating the process iteratively until the collection has been classified at a level that’s appropriate to meet the needs of the case.

What about keyword search?  “Good keyword search” also includes a sound process for identifying potentially responsive terms, using various mechanisms to refine those terms (which can include variations, at an appropriate level, that can also be responsive), performing a search for each term, testing the result set (to determine if the term is precise enough and not overbroad) and testing what was not retrieved (to determine what, if anything, might have been missed).  We covered some useful resources for testing and sampling earlier this week here.

Speaking of this week, apparently, this is my week for the “wayback machine” on this blog.  In early 2011, I described a defensible search approach for keyword search for which I created an acronym – “STARR”.  Not Ringo or Bart, but Search, Test, Analyze, Revise (if necessary), Repeat (the first four steps until precision and recall is properly balanced).  While you might think that “STARR” sounds a lot like “TAR”, I coined my acronym for the keyword search approach well before the TAR acronym became popular (just sayin’).

Regardless whether you use STARR or TAR, the key is a sound approach.  Keyword search, if you’re using a sound approach in performing it, can still be an appropriate choice for many cases and document collections.

So, what do you think?  Do you think that keyword search still has a place in eDiscovery?  If not, why not?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

To Keyword Cull or Not to Keyword Cull? That is the Question: eDiscovery Trends

We’re seeing a lot of discussion about whether to perform keyword searching before predictive coding.  We’ve even seen a recent case where a judge weighed in as to whether TAR with or without keyword searching is preferable.  Now, we have a new article published in the Richmond Journal of Law and Technology that weighs in as well.

In Calling an End to Culling: Predictive Coding and the New Federal Rules of Civil Procedure (PDF version here), Stephanie Serhan, a law student, looks at the 2015 Federal Rules amendments (particularly Rules 1 and 26(b)(1)) as justification for applying predictive coding “at the outset on the entire universe of documents in a case.”  Serhan concludes that doing so is “far more accurate, and is not more costly or time-consuming, especially when the parties collaborate at the outset.”

Serhan discusses the importance of timing to predictive coding and explains the technical difference between predictive coding at the outset of a case vs. predictive coding after performing keyword searches.  One issue of keyword culling that Serhan notes is that it “is not as accurate because the party may lose many relevant documents if the documents do not contain the specified search terms, have typographical errors, or use alternative phraseologies”.  Serhan assumes that those “relevant documents removed by keyword culling would likely have been identified using predictive coding at the outset instead.”

Serhan also takes a look at the impact on efficiency and cost between the two methods and concludes that the “actual cost of predictive coding will likely be substantially equal in both methods since the majority of the costs will be incurred in both methods.”  She also looks at TAR related cases, both before and after the 2015 Rules changes.

More and more people have concluded that predictive coding should be done without keyword culling and with good reason.  Applying predictive coding to a set unaltered by keywords would not only likely be more accurate, but also be more efficient as keyword searching requires its own methodology that includes testing of results (and documents not retrieved) before moving on.  Unless there’s a need to limit the volume of collected data because of cost considerations, there is no need to apply keyword culling before predictive coding.

Culling that does make sense is Hash based deduplication, elimination of clearly non-responsive domains and other activities where clearly redundant or non-responsive ESI can be removed from the collection.  That’s a different type of culling that does make sense.

So, what do you think?  To keyword cull or not to keyword cull?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Denies Defendant’s Motion to Overrule Plaintiff’s Objections to Discovery Requests

Court Determines TAR Without Keyword Search Culling First is Preferable: eDiscovery Case Law

In FCA US, LLC v. Cummins, Inc., No. 16-12883 (E.D.  Mich., Mar. 28, 2017), Michigan District Judge Avern Cohn “rather reluctantly” decided a dispute between the plaintiff and defendant on whether the universe of electronic material subject to Technology Assisted Review (TAR) review should first be culled by the use of search terms by agreeing with the plaintiff that “[a]pplying TAR to the universe of electronic material before any keyword search reduces the universe of electronic material is the preferred method.”

Case Background

In this dispute over the allocation of the cost incurred for an auto part that became the subject of a recall, the parties agreed on many issues relating to discovery and particularly electronic discovery.  However, one issue that they couldn’t agree on was whether the universe of electronic material subject to TAR review should first be culled by the use of search terms. The plaintiff took the position that the electronic material subject to TAR review should not first be culled by the use of search terms, while the defendant took the position that a pre-TAR culling is appropriate.

Judge’s Ruling

Noting that the Court decides “rather reluctantly” to rule on the issue, Judge Cohn stated:

“Given the magnitude of the dispute and the substantial matters upon which they agree, the parties should have been able to resolve the discovery issue without the Court as decision maker. Be that as it may, having reviewed the letters and proposed orders together with some technical in-house assistance including a read of The Sedona Conference TAR Case Law Primer, 18 Sedona Con. J. ___ (forthcoming 2017), the Court is satisfied that FCA has the better postion (sic). Applying TAR to the universe of electronic material before any keyword search reduces the universe of electronic material is the preferred method. The TAR results can then be culled by the use of search terms or other methods.”

As a result, Judge Cohn agreed to enter the plaintiff’s proposed order regarding the TAR approach.

So, what do you think?  Should TAR be performed with no pre-search culling beforehand?  Should courts rule on a preferred TAR approach?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.