eDiscovery Daily Blog

To Keyword Cull or Not to Keyword Cull? That is the Question: eDiscovery Trends

We’re seeing a lot of discussion about whether to perform keyword searching before predictive coding.  We’ve even seen a recent case where a judge weighed in as to whether TAR with or without keyword searching is preferable.  Now, we have a new article published in the Richmond Journal of Law and Technology that weighs in as well.

In Calling an End to Culling: Predictive Coding and the New Federal Rules of Civil Procedure (PDF version here), Stephanie Serhan, a law student, looks at the 2015 Federal Rules amendments (particularly Rules 1 and 26(b)(1)) as justification for applying predictive coding “at the outset on the entire universe of documents in a case.”  Serhan concludes that doing so is “far more accurate, and is not more costly or time-consuming, especially when the parties collaborate at the outset.”

Serhan discusses the importance of timing to predictive coding and explains the technical difference between predictive coding at the outset of a case vs. predictive coding after performing keyword searches.  One issue of keyword culling that Serhan notes is that it “is not as accurate because the party may lose many relevant documents if the documents do not contain the specified search terms, have typographical errors, or use alternative phraseologies”.  Serhan assumes that those “relevant documents removed by keyword culling would likely have been identified using predictive coding at the outset instead.”

Serhan also takes a look at the impact on efficiency and cost between the two methods and concludes that the “actual cost of predictive coding will likely be substantially equal in both methods since the majority of the costs will be incurred in both methods.”  She also looks at TAR related cases, both before and after the 2015 Rules changes.

More and more people have concluded that predictive coding should be done without keyword culling and with good reason.  Applying predictive coding to a set unaltered by keywords would not only likely be more accurate, but also be more efficient as keyword searching requires its own methodology that includes testing of results (and documents not retrieved) before moving on.  Unless there’s a need to limit the volume of collected data because of cost considerations, there is no need to apply keyword culling before predictive coding.

Culling that does make sense is Hash based deduplication, elimination of clearly non-responsive domains and other activities where clearly redundant or non-responsive ESI can be removed from the collection.  That’s a different type of culling that does make sense.

So, what do you think?  To keyword cull or not to keyword cull?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.