eDiscovery Daily Blog

eDiscovery Trends: “Assisted” is the Key Word for Technology Assisted Review


As noted in our blog post entitled 2012 Predictions – By The Numbers, almost all of the sets of eDiscovery predictions we reviewed (9 out of 10) predicted a greater emphasis on Technology Assisted Review (TAR) in the coming year.  It was one of our predictions, as well.  And, during all three days at LegalTech New York (LTNY) a couple of weeks ago, sessions were conducted that addressed technology assisted review concepts and best practices.

While some equate technology assisted review with predictive coding, other technology approaches such as conceptual clustering are also increasing in popularity.  They qualify as TAR approaches, as well.  However, for purposes of this blog post, we will focus on predictive coding.

Over a year ago, I attended a Virtual LegalTech session entitled Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and wrote a blog post from that entitled What the Heck is “Predictive Coding”?  The speakers for the session were Jason R. Baron, Maura Grossman and Bennett Borden (Jason and Bennett are previous thought leader interviewees on this blog).  The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

It’s very cool technology and capable of efficient and accurate review of the document collection, saving costs without sacrificing quality of review (in some cases, it yields even better results than traditional manual review).  However, there is one key phrase in the definition above that can make or break the success of the predictive coding process: “based on human review of only a subset of the document collection”. 

Key to the success of any review effort, whether linear or technology assisted, is knowledge of the subject matter.  For linear review, knowledge of the subject matter usually results in preparation of high quality review instructions that (assuming the reviewers competently follow those instructions) result in a high quality review.  In the case of predictive coding, use of subject matter experts (SMEs) to review a core subset of documents (typically known as a “seed set”) and make determinations regarding that subset is what enables the technology in predictive coding to “predict” the responsiveness and importance of the remaining documents in the collection.  The more knowledgeable the SMEs are in creating the “seed set”, the more accurate the “predictions” will be.

And, as is the case with other processes such as document searching, sampling the results (by determining the appropriate sample size of responsive and non-responsive items, randomly selecting those samples and reviewing both groups – responsive and non-responsive – to test the results) will enable you to determine how effective the process was in predictively coding the document set.  If sampling shows that the process yielded inadequate results, take what you’ve learned from the sample set review and apply it to create a more accurate “seed set” for re-categorizing the document collection.  Sampling will enable you to defend the accuracy of the predictive coding process, while saving considerable review costs.

So, what do you think?  Have you utilized predictive coding in any of your reviews?  How did it work for you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.