Five Common Myths About Predictive Coding – eDiscovery Best Practices

March 11, 2013

During my interviews with various thought leaders (a list of which can be found here, with links to each interview), we discussed various aspects of predictive coding and some of the perceived myths that exist regarding predictive coding and what it means to the review process. I thought it would be a good idea to recap some of those myths and how they compare to the “reality” (at least as some of us see it). Or maybe just me. 🙂

1. Predictive Coding is New Technology

Actually, with all due respect to each of the various vendors that have their own custom algorithm for predictive coding, the technology for predictive coding as a whole is not new technology. Ever heard of artificial intelligence? Predictive coding, in fact, applies artificial intelligence to the review process. With all of the acronyms we use to describe predictive coding, here’s one more for consideration: “Artificial Intelligence for Review” or “AIR”. May not catch on, but I like it.

Maybe attorneys would be more receptive to it if they understood as artificial intelligence? As Laura Zubulake pointed out in my interview with her, “For years, algorithms have been used in government, law enforcement, and Wall Street. It is not a new concept.” With that in mind, Ralph Losey predicts that “The future is artificial intelligence leveraging your human intelligence and teaching a computer what you know about a particular case and then letting the computer do what it does best – which is read at 1 million miles per hour and be totally consistent.”

2. Predictive Coding is Just Technology

Treating predictive coding as just the algorithm that “reviews” the documents is shortsighted. Predictive coding is a process that includes the algorithm. Without a sound approach for identifying appropriate example documents for the collection, ensuring educated and knowledgeable reviewers to appropriately code those documents and testing and evaluating the results to confirm success, the algorithm alone would simply be another case of “garbage in, garbage out” and doomed to fail.

As discussed by both George Socha and Tom Gelbmann during their interviews with this blog, EDRM’s Search project has published the Computer Assisted Review Reference Model (CARRM), which has taken steps to define that sound approach. Nigel Murray also noted that “The people who really understand computer assisted review understand that it requires a process.” So, it’s more than just the technology.

3. Predictive Coding and Keyword Searching are Mutually Exclusive

I’ve talked to some people that think that predictive coding and key word searching are mutually exclusive, i.e., that you wouldn’t perform key word searching on a case where you plan to use predictive coding. Not necessarily. Ralph Losey advocates a “multimodal” approach, noting it as: “more than one kind of search – using predictive coding, but also using keyword search, concept search, similarity search, all kinds of other methods that we have developed over the years to help train the machine. The main goal is to train the machine.”

4. Predictive Coding Eliminates Manual Review

Many people think of predictive coding as the death of manual review, with all attorney reviewers being replaced by machines. Actually, manual review is a part of the predictive coding process in several aspects, including: 1) Subject matter knowledgeable reviewers are necessary to perform review to create a training set of documents for the technology, 2) After the process is performed, both sets (the included and excluded documents) are sampled and the samples are reviewed to determine the effectiveness of the process, and 3) The resulting responsive set is generally reviewed to confirm responsiveness and also to determine whether the documents are privileged. Without manual review to train the technology and verify the results, the process would fail.

5. Predictive Coding Has to Be Perfect to Be Useful

Detractors of predictive coding note that predictive coding can miss plenty of responsive documents and is nowhere near 100% accurate. In one recent case, the producing party estimated as many as 31,000 relevant documents may have been missed by the predictive coding process. However, they also estimated that a much more costly manual review would have missed as many as 62,000 relevant documents.

Craig Ball’s analogy about the two hikers that encounter the angry grizzly bear is appropriate – the one hiker doesn’t have to outrun the bear, just the other hiker. Craig notes: “That is how I look at technology assisted review. It does not have to be vastly superior to human review; it only has to outrun human review. It just has to be as good or better while being faster and cheaper.”

So, what do you think? Do you agree that these are myths? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily Blog