eDiscovery Daily Blog

You May Be a User of Predictive Coding Technology and Not Realize It: eDiscovery Trends

At the Houston ACEDS luncheon/TAR panel last week, we asked a few questions of the audience to gauge their understanding and experience with Technology Assisted Review (TAR).  Some of the questions (like “have you used TAR on a case?”) were obvious questions to ask.  Others might have not been so obvious.

Like, “do you watch movies and TV shows on Netflix or Amazon Prime?”  Or, “do you listen to music on Pandora or Spotify”?

So, why would we ask a question like that on a TAR panel?

Because those sites are examples of uses of artificial intelligence and supervised machine learning.

But first, this week’s eDiscovery Tech Tip of the Week is about Boolean Searching.  When performing searches, the ability to combine multiple criteria into a single search to be performed is key to help achieve a proper balance of recall and precision in that search.  Using OR operators between search terms helps expand recall by retrieving documents that meet ANY of the criteria; while using AND or AND NOT operators between search terms help improve precision by only retrieving documents that are responsive if they include all terms (AND) or exclude certain terms (AND NOT).

Grouping of those parameters properly is important as well.  My first name is Dozier, so a search for my name could be represented as Doug or Douglas or Dozier and Austin or it could be represented as (Doug or Douglas or Dozier) and Austin.  One of them is right.  Guess which one!  Regardless, boolean searching is an important part of efficient search and retrieval of documents to meet discovery requirements.

To see an example of how Boolean Searching is conducted using our CloudNine platform, click here (requires BrightTalk account, which is free).

Anyway, back to the topic of the day.  Let’s take Pandora, for example.  I was born in the 60’s – yes, I look GREAT for my age, :o) – and so I’m a fan of classic rock.  Pandora is a site where you can set up “stations” of your favorite artists.  If you’re a fan of classic rock and you’re born in the 60’s, you probably love an artist like Jimi Hendrix.  Right?

Well, I do and I have a Pandora account, so I set up a Jimi Hendrix “station”.  But, Pandora doesn’t just play Jimi Hendrix on that station, it plays other artists and songs it thinks I might like that are in a similar genre.  Artists like Stevie Ray Vaughan (The Sky is Crying), Led Zeppelin (Kashmir), The Doors (Peace Frog) and Ten Years After (I’d Love to Change the World), which is the example you see above.  For each song, you can listen to it, skip it, or give it a “thumbs up” or “thumbs down” (for the record, I wouldn’t give any of the above songs a “thumbs down”).  If you give a song a “thumbs up”, you’re more likely to hear the song again and if you give the song a “thumbs down”, you’re less likely to hear it again (at least in theory).

Does something sound familiar about that?

You’re training the system.  Pandora is using the feedback you give it to (hopefully) deliver more songs that you like and less of the songs you don’t like to improve your listening experience.  One nice thing about it is that you get to listen to songs or artists you may not have heard before and learn to enjoy them as well (that’s how I got to be a fan of The Black Keys, for example).

If you watch a show or movie on Netflix and you log in sometime afterward, Netflix will suggest shows for you to watch, based on what you’ve viewed previously (especially if you rate what you watched highly).

That’s what supervised machine learning is and what a predictive coding algorithm does.  “Thumbs up” is the same as marking a document responsive, “thumbs down” is the same as marking a document non-responsive.  The more documents (or songs or movies) you classify, the more likely you’re going to receive relevant and useful documents (or songs or movies) going forward.

When it comes to teaching the legal community about predictive coding, “I’d love to save the world, but I don’t know what to do”.  Maybe, I can start by teaching people about Pandora!  So, you say you’ve never used a predictive coding algorithm before?  Maybe you have, after all.  :o)

Speaking of predictive coding, is that the same as TAR or not?  If you want to learn more about what TAR is and what it could also be, check out our webcast Getting Off the Sidelines and into the Game using Technology Assisted Review on Wednesday, April 25.  Tom O’Connor and I will discuss a lot of topics related to the use of TAR, including what TAR is (or what people think it is), considerations and challenges to using TAR and how to get started using it.  To register, click here!

So, what do you think?  Have you used a predictive coding algorithm before?  Has your answer changed after reading this post?  :o)  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.