eDiscovery Daily Blog

eDiscovery Searching: A Great Example of Why Search Results Need to Be Tested

 

In my efforts to stay abreast of current developments in eDiscovery (and also to identify great blog post ideas!), I subscribe to and read a number of different sources for information.  That includes some of the “web crawling” services that identify articles, press releases and other publications such as the Pinhawk Law Technology Daily Digest, which is one of my favorite resources and always has interesting stories to read.  I also have a Google Alert set up to deliver stories on “e-Discovery” via a daily email.

So, I got a chuckle out of one of the stories that both sources (and probably others, as well) highlighted last week:

A+E, Discovery get ready to roll out

The story is about two of the biggest players in the global TV, A+E Networks and Discovery Networks, rolling out their channels into India and Latin America respectively.  The article proceeds to discuss the challenges of rolling out these channels into markets with various requirements and several languages and dialects included in those markets.

This story has nothing to do with eDiscovery.

Why did it wind up in the list of eDiscovery stories returned by these two services?  Because the story title “A+E, Discovery get ready to roll out” retrieved a hit on “e-Discovery”.  Many search engines are generally set to ignore punctuation when searching, so a search for “e-Discovery” actually looks like a search for “e Discovery” to a search engine (keep in mind searches are also usually case insensitive).  So, a document with a title of “A+E, Discovery get ready to roll out” could actually be viewed by a search engine as “a e discovery get ready to roll out”, causing the document to be considered a “hit” for “e discovery”.

This is just one example why search results can retrieve unexpected results.  And, why a defensible search process (such as the “STARR” approach outlined here) that involves testing and refining searches is vital to maximizing your search recall and precision.

BTW, this can happen to any search engine, so it’s not a reflection on either Pinhawk or Google.  Both are excellent resources that can occasionally retrieve non relevant results, just like any other “web crawling” service.

So, what do you think?  Did you see this story crop up in the eDiscovery listings?  Have you encountered similar examples of search anomalies?  Please share any comments you might have or if you’d like to know more about a particular topic.

print