eDiscovery Daily Blog

eDiscovery Searching: Don’t Let Stop Words Stop You From Effective Searching

 

When providing searching assistance to my clients and reviewing their proposed list of search terms, one of the considerations I use for evaluating those terms is whether they contain any potential “stop” words that might affect their search results.  Stop words (also known as noise words) are words – such as to, or, not, etc. – which are so common that they are not considered useful in searches. 

Search engines rely on indexes to find information quickly – these indexes are built and updated each time documents are loaded into the database.  To save time, stop words are not indexed and are ignored in indexed searches. The advantage of excluding these words is smaller indexes and quicker indexing and searching.

However, there can be drawbacks to stop words.  One disadvantage is that if your searches are typically for common phrases, you may not be able to search with precision and you may either get additional non-responsive results or (even worse) miss some responsive results.

Leave it to Craig Ball (who discussed this during his presentation at Law Tech Texas a couple of weeks ago and also referenced it in this article for Law Technology News) to identify the perfect phrase that illustrates the problems with stop words:

“To Be or Not to Be”

This famous phrase in Shakespeare’s Hamlet would typically not be indexed at all in most search engines – every word in the phrase is a typical stop word.

If a quoted phrase in a search query includes a stop word, the search results may contain results with any word in place of the stop word. For example, a search query for "deed of trust", might contain documents with the phrases "deed and trust" or "deed under trust" in the search results.

Some search tools can provide a list of the stop words used, so that you can adjust accordingly when constructing your searches.  Some will even enable the list of stop words to be modified, so, depending on the requirements of your case, you could be able to remove certain stop words (or add others) to adjust the indexing of the data.  If the search tool allows this, you would want to do so before loading and indexing documents – or ensure that you can reindex the data if documents are already loaded.

When preparing a list of search terms, it’s important to remember that stop words exist and they could affect your search results.  Don’t let them stop you!

So, what do you think?  Have you encountered issues with stop words in your searches?  How have you addressed those issues?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

print