eDiscovery Daily Blog

Sometimes, Your Wildcard May Not Be “Wild” Enough: eDiscovery Best Practices

On the very first day we launched this blog nearly six years ago (next Tuesday is our six year anniversary), one of our first blog posts was called “Don’t Get ‘Wild’ with Wildcards” where we showed how a poorly constructed wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining” actually retrieved over 300,000 files with hits because there are 269 words in the English language that begin with the letters “min” (such as words like “mink”, “mind”, “mint” and “minion”).  Sometimes, though, you have the opposite problem – your wildcard isn’t “wild” enough.

Last week a client of mine provided some search terms to me for review.  One of the searches he proposed included a wildcard term for depreciate* to reflect assets that depreciate.  See any problem with that term?

That wildcard would have picked up variations such as depreciates and depreciated, but would have missed other obvious variations like depreciating and, of course, depreciation.  Oops.

So, how do you find the actual variations of the word you want?  One way, as we noted back in September 2010, is to list all of the words that begin with your search string.  Morewords.com is one site that shows a list of words that begin with your search string.  So, to get all 269 words beginning with “min”, go here – simply substitute any characters for “min” to see the words that start with those characters.  You can choose the variations you want and incorporate them into the search instead of the wildcard – i.e., use “(mine or “mines or mining)” instead of “min*” to retrieve a more relevant result set.

However, if you don’t want to search through 269 words to get the ones you want, or if you picked a place to insert your wildcard character so that all desired terms don’t even display, there’s another way.  As we discussed a couple of years ago, you can use a dictionary.

Dictionary.com, that is.  Type in the word that you want at the top of the form and find all of the uses of it (e.g., the yellow sweater is mine, which tells you not all of the hits may be relevant to mining terms) and also variations of a term (e.g., depreciated, depreciating, depreciation).  You can even find synonyms of the word (e.g., reserve, excavate) on the left hand side of the form (via Thesaurus.com) that might lead to additional terms you may want to include in your search.

Believe it or not, a poorly placed wildcard may sometimes not be “wild” enough.  If you want to make sure you cover all of the variations you need (and only those variations), use a dictionary.

So, what do you think? Do you use wildcards in your eDiscovery searches? If so, how do you check them to ensure that they are neither over-inclusive nor under-inclusive?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.