eDiscovery Daily Blog

What’s in a Name? Potentially, a Lot of Permutations: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on November 13, 2012 – when eDiscovery Daily was early into its third year of existence.  Back then, the use of predictive coding instead of keyword searching was very uncommon as we had just had our first case (Da Silva Moore) approving the use of technology assisted review earlier in the year.  Now, the use of predictive coding technologies and approaches are much more common, but many (if not most) attorneys still use keyword searching for most cases.  With that in mind, let’s talk about considerations for searching names – they’re still valid close to seven years later!  Enjoy!

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think.  There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents.  Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

  • “Jim Smith”
  • “Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase.  For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search.  Proximity searching is simply looking for two or more words that appear close to each other in the document.  A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”.  Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name.  What are other variations of the name “Jim”?  How about “James” or “Jimmy”?  Or even “Jimbo”?  I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends.  Also, some documents may refer to him by his initials – i.e., “J.T. Smith”.  All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial.  If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 smith, where “w/3” is “within 3 words of”.  This is the syntax you would use to perform the search in our CloudNine Review platform.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”?  Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock).  Don’t get wild with wildcards!  Using the specific variations you want (e.g., “jim OR jimmy”) is usually best.

Next week, we will talk about another way to retrieve documents that mention key individuals – through their email addresses.  Same bat time, same bat channel!

So, what do you think?  How do you handle searching for key individuals within your document collections?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.