eDiscovery Daily Blog

Searching for Individuals Isn’t as Straightforward as You Think – eDiscovery Best Practices

I’ve recently worked with a couple of clients who proposed search terms for key individuals that were a bit limited, so I thought this was an appropriate topic to revisit.

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think.  There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents.  Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

  • “Jim Smith”
  • “Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase.  For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search.  Proximity searching is simply looking for two or more words that appear close to each other in the document.  A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”.  Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name.  What are other variations of the name “Jim”?  How about “James” or “Jimmy”?  Or even “Jimbo”?  I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends.  Also, some documents may refer to him by his initials – i.e., “J.T. Smith”.  All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial.  If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 Smith, where “w/3” is “within 3 words of”.  This is the syntax you would use to perform the search in OnDemand®, CloudNine Discovery’s online review tool.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”?  Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock).  Don’t get wild with wildcards!  Using the specific variations you want (e.g., “jim OR jimmy”) is often best, though you should always test your terms (and variations of those terms) to maximize the balance between recall and precision.

Of course, there’s another way to retrieve documents that mention key individuals – through their email addresses.  We’ll touch on that topic next week.

So, what do you think?  How do you handle searching for key individuals within your document collections?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.