eDiscovery Daily Blog

eDiscovery Best Practices: First Name Searches Are Not Always Proper

I’ve worked with numerous clients over the years and provided assistance regarding searching best practices to maximize recall without sacrificing search precision, including the use of fuzzy and synonym searches to identify additional potentially responsive files and sampling to test the effectiveness of searches.  In several cases, the initial list of proposed search terms sent to me by the client includes first names of several individuals to search for as standalone terms.  Unfortunately, first names don’t always make the best search terms.

Why?  Because, in many cases, the first names are so common that they can apply to several people, not just the desired individuals to be retrieved.  Depending on the size of the collection, searching for names like “Bob”, “Bill”, “Cathy”, “Jim”, “Karen” or “Pat” could retrieve many additional files to be reviewed for numerous individuals other than those specifically sought, potentially driving up review costs unnecessarily.

Another issue with first name searches is the potential variations in first names that must be included to ensure that retrieval is complete.  Take this name, for example:

“Billy Bob Byrd”

To adequately perform a first name search, your search might need to include the following: “Billy”, “Bill”, “William”, “WR” (for “William Robert”), “Bob”, “Bobby”, “Robert” and maybe even “BB” (or “BBB”).  Searching for all these terms could yield many additional hits that are probably not responsive, costing time and money to review.  While emails and other informal communications may just refer to him as “Billy Bob”, more formalized communications such as financial documents would probably refer to his name differently.  So, it’s important to include all potential variations, several of which could add considerably more false hits.

You also have the potential that the name might also have another meaning.  For example, “Bill” can be a person’s name, but “bill” is another word for invoice (keep in mind that most search engines are case insensitive, so it doesn’t matter if it’s capitalized or not).  So, searching for “bill” as a person would also yield every instance where an invoice is referred to as a “bill”.

With that in mind, it’s important to get the complete names of the people you’re searching for, as well as any known nicknames, so that you can then make decisions on the best terms to use to retrieve the most hits for each person.  Consider these names:

  • Terry Bradshaw: “Terry” is a fairly common name, so I might opt to search for “Bradshaw” first and see what I get.  Or, to limit further, retrieve only documents where both “Terry” and “Bradshaw” are both mentioned.
  • Jay Leno: Same here, “Jay” is common, “Leno” is more unique.
  • Jennifer Lopez: “Jennifer” is more common than “Lopez”, though both are fairly common.  I would search for “Lopez” first, but assuming that the client provided the nickname “JLo”, I would search for that alternative also (if not, that would hopefully fall out during review as an additional term to search for).
  • Shaquille O’Neal: This is one case where the first name is actually more unusual than the last name, so I might prefer to search for “Shaquille” and would also search for the nickname of “Shaq”.

Of course, there may be occasions where only the first name is mentioned in a document without the last name.  If you can, try to combine with some other criteria to refine the broad search for the first name, such as email address of the individual in question or email addresses of those most likely to be talking about that individual.

What about the instances where both the first and last names are common?  What about my name, “Doug Austin”?  “Doug” isn’t an extremely common first name, but it’s somewhat common, and “Austin” is the name of a city.  Searching for either term by itself could be overbroad.  So, it makes sense to try to combine them.  To do so in a phrase search, however, could be limiting as searching for “Doug Austin” could miss occurrences of “Austin, Doug”.  Conducting the search as a proximity search (e.g., “Doug within 3 words of Austin”) will catch variations, regardless of order.

This is just one example why keyword searching isn’t an exact science.  These aren’t necessarily hard and fast rules and each situation is different.  It’s important to randomly sample and test search terms to ensure an appropriate balance of recall and precision.  Of course, parties sometimes agree that it may be necessary to include first names as standalone terms, even when they are common and may retrieve a high number of additional files that are not responsive, though testing those terms before negotiating with opposing counsel can help you to be prepared to negotiate a more favorable set of terms.

So, what do you think?  Do your search term lists include standalone first names?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

print