Searching

Have you considered the implications of time zones when it comes to your litigation needs?

by: Trent Livingston, Chief Technology Officer

Most of today’s legal technology platforms require that a time zone be selected at the time of ingestion of data. Or, in the case of forensic software, the time stamp is displayed with a time zone offset based upon the device’s time zone setting. However, when conducting a review, the de facto time zone setting for your litigation is often determined ahead of time, often based upon subjective information. This is likely the region in which the primary custodian resides. Once that time zone is selected, everything is adjusted to that time zone. It is “set in stone” so to speak. In some cases, this is fine, but in others, it can complicate things, especially if you want to alter your time zone mid-review.

Let’s start by understanding time zones, which immediately begs the question, “how many time zones are there in the world?” After all, it can’t be that many, right? Well, don’t start up your time machine just yet! To summarize a Quora answer (https://www.quora.com/How-many-timezones-do-we-have-in-the-world) we arrive at the following confusing mess.

Spanning our globe, there are a total of 41 different time zones. Given the number of time zones, “shifting time” (so to speak) can be of the utmost importance when examining evidentiary data.

If everything is set to Eastern Standard Time but does not properly allocate for time zone changes, a software application could arbitrarily alter a time stamp inconsistently, and consistency is what really matters! What happens if two of the parties to a matter are in New York while two of the parties are in Arizona? Arizona does not observe Daylight Saving Time. This could result in a set of timestamps being thrown off by an hour spanning approximately five months of the data set (based upon Daylight Saving Time rules). Communication responses that may have happened within minutes now seemingly occur an hour later (or earlier depending on how to look at it). Forensic records could fall out of sync with other evidentiary data and communications or, worse yet, sworn testimony. The key is to ensure consistency to avoid confusion.

CloudNine’s ESI Analyst (ESIA) normalizes everything to Coordinated Universal Time (UTC) upon ingestion, leveraging the original time zone or offset. By doing this, ESIA can display the time zone of the project manager’s choosing (either set at the project level or by the specific user’s account time zone setting). This allows for the time stamp display of any evidence to be changed at any time to the desired time zone across an entire project, allowing for the dynamic view of time stamps. Not only can it be changed during a review, but also set at export. All original metadata is stored, and available during export so that the adjusted time stamp can be leveraged for timelines, while the original time stamp and time zone settings are preserved for evidentiary purposes.

When performing analysis of disparate data sets, this methodology allows users to adjust data to see relative time stamps to a particular party involved in that specific investigation. For example, an investigation may involve multiple parties that are all located in different time zones. Additionally, these users may be traveling to different countries. Adjusting everything to Eastern Time may show text messages arriving and being responded to in the late hours of the day not accounting for the fact that perhaps the user was abroad and was actually responding during normal business hours.

While seemingly innocuous, it can make a big difference in how a jury perceives the action of the party, depending on the nature of the investigation.

As they say… “timing is everything!” especially when it comes to digital evidence in today’s modern era.

Now, where did I leave my keys to my DeLorean?

Learn more about CloudNine ESI Analyst and its ability to deduplicate, search, filter, and adjust time zones across all data types at once here.

Kroll Leverages ESI Analyst for Case Insights: CloudNine Podcasts

Without the right tools, sorting through a large dataset is akin to stumbling in the dark. Before deep-diving into voluminous data, legal teams need to know what to look for. The sooner those insights are found, the better. For years, attorneys uploaded data to traditional review platforms to win their clients and firm a head start. Since the platforms offered minimal searching tools, attorneys meticulously combed through mobile device data text by text. This process is not only time-consuming but also inefficient. Valuable case insights are easy to miss when hidden amongst other information.

CloudNine Senior Director, Rick Clark, kicks off the new 360 Innovate Podcast through an interview with Phil Hodgkins, Director of Data Insights and Forensics at Kroll. As a growing global practice, Kroll is well-versed in managing data-heavy projects involving compliance, investigations, and litigations. While conducting an internal investigation, Kroll learned how ESI Analyst’s capabilities surpassed those of two traditional review platforms. Through its various identification and visualization features, ESI Analyst yielded larger insights at a much faster rate. To learn how the Kroll team utilized ESI Analyst to strategically navigate through a broad dataset, visit this link: https://cloudnine.com/webcasts/kroll-innovate/?pg=ediscoverydaily/searching/kroll-leverages-esi-analyst-for-case-insights-cloudnine-podcasts

Getting the Most out of Your Keyword Searches

Though a more basic searching technique, keyword searches allow professionals to identify one or two specific words from multiple documents. Nowadays, keyword searches are considered inferior to the successor, predictive coding (TAR). In comparison to TAR, the “outdated” search method is more expensive and time-consuming. Keyword searches are also less predictable; when filtering through the same data set, keyword searches yield fewer results. Based on these flaws, some would argue that keyword searches are a dying technique. So, why bother talking about them at all? Though keyword searches have their flaws, they are far from obsolete. Some legal teams prefer to utilize manual review, recognizing it as a tried-and-true method. For example, the defendants in Coventry Capital U.S., LLC v. EEA Life Settlements, Inc. attempted to use TAR in 2020 to resolve the fraud case, but they argued the process was “protracted and contentious.” Thus, Judge Sarah L. Cave declined to compel the inclusion of TAR. [1] Similar outcomes occurred in cases such as Hyles v. New York City (2016) and In re Viagra (Sildenafil Citrate) Prods. Liab. Lit. (2016). In both cases, the court refused to mandate the usage of TAR when the responding party demonstrated a clear preference for keyword searching. [2] With this knowledge in mind, it’s important to recognize that keyword searches are still effective when done right.

Five Tips for Effective Keyword Searches

  1. Good communication is crucial.

Consult your custodians before running your searches. Use the conversations to identify any specific terms or abbreviations that may be relevant to your review. If necessary, you may also need to speak with an experienced advisor. Through their expertise, they can assist you with the sampling and testing process. Advisors are a great way to save time and money for everyone involved.

  1. Create and test your initial set of terms.

Everyone has to start somewhere. Your initial search terms don’t have to be perfect. While constructing your list, estimate how many results you expect each term to yield. Once you’ve run your test, evaluate how the search results compare to your expectations. If you received significantly fewer results than anticipated, adjust the search terms as needed. You may have to refine your search list multiple times. Anticipate this possibility to avoid missing any deadlines.  [3]

  1. Limit searches that include wildcards and/or numbers.

When searching for words with slight differences, it’s better to search for each variation rather than use wildcards. For example, you should set up individual searches for “email” and “emails” instead of using “email*” as a search term. Numbers can also be a problem if not done correctly (i.e. searching for the number 10 will show results for 100, 1 000, etc.). Make sure to place the number in quotes to avoid this issue.

  1. Count the characters.

Search terms with four or fewer characters are likely to yield false hits. Short words or abbreviations like HR or IT may be identified in longer, unrelated results. Filtering out the false hits requires extra review time and money.

  1. Know how to search for names properly.

Avoid searching for custodian names. Their name will most likely be attached to more documents and hits than expected or desired. When searching for non-custodians, place “w/2” between their first and last name. Doing so will show all variations of the full name. Finally, consider searching for nicknames to get even more results. Ask the client what nicknames they respond to before making your search term list. [4]

 

[1] Doug Austin, “Court Rules for Defendant on TAR and (Mostly) Custodian Disputes: eDiscovery Case Law,” eDiscovery Today, January 12, 2021.

[2] “How Courts Treat ‘Technology Assisted Review’ in Discovery,” Rivkin Radler, March 13, 2019.

[3] “Improving the effectiveness of keyword search terms,” E-discovery Consulting, November 11, 2021.

[4] Kathryn Cole, “Key Word Searching – What Is It? And How Do I Do It (Well)?,” All About eDiscovery, December 9, 2016.

Here’s a Terrific Listing of eDiscovery Workstream Processes and Tasks: eDiscovery Best Practices

Let’s face it – workflows and workstreams in eDiscovery are as varied as organizations that conduct eDiscovery itself.  Every organization seems to do it a little bit differently, with a different combination of tasks, methodologies and software solutions than anyone else.  But, could a lot of organizations improve their eDiscovery workstreams?  Sure.  Here’s a resource (that you probably already know well) which could help them do just that.

Rob Robinson’s post yesterday on his terrific Complex Discovery site is titled The Workstream of eDiscovery: Considering Processes and Tasks and it provides a very comprehensive list of tasks for eDiscovery processes throughout the life cycle.  As Rob notes:

“From the trigger point for audits, investigations, and litigation to the conclusion of cases and matters with the defensible disposition of data, there are countless ways data discovery and legal discovery professionals approach and administer the discipline of eDiscovery.  Based on an aggregation of research from leading eDiscovery educators, developers, and providers, the following eDiscovery Processes and Tasks listing may be helpful as a planning tool for guiding business and technology discussions and decisions related to the conduct of eDiscovery projects. The processes and tasks highlighted in this listing are not all-inclusive and represent only one of the myriads of approaches to eDiscovery.”

Duly noted.  Nonetheless, the list of processes and tasks is comprehensive.  Here are the number of tasks for each process:

  • Initiation (8 tasks)
  • Legal Hold (11 tasks)
  • Collection (8 tasks)
  • Ingestion (17 tasks)
  • Processing (6 tasks)
  • Analytics (11 tasks)
  • Predictive Coding (6 tasks)*
  • Review (17 tasks)
  • Production/Export (6 tasks)
  • Data Disposition (6 tasks)

That’s 96 total tasks!  But, that’s not all.  There are separate lists of tasks for each method of predictive coding, as well.  Some of the tasks are common to all methods, while others are unique to each method:

  • TAR 1.0 – Simple Active Learning (12 tasks)
  • TAR 1.0 – Simple Passive Learning (9 tasks)
  • TAR 2.0 – Continuous Active Learning (7 tasks)
  • TAR 3.0 – Cluster-Centric CAL (8 tasks)

The complete list of processes and tasks can be found here.  While every organization has a different approach to eDiscovery, many have room for improvement, especially when it comes to exercising due diligence during each process.  Rob provides a comprehensive list of tasks within eDiscovery processes that could help organizations identify steps they could be missing in their processes.

So, what do you think?  How many steps do you have in your eDiscovery processes?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

In the case In re Mercedes-Benz Emissions Litig., No. 2:16-cv-881 (KM) (ESK) (D.N.J. Jan. 9, 2020), Special Master Dennis Cavanaugh (U.S.D.J., Ret.) issued an order and opinion stating that he would not compel defendants to use technology assisted review (TAR), and instead adopted the search term protocol negotiated by the parties, with three areas of dispute resolved by his ruling.

Case Background

In this emissions test class action involving an automobile manufacturer, the plaintiffs proposed that the defendants use predictive coding/TAR, asserting that TAR yields significantly better results than either traditional human “eyes on” review of the full data set or the use of search terms.  The plaintiffs also argued that if the Court were to decline to compel the defendants to adopt TAR, the Court should enter its proposed Search Term Protocol.

The defendants argued that there is no authority for imposing TAR on an objecting party and that this case presented a number of unique issues that would make developing an appropriate and effective seed set challenging, such as language and translation issues, unique acronyms and identifiers, redacted documents, and technical documents. As a result, they contended that they should be permitted to utilize their preferred custodian-and-search term approach.

Judge’s Ruling

Citing Rio Tinto Plc v. Vale S.A., Special Master Cavanaugh quoted from that case in stating: “While ‘the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it’…, no court has ordered a party to engage in TAR over the objection of that party. The few courts that have considered this issue have all declined to compel predictive coding.”  Citing Hyles v. New York City (another case ruling by now retired New York Magistrate Judge Andrew J. Peck), Special Master Cavanaugh stated: “Despite the fact that it is widely recognized that ‘TAR is cheaper, more efficient and superior to keyword searching’…, courts also recognize that responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for producing their own electronically stored information.”

As a result, Special Master Cavanaugh ruled: “While the Special Master believes TAR would likely be a more cost effective and efficient methodology for identifying responsive documents, Defendants may evaluate and decide for themselves the appropriate technology for producing their ESI. Therefore, the Special Master will not order Defendants to utilize TAR at this time. However, Defendants are cautioned that the Special Master will not look favorably on any future arguments related to burden of discovery requests, specifically cost and proportionality, when Defendants have chosen to utilize the custodian-and-search term approach despite wide acceptance that TAR is cheaper, more efficient and superior to keyword searching. Additionally, the denial of Plaintiffs’ request to compel Defendants to utilize TAR is without prejudice to revisiting this issue if Plaintiffs contend that Defendants’ actual production is deficient.”

Special Master Cavanaugh also ruled on areas of dispute regarding the proposed Search Term Protocol, as follows:

  • Validation: Special Master Cavanaugh noted that “the parties have been able to reach agreement on the terms of Defendants’ validation process, [but] the parties are at an impasse regarding the level of validation of Plaintiffs’ search term results”, observing that “Plaintiffs’ proposal does not articulate how it will perform appropriate sampling and quality control measures to achieve the appropriate level of validation.” As a result, Special Master Cavanaugh, while encouraging the parties to work together to develop a reasonable procedure for the validation of Plaintiffs’ search terms, ruled: “As no articulable alternative process has been proposed by Plaintiffs, the Special Master will adopt Defendants’ protocol to the extent that it will require the parties, at Defendants’ request, to meet and confer concerning the application of validation procedures described in paragraph 12(a) to Plaintiffs, if the parties are unable to agree to a procedure.”
  • Known Responsive Documents & Discrete Collections: The defendants objected to the plaintiffs’ protocol to require the production of all documents and ESI “known” to be responsive as “vague, exceedingly burdensome, and provides no clear standard for the court to administer or the parties to apply”. The defendants also objected to the plaintiffs’ request for “folders or collections of information that are known to contain documents likely to be responsive to a discovery request” as “overly broad and flouts the requirement that discovery be proportional to the needs of the case.”  Noting that “Defendants already agreed to produce materials that are known to be responsive at the November status conference”, Special Master Cavanaugh decided to “modify the Search Term Protocol to require production of materials that are ‘reasonably known’ to be responsive.”  He also decided to require the parties to collect folders or collections of information “to the extent it is reasonably known to the producing party”, also requiring “the parties to meet and confer if a party believes a discrete document folder or collection of information that is relevant to a claim or defense is too voluminous to make review of each document proportional to the needs of the case.”

So, what do you think?  Should a decision not to use TAR negatively impact a party’s ability to make burden of discovery arguments?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Related to this topic, Rob Robinson’s Complex Discovery site published its Predictive Coding Technologies and Protocols Spring 2020 Survey results last week, which (as always) provides results on most often used primary predictive coding platforms and technologies, as well as most-often used TAR protocols and areas where TAR is most used (among other results).  You can check it out at the link directly above.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiffs’ Failure to “Hurry” Leads to Denial of Motion to Compel: eDiscovery Case Law

Sorry, I couldn’t resist… ;o)

In Hurry Family Revocable Trust, et al. v. Frankel, No. 8:18-cv-2869-T-33CPT (M.D. Fla. Jan. 14, 2020), the Florida District Court judge denied the Plaintiffs’ Motion to Compel Production of Documents and Request for Sanctions, ruling the motion to be untimely, given that the extended discovery deadline had passed, and also rejected the plaintiffs’ argument that the defendant had willfully avoided producing certain emails.

Case Background

In this dispute involving a former employee of the plaintiffs and claims that he used their confidential information and trade secrets, the Court entered a Case Management and Scheduling Order (CMSO) in January 2019 establishing various deadlines, including a discovery deadline of July 26, 2019, and a trial date of February 3, 2020.  The CMSO warned the parties that “[t]he Court may deny as untimely all motions to compel filed after the discovery deadline.”  In May 2019, the plaintiffs filed a motion to modify the CMSO and the Court extended the discovery deadline to August 9, 2019, but also cautioned the parties, however, that it would “be disinclined to extend…the [discovery] deadline[ ] further.”  Nonetheless, the plaintiffs sought to modify the CMSO two more times – the second time after the discovery deadline on August 12, 2019 – but the court denied both motions, stating after the second one:

“The Court has already extended the discovery deadline in this case to August 9, 2019, at the Plaintiffs’ request. The Court has also repeatedly warned Plaintiffs that it would be disinclined to extend deadlines further. Yet Plaintiffs filed this third motion to modify the Case Management and Scheduling Order on August 12, 2019, after the extended discovery deadline had passed…..As for the documents that Plaintiffs claim Defendant has failed to produce, Plaintiffs were aware of those missing documents since August 6 and/or 7, 2019, and failed to file a motion to compel prior to the discovery deadline. As the Court advised in its Case Management and Scheduling Order, ‘[f]ailure to complete discovery within the time established by this Order shall not constitute cause for a continuance.’”

Roughly four months after the Court’s August 20 Order, the plaintiffs filed an instant motion to compel after the plaintiffs received five emails from third parties that were not produced by the plaintiff.  The plaintiff requested an order directing that: (1) the defendant’s “email accounts, cloud storage, and digital devices” be subjected to a “third party search” for responsive documents at his expense; (2) “[Frankel] be precluded from testifying or offering evidence on issues related to categories of discovery withheld by [Frankel];” and (3) “adverse inferences be made against [Frankel] related to categories of discovery withheld by [Frankel].”

Judge’s Ruling

Noting that “Hurry waited to submit the instant motion until four months after the discovery deadline and only two months before trial”, the court stated: “Hurry’s proffered excuse for this extended delay is unpersuasive. When pressed on the matter at the hearing, Hurry conceded that it knew about the Koonce and FINRA emails by no later than early August 2019. It also admitted that it elected to place the instant motion on the ‘backburner’ while it dealt with its motion for summary judgment. Hurry’s evident lack of diligence in pursuing its motion to compel alone is fatal to that request.”

Continuing, the court stated: “Even were that not the case, Hurry has not shown that it is entitled to the relief it seeks. The central premise of its motion is that Frankel willfully avoided producing the Koonce and FINRA emails. In both his response and at the hearing, however, Frankel persuasively argued that his failure to produce these emails was not purposeful, but stemmed from the fact that the emails were not detected during the search Frankel conducted in connection with Hurry’s production requests. Frankel also noted he informed Hurry of the parameters of that search in advance, and Hurry did not object to those parameters.”

So, what do you think?  Should identification of new emails from third parties justify re-opening discovery?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

It’s a Mistake to Ignore the Mistakes: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

These posts were originally published on September 22, 2010 and September 23, 2010, the third and fourth days of eDiscovery Daily’s existence (yep, I combined two into one for this throwback edition and updated the misspelling site resources to replace a couple of defunct ones with current ones).  It continues to amaze me how some of the attorneys I work with fail to account for potential misspellings and typos in ESI when developing a search strategy (especially requesting parties looking to maximize recall of potentially responsive ESI).  FWIW, since publishing these two blog posts, I had an actual case where searching for misspellings of the word “management” yielded additional responsive documents, so it works.  Enjoy!

How many times have you received an email sent to “All Employees” like this?  “I am pleased to announce that Joe Smith has been promoted to the position of Operations Manger.”

Do you cringe when you see an email like that?  I do.  I cringe even more when an email like that comes from me, which happens more often than I’d like to admit.

Of course, we all make mistakes.  And, forgetting that fact can be costly when searching for, or requesting, relevant documents in eDiscovery.  For example, if you’re searching for e-mails that relate to management decisions, can you be certain that “management” is spelled perfectly throughout the collection?  Unlikely.  It could be spelled “managment” or “mangement” and you would miss those potentially critical emails without an effective plan to look for them.

Finding Misspellings Using Fuzzy Searching

How do you find them if you don’t know how they might be misspelled?  Pretty much any eDiscovery application these days (including CloudNine products), support the ability to perform fuzzy searching.  So, if you’re looking for someone named “Brian”, you can find variations such as “Bryan” or even “brain” – that could be relevant but were simply misspelled.  Fuzzy searching is the best way to broaden your search to include potential misspellings.

Examples of Sites for Common Misspellings

However, another way to identify misspellings is to use a resource that tracks the most typical misspellings for common words and search for them explicitly.  The advantage of that is that you can pinpoint the likeliest misspellings while excluding other hits retrieved via fuzzy search that might be other terms altogether.  Here are a few sites you can check for common misspellings:

At Dumbtionary.com, you can check words against a list of over 10,000 misspelled words.  Simply type the correct word into the search box with a “plus” before it (e.g., “+management”) to get the common misspellings for that word.  You can also search for misspelled names and places.

Wikipedia has a list of common misspellings as well.  It breaks the list down by starting letter, as well as variations on 0-9 (e.g., “3pm” or “3 pm”).  You can go to the starting letter you want to search, then do a “find” on the page (by pressing Ctrl+F) and type in the string to search.

McMillan Dictionary, EnglishCLUB and ESL-Lounge also provide lists of commonly misspelled words, 50, 100 and 118 commonly misspelled words respectively.  There are numerous sites out there with common misspellings available and they are usually listed alphabetically so that you can quickly check for common misspellings of potential search terms you plan to use.

So, what do you think?  Do you have any real-world examples of how fuzzy searching or searches for misspelled words have aided in eDiscovery search and retrieval?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Don’t Get “Wild” with Wildcards: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on September 20, 2010 – which was the day eDiscovery Daily was launched!  We launched that day with an announcement post, this post and our first case law post where Judge Paul Grimm actually ordered the defendant to be imprisoned for up to two years or until he paid the plaintiff “the attorney’s fees and costs that will be awarded to Plaintiff as the prevailing party pursuant to Fed. R. Civ. P. 37(b)(2)(C).”  (Spoiler alert – the defendant didn’t ultimately go to jail, but was ordered to pay over 1 million dollars to the plaintiff)…

Even before the 2015 Federal Rules changes, we didn’t see any other cases where the parties were threatened with jail time.  But I personally have seen several instances where parties still want to get “wild” with wildcards.  We even covered a case where the parties negotiated terms that included the wildcard for “app*” because they were looking for phone applications or apps (an even more extreme example than the one I detail below).  Check it out too.  And, enjoy this one as well!  It’s as relevant today as it was (almost) nine years ago!

A while ago, I provided search strategy assistance to a client that had already agreed upon several searches with opposing counsel.  One search related to mining activities, so the attorney decided to use a wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining”.

That one search retrieved over 300,000 files with hits.

Why?  Because there are 269 words in the English language that begin with the letters “min”.  Words like “mink”, “mind”, “mint” and “minion” were all being retrieved in this search for files related to “mining”.  We ultimately had to go back to opposing counsel and negotiate a revised search that was more appropriate.

How do you ensure that you’re retrieving all variations of your search term?

Stem Searches

One way to capture the variations is with stem searching.  Applications that support stem searching give you an ability to enter the root word (e.g., mine) and it will locate that word and its variations.  Stem searching provides the ability to find all variations of a word without having to use wildcards.

Other Methods

If your application doesn’t support stem searches, Morewords.com shows list of words that begin with your search string (e.g., to get all 269 words beginning with “min”, go here – simply substitute any characters for “min” to see the words that start with those characters).  Choose the variations you want and incorporate them into the search instead of the wildcard – i.e., use “(mine or “mines or mining)” instead of “min*” to retrieve a more relevant result set.

Many applications let you preview the wildcard variations you wish to use before running them.  For example, our CloudNine Review solution (shameless plug warning!) performs a preview when you start to type in a search term to show you words within the collection that begin with that string.  As a result, you can identify an overbroad term before you agree to it.

So, what do you think?  Have you ever been “burned” by wildcard searching?  Do you have any other suggested methods for effectively handling them?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Searching for Email Addresses Can Have Lots of Permutations Too: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on November 15, 2012 – when eDiscovery Daily was early into its third year of existence and continues the two-part series we started last week.  Email addresses still provide the same opportunities and challenges for identifying documents associated with individuals that they did nearly seven years ago.  Enjoy!

Last week, we discussed the various permutations of names of individuals to include in your searching for a more complete result set, as well as the benefits of proximity searching (broader than a phrase search, more precise than an AND search) to search for names of individuals.  Another way to identify documents associated with individuals is through their email addresses.

Variations of Email Addresses within a Domain

You may be planning to search for an individual based on their name and the email domain of their company (e.g., daustin@cloudnine.com), but that’s not always inclusive of all possible email addresses for that individual.  Email addresses for an individual’s domain might appear to be straightforward, but there might be aliases or other variations to search for to retrieve emails to and from that individual at that domain.  For example, here are three of the email addresses to which I can receive email as a member of CloudNine:

To retrieve all of the emails to and from me, you would have to include all of the above addresses (and others too).  There are other variations you may need to account for, as well.  Here are a couple:

  • Jim Smith[/O=FIRST ORGANIZATION/OU=EXCHANGE ADMINISTRATIVE GROUP (GZEJCPIG34TQEMU)/CN=RECIPIENTS/CN=JimSmith] (legacy Exchange distinguished name from old versions of Microsoft Exchange);
  • IMCEANOTES-Andy+20Zipper_Corp_Enron+40ECT@ENRON.com (an internal Lotus Notes representation of an email address from the Enron Data Set).

As you can see, email addresses from the business domain can be represented several different ways, so it’s important to account for that in your searching for emails for your key individuals.

Personal Email Addresses

Raise your hand if you’ve ever sent any emails from your personal email account(s) through the business domain, even if it’s to remind you of something.  I suspect most of your hands are raised – I know mine is.  Identifying personal email accounts for key individuals can be important for two reasons: 1) those emails within your collection may also be relevant and, 2) you may have to request additional emails from the personal email addresses in discovery if it can be demonstrated that those accounts contain relevant emails.

Searching for Email Addresses

To find all of the relevant email addresses (including the personal ones), you may need to perform searches of the email fields for variations of the person’s name.  So, for example, to find emails for “Jim Smith”, you may need to find occurrences of “Jim”, “James”, “Jimmy”, “JT” and “Smith” within the “To”, “From”, “Cc” and “Bcc” fields.  Then, you have to go through the list and identify the email addresses that appear to be those for Jim Smith.  Any email addresses for which you’re not sure whether they belong to the individual or not (e.g., does jsmith1963@gmail.com belong to Jim Smith or Joe Smith?), you may need to retrieve and examine some of the emails to make that determination.  If he uses nicknames for his personal email addresses (e.g., huggybear2012@msn.com), you should hopefully be able to identify those through emails that he sends to his business account.

To summarize, searching by email address is another way to identify documents pertaining to a key individual.  The key is making sure your search includes all the email addresses possible for that individual.

So, what do you think?  How do you handle searching for key individuals within your document collections?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

So many topics, so little time!  Rob Robinson published the latest Predictive Coding and Technologies and Protocols Survey on his excellent ComplexDiscovery site last week, but this is the first chance I’ve had to cover it.  The results are in and here are some of the findings in the largest response group for this survey yet.

As Rob notes in the results post here, the third Predictive Coding Technologies and Protocols Survey was initiated on August 23 and concluded on September 5 with individuals invited to participate directly by ComplexDiscovery and indirectly by industry website, blog, and newsletter mentions – including a big assist from the Association of Certified E-Discovery Specialists (ACEDS).  It’s a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies and protocols from data discovery and legal discovery professionals within the eDiscovery ecosystem.  The survey was designed to provide a general understanding of predictive coding technologies and protocols and had two primary educational objectives:

  • To provide a consolidated listing of potential predictive coding technology and protocol definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
  • To ask eDiscovery ecosystem professionals about their usage and preferences of predictive coding platforms, technologies, and protocols.

There were 100 total respondents in the survey (a nice, round number!).  Here are some of the more notable results:

  • 39 percent of responders were from law firms, 37 percent of responders were from software or services provider organizations, and the remaining 24 percent of responders were either part of a consultancy (12 percent), a corporation (6 percent), the government (3 percent), or another type of entity (3 percent).
  • 86 percent of responders shared that they did have a specific primary platform for predictive coding versus 14 percent who indicated they did not.
  • There were 31 different platforms noted as primary predictive platforms by responders, nine of which received more than one vote and they accounted for more than three-quarters of responses (76 percent).
  • Active Learning was the most used predictive coding technology, with 86 percent reporting that they use it in their predictive coding efforts.
  • Just over half (51 percent) of responders reported using only one predictive coding technology in their predictive coding efforts.
  • Continuous Active Learning (CAL) was (by far) the most used predictive coding protocol, with 82 percent reporting that they use it in their predictive coding efforts.
  • Maybe the most interesting stat: 91 percent of responders reported using technology-assisted review in more than one area of data and legal discovery. So, the uses of TAR are certainly expanding!

Rob has reported several other results and provided graphs for additional details.  To check out all of the results, click here.  Want to compare to the previous two surveys?  They’re here and here:o)

So, what do you think?  Do any of the results surprise you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © FremantleMedia North America, Inc.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.