Searching Archives

TIFFs, PDFs, or Neither: How to Select the Best Production Format

January 27, 2022

Through Rule 34(b) of the FRCP, the requesting party may select the form(s) of production based on the needs of the case. Though this flexibility better serves the client, it also begs a few important questions: What is the best form of production? Is there one right answer? Since there are multiple types of ESI, it’s hard to definitively say that one format type is superior. Arguably, any form is acceptable so long as it facilitates “orderly, efficient, and cost-effective discovery.” Requesting parties may ask for ESI to be produced in native, PDF, TIFF, or paper files. Determinations typically consider the production software’s capabilities as well as the resources accessible to the responding party. [1] The purpose of this article is to weigh the advantages and disadvantages of each type so that legal teams can make informed decisions in the future.

Production Options

Native – As the often-preferred option, native files are produced in the same format in which the ESI was created. Since native files require no conversions, they save litigants time and money. True natives also contain metadata and other information that TIFF and PDF files may lack. Litigants may also be interested in native files for their clear insights into dynamic content (such as comments and animations). TIFFs and PDFs can only process dynamic content through overlapping static images. This cluttered format is often confusing and hard to decipher. Though useful, litigants must be careful with the metadata and dynamic content because they may contain sensitive or privileged information. [2] Native files may seem like the superior choice, but they aren’t always an option. Unfortunately, some ESI types cannot be reviewed unless they are converted into a different form. Additionally, reviewers utilizing this format are unable to add labels or redactions to the individual pages.
TIFF – TIFFs (tagged image format files) are black and white, single-paged conversions of native files. Controllable metadata fields, document-level text, and an image load file are included in this format. Though TIFFs are more expensive to produce than native files, they offer security in the fact that they cannot be manipulated. Other abilities that differentiate TIFFs include branding, numbering, and redacting information. [3] To be searchable, TIFFs must undergo Optical Character Recognition (OCR). OCR simply creates a text version of the TIFF document for searching purposes.
PDFs – Similar to TIFFs, PDFs also produce ESI through static images. PDFs can become searchable in two ways. The reviewer may choose to simply save the file as a searchable document, or they can create an OCR to accompany the PDF. However, OCR cannot guarantee accurate search results for TIFFs or PDFs. [1]Advocates for PDFs cite the format’s universal compatibility, small file size, quick download speeds, clear imaging, and separate pages. [4]
Paper – As the least expensive option, paper production may be used for physical documents or printing digital documents. Many litigants prefer to avoid paper productions because they don’t permit electronic review methods. All redactions and bates stamps must be completed manually. This may be okay for a case that involves a small amount of ESI. However, manually sorting and searching through thousands of documents is time-consuming and exhausting. Litigants who opt for this format also miss out on potentially relevant metadata. [3]

[1] Clinton P. Sanko and Cheryl Proctor, “The New E-Discovery Battle of the Forms,” For The Defense, 2007.

[2] “Native File,” Thomas Reuters Practical Law.

[3] Farrell Pritz P.C. “In What Format Should I Make My Production? And, Does Format Matter?” All About eDiscovery, May 30, 2019.

[4] “PDF vs. TIFF,” eDiscovery Navigator, February 13, 2007.

Working with CloudNine Explore and PST Attachments

April 20, 2021

#Did You Know: Yes, users really DO attach PSTs to emails! When examining your early case data, you need complete content visibility including the multiple layers of PST and OST attachments within email containers.

Older processing engines have trouble extracting certain archive containers especially when those containers have PSTs and OSTs attached to emails. In these cases, the processing may skip over the attached email container or record it as an error.

CloudNine Explore fully expands the data container, without creating duplicate files to process its contents in full, including multi-layered PST and OST files.

A custodian creates a PST file containing several dozen messages about a particular topic and emails it to a co-worker.

Earlier processing engines could process the email sent to the co-worker but could not expand the attached PST to process its contents without requiring a separate and manual process.

Visual example of a PST file in a Zip file, with a PST attachment:

Explore uses the newest extraction technologies to fully expand the attached PST and collect the metadata and emails contained within. The manual processes are not necessary, and the data is fully expanded and available for searching and review.

Example display of extracted email and metadata within Explore:

Have the assurance of a thorough early case assessment to find hidden or multi-layered files with CloudNine Explore.

Learn how to automate your eDiscovery with the legal industry’s most powerful processing and early case assessment tool. Click the button below to schedule a demo with a CloudNine eDiscovery specialist.

Here’s a Terrific Listing of eDiscovery Workstream Processes and Tasks: eDiscovery Best Practices

February 26, 2020

Let’s face it – workflows and workstreams in eDiscovery are as varied as organizations that conduct eDiscovery itself. Every organization seems to do it a little bit differently, with a different combination of tasks, methodologies and software solutions than anyone else. But, could a lot of organizations improve their eDiscovery workstreams? Sure. Here’s a resource (that you probably already know well) which could help them do just that.

Rob Robinson’s post yesterday on his terrific Complex Discovery site is titled The Workstream of eDiscovery: Considering Processes and Tasks and it provides a very comprehensive list of tasks for eDiscovery processes throughout the life cycle. As Rob notes:

“From the trigger point for audits, investigations, and litigation to the conclusion of cases and matters with the defensible disposition of data, there are countless ways data discovery and legal discovery professionals approach and administer the discipline of eDiscovery. Based on an aggregation of research from leading eDiscovery educators, developers, and providers, the following eDiscovery Processes and Tasks listing may be helpful as a planning tool for guiding business and technology discussions and decisions related to the conduct of eDiscovery projects. The processes and tasks highlighted in this listing are not all-inclusive and represent only one of the myriads of approaches to eDiscovery.”

Duly noted. Nonetheless, the list of processes and tasks is comprehensive. Here are the number of tasks for each process:

Initiation (8 tasks)
Legal Hold (11 tasks)
Collection (8 tasks)
Ingestion (17 tasks)
Processing (6 tasks)
Analytics (11 tasks)
Predictive Coding (6 tasks)*
Review (17 tasks)
Production/Export (6 tasks)
Data Disposition (6 tasks)

That’s 96 total tasks! But, that’s not all. There are separate lists of tasks for each method of predictive coding, as well. Some of the tasks are common to all methods, while others are unique to each method:

TAR 1.0 – Simple Active Learning (12 tasks)
TAR 1.0 – Simple Passive Learning (9 tasks)
TAR 2.0 – Continuous Active Learning (7 tasks)
TAR 3.0 – Cluster-Centric CAL (8 tasks)

The complete list of processes and tasks can be found here. While every organization has a different approach to eDiscovery, many have room for improvement, especially when it comes to exercising due diligence during each process. Rob provides a comprehensive list of tasks within eDiscovery processes that could help organizations identify steps they could be missing in their processes.

So, what do you think? How many steps do you have in your eDiscovery processes? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

February 18, 2020

In the case In re Mercedes-Benz Emissions Litig., No. 2:16-cv-881 (KM) (ESK) (D.N.J. Jan. 9, 2020), Special Master Dennis Cavanaugh (U.S.D.J., Ret.) issued an order and opinion stating that he would not compel defendants to use technology assisted review (TAR), and instead adopted the search term protocol negotiated by the parties, with three areas of dispute resolved by his ruling.

Case Background

In this emissions test class action involving an automobile manufacturer, the plaintiffs proposed that the defendants use predictive coding/TAR, asserting that TAR yields significantly better results than either traditional human “eyes on” review of the full data set or the use of search terms. The plaintiffs also argued that if the Court were to decline to compel the defendants to adopt TAR, the Court should enter its proposed Search Term Protocol.

The defendants argued that there is no authority for imposing TAR on an objecting party and that this case presented a number of unique issues that would make developing an appropriate and effective seed set challenging, such as language and translation issues, unique acronyms and identifiers, redacted documents, and technical documents. As a result, they contended that they should be permitted to utilize their preferred custodian-and-search term approach.

Judge’s Ruling

Citing Rio Tinto Plc v. Vale S.A., Special Master Cavanaugh quoted from that case in stating: “While ‘the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it’…, no court has ordered a party to engage in TAR over the objection of that party. The few courts that have considered this issue have all declined to compel predictive coding.” Citing Hyles v. New York City (another case ruling by now retired New York Magistrate Judge Andrew J. Peck), Special Master Cavanaugh stated: “Despite the fact that it is widely recognized that ‘TAR is cheaper, more efficient and superior to keyword searching’…, courts also recognize that responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for producing their own electronically stored information.”

As a result, Special Master Cavanaugh ruled: “While the Special Master believes TAR would likely be a more cost effective and efficient methodology for identifying responsive documents, Defendants may evaluate and decide for themselves the appropriate technology for producing their ESI. Therefore, the Special Master will not order Defendants to utilize TAR at this time. However, Defendants are cautioned that the Special Master will not look favorably on any future arguments related to burden of discovery requests, specifically cost and proportionality, when Defendants have chosen to utilize the custodian-and-search term approach despite wide acceptance that TAR is cheaper, more efficient and superior to keyword searching. Additionally, the denial of Plaintiffs’ request to compel Defendants to utilize TAR is without prejudice to revisiting this issue if Plaintiffs contend that Defendants’ actual production is deficient.”

Special Master Cavanaugh also ruled on areas of dispute regarding the proposed Search Term Protocol, as follows:

Validation: Special Master Cavanaugh noted that “the parties have been able to reach agreement on the terms of Defendants’ validation process, [but] the parties are at an impasse regarding the level of validation of Plaintiffs’ search term results”, observing that “Plaintiffs’ proposal does not articulate how it will perform appropriate sampling and quality control measures to achieve the appropriate level of validation.” As a result, Special Master Cavanaugh, while encouraging the parties to work together to develop a reasonable procedure for the validation of Plaintiffs’ search terms, ruled: “As no articulable alternative process has been proposed by Plaintiffs, the Special Master will adopt Defendants’ protocol to the extent that it will require the parties, at Defendants’ request, to meet and confer concerning the application of validation procedures described in paragraph 12(a) to Plaintiffs, if the parties are unable to agree to a procedure.”
Known Responsive Documents & Discrete Collections: The defendants objected to the plaintiffs’ protocol to require the production of all documents and ESI “known” to be responsive as “vague, exceedingly burdensome, and provides no clear standard for the court to administer or the parties to apply”. The defendants also objected to the plaintiffs’ request for “folders or collections of information that are known to contain documents likely to be responsive to a discovery request” as “overly broad and flouts the requirement that discovery be proportional to the needs of the case.” Noting that “Defendants already agreed to produce materials that are known to be responsive at the November status conference”, Special Master Cavanaugh decided to “modify the Search Term Protocol to require production of materials that are ‘reasonably known’ to be responsive.” He also decided to require the parties to collect folders or collections of information “to the extent it is reasonably known to the producing party”, also requiring “the parties to meet and confer if a party believes a discrete document folder or collection of information that is relevant to a claim or defense is too voluminous to make review of each document proportional to the needs of the case.”

So, what do you think? Should a decision not to use TAR negatively impact a party’s ability to make burden of discovery arguments? Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Related to this topic, Rob Robinson’s Complex Discovery site published its Predictive Coding Technologies and Protocols Spring 2020 Survey results last week, which (as always) provides results on most often used primary predictive coding platforms and technologies, as well as most-often used TAR protocols and areas where TAR is most used (among other results). You can check it out at the link directly above.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiffs’ Failure to “Hurry” Leads to Denial of Motion to Compel: eDiscovery Case Law

January 23, 2020

Sorry, I couldn’t resist… ;o)

In Hurry Family Revocable Trust, et al. v. Frankel, No. 8:18-cv-2869-T-33CPT (M.D. Fla. Jan. 14, 2020), the Florida District Court judge denied the Plaintiffs’ Motion to Compel Production of Documents and Request for Sanctions, ruling the motion to be untimely, given that the extended discovery deadline had passed, and also rejected the plaintiffs’ argument that the defendant had willfully avoided producing certain emails.

Case Background

In this dispute involving a former employee of the plaintiffs and claims that he used their confidential information and trade secrets, the Court entered a Case Management and Scheduling Order (CMSO) in January 2019 establishing various deadlines, including a discovery deadline of July 26, 2019, and a trial date of February 3, 2020. The CMSO warned the parties that “[t]he Court may deny as untimely all motions to compel filed after the discovery deadline.” In May 2019, the plaintiffs filed a motion to modify the CMSO and the Court extended the discovery deadline to August 9, 2019, but also cautioned the parties, however, that it would “be disinclined to extend…the [discovery] deadline[ ] further.” Nonetheless, the plaintiffs sought to modify the CMSO two more times – the second time after the discovery deadline on August 12, 2019 – but the court denied both motions, stating after the second one:

“The Court has already extended the discovery deadline in this case to August 9, 2019, at the Plaintiffs’ request. The Court has also repeatedly warned Plaintiffs that it would be disinclined to extend deadlines further. Yet Plaintiffs filed this third motion to modify the Case Management and Scheduling Order on August 12, 2019, after the extended discovery deadline had passed…..As for the documents that Plaintiffs claim Defendant has failed to produce, Plaintiffs were aware of those missing documents since August 6 and/or 7, 2019, and failed to file a motion to compel prior to the discovery deadline. As the Court advised in its Case Management and Scheduling Order, ‘[f]ailure to complete discovery within the time established by this Order shall not constitute cause for a continuance.’”

Roughly four months after the Court’s August 20 Order, the plaintiffs filed an instant motion to compel after the plaintiffs received five emails from third parties that were not produced by the plaintiff. The plaintiff requested an order directing that: (1) the defendant’s “email accounts, cloud storage, and digital devices” be subjected to a “third party search” for responsive documents at his expense; (2) “[Frankel] be precluded from testifying or offering evidence on issues related to categories of discovery withheld by [Frankel];” and (3) “adverse inferences be made against [Frankel] related to categories of discovery withheld by [Frankel].”

Judge’s Ruling

Noting that “Hurry waited to submit the instant motion until four months after the discovery deadline and only two months before trial”, the court stated: “Hurry’s proffered excuse for this extended delay is unpersuasive. When pressed on the matter at the hearing, Hurry conceded that it knew about the Koonce and FINRA emails by no later than early August 2019. It also admitted that it elected to place the instant motion on the ‘backburner’ while it dealt with its motion for summary judgment. Hurry’s evident lack of diligence in pursuing its motion to compel alone is fatal to that request.”

Continuing, the court stated: “Even were that not the case, Hurry has not shown that it is entitled to the relief it seeks. The central premise of its motion is that Frankel willfully avoided producing the Koonce and FINRA emails. In both his response and at the hearing, however, Frankel persuasively argued that his failure to produce these emails was not purposeful, but stemmed from the fact that the emails were not detected during the search Frankel conducted in connection with Hurry’s production requests. Frankel also noted he informed Hurry of the parameters of that search in advance, and Hurry did not object to those parameters.”

So, what do you think? Should identification of new emails from third parties justify re-opening discovery? Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

It’s a Mistake to Ignore the Mistakes: eDiscovery Throwback Thursdays

October 17, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

These posts were originally published on September 22, 2010 and September 23, 2010, the third and fourth days of eDiscovery Daily’s existence (yep, I combined two into one for this throwback edition and updated the misspelling site resources to replace a couple of defunct ones with current ones). It continues to amaze me how some of the attorneys I work with fail to account for potential misspellings and typos in ESI when developing a search strategy (especially requesting parties looking to maximize recall of potentially responsive ESI). FWIW, since publishing these two blog posts, I had an actual case where searching for misspellings of the word “management” yielded additional responsive documents, so it works. Enjoy!

How many times have you received an email sent to “All Employees” like this? “I am pleased to announce that Joe Smith has been promoted to the position of Operations Manger.”

Do you cringe when you see an email like that? I do. I cringe even more when an email like that comes from me, which happens more often than I’d like to admit.

Of course, we all make mistakes. And, forgetting that fact can be costly when searching for, or requesting, relevant documents in eDiscovery. For example, if you’re searching for e-mails that relate to management decisions, can you be certain that “management” is spelled perfectly throughout the collection? Unlikely. It could be spelled “managment” or “mangement” and you would miss those potentially critical emails without an effective plan to look for them.

Finding Misspellings Using Fuzzy Searching

How do you find them if you don’t know how they might be misspelled? Pretty much any eDiscovery application these days (including CloudNine products), support the ability to perform fuzzy searching. So, if you’re looking for someone named “Brian”, you can find variations such as “Bryan” or even “brain” – that could be relevant but were simply misspelled. Fuzzy searching is the best way to broaden your search to include potential misspellings.

Examples of Sites for Common Misspellings

However, another way to identify misspellings is to use a resource that tracks the most typical misspellings for common words and search for them explicitly. The advantage of that is that you can pinpoint the likeliest misspellings while excluding other hits retrieved via fuzzy search that might be other terms altogether. Here are a few sites you can check for common misspellings:

At Dumbtionary.com, you can check words against a list of over 10,000 misspelled words. Simply type the correct word into the search box with a “plus” before it (e.g., “+management”) to get the common misspellings for that word. You can also search for misspelled names and places.

Wikipedia has a list of common misspellings as well. It breaks the list down by starting letter, as well as variations on 0-9 (e.g., “3pm” or “3 pm”). You can go to the starting letter you want to search, then do a “find” on the page (by pressing Ctrl+F) and type in the string to search.

McMillan Dictionary, EnglishCLUB and ESL-Lounge also provide lists of commonly misspelled words, 50, 100 and 118 commonly misspelled words respectively. There are numerous sites out there with common misspellings available and they are usually listed alphabetically so that you can quickly check for common misspellings of potential search terms you plan to use.

So, what do you think? Do you have any real-world examples of how fuzzy searching or searches for misspelled words have aided in eDiscovery search and retrieval? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Don’t Get “Wild” with Wildcards: eDiscovery Throwback Thursdays

September 19, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on September 20, 2010 – which was the day eDiscovery Daily was launched! We launched that day with an announcement post, this post and our first case law post where Judge Paul Grimm actually ordered the defendant to be imprisoned for up to two years or until he paid the plaintiff “the attorney’s fees and costs that will be awarded to Plaintiff as the prevailing party pursuant to Fed. R. Civ. P. 37(b)(2)(C).” (Spoiler alert – the defendant didn’t ultimately go to jail, but was ordered to pay over 1 million dollars to the plaintiff)…

Even before the 2015 Federal Rules changes, we didn’t see any other cases where the parties were threatened with jail time. But I personally have seen several instances where parties still want to get “wild” with wildcards. We even covered a case where the parties negotiated terms that included the wildcard for “app*” because they were looking for phone applications or apps (an even more extreme example than the one I detail below). Check it out too. And, enjoy this one as well! It’s as relevant today as it was (almost) nine years ago!

A while ago, I provided search strategy assistance to a client that had already agreed upon several searches with opposing counsel. One search related to mining activities, so the attorney decided to use a wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining”.

That one search retrieved over 300,000 files with hits.

Why? Because there are 269 words in the English language that begin with the letters “min”. Words like “mink”, “mind”, “mint” and “minion” were all being retrieved in this search for files related to “mining”. We ultimately had to go back to opposing counsel and negotiate a revised search that was more appropriate.

How do you ensure that you’re retrieving all variations of your search term?

Stem Searches

One way to capture the variations is with stem searching. Applications that support stem searching give you an ability to enter the root word (e.g., mine) and it will locate that word and its variations. Stem searching provides the ability to find all variations of a word without having to use wildcards.

Other Methods

If your application doesn’t support stem searches, Morewords.com shows list of words that begin with your search string (e.g., to get all 269 words beginning with “min”, go here – simply substitute any characters for “min” to see the words that start with those characters). Choose the variations you want and incorporate them into the search instead of the wildcard – i.e., use “(mine or “mines or mining)” instead of “min*” to retrieve a more relevant result set.

Many applications let you preview the wildcard variations you wish to use before running them. For example, our CloudNine Review^™ solution (shameless plug warning!) performs a preview when you start to type in a search term to show you words within the collection that begin with that string. As a result, you can identify an overbroad term before you agree to it.

So, what do you think? Have you ever been “burned” by wildcard searching? Do you have any other suggested methods for effectively handling them? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Searching for Email Addresses Can Have Lots of Permutations Too: eDiscovery Throwback Thursdays

September 12, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on November 15, 2012 – when eDiscovery Daily was early into its third year of existence and continues the two-part series we started last week. Email addresses still provide the same opportunities and challenges for identifying documents associated with individuals that they did nearly seven years ago. Enjoy!

Last week, we discussed the various permutations of names of individuals to include in your searching for a more complete result set, as well as the benefits of proximity searching (broader than a phrase search, more precise than an AND search) to search for names of individuals. Another way to identify documents associated with individuals is through their email addresses.

Variations of Email Addresses within a Domain

You may be planning to search for an individual based on their name and the email domain of their company (e.g., daustin@cloudnine.com), but that’s not always inclusive of all possible email addresses for that individual. Email addresses for an individual’s domain might appear to be straightforward, but there might be aliases or other variations to search for to retrieve emails to and from that individual at that domain. For example, here are three of the email addresses to which I can receive email as a member of CloudNine:

daustin@cloudnine.com;
daustin@cloudnine.com (the domain we used before we acquired the rights to the cloudnine.com domain);
support@cloudnine.com (an Outlook contact group which includes people within our company that provide support to our CloudNine Review application)

To retrieve all of the emails to and from me, you would have to include all of the above addresses (and others too). There are other variations you may need to account for, as well. Here are a couple:

Jim Smith[/O=FIRST ORGANIZATION/OU=EXCHANGE ADMINISTRATIVE GROUP (GZEJCPIG34TQEMU)/CN=RECIPIENTS/CN=JimSmith] (legacy Exchange distinguished name from old versions of Microsoft Exchange);
IMCEANOTES-Andy+20Zipper_Corp_Enron+40ECT@ENRON.com (an internal Lotus Notes representation of an email address from the Enron Data Set).

As you can see, email addresses from the business domain can be represented several different ways, so it’s important to account for that in your searching for emails for your key individuals.

Personal Email Addresses

Raise your hand if you’ve ever sent any emails from your personal email account(s) through the business domain, even if it’s to remind you of something. I suspect most of your hands are raised – I know mine is. Identifying personal email accounts for key individuals can be important for two reasons: 1) those emails within your collection may also be relevant and, 2) you may have to request additional emails from the personal email addresses in discovery if it can be demonstrated that those accounts contain relevant emails.

Searching for Email Addresses

To find all of the relevant email addresses (including the personal ones), you may need to perform searches of the email fields for variations of the person’s name. So, for example, to find emails for “Jim Smith”, you may need to find occurrences of “Jim”, “James”, “Jimmy”, “JT” and “Smith” within the “To”, “From”, “Cc” and “Bcc” fields. Then, you have to go through the list and identify the email addresses that appear to be those for Jim Smith. Any email addresses for which you’re not sure whether they belong to the individual or not (e.g., does jsmith1963@gmail.com belong to Jim Smith or Joe Smith?), you may need to retrieve and examine some of the emails to make that determination. If he uses nicknames for his personal email addresses (e.g., huggybear2012@msn.com), you should hopefully be able to identify those through emails that he sends to his business account.

To summarize, searching by email address is another way to identify documents pertaining to a key individual. The key is making sure your search includes all the email addresses possible for that individual.

So, what do you think? How do you handle searching for key individuals within your document collections? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

September 11, 2019

So many topics, so little time! Rob Robinson published the latest Predictive Coding and Technologies and Protocols Survey on his excellent ComplexDiscovery site last week, but this is the first chance I’ve had to cover it. The results are in and here are some of the findings in the largest response group for this survey yet.

As Rob notes in the results post here, the third Predictive Coding Technologies and Protocols Survey was initiated on August 23 and concluded on September 5 with individuals invited to participate directly by ComplexDiscovery and indirectly by industry website, blog, and newsletter mentions – including a big assist from the Association of Certified E-Discovery Specialists (ACEDS). It’s a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies and protocols from data discovery and legal discovery professionals within the eDiscovery ecosystem. The survey was designed to provide a general understanding of predictive coding technologies and protocols and had two primary educational objectives:

To provide a consolidated listing of potential predictive coding technology and protocol definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
To ask eDiscovery ecosystem professionals about their usage and preferences of predictive coding platforms, technologies, and protocols.

There were 100 total respondents in the survey (a nice, round number!). Here are some of the more notable results:

39 percent of responders were from law firms, 37 percent of responders were from software or services provider organizations, and the remaining 24 percent of responders were either part of a consultancy (12 percent), a corporation (6 percent), the government (3 percent), or another type of entity (3 percent).
86 percent of responders shared that they did have a specific primary platform for predictive coding versus 14 percent who indicated they did not.
There were 31 different platforms noted as primary predictive platforms by responders, nine of which received more than one vote and they accounted for more than three-quarters of responses (76 percent).
Active Learning was the most used predictive coding technology, with 86 percent reporting that they use it in their predictive coding efforts.
Just over half (51 percent) of responders reported using only one predictive coding technology in their predictive coding efforts.
Continuous Active Learning (CAL) was (by far) the most used predictive coding protocol, with 82 percent reporting that they use it in their predictive coding efforts.
Maybe the most interesting stat: 91 percent of responders reported using technology-assisted review in more than one area of data and legal discovery. So, the uses of TAR are certainly expanding!

Rob has reported several other results and provided graphs for additional details. To check out all of the results, click here. Want to compare to the previous two surveys? They’re here and here. :o)

So, what do you think? Do any of the results surprise you? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

What’s in a Name? Potentially, a Lot of Permutations: eDiscovery Throwback Thursdays

September 5, 2019

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on November 13, 2012 – when eDiscovery Daily was early into its third year of existence. Back then, the use of predictive coding instead of keyword searching was very uncommon as we had just had our first case (Da Silva Moore) approving the use of technology assisted review earlier in the year. Now, the use of predictive coding technologies and approaches are much more common, but many (if not most) attorneys still use keyword searching for most cases. With that in mind, let’s talk about considerations for searching names – they’re still valid close to seven years later! Enjoy!

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think. There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents. Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

“Jim Smith”
“Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase. For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search. Proximity searching is simply looking for two or more words that appear close to each other in the document. A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”. Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name. What are other variations of the name “Jim”? How about “James” or “Jimmy”? Or even “Jimbo”? I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends. Also, some documents may refer to him by his initials – i.e., “J.T. Smith”. All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial. If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 smith, where “w/3” is “within 3 words of”. This is the syntax you would use to perform the search in our CloudNine Review platform.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”? Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock). Don’t get wild with wildcards! Using the specific variations you want (e.g., “jim OR jimmy”) is usually best.

Next week, we will talk about another way to retrieve documents that mention key individuals – through their email addresses. Same bat time, same bat channel!

So, what do you think? How do you handle searching for key individuals within your document collections? Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Searching

TIFFs, PDFs, or Neither: How to Select the Best Production Format

Working with CloudNine Explore and PST Attachments

Here’s a Terrific Listing of eDiscovery Workstream Processes and Tasks: eDiscovery Best Practices

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

Plaintiffs’ Failure to “Hurry” Leads to Denial of Motion to Compel: eDiscovery Case Law

It’s a Mistake to Ignore the Mistakes: eDiscovery Throwback Thursdays

Don’t Get “Wild” with Wildcards: eDiscovery Throwback Thursdays

Searching for Email Addresses Can Have Lots of Permutations Too: eDiscovery Throwback Thursdays

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

What’s in a Name? Potentially, a Lot of Permutations: eDiscovery Throwback Thursdays

Status: Updated