Project Management

Now You Can Weigh in on Principles and Guidelines for Developing and Implementing a Sound eDiscovery Process: eDiscovery Best Practices

The Sedona Conference® has had a busy summer (yes, it’s still technically summer).  Last month, they finalized their guide for “possession, custody, or control” as it’s used in Federal Rules 34 and 45 and also issued a Public Comment Version of a new TAR Case Law Primer.  Now, they have also issued a Public Comment Version of a new Principles and Guidelines for Developing and Implementing a Sound E-Discovery Process, which is a project of its Working Group on Electronic Document Retention and Production (WG1).

As noted in the Preface, the Commentary “represents the culmination of five years of spirited dialogue within WG1 on a number of sensitive topics that go to the heart of what it means to be a competent advocate and officer of the court in an age of increasing technological complexity. It addresses the tension between the principle of party-controlled discovery, and the need for accountability in the discovery process, by establishing a series of reasonable expectations and by providing practical guidance to meet these competing interests. The overriding goal of the principles and guidelines set forth in this Commentary is to reduce the cost and burden typically associated with modern discovery by helping litigants prepare for – or better yet, avoid altogether – challenges to their chosen discovery processes, and by providing guidance to the courts in the (ideally) rare instances in which they are called upon to examine a party’s discovery conduct.”

The preliminary 55 page PDF guide includes an Introduction and the following 13 principles, which are:

  • Principle 1: An e-discovery process is not required to be perfect, or even the best available, but it should be reasonable under the circumstances. When evaluating the reasonableness of an e-discovery process, parties and the court should consider issues of proportionality, including the benefits and burdens of a particular process.
  • Principle 2: An e-discovery process should be developed and implemented by a responding party after reasonable due diligence, including consultation with persons with subject-matter expertise, and technical knowledge and competence.
  • Principle 3: Responding parties are best situated to evaluate and select the procedures, methodologies, and technologies for their e-discovery process
  • Principle 4: Parties may reduce or eliminate the likelihood of formal discovery or expensive and time consuming motion practice about an e-discovery process by conferring and exchanging non-privileged information about that process.
  • Principle 5: When developing and implementing an e-discovery process, a responding party should consider how it would demonstrate the reasonableness of its process if required to do so. Documentation of significant decisions made during e-discovery may be helpful in demonstrating that the process was reasonable.
  • Principle 6: An e-discovery process should include reasonable validation.
  • Principle 7: A reasonable e-discovery process may use search terms and other culling methods to remove ESI that is duplicative, cumulative, or not reasonably likely to contain information within the scope of discovery.
  • Principle 8: A review process can be reasonable even if it does not include manual review of all potentially responsive ESI.
  • Principle 9: Technology-assisted review should be held to the same standard of reasonableness as any other e-discovery process.
  • Principle 10: A party may use any reasonable process, including a technology-assisted process, to identify and withhold privileged or otherwise protected information. A party should not be required to use any process that does not adequately protect its rights to withhold privileged or otherwise protected information from production.
  • Principle 11: Whenever possible, a dispute about an e-discovery process should be timely resolved through informal mechanisms, such as mediation between the parties and conferences with the court, rather than through formal motion practice and hearings.
  • Principle 12: A party should not be required to provide discovery about its e-discovery process without good cause.
  • Principle 13: The court should not decide a motion regarding the adequacy of an e-discovery process without a sufficient factual record. In many instances, such a motion may not be ripe for determination before there has been substantial or complete production.

Principles 1 through 5 are General Principles, 6 through 10 are Specific Applications of the General Principles and 11 through 13 are principles related to Defending the E-Discovery Process.

As usual, the Commentary is free and you can download it here.  The Sedona Conference welcomes input on the Commentary through November 15, 2016. Questions and comments regarding the Primer may be sent to comments@sedonaconference.org.

So, what do you think?  Will these new principles help organizations implement a sound eDiscovery process?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Spring has Sprung! Don’t Plant – Build – a (Decision) Tree: eDiscovery Best Practices

Having recently needed to walk a client through a decision process to determine how to proceed to index and search a huge volume of data, it seems timely to revisit this topic.

When a new case is filed, there are several important decisions that the lead attorney has to make.  Those decisions that are made early in the life cycle of a case can significantly affect how discovery is managed and how costly the discovery process can be for that case.  Decision trees enable attorneys to work through the decision process up front to help them make sound, logical decisions which can lead to more efficient management of the discovery process.

What is a Decision Tree?

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences.  It is essentially a flowchart in which each internal node represents a test on an attribute, each branch represents outcome of that test and each leaf node represents the decision taken after computing all attributes.

If you have ever prepared an analysis at the outset of a case to estimate the probability of winning the case and determining whether to litigate or attempt to settle, you may have already prepared some sort of decision tree to make those decisions.  You probably looked at the probability of winning, probabilities of different award amounts, extrapolated the costs for litigating against the potential award amounts and used that to decide how to proceed.  The graphic above provides an example of what a decision tree, drawn as a flowchart, might look like to represent that process.

Uses of Decision Trees in Discovery

Decision trees identify the available alternatives to tackle a particular business problem and can help identify the conditions conducive to each alternative.  Issues in discovery for which a decision tree might be warranted could include:

  • Decide whether to outsource litigation support and discovery activities or keep them in-house;
  • Select an appropriate discovery solution to meet your organization’s needs within its budget;
  • Decide when to implement a litigation hold and determine how to comply with your organization’s ongoing duty to preserve data;
  • Determine how to manage collection procedures in discovery that identify the appropriate custodians for each type of case;
  • Decide whether to perform responsiveness and privilege review of native files or convert to an image format such as TIFF or PDF to support those review processes;
  • Determine whether to agree to produce native files or converted TIFF or PDF images to opposing counsel.

While they promote efficiency during the discovery process by promoting up front planning and walking through the logic of the decision making process, decision trees also reduce mistakes in the process by making the process more predictable and repeatable, promoting consistency in handling cases.  Once you have the decision process documented via a decision tree (and underlying assumptions don’t change), the plan of action will remain consistent.  If assumptions do change over time, your decision tree can evolve just like a real tree – adding or removing “branches” as needed to reflect the current decision making process.

So, what do you think?  Does your organization use decision trees in your discovery process?   Please share any comments you might have or if you’d like to know more about a particular topic.

Graphic source: Wikipedia.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here are Some Questions You Might Not Think to Ask Your Technology Provider: eDiscovery Best Practices

I love Rob Robinson’s Complex Discovery site.  Whether it’s information on eDiscovery provider mergers and acquisitions, a software and services mashup of the eDiscovery market, or links to many other useful resources, his is a site I check out pretty much daily.  His latest post discusses some questions that you might not think to ask your technology provider, because they might be “uncomfortable”.

In Six Uncomfortable Questions to Ask Your Technology Provider Immediately, Rob discusses failure to fully investigate a potential technology provider’s risk in relation to conflicts of interest, financial integrity, and adherence to the law when vetting those providers before entering into agreements.  As Rob notes, failure to ask questions like this could be due to not knowing what to ask, or it could be because those conversations are simply “uncomfortable”.

With that in mind, Rob identifies “six uncomfortable questions that all sourcing organizations should consider asking their technology providers today”.  They are:

  1. Is any member of your organization involved in any activity that may result in competing loyalties that could cause your organization to benefit at our expense?
  2. Is your organization prevented from engaging with any specific organization(s) by a contractual agreement, temporary restraining order, or a legal judgment?
  3. Has your organization withdrawn any publicly released announcements or materials because of the inability to substantiate claims?
  4. Does your organization have any unpaid federal, state, local or foreign income and employment taxes (as required) for the most recent three years of your organization’s existence?
  5. Is your organization involved in any current litigation or under the threat of potential litigation?
  6. Does your organization have any unsatisfied judgments?

Let’s face it, the quality of the technology offered by the provider may be moot if that provider has legal or financial issues or conflicts of interest that may affect the quality of the technology or services it provides.

Additional questions I like to ask relate to how long an organization has been in business and what is the average tenure of the key employees or executive team.  It’s good to know that you’re not dealing with a “fly by night” company that may not last.  While tenure and experience aren’t a guarantee for success, it certainly helps.  Have you looked at the standings in the National Football League lately?  It’s no accident that all of the undefeated teams have had the same coaching staffs for the last five to fifteen years (except one, and that team has Peyton Manning).  Experience makes a difference.

Rob’s article hits on a topic that people don’t talk much about.  If the provider is solid, those questions don’t have to be “uncomfortable”.  Thanks, Rob!

So, what do you think?  What “uncomfortable” questions do you ask your technology provider(s)?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Do You Test Your Search Terms Before Proposing Them to Opposing Counsel?: eDiscovery Best Practices

If you don’t, you should.  When litigation is anticipated, it’s never too early to begin collecting potentially responsive data and assessing it by performing searches and testing the results.  However, if you wait until after the meet and confer with opposing counsel, it can be too late.

On the very first day we introduced eDiscovery Daily, we discussed the danger of using wildcards in your searches (and how they can retrieve vastly different results than you intended).  Let me recap that example.

Several years ago, I provided search strategy assistance to a client that had already agreed upon several searches with opposing counsel.  One search related to mining activities, so the attorney decided to use a wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining”.

That one search retrieved over 300,000 files with hits.

Why?  Because there are 269 words in the English language that begin with the letters “min”.  Words like “mink”, “mind”, “mint” and “minion” were all being retrieved in this search for files related to “mining”.  We ultimately had to go back to opposing counsel and attempt to negotiate a revised search that was more appropriate.

What made that process difficult was the negotiation with opposing counsel.  My client had already agreed on over 200 terms with opposing counsel and had proposed many of those terms, including this one.  The attorneys had prepared these terms without assistance from a technology consultant (or “geek” if you prefer) – I was brought into the project after the terms were negotiated and agreed upon – and without testing any of the terms.

Since the terms had been agreed upon, opposing counsel was understandably resistant to modifying any of them.  It wasn’t their problem that my client faced having to review all of these files – it was my client’s proposed term that they now wanted to modify.  Fortunately, for this term, we were ultimately able to provide a clear indication that many of the retrieved documents in this search were non-responsive and were able to get opposing counsel to agree to a modified list of variations of “mine” that included “minable”, “minefield”, “minefields”, “miner” and “minings” (among other variations).  We were able to reduce the result set to less than 12,000 files with hits, saving our client a “mint”, which they certainly didn’t “mind” (because we were able to drop “mint” and “mind” and over 200 other words from the responsive hit list).

However, there were several other inefficient terms that opposing counsel refused to renegotiate and my client was forced to review thousands of additional files that they shouldn’t have had to review.  Had the client included a technical member on the team at the beginning and had they tested each of these searches before negotiating terms with opposing counsel, they would have been able to figure out which terms were overbroad and would have been able to determine more efficient search terms to propose, saving thousands of dollars in review costs.

So, what do you think?  Do you test your search terms before proposing them to opposing counsel?  If not, why not?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

If You Play “Tag” Too Often, You Might Find Yourself Playing “Hide and Seek”: eDiscovery Best Practices

If you’ve used any review tool, you’re familiar with the “tag” field to classify documents.  Whether classifying documents as responsive, non-responsive, privileged, or applicable to any of a number of issues, you’ve probably used a tag field to simply check a document to indicate that the associated characteristic of the document is “true”.  But, if you fall in love with the tag field too much, your database can become unmanageable and you may find yourself playing “hide and seek” to try to find the desired tag.

So, what is a “tag” field?

In databases such as SQL Server (which many review platforms use for managing the data associated with ESI being reviewed), a “tag” field is typically a “bit” field known as a yes/no boolean field (also known as true/false).  As a “bit” field, its valid values are 0 (false) and 1 (true).  In the review platform, the tag field is typically represented by a check box that can simply be clicked to check it as true (or click again to turn it back to false).  Easy, right?

One of the most popular features of CloudNine’s review platform (shameless plug warning!) is the ability for the users to create their own fields – as many as they want.  This can be useful for classifying documents in a variety of ways – in many cases, using the aforementioned “tag” field.  So, the user can create their fields and organize them in the order they want to make review more efficient.  Easy, right?

Sometimes, too much of a good thing can be a bad thing.

I have worked with some clients who have used tag fields to classify virtually everything they track within their collection – in some cases, to the extent where their field collections grew to over 200 data fields!!  Try finding the data field you need quickly when you have that many.  Not easy, right?  A couple of examples where use of the tag field was probably not the best choice:

  • Document Types: I have seen instances where clients have created a tag field for each type of document. So, instead of creating one text-based “DocType” field and populating it with the description of the type of document (e.g., Bank Statements, Correspondence, Reports, Tax Documents, etc.), the client created a tag field for each separate document type.  For clients who have identified 15-20 distinct document types (or more), it can become quite difficult to find the right tag to classify the type of document.
  • Account Numbers: Once again, instead of creating one text-based field for tracking key account numbers mentioned in a document, I have seen clients create a separate tag field for each key account number, which can drive the data field count up quite a bit.

Up front planning is one key to avoid “playing tag” too often.  Identify the classifications that you intend to track and look for common themes among larger numbers of classifications (e.g., document types, organizations mentioned, account numbers, etc.).  Develop an approach for standardizing descriptions for those within text-based fields (that can then effectively searched using “equal to” or “contains” searches, depending on what you’re trying to accomplish) and you can keep your data field count to a manageable level.  That will keep your game of “tag” from turning into “hide and seek”.

So, what do you think?  Have you worked with databases that have so many data fields that it becomes difficult to find the right field?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Got Problems with Your eDiscovery Processes? “SWOT” Them Away: eDiscovery Best Practices

Having recently helped a client put one of these together, it seemed appropriate to revisit this topic…

Understanding the internal and external challenges that your organization faces allows it to approach ongoing and future discovery more strategically.  A “SWOT” analysis is a tool that can be used to develop that understanding.

A “SWOT” analysis is a structured planning method used to evaluate the Strengths, Weaknesses, Opportunities, and Threats associated with a specific business objective.  That specific business objective can be a specific project or all of the activities of a business unit.  It involves specifying the objective of the specific business objective and identifying the internal and external factors that are favorable and unfavorable to achieving that objective.  The SWOT analysis is broken down as follows:

  • Strengths: characteristics of the business or project that give it an advantage over others;
  • Weaknesses: are characteristics that place the team at a disadvantage relative to others;
  • Opportunities: elements in the environment that the project could exploit to its advantage;
  • Threats: elements in the environment that could cause trouble for the business or project.

“SWOT”, get it?

From an eDiscovery perspective, a SWOT analysis enables you to take an objective look at how your organization handles discovery issues – what you do well and where you need to improve – and the external factors that can affect how your organization addresses its discovery challenges.  The SWOT analysis enables you to assess how your organization handles each phase of the discovery process – from Information Governance to Presentation – to evaluate where your strengths and weaknesses exist so that you can capitalize on your strengths and implement changes to address your weaknesses.

How solid is your information governance program?  How well does your legal department communicate with IT?  How well formalized is your coordination with outside counsel and vendors?  Do you have a formalized process for implementing and tracking litigation holds?  These are examples of questions you might ask about your organization and, based on the answers, identify your organization’s strengths and weaknesses in managing the discovery process.

However, if you only look within your organization, that’s only half the battle.  You also need to look at external factors and how they affect your organization in its handling of discovery issues.  Trends such as the growth of social media, and changes to state or federal rules addressing handling of electronically stored information (ESI) need to be considered in your organization’s strategic discovery plan.

Having worked through the strategic analysis process with several organizations over a number of years, I find that the SWOT analysis is a useful tool for summarizing where the organization currently stands with regard to managing discovery, which naturally identifies areas for improvement that can be addressed.

So, what do you think?  Has your organization performed a SWOT analysis of your discovery process?   Please share any comments you might have or if you’d like to know more about a particular topic.

Graphic source: Wikipedia.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Keyword Searching Isn’t Dead, If It’s Done Correctly: eDiscovery Best Practices

In the latest post of the Advanced Discovery blog, Tom O’Connor (who is an industry thought leader and has been a thought leader interviewee on this blog several times) posed an interesting question: Is Keyword Searching Dead?

In his post, Tom recapped the discussion of a session with the same name at the recent Today’s General Counsel Institute in New York City where Tom was a co-moderator of the session along with Maura Grossman, a recognized Technology Assisted Review (TAR) expert, who was recently appointed as Special Master in the Rio Tinto case.  Tom then went on to cover some of the arguments for and against keyword searching as discussed by the panelists and participants in the session, while also noting that numerous polls and client surveys show that the majority of people are NOT using TAR today.  So, they must be using keyword searching, right?

Should they be?  Is there still room for keyword searching in today’s eDiscovery landscape, given the advances that have been made in recent years in TAR technology?

There is, if it’s done correctly.  Tom quotes Maura in the article as stating that “TAR is a process, not a product.”  The same could be said for keyword searching.  If the process is flawed within which the keyword searches are being performed, you could either retrieve way more documents to be reviewed than necessary and drive up eDiscovery costs or leave yourself open to challenges in the courtroom regarding your approach.  Many lawyers at corporations and law firms identify search terms to be performed (and, in many cases, agree on those terms with opposing counsel) without any testing done to confirm the validity of those terms.

Way back in the first few months of this blog (over four years ago), I advocated an approach to searching that I called “STARR”Search, Test, Analyze, Revise (if necessary) and Repeat (also, if necessary).  With an effective platform (using advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and knowledge and experience of that platform and also knowledge of search best practices, you can start with a well-planned search that can be confirmed or adjusted using the “STARR” approach.

And, even when you’ve been searching databases for as long as I have (decades now), an effective process is key because you never know what you will find until you test the results.  The favorite example that I have used over recent years (and walked through in this earlier post) is the example where I was doing work for a petroleum (oil) company looking for documents that related to “oil rights” and retrieved almost every published and copyrighted document in the oil company with a search of “oil AND rights”.  Why?  Because almost every published and copyrighted document in the oil company had the phrase “All Rights Reserved”.  Testing and an iterative process eventually enabled me to find the search that offered the best balance of recall and precision.

Like TAR, keyword searching is a process, not a product.  And, you can quote me on that.  (-:

So, what do you think?  Is keyword searching dead?  And, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Pitfalls Associated with Self-Collection of Data by Custodians: eDiscovery Best Practices

In a prior article, we covered the Burd v. Ford Motor Co. case where the court granted the plaintiff’s motion for a deposition of a Rule 30(b)(6) witness on the defendant’s search and collection methodology involving self-collection of responsive documents by custodians based on search instructions provided by counsel.  In light of that case and a recent client experience of mine, I thought it would be appropriate to revisit this topic that we addressed a couple of years ago.

I’ve worked with a number of attorneys who have turned over the collection of potentially responsive files to the individual custodians of those files, or to someone in the organization responsible for collecting those files (typically, an IT person).  Self-collection by custodians, unless managed closely, can be a wildly inconsistent process (at best).  In some cases, those attorneys have instructed those individuals to perform various searches to turn “self-collection” into “self-culling”.  Self-culling can cause at least two issues:

  1. You have to go back to the custodians and repeat the process if additional search terms are identified.
  2. Potentially responsive image-only files will be missed with self-culling.

It’s not uncommon for additional searches to be required over the course of a case, even when search terms are agreed to by the parties up front (search terms are frequently renegotiated), so the self-culling process has to be repeated when new or modified terms are identified.

It’s also common to have a number of image-only files within any collection, especially if the custodians frequently scan executed documents or use fax software to receive documents from other parties.  In some cases, image-only PDF or TIFF files can often make up as much as 20% of the collection.  When custodians are asked to perform “self-culling” by performing their own searches of their data, these files will typically be missed.

For these reasons, I usually advise against self-culling by custodians in litigation.  I also typically don’t recommend that the organization’s internal IT department perform self-culling either, unless they have the capability to process that data to identify image-only files and perform Optical Character Recognition (OCR) on them to capture text.  If your IT department doesn’t have the capabilities and experience to do so (which includes a well-documented process and chain of custody), it’s generally best to collect all potentially responsive files from the custodians and turn them over to a qualified eDiscovery provider to perform the culling.  Most qualified eDiscovery providers, including (shameless plug warning!) CloudNine™, perform OCR as needed to include image-only files in the resulting potentially responsive document set before culling.  With the full data set available, there is also no need to go back to the custodians to perform additional searches to collect additional data (unless, of course, the case requires supplemental productions).


Most organizations that have their custodians perform self-collection of files for eDiscovery probably don’t expect that they will have to explain that process to the court.  Ford sure didn’t.  If your organization plans to have its custodians self-collect, you’d better be prepared to explain that process, which includes discussing your approach for handling image-only files.

So, what do you think?  Do you self-collect data for discovery purposes?  If so, how do you account for image-only files?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control, Making Sure the Numbers Add Up: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

Friday, we wrote about tracking file counts from collection to production, the concept of expanded file counts, and the categorization of files during processing.  Today, let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We discussed the typical categories of excluded files after processing – obviously, what’s not excluded is available for searching and review.  Even if your approach includes technology assisted review (TAR) as part of your methodology, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged.  Privileged documents are also often classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log.  Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail based example.  We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case.  The relevant time period for the case is January 1, 2013 through December 31, 2014.  Other than date range, we plan to do no other filtering of files during processing.  Identified duplicates will not be reviewed or produced.  We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged.  Here’s what this collection might look like:

  • Collected Files: After expansion and processing, 7 PST files expand to 101,852 eMails and attachments.
  • Filtered Files: Filtering eMails outside of the relevant date range eliminates 23,564
  • Remaining Files after Filtering: After filtering, there are 78,288 files to be processed.
  • NIST/System Files: eMail collections typically don’t have NIST or system files, so we’ll assume zero (0) files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
  • Exception Files: Let’s assume that a little less than 1% of the collection (912) is exception files like password protected, corrupted or empty files.
  • Duplicate Files: It’s fairly common for approximately 30% or more of the collection to include duplicates, so we’ll assume 24,215 files here.
  • Remaining Files after Processing: We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
  • Files Culled During Searching: If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618.
  • Remaining Files for Review: After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
  • Files Tagged as Non-Responsive: If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such.
  • Remaining Files Tagged as Responsive: After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
  • Responsive Files Tagged as Privileged: If roughly 8% of the responsive documents are determined to be privileged during review, that would be 842 privileged documents.
  • Produced Files: After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them.  The key is to note the numbers in red above.  Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals.  What happens if you add the counts in red together?  You should get 101,852 – the number of collected files after expanding the PST files.  As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery.  That’s the way it should be.  If not, investigation is required to determine where files were missed.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control By The Numbers: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

A while back, we wrote about Quality Assurance (QA) and Quality Control (QC) in the eDiscovery process.  Both are important in improving the quality of work product and making the eDiscovery process more defensible overall.  With regard to QC, an overall QC mechanism is tracking of document counts through the discovery process, especially from collection to production, to identify how every collected file was handled and why each non-produced document was not produced.

Expanded File Counts

Scanned counts of files collected are not the same as expanded file counts.  There are certain container file types, like Outlook PST files and ZIP archives that exist essentially to store a collection of other files.  So, the count that is important to track is the “expanded” file count after processing, which includes all of the files contained within the container files.  So, in a simple scenario where you collect Outlook PST files from seven custodians, the actual number of documents (emails and attachments) within those PST files could be in the tens of thousands.  That’s the starting count that matters if your goal is to account for every document or file in the discovery process.

Categorization of Files During Processing

Of course, not every document gets reviewed or even included in the search process.  During processing, files are usually categorized, with some categories of files usually being set aside and excluded from review.  Here are some typical categories of excluded files in most collections:

  • Filtered Files: Some files may be collected, and then filtered during processing. A common filter for the file collection is the relevant date range of the case.  If you’re collecting custodians’ source PST files, those may include messages outside the relevant date range; if so, those messages may need to be filtered out of the review set.  Files may also be filtered based on type of file or other reasons for exclusion.
  • NIST and System Files: Many file collections also contain system files, like executable files (EXEs) or Dynamic Link Library (DLLs) that are part of the software on a computer which do not contain client data, so those are typically excluded from the review set. NIST files are included on the National Institute of Standards and Technology list of files that are known to have no evidentiary value, so any files in the collection matching those on the list are “De-NISTed”.
  • Exception Files: These are files that cannot be processed or indexed, for whatever reason. For example, they may be password-protected or corrupted.  Just because these files cannot be processed doesn’t mean they can be ignored, depending on your agreement with opposing counsel, you may need to at least provide a list of them on an exception log to prove they were addressed, if not attempt to repair them or make them accessible (BTW, it’s good to establish that agreement for disposition of exception files up front).
  • Duplicate Files: During processing, files that are exact duplicates may be put aside to avoid redundant review (and potential inconsistencies). Some exact duplicates are typically identified based on the HASH value, which is a digital fingerprint generated based on the content and format of the file – if two files have the same HASH value, they have the same exact content and format.  Emails (and their attachments) may be identified as duplicates based on key metadata fields, so an attachment cannot be “de-duped” out of the collection by a standalone copy of the same file.

All of these categories of excluded files can reduce the set of files to actually be searched and reviewed.  On Monday, we’ll illustrate an example of a file set from collection to production to illustrate how each file is accounted for during the discovery process.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.