Electronic Discovery Archives

eDiscovery Searching: For Defensible Searching, Be a "STARR"

January 24, 2011

Defensible searching has become a priority in eDiscovery as parties in several cases have experienced significant consequences (including sanctions) for not implementing a defensible search strategy in responding to discovery requests.

Probably the most famous case where search approach has been an issue was Victor Stanley, Inc. v. Creative Pipe , Inc., 250 F.R.D. 251 (D. Md. 2008), where Judge Paul Grimm noted that “only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents” and found that privilege on 165 inadvertently produced documents was waived, in part, because of the inadequacy of the search approach.

A defensible search strategy is part using an effective tool (with advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and part using an effective approach to test and verify search results.

I have an acronym that I use to reflect the defensible search process. I call it “STARR” – as in “STAR” with an extra “R” or Green Bay Packer football legend Bart Starr (sorry, Bears fans!). For each search that you need to conduct, here’s how it goes:

Search: Construct the best search you can to maximize recall and precision for the desired result. An effective tool gives you more options for constructing a more effective search, which should help in maximizing recall and precision. For example, as noted on this blog a few days ago, a proximity search can, under the right circumstances, provide a more precise search result without sacrificing recall.
Test: Once you’ve conducted the search, it’s important to test two datasets to determine the effectiveness of the search:
- Result Set: Test the result set by randomly selecting an appropriate sample percentage of the files and reviewing those to determine their responsiveness to the intent of the search. The appropriate percentage of files to be reviewed depends on the size of the result set – the smaller the set, the higher percentage of it that should be reviewed.
- Files Not Retrieved: While testing the result set is important, it is also important to randomly select an appropriate sample percentage of the files that were not retrieved in the search and review those as well to see if any responsive hits are identified as missed by the search.
Analyze: Analyze the results of the random sample testing of both the result set and also the files not retrieved to determine how effective the search was in retrieving mostly responsive files and whether any responsive files were identified as missed by the search performed.
Revise: If the search retrieved a low percentage of responsive files and retrieved a high percentage of non-responsive files, then precision of the search may need to be improved. If the files not retrieved contained any responsive files, then recall of the search may need to be improved. Evaluate the results and see what, if any, revisions can be made to the search to improve precision and/or recall.
Repeat: Once you’ve identified revisions you can make to your search, repeat the process. Search, Test, Analyze and (if necessary) Revise the search again until the precision and recall of the search is maximized to the extent possible.

While you can’t guarantee that you will retrieve all of the responsive files or eliminate all of the non-responsive ones, a defensible approach to get as close as you can to that goal will minimize the number of files for review, potentially saving considerable costs and making you a “STARR” in the courtroom when defending your search approach.

So, what do you think? Are you a “STARR” when it comes to defensible searching? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Metadata Mining Ethics

January 21, 2011

Years ago, I put together a CLE course about metadata awareness and how hidden data (such as tracked changes and comments) can cause embarrassment or even inadvertent disclosures in eDiscovery. The production of metadata with ESI continues to be a big issue in eDiscovery and organizations need to consider how to handle that metadata (especially if it’s hidden), to avoid issues.

For those who don’t know, metadata can be simply defined as “data about data”, which is to say it’s the data that describes each file and includes information such as when it was created, when it was last modified and who last modified it. Metadata can often be used in identifying responsive files based on time frame (of creation or last editing) or other criteria.

Many types of files can contain other hidden metadata, such as a record the changes made to a file, who made those changes, and any comments that those parties may have also added (for example, Microsoft Word has Tracked Changes and Comments that aid in collaboration to obtain feedback from one or multiple parties regarding the content of the document). Embedded objects can also be hidden, for example, depending on how you embed an Excel table into a Word document; the entire Excel file may be accessible within the document, even though only a small part of it is displayed.

Last fall, the American Bar Association published an article with a look at metadata ethics opinions, which was also recently referenced in this article. The opinions issued to date have focused on three topics with regard to metadata production:

The sender's responsibility when transmitting or producing electronic files;
The recipient's right to examine (or "mine") files for metadata; and
The recipient's duty to notify the sender if sensitive data is discovered.

Sender’s Responsibility

Jurisdictions agree that an attorney sending or producing ESI has a duty to exercise caution to avoid inadvertently disclosing confidential information, though the level of caution required may vary depending upon the jurisdiction and situation. In SBA Ethics Opinion 07-03, the State Bar of Arizona's Ethics Committee indicated that level of caution may depend upon "the sensitivity of the information, the potential consequences of its inadvertent disclosure, whether further disclosure is restricted by statute, protective order, or confidentiality agreement, and any special instructions given by the client."

Ignorance of technology is no excuse. The Colorado Bar Association Ethics Committee states that attorneys cannot limit their duty "by remaining ignorant of technology relating to metadata or failing to obtain competent computer support." (CBA Ethics Opinion 119).

Recipient’s Right to Examine

There is less jurisdictional agreement here. Colorado, Washington D.C. and West Virginia allow metadata mining unless the recipient is aware that the data was sent unintentionally. On the other hand, New York and Maine prohibit metadata mining – the New York State Bar Association's Committee on Professional Ethics based its decision in part on the "strong public policy in favor of protecting attorney-client confidentiality." (NYSBA Opinion 749). Minnesota and Pennsylvania have not set a bright-line rule, stating that the decision to allow or prohibit metadata mining should depend on the case.

Recipient’s Duty to Notify

Most jurisdictions rely on their local variation of ABA Model Rule of Professional Conduct 4.4(b), which indicates that an attorney who receives confidential data inadvertently sent is obligated to notify the sender. Maryland is one exception to that position, stating that "the receiving attorney can, and probably should, communicate with his or her client concerning the pros and cons of whether to notify the sending attorney." (MSBA Ethics Docket 2007-09).

Bottom Line

You may not be able to control what a recipient can do with your inadvertently produced metadata, but you can take steps to avoid the inadvertent production in the first place. Office 2007 and greater has a built in Document Inspector that eliminates the hidden metadata in Office files, while publishing files to PDF will remove some metadata (the amount of metadata removed depends on the settings). You can also use a metadata “scrubber” application such as Workshare Protect or Metadata Assistant to remove the metadata – most of these will even integrate with email so that you have the option to “scrub” the file before sending.

So, what do you think? Have you been “stung” by hidden metadata? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: Privilege Waived for Produced Servers

January 20, 2011

If you were at the International Legal Technology Association (ILTA) trade show this past August, you may have noticed a huge unfinished building in the middle of the strip – the Fontainebleau Resort. It sits idle after financing was pulled, forcing Fontainebleau Las Vegas LLC to file for Chapter 11 bankruptcy in June of 2009. Naturally, lawsuits followed, between the Term Lenders and Fontainebleau Resort, LLC (FRLLC), the third party parent of Fontainebleau Las Vegas – In re Fontainebleau Las Vegas Contract Litig., (S.D. Fla. Jan 7, 2011)

A company that responded to a third party subpoena and court orders compelling production by handing over three servers to lenders without conducting any relevancy review and without reviewing two of the servers for privileged materials waived privilege for documents on the two servers that were not reviewed.

The parent company of a resort in bankruptcy proceedings was served by lenders to the resort with a subpoena for production of documents. The company did not object to the scope of the subpoena, and the court granted a motion of the lenders to compel production. Counsel for the company then halted work by an e-discovery vendor who had completed screening the company’s email server for responsive documents but had not started a privilege review because of concerns that the company could not pay for the services. Counsel for the company also sought to withdraw from the case, but the company was unable to find new counsel.

Rather than seeking a stay or challenging discovery rulings from the court, the company turned over data from a document server, an accounting server, and an email server. According to the court, the three servers were turned over to the lenders without any meaningful review for relevancy or responsiveness. Despite an agreement with the lenders on search terms for the email server, the company produced a 126 gigabyte disk with 700,000 emails from that server and then, without asking for leave of court, was late in producing a privilege log for data on the email server. The lenders sought direction from the court on waiver of privilege and their obligation if they found privileged materials in the data produced by the company. The company for the first time then raised objections to the burdensomeness of the original subpoena served over six months earlier given the company’s lack of resources or employees to conduct a document review.

The court held that the company “waived the attorney-client privilege and work product protection, and any other applicable privileges, for the materials it produced from two of three computer servers in what can fairly be described as a data dump as part of a significantly tardy response to a subpoena and to court-ordered production deadlines.” The court stated that in effect, the company “took the two servers, which it never reviewed for privilege or responsiveness, and said to the Term Lenders ‘here, you go figure it out.’”

However, because the company prepared a privilege log for the email server, the court added that privileges were not waived for materials from the email server. Also, the lenders were directed to alert the company to any “clearly privileged material they may find during their review of the production on the documents and accounting servers.” Although the court was not ruling on admissibility at trial of that privileged material, the lenders would be allowed to use it during pre-trial preparations, including depositions.

So, what do you think? Was justice served? Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required). For eDiscovery news and best practices, check out the Applied Discovery Blog here.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Managing an eDiscovery Contract Review Team: Drafting Privileged Criteria

January 19, 2011

Yesterday, we covered drafting criteria for responsiveness. You may, however, be asking the review team to do more than identify responsive documents. You might, for example, also ask them to identify privileged documents, significant documents, documents that need to be redacted, documents that need to be reviewed by an expert, and so on. In this issue, we’ll talk about reviewing for privilege.

First, let’s clarify what you’ll be asking the review team to do. If you are using a team of contract reviewers, it is unlikely that you’ll be asking them to make privilege decisions. You might, however, ask them to identify and flag potentially privileged documents. Under this approach, attorneys on your team who can make privilege decisions would do a subsequent review of the potentially privileged documents. That’s when privilege decisions will be made.

Of course, you’ll need to give the contract team criteria for potentially privileged materials. Consider including these information points and instructions in the criteria:

The names and initials of individual attorneys, both outside counsel and corporate in-house attorneys.
The names and initials of legal assistants and other litigation team members of outside counsel and the corporate legal department (work done by these individuals under the direction of counsel may be privileged).
The names of law firms that have served as outside counsel.
Documents on law firm letterhead.
Documents stamped “Work Product”, “Attorney Client”, “Privileged” or other designations indicating confidentiality re litigation.
Legal documents such as pleadings or briefs in draft form.
Handwritten annotations on documents that may be authored by counsel or litigation team members under the direction of counsel.
Subject areas of privileged communication.

In addition, provide instructions for documents that will not be privileged. In every collection, there will be certain types of documents that won’t be privileged unless they bear privileged annotations. Examples are published literature, press releases, advertisements, corporate annual reports, brochures, user manuals… in short, any documents that are public in nature. These materials won’t be privileged unless they bear privileged annotations. Likewise, most document collections will include internal documents that will fall into the same category. Examples may be insurance policies, invoices, manufacturing reports, and so on. Create a list of these documents and include them in the criteria instructions.

Have you drafted criteria for a privilege review of a large collection? How did you approach it and how well did it work? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

Managing an eDiscovery Contract Review Team: Drafting Responsive Criteria – a Step-by-Step Guide

January 18, 2011

The criteria that you prepare for the review will be governed by the objectives that you established for the review. At a minimum, you’ll draft criteria for responsive documents. In addition, you may draft criteria for privileged documents, hot documents, and so on. Let’s start with drafting responsive criteria. For this step, you’ll need the request for production and the notes that you took when you sampled the document collection.

For each separate point on the request for production, do the following:

Expand on the definition. Make it clearer and more detailed. Make sure that the language you use is understandable to lay people.
List topic areas that are likely to appear in responsive documents. Make sure these topic areas are objective in nature and that they minimize the need for judgment. For example, don’t include criteria like “documents that demonstrate negligence in operations”. Rather, break this down into real-life objective examples like “documents that discuss accidents”, “documents that discuss poor employee performance” and so on. Use real examples from the documents – examples that you came across during your sampling of the collection.
List date ranges of responsive materials.
Based on your review of the documents, list as many examples as you can of document types that are responsive, and attach examples to the criteria.
Based on your review of the documents, include as many examples as you can of responsive text.

Several members of the litigation team should review the draft criteria. Once all suggestions for modifications and additions are agreed upon, put the criteria in “final” form – “final” meaning the document that you will use at the start of the review project. As you go move forward, update the criteria with more examples and clearer definitions as you learn more about the collection.

In the next issue, we’ll cover criteria for other review objectives you might have established (for example, you might be screening for privilege or significance).

Have you drafted criteria for a document review of a large collection? How did you approach it and how well did it work? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Best Practices: EDRM Data Set for Great Test Data

January 17, 2011

In it’s almost six years of existence, the Electronic Discovery Reference Model (EDRM) Project has implemented a number of mechanisms to standardize the practice of eDiscovery. Having worked on the EDRM Metrics project for the past four years, I have seen some of those mechanisms implemented firsthand.

One of the most significant recent accomplishments by EDRM is the EDRM Data Set. Anyone who works with eDiscovery applications and processes understands the importance to be able to test those applications in as many ways as possible using realistic data that will illustrate expected results. The use of test data is extremely useful in crafting a defensible discovery approach, by enabling you to determine the expected results within those applications and processes before using them with your organization’s live data. It can also help you identify potential anomalies (those never occur, right?) up front so that you can be proactive to develop an approach to address those anomalies before encountering them in your own data.

Using public domain data from Enron Corporation (originating from the Federal Energy Regulatory Commission Enron Investigation), the EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) to test those eDiscovery applications and processes. In 2009, the EDRM Data Set project released its first version of the Enron Data Set, comprised of Enron e-mail messages and attachments within Outlook PST files, organized in 32 zipped files.

This past November, the EDRM Data Set project launched Version 2 of the EDRM Enron Email Data Set. Straight from the press release announcing the launch, here are some of the improvements in the newest version:

Larger Data Set: Contains 1,227,255 emails with 493,384 attachments (included in the emails) covering 151 custodians;
Rich Metadata: Includes threading information, tracking IDs, and general Internet headers;
Multiple Email Formats: Provision of both full and de-duplicated email in PST, MIME and EDRM XML, which allows organizations to test and compare results across formats.

The Text REtrieval Conference (TREC) Legal Track project provided input for this version of the data set, which, as noted previously on this blog, has used the EDRM data set for its research. Kudos to John Wang, Project Lead for the EDRM Data Set Project and Product Manager at ZL Technologies, Inc., and the rest of the Data Set team for such an extensive test set collection!

So, what do you think? Do you use the EDRM Data Set for testing your eDiscovery processes? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

January 14, 2011

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation. Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things). However, few of them are developed with litigation retrieval in mind. Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.). You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.). You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other. A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”. Why? Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits. And, that’s where proximity searching comes in. Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order. Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How? We wound up taking a two-step approach to get the collection to a more likely responsive set. First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results. After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other. Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result? We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set. And, we probably saved $50,000 to $100,000 in review costs – on a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits. For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”. One character difference is all it takes for a phrase search to miss a potentially responsive file. With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think? Do you use proximity searching in your culling for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Managing an eDiscovery Contract Review Team: First Steps in Drafting Criteria

January 13, 2011

In theory, responsive documents are described in the other side’s request for production. In practice, those requests are often open to interpretation. Your goal in drafting responsive criteria is to distill those requests and create a clear set of objective rules that leave little room for interpretation – a set of rules that can be applied correctly and consistently to the document collection. This step is important for a couple of reasons:

It is difficult to get consistent results from a group of people doing the same task. No two people will make exactly the same decision about every document – not even attorneys. Even an individual attorney will not always make the same decision about duplicates of the same document. Thorough, clear, detailed and objective criteria will minimize inconsistencies.
If discovery disputes arise, it may be necessary to demonstrate a good-faith effort. Thorough, detailed criteria will help. Judges understand the human error factor. They are less tolerant of work that was approached casually or sloppily. Clear, detailed criteria will demonstrate a carefully thought-out approach.

Where do you start? First, do a little preparation. There are some basic materials and information that you’ll need:

The complaint.
The request for production
Knowledge of the document collection (in the last blog in this series, we talked about sampling the collection).
Knowledge of the strategy for defending or prosecuting the case.

Once you’ve read the complaint and the document request and you’ve sampled the collection, you’ll have a feel for the materials that reviewers are likely to see and how those documents relate to the facts and legal issues in the case. If a strategy for defending or prosecuting the case has been developed, make sure you understand that strategy. It is likely that an understanding of the allegations and the strategy will broaden your view of what is responsive and important.

After these preparation steps, you’ll be ready to develop a first draft of the criteria. In the next issue, we’ll talk about how to structure and write effective criteria.

Managing an eDiscovery Contract Review Team: Get a Handle on the Document Collection

January 12, 2011

Once you’ve defined the objectives of the review, you need to move forward with other preparation steps: You need to draft review criteria, you need to identify the type of people that are appropriate for the review (do you need a staff of attorneys? lay people? staff with expertise in a specific subject matter?), and you need to pull that team together.

Before moving forward with these steps, you need a bit more information. You need to know what’s in the document collection. You need to know what types of documents are in the collection and you need to know what type of content is in the documents. Once you’ve got a handle on the collection, you’ll be in a better position to make decisions on subsequent steps.

Start by interviewing custodians. You don’t need to talk to every custodian, but talk to a representative sample. For example, if you are collecting documents from a corporate client, speak to at least one person from each department from which you’ve collected documents. The person you speak to should probably be a manager or someone who has a good handle on the overall operation of the department. Find out about the department’s operations and determine its role in the events that are at issue in the case. Ask about the types of documents that are generated and retained. Information that you glean here will help in the next step: sampling the collection.

After you’ve collected information from the custodians, take a look at the documents. Review a representative sample. Look at documents from each custodian. Take notes on what you are finding and make copies of documents that can be used as examples to illustrate the criteria you’ll be drafting and to be used in training.

Your ultimate goal is to develop a set of objective rules that a well-trained staff can apply effectively and consistently to the collection during the review. The more you learn about the documents in advance, the better you’ll be able to do that. So spend the time up front learning what you can about what’s in your document collection.

Do you typically sample an eDiscovery document collection before a review? How did you approach it? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: Sanctions Down in 2010 — at least thru December 1

January 11, 2011

Recently, this blog cited a Duke Law Journal study that indicated that eDiscovery sanctions were at an all-time high through 2009. Then, a couple of weeks ago, I saw a story recently from Williams Mullen recapping the 2010 year in eDiscovery. It provides a very thorough recap including 2010 trends in sanctions (identifying several cases where sanctions were at issue), advances made during the year in cooperation and proportionality, challenges associated with privacy concerns in foreign jurisdictions and trends in litigation dealing with social media. It’s a very comprehensive summary of the year in eDiscovery.

One noteworthy finding is that, according to the report, sanctions were sought and awarded in fewer cases in 2010. Some notable stats from the report:

There were 208 eDiscovery opinions in 2009 versus 209 through December 1, 2010;
Out of 209 cases with eDiscovery opinions in 2010, sanctions were sought in 79 of them (38%) and awarded in 49 (62% of those cases, and 23% of all eDiscovery cases).
Compare that with 2009 when sanctions were sought in 42% of eDiscovery cases and were awarded in 70% of the cases in which they were requested (30% of all eDiscovery cases).
While overall requests for sanctions decreased, motions to compel more than doubled in 2010, being filed in 43% of all e-discovery cases, compared to 20% in 2009.
Costs and fees were by far the most common sanction, being awarded in 60% of the cases involving sanctions.
However, there was a decline in each type of sanction as costs and fees (from 33 to 29 total sanctions), adverse inference (13 to 7), terminating (10 to 7), additional discovery (10 to 6) and preclusion (5 to 3) sanctions all declined.

The date of this report was December 17, and the report noted a total of 209 eDiscovery cases as of December 1, 2010. So, final tallies for the year were not yet tabulated. It will be interesting to see if the trend in decline of sanctions held true once the entire year is considered.

So, what do you think? Is this a significant indication that more organizations are getting a handle on their eDiscovery obligations – or just a “blip in the radar”? Please share any comments you might have or if you’d like to know more about a particular topic.

Electronic Discovery