Analysis Archives

Good Processing Requires a Sound Process – Best of eDiscovery Daily

October 3, 2014

Home at last! Today, we are recovering from our trip, after arriving back home one day late and without our luggage. Satan, thy name is Lufthansa! Anyway, for these past two weeks except for Jane Gennarelli’s Throwback Thursday series, we have been re-publishing some of our more popular and frequently referenced posts. Today’s post is a topic that comes up often with our clients. Enjoy! New posts next week!

As we discussed Wednesday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading. To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production. As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing. Examples of information we collect from our clients:

Do you need de-duplication? If so, should it performed at the case or the custodian level?
Should Outlook emails be extracted in MSG or HTM format?
What time zone should we use for email extraction? Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT). If you don’t think that matters, consider this example.
Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text? If we don’t OCR those files, these could be responsive files that are missed during searching.
If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
Should the collection be culled based on a responsive date range?
Should the collection be culled based on key terms?

Those are some general examples for native processing. If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing. Some examples:

Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
Should color images be created when appropriate?
Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
Should we create images of Excel files? If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
Should we endorse the images? If so, how?

Those are just some examples. Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use. Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data. Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think? How do you handle processing of electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – Best of eDiscovery Daily

October 1, 2014

Come fly with me! Today we are winding our way back home from Paris, by way of Frankfurt. For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts. Today’s post is a topic that relates to a question that I get asked often. Enjoy!

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started. Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited. Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.). It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is. They don’t just represent one problem, they can represent a whole collection of problems. For example:

Image only electronic files such as TIFF or image-only PDF files may be electronic, but they still have no searchable text. They still require OCR to generate searchable text to enable them to be effectively searched. It’s important to account for image-only files when self-collecting as keyword searches will miss these files.
Outlook Emails are typically stored in a “container” file like an EDB (Exchange Database), OST (Outlook Offline Storage Table) or PST (Outlook Personal Storage Table). To work with the emails individually, they typically require processing to break them out into individual MSG (Outlook MSG Files). That processing is also necessary to break out the attachments from the emails so that they can be reviewed or categorized individually, if required. And, if the emails are stored in Lotus Notes, there is no equivalent single message format, so those emails generally require conversion to HTML format during processing.
Databases are large, structured collections of data, but they don’t relate easily to a document format, so they require some analysis to determine if, and in what form, they should be produced.
In almost every collection, there are some files that cannot be processed or searched. Corrupt files, password protected files and other types of exception files are frequent components of your ESI collection and it can become very expensive to make these files searchable or reviewable.

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward. Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom. They’re way more useful than paper documents. So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available. But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think? Have you encountered difficulties or challenges when processing electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Is Technology Assisted Review Older than the US Government? – eDiscovery Trends

September 19, 2014

A lot of people consider Technology Assisted Review (TAR) and Predictive Coding (PC) to be new technology. We attempted to debunk that as myth last year after our third annual thought leader interview series by summarizing comments from some of the thought leaders that noted that TAR and PC really just apply artificial intelligence to the review process. But, the foundation for TAR may go way farther back than you might think.

In the BIA blog, Technology Assisted Review: It’s not as new as you think it is, Robin Athlyn Thompson and Brian Schrader take a look at the origins of at least one theory behind TAR. Called the “Naive Bayes classifier”, it’s based on theorems that were essentially introduced to the public in 1812. But, the theorems existed quite a bit earlier than that.

Bayes’s theorem is named after Rev. Thomas Bayes (who died in 1761), who first showed how to use new evidence to update beliefs. He lived so long ago, that there is no known widely accepted portrait of him. His friend Richard Price edited and presented this work in 1763, after Bayes’s death, as An Essay towards solving a Problem in the Doctrine of Chances. Bayes’ algorithm remained unknown until it was independently rediscovered and further developed by Pierre-Simon Laplace, who first published the modern formulation in his 1812 Théorie analytique des probabilities (Analytic theory of probabilities).

Thompson and Schrader go on to discuss more recent uses of artificial intelligence algorithms to map trends, including Amazon’s More Like This functionality that Amazon uses to recommend other items that you may like, based on previous purchases. That technology has been around for nearly two decades – can you believe it’s been that long? – and is one of the key factors for Amazon’s success over that time.

So, don’t scoff at the use of TAR because it’s “new technology”, that thinking is “naïve”. Some of the foundation statistical theories for TAR go further back than the birth of our country.

So, what do you think? Has your organization used technology assisted review on a case yet? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Though it was “Switching Horses in Midstream”, Court Approves Plaintiff’s Predictive Coding Plan – eDiscovery Case Law

September 8, 2014

In Bridgestone Americas Inc. v. Int’l Bus. Mach. Corp., No. 3:13-1196 (M.D. Tenn. July 22, 2014), Tennessee Magistrate Judge Joe B. Brown, acknowledging that he was “allowing Plaintiff to switch horses in midstream”, nonetheless ruled that that the plaintiff could use predictive coding to search documents for discovery, even though keyword search had already been performed.

In this case where the plaintiff sued the defendant for a $75 million computer system that it claimed threw its “entire business operation into chaos”, the plaintiff requested that the court allow the use of predictive coding in reviewing over two million documents. The defendant objected, noting that the request was an unwarranted change to the original case management order that did not include predictive coding, and that it would be unfair to use predictive coding after an initial screening had been done with keyword search terms.

Judge Brown conducted a lengthy telephone conference with the parties on June 25 and, began the analysis in his order by observing that “[p]redictive coding is a rapidly developing field in which the Sedona Conference has devoted a good deal of time and effort to, and has provided various best practices suggestions”, also noting that “Magistrate Judge Peck has written an excellent article on the subject and has issued opinions concerning predictive coding.” “In the final analysis”, Judge Brown continued, “the uses of predictive coding is a judgment call, hopefully keeping in mind the exhortation of Rule 26 that discovery be tailored by the court to be as efficient and cost-effective as possible.”

As a result, noting that “we are talking about millions of documents to be reviewed with costs likewise in the millions”, Judge Brown permitted the plaintiff “to use predictive coding on the documents that they have presently identified, based on the search terms Defendant provided.” Judge Brown acknowledged that he was “allowing Plaintiff to switch horses in midstream”, so “openness and transparency in what Plaintiff is doing will be of critical importance.”

This case has similar circumstances to Progressive Cas. Ins. Co. v. Delaney, where that plaintiff also desired to shift from the agreed upon discovery methodology for privilege review to a predictive coding methodology. However, in that case, the plaintiff did not consult with either the court or the requesting party regarding their intentions to change review methodology and the plaintiff’s lack of transparency and lack of cooperation resulted in the plaintiff being ordered to produce documents according to the agreed upon methodology. It pays to cooperate!

So, what do you think? Should the plaintiff have been allowed to shift from the agreed upon methodology or did the volume of the collection warrant the switch? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Our 1,000th Post! – eDiscovery Milestones

September 3, 2014

When we launched nearly four years ago on September 20, 2010, our goal was to be a daily resource for eDiscovery news and analysis. Now, after doing so each business day (except for one), I’m happy to announce that today is our 1,000th post on eDiscovery Daily!

We’ve covered the gamut in eDiscovery, from case law to industry trends to best practices. Here are some of the categories that we’ve covered and the number of posts (to date) for each:

Case Law (326 posts), including those dealing with Sanctions (151)
Searching (238)
Proportionality (140)
Law Firm Departments (115)
Project Management (102)
Outsourcing (97)
Social Media (95)
Federal Discovery Rules (68)
SaaS Based Technologies (65)
State Discovery Rules (35)

We’ve also covered every phase of the EDRM (177) life cycle, including:

Every post we have published is still available on the site for your reference, which has made eDiscovery Daily into quite a knowledgebase! We’re quite proud of that.

Comparing our first three months of existence to now, we have seen traffic on our site grow an amazing 474%! Our subscriber base has more than tripled in the last three years! We want to take this time to thank you, our readers and subcribers, for making that happen. Thanks for making the eDiscoveryDaily blog a regular resource for your eDiscovery news and analysis! We really appreciate the support!

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan University, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!). We really appreciate it!

I also want to extend a special thanks to Jane Gennarelli, who has provided some serial topics, ranging from project management to coordinating review teams to what litigation support and discovery used to be like back in the 80’s (to which some of us “old timers” can relate). Her contributions are always well received and appreciated by the readers – and also especially by me, since I get a day off!

We always end each post with a request: “Please share any comments you might have or if you’d like to know more about a particular topic.” And, we mean it. We want to cover the topics you want to hear about, so please let us know.

Tomorrow, we’ll be back with a new, original post. In the meantime, feel free to click on any of the links above and peruse some of our 999 previous posts. Now is your chance to catch up! 😉

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Sides with Defendant in Dispute over Predictive Coding that Plaintiff Requested – eDiscovery Case Law

August 29, 2014

In the case In re Bridgepoint Educ., Inc., Securities Litigation, 12cv1737 JM (JLB) (S.D. Cal. Aug. 6, 2014), California Magistrate Judge Jill L. Burkhardt ruled that expanding the scope of discovery by nine months was unduly burdensome, despite the plaintiff’s request for the defendant to use predictive coding to fulfill its discovery obligation and also approved the defendants’ method of using search terms to identify responsive documents for the already reviewed three individual defendants, directing the parties to meet and confer regarding the additional search terms the plaintiffs requested.

In this case involving several discovery disputes, a telephonic discovery conference was held in the instant action on June 27, during which, the Court issued oral orders on three of four discovery disputes. As to the remaining dispute, the Court requested supplemental briefings from both parties and issued a ruling in this order, along with formalizing the remaining orders.

The unresolved discovery dispute concerned the plaintiffs’ “request for discovery extending beyond the time frame that Defendants have agreed to” for an additional nine months. In their briefing, the defendants (based on the production efforts to date) claimed that expanding the scope of discovery by nine months would increase their review costs by 26% or $390,000. The plaintiffs’ reply brief argued that the defendants’ estimate reflected the cost of manual review rather than the predictive coding system that the defendants would use – according to the plaintiffs, the cost of predictive coding was the only cost relevant to the defendants’ burden, estimating the additional burden to be roughly $11,279.

Per the Court’s request, the defendants submitted a reply brief addressing the arguments raised by the plaintiffs, arguing that predictive coding software “does not make manual review for relevance merely elective”. The defendants argued that the software only assigns a percentage estimate to each document that reflects the assessment of the probability that the document is relevant, but the software is not foolproof and that attorney review is still required to ensure that the documents produced are both relevant and not privileged.

Judge Burkhardt, citing the “proportionality” rule of Federal Rule of Civil Procedure Rule 26(b)(2)(C), denied expanding the scope of discovery by nine months, finding that “Defendants have set forth sufficient evidence to conclude that the additional production would be unduly burdensome”.

The plaintiffs, claiming that the defendants “unilaterally-selected search terms” to identify the original production, also argued discovery produced from three Individual Defendants should be added to the Defendants’ predictive coding software. But, Judge Burkhardt, formalizing the oral order, stated “[t]he Court approved Defendants’ method of using linear screening with the aid of search terms to identify responsive documents with regard to the emails already reviewed for the three Individual Defendants. The parties were directed to meet and confer regarding the additional search terms Plaintiffs would like Defendants to use.”

So, what do you think? Was the additional discovery scope unduly burdensome or did the plaintiff have a point about reduced discovery costs? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will resume posts on Tuesday. Happy Labor Day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules in Dispute Between Parties Regarding ESI Protocol, Suggests Predictive Coding – eDiscovery Case Law

July 3, 2014

In a dispute over ESI protocols in FDIC v. Bowden, CV413-245 (S.D. Ga. June 6, 2014), Georgia Magistrate Judge G. R. Smith approved the ESI protocol from the FDIC and suggested the parties consider the use of predictive coding.

After FDIC-insured Darby Bank & Trust Co. failed in November 2010, the FDIC took over as receiver (as FDIC-R) and brought a bank mismanagement case against sixteen of Darby’s former directors and officers. Thus far, the parties had been unable to agree on a Joint Protocol for Electronically Stored Information (ESI) and the dispute ultimately reached the court. The FDIC-R had already spent $614,000 to digitally scan about “2.01 terabytes of data or 153.6 million pages” of data at the bank, but the defendants insisted that the FDIC-R shoulder the burden and expense of reviewing the documents and determining their responsiveness to the claims “”[e]ven though the Bank’s documents were created under Defendants’ custody and control”.

The defendants also argued for a protocol which involved the FDIC-R to “repeatedly search, review, and re-review myriad ‘second-run’ (Phase II) documents, then turn over to them the documents relevant to both claims and defenses that arise in this litigation. The FDIC-R argued for a protocol in which it would produce “categories of documents most likely to contain relevant information” which the defendants could then search, claiming that protocol would be the more “correct allocation of discovery burdens between the parties.” The defendants contended that “search terms alone won’t suffice” and the FDIC-R’s proposed protocol does not relieve the receiver of its Rule 34 burden to “locate and produce responsive documents.”

After reviewing the two proposed protocols, Judge Smith ruled that “given the common ground between the dueling protocols here, the FDIC-R’s ESI protocol will be implemented, as modified by the FDIC-R’s ‘briefing concessions’…as well as by the additional guidance set forth in this Order.” Those briefing concessions included “offering to open ‘all of the Bank’s former documents . . . [so defendants can retrieve them] to the same extent that the FDIC-R can’” and “offering, in ‘Phase II’ of the disclosure process, to ‘meet and confer with Defendants to reach agreement upon a set of reasonable search terms to run across the database of sources of the ESI to identify documents for production’”. In approving the FDIC-R’s protocol, Judge Smith stated that “the FDIC-R may meaningfully deploy suitable search terms to satisfy its initial disclosure requirements and respond to forthcoming Rule 34 document requests”.

Also, referencing the DaSilva Moore decision of 2012, Judge Smith stated that “the parties shall consider the use of predictive coding” if ESI protocol disagreements persisted noting that it “has emerged as a far more accurate means of producing responsive ESI in discovery”.

So, what do you think? Should organizations bear the bulk of the discovery burden in cases against individual defendants? Or should the burden be balanced between both parties? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will resume posts on Monday, July 7. Happy Birthday America!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules that Unilateral Predictive Coding is Not Progressive – eDiscovery Case Law

June 9, 2014

In Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678-LRH-PAL (D. Nev. May 19, 2014), Nevada Magistrate Judge Peggy A. Leen determined that the plaintiff’s unannounced shift from the agreed upon discovery methodology, to a predictive coding methodology for privilege review was not cooperative. Therefore, the plaintiff was ordered to produce documents that met agreed-upon search terms without conducting a privilege review first.

This declaratory relief action had been plagued by delays in discovery production, which led to the defendants filing a Motion to Compel the plaintiffs to produce discovery in a timely fashion. Following a hearing, both sides were ordered to meet and confer, and hold meaningful discussions about resolving outstanding ESI issues pursuant to discovery. The plaintiff contended that the defendant’s discovery requests, as standing, would require them to produce approximately 1.8 million documents, which would be unduly burdensome. Both parties agreed to search terms that would reduce the number of potentially responsive documents to around 565,000, which the plaintiff would manually review for privileged documents before producing discovery to the defendant.

Shortly thereafter, the plaintiff determined that manual review would be too expensive and time-consuming, and therefore after consulting with a “nationally-recognized authority on eDiscovery,” elected to apply predictive coding to the identified 565,000 documents. Plaintiff selected a software program that they began using to identify relevant documents with the intention of applying a further predictive coding layer in order to determine which documents were “more likely privileged” and which were “less likely privileged.”

However, the plaintiff did not consult with either the court or the requesting party regarding their intentions to change review methodology. As a result, the defendant objected to the use of predictive coding in this case for several reasons, including the plaintiff’s lack of transparency surrounding its predictive coding methodology and its failure to cooperate, as well as the plaintiff’s failure to adhere to the best practices for the chosen software program which were recommended to them by the authority they chose. Finally, the defendants cited a likelihood of satellite disputes revolving around discovery, should the plaintiff proceed with the current predictive coding, which would further delay production discovery that had already been “stalled for many months.”

The defendant requested that either the plaintiff be required to proceed with predictive coding according to the defendant’s suggested protocol, which would include applying the predictive methodology to all of the originally collected 1.8 million documents, or that the plaintiff produce the non-privileged keyword hits without any review, but allowing them to be subject to a clawback order—which was a second option included in the originally stipulated ESI protocol that both parties had agreed to. Although this option would shift the burden of discovery to the defendant, it was noted that the defendant was “committed to devot[ing] the resources required to review the documents as expeditiously as possible” in order to allow discovery to move forward.

Judge Leen acknowledged potential support for the general methodology of predictive coding in eDiscovery, and stated that a “transparent mutually agreed upon” protocol for such a method would likely have been approved. However, Judge Leen took issue that the plaintiff had refused to “engage in the type of cooperation and transparency that its own eDiscovery consultant has so comprehensibly and persuasively explained is needed for a predictive coding protocol to be accepted by the court or opposing counsel” and instead had “elected and then abandoned the second option—to manually review and produce responsive ESI documents. It abandoned the option it selected unilaterally, without the [defendant’s] acquiescence or the court’s approval and modification of the parties’ stipulated ESI protocol.”

Therefore, Judge Leen elected to enforce the second option described in the agreed-upon ESI protocol, and required the plaintiff to produce all 565,000 documents that matched the stipulated search terms without review, with a clawback option in place for privileged documents as well as permission to apply privilege filters to the documents at issue, and withhold those documents that returned as “most likely privileged.”

So, what do you think? Should parties need to obtain approval regarding the review methodology that they plan to use? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Predictive Analytics: It’s Not Just for Review Anymore – eDiscovery Trends

April 29, 2014

One of the most frequently discussed trends in this year’s annual thought leader interviews that we conducted was the application of analytics (including predictive analytics) to Information Governance. A recent report published in the Richmond Journal of Law & Technology addresses how analytics can be used to optimize Information Governance.

Written by Bennett B. Borden & Jason R. Baron (who was one of our thought leaders discussing that very topic), Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice, 20 RICH. J.L. & TECH. 7 (2014) is written for those who are not necessarily experts in the field. It provides a synopsis of why and how predictive coding first emerged in eDiscovery and defines important terms related to the topic, then discusses aspects of an information governance program where application of predictive coding and related analytical techniques is most useful. Most notably, the authors provide a few “early” examples of the use of advanced analytics, like predictive coding, for non-litigation contexts to illustrate the possibilities for applying the technology. Here is a high-level breakdown of the report:

Introduction (pages 1-3): Provides a high-level introduction of the topics to be discussed.

A. The Path to Da Silva Moore (pages 3-14): Provides important background to the evolution of managing electronically stored information (ESI) and predictive coding (fittingly, it begins with the words “In the beginning”). Starting on page 9, the authors discuss “The Da Silva Moore Precedent”, providing a detailed account of the Da Silva Moore case (our post here summarizes our coverage of the case) and also references other cases, as well: In re Actos (Pioglitazone) Products Liability Litigation, Global Aerospace Inc., et al, v. Landow Aviation, L.P., Kleen Products v. Packaging Corp. of America, EORHB, Inc. v. HOA Holdings and In Re: Biomet M2a Magnum Hip Implant Products Liability Litigation. Clearly, the past couple of years have provided several precedents for the use of predictive coding in litigation.

B. Information Governance and Analytics in the Era of Big Data (pages 15-20): This section provides definitions and important context for terms such as “big data”, “analytics” and “Information Governance”. It’s important to have the background on these concepts before launching into how analytics can be applied to optimize Information Governance.

C. Applying the Lessons of E-Discovery In Using Analytics for Optimal Information Governance: Some Examples (pages 21-31): With the background of sections A and B under your belt, the heart of the report then gets into the actual application of analytics in different scenarios, using “True Life Examples” that are “’ripped from’ the pages of the author’s legal experience, without embellishment”. These examples where analytics are used include:

A corporate client is being sued by a former employee in a whistleblower qui tam action;
A highly regulated manufacturing client decided to outsource the function of safety testing some of its products and a director of the department whose function was being outsourced, despite being offered a generous severance package, demanded four times the severance amount and threatened to go to the company’s regulator with a list of ten supposed major violations that he described in the email if he did not receive what he was asking for.
A major company received a whistleblower letter from a reputable third party alleging that several senior personnel were involved with an elaborate kickback scheme that also involved FCPA violations.
An acquisition agreement between parties contained a provision such that if the disclosures made by the target were found to be off by a certain margin within thirty days of the acquisition, the purchase price would be adjusted.

In each case, the use of analytics either resulted in a quick settlement, proved the alleged violations to be unfounded, or resulted in an appropriate adjustment in the purchase price of the acquired company. These real world examples truly illustrate how analytics can be applied beyond the document review stage of eDiscovery.

Conclusion (pages 31-32): While noting that the authors’ intent was to “merely scratch the surface” of the topic, they offer some predictions for the end of the decade and note “expected demand on the part of corporate clients for lawyers to be familiar with state of the art practices in the information governance space”. In other words, your clients are going to expect you to understand this.

The report is an easy read, even for novices to the technology, and is a must-read for anyone looking to understand more about applying analytics to Information Governance. Bennett and Jason are both with Drinker Biddle & Reath LLP and are also co-chairs of the Information Governance Initiative (here is our recent blog post about IGI).

So, what do you think? Has your organization applied analytics to big data to reduce or eliminate litigation costs? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Searching for Individuals Isn’t as Straightforward as You Think – eDiscovery Best Practices

April 10, 2014

I’ve recently worked with a couple of clients who proposed search terms for key individuals that were a bit limited, so I thought this was an appropriate topic to revisit.

When looking for documents in your collection that mention key individuals, conducting a name search for those individuals isn’t always as straightforward as you might think. There are potentially a number of different ways names could be represented and if you don’t account for each one of them, you might fail to retrieve key responsive documents – OR retrieve way too many non-responsive documents. Here are some considerations for conducting name searches.

The Ever-Limited Phrase Search vs. Proximity Searching

Routinely, when clients give me their preliminary search term lists to review, they will always include names of individuals that they want to search for, like this:

“Jim Smith”
“Doug Austin”

Phrase searches are the most limited alternative for searching because the search must exactly match the phrase. For example, a phrase search of “Jim Smith” won’t retrieve “Smith, Jim” if his name appears that way in the documents.

That’s why I prefer to use a proximity search for individual names, it catches several variations and expands the recall of the search. Proximity searching is simply looking for two or more words that appear close to each other in the document. A proximity search for “Jim within 3 words of Smith” will retrieve “Jim Smith”, “Smith, Jim”, and even “Jim T. Smith”. Proximity searching is also a more precise option in most cases than “AND” searches – Doug AND Austin will retrieve any document where someone named Doug is in (or traveling to) Austin whereas “Doug within 3 words of Austin” will ensure those words are near each other, making is much more likely they’re responsive to the name search.

Accounting for Name Variations

Proximity searches won’t always account for all variations in a person’s name. What are other variations of the name “Jim”? How about “James” or “Jimmy”? Or even “Jimbo”? I have a friend named “James” who is also called “Jim” by some of his other friends and “Jimmy” by a few of his other friends. Also, some documents may refer to him by his initials – i.e., “J.T. Smith”. All are potential variations to search for in your collection.

Common name derivations like those above can be deduced in many cases, but you may not always know the middle name or initial. If so, it may take performing a search of just the last name and sampling several documents until you are able to determine that middle initial for searching (this may also enable you to identify nicknames like “JayDog”, which could be important given the frequently informal tone of emails, even business emails).

Applying the proximity and name variation concepts into our search, we might perform something like this to get our “Jim Smith” documents:

(jim OR jimmy OR james OR “j.t.”) w/3 Smith, where “w/3” is “within 3 words of”. This is the syntax you would use to perform the search in OnDemand®, CloudNine Discovery’s online review tool.

That’s a bit more inclusive than the “Jim Smith” phrase search the client originally gave me.

BTW, why did I use “jim OR jimmy” instead of the wildcard “jim*”? Because wildcard searches could yield additional terms I might not want (e.g., Joe Smith jimmied the lock). Don’t get wild with wildcards! Using the specific variations you want (e.g., “jim OR jimmy”) is often best, though you should always test your terms (and variations of those terms) to maximize the balance between recall and precision.

Of course, there’s another way to retrieve documents that mention key individuals – through their email addresses. We’ll touch on that topic next week.

So, what do you think? How do you handle searching for key individuals within your document collections? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Analysis