Analysis Archives

Organize Your Collection by Message Thread to Save Costs During Review: eDiscovery Best Practices

October 21, 2015

This topic came up recently with a client, so I thought it was timely to revisit…

Not only is insanity doing the same thing over and over again and expecting a different result, but in eDiscovery review, it can be even worse when you do get a different result.

One of the biggest challenges when reviewing electronically stored information (ESI) is identifying duplicates so that your reviewers aren’t reviewing the same files again and again. Not only does that drive up costs unnecessarily, but it could lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

There are a few ways to identify duplicates. Exact duplicates (that contain the exact same content in the same file format) can be identified through hash values, which are a digital fingerprint of the content of the file. MD5 and SHA-1 are the most popular hashing algorithms, which can identify exact duplicates of a file, so that they can be removed from the review population. Since many of the same emails are emailed to multiple parties and the same files are stored on different drives, deduplication through hashing can save considerable review costs.

Sometimes, files are exact (or nearly exact) duplicates in content but not in format. One example is a Word document published to an Adobe PDF file – the content is the same, but the file format is different, so the hash value will be different. Near-deduplication can be used to identify files where most or all of the content matches so they can be verified as duplicates and eliminated from review.

Another way to identify duplicative content is through message thread analysis. Many email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about lunch plans or did you see The Walking Dead last night).

CloudNine’s review platform (shameless plug warning!) is one example of an application that provides a mechanism for message thread analysis of Outlook emails that pulls the entire thread into one conversation for review in a popup window. By doing so, you can focus your review on the last emails in each conversation to see what is said without having to review each email.

With message thread analysis, you can minimize review of duplicative information within emails, saving time and cost and ensuring consistency in the review.

So, what do you think? Does your review tool support message thread analysis? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s a New Dataset Option, Thanks to EDRM: eDiscovery Trends

August 28, 2015

For several years, the Enron data set (converted to Outlook by the EDRM Data Set team back in November of 2010) has been the only viable set of public domain data available for testing and demonstration of eDiscovery processing and review applications. Chances are, if you’ve seen a demo of an eDiscovery application in the last few years, it was using Enron data. Now, the EDRM Data Set team has begun to offer some new dataset options.

Yesterday, EDRM announced the release of the first of its “Micro Datasets.” As noted in the announcement, the datasets are designed for eDiscovery data testing and process validation. Software vendors, litigation support organizations, law firms and others may use these smaller sets to qualify support, test speed and accuracy in indexing and search, and conduct more forensically oriented analytics exercises throughout the eDiscovery workflow.

The initial offering is a 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files and contains EDRM specific work product files and data from public websites. There are even some uncommon formats including .mbox email storage files and .gz archive files! The EDRM Dataset group has scoured the internet and found usable freely available data at universities, government sites and elsewhere, a selection of which are included in the zip file.

The first EDRM Micro Dataset zip file is available now for download here. While it’s an initial small set, EDRM has promised “advanced” data sets to come. Those advanced data sets, to be released in the near future, will be available exclusively to EDRM members. Members will be notified by email with instructions for file downloading. Organizations interested in EDRM membership will find information at https://www.edrm.net/join/. Now, there is more reason than ever to join!

So, what do you think? Are you tired of using the Enron data set and look forward to alternatives? If so, today is your lucky day! Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Got Problems with Your eDiscovery Processes? “SWOT” Them Away: eDiscovery Best Practices

August 25, 2015

Having recently helped a client put one of these together, it seemed appropriate to revisit this topic…

Understanding the internal and external challenges that your organization faces allows it to approach ongoing and future discovery more strategically. A “SWOT” analysis is a tool that can be used to develop that understanding.

A “SWOT” analysis is a structured planning method used to evaluate the Strengths, Weaknesses, Opportunities, and Threats associated with a specific business objective. That specific business objective can be a specific project or all of the activities of a business unit. It involves specifying the objective of the specific business objective and identifying the internal and external factors that are favorable and unfavorable to achieving that objective. The SWOT analysis is broken down as follows:

Strengths: characteristics of the business or project that give it an advantage over others;
Weaknesses: are characteristics that place the team at a disadvantage relative to others;
Opportunities: elements in the environment that the project could exploit to its advantage;
Threats: elements in the environment that could cause trouble for the business or project.

“SWOT”, get it?

From an eDiscovery perspective, a SWOT analysis enables you to take an objective look at how your organization handles discovery issues – what you do well and where you need to improve – and the external factors that can affect how your organization addresses its discovery challenges. The SWOT analysis enables you to assess how your organization handles each phase of the discovery process – from Information Governance to Presentation – to evaluate where your strengths and weaknesses exist so that you can capitalize on your strengths and implement changes to address your weaknesses.

How solid is your information governance program? How well does your legal department communicate with IT? How well formalized is your coordination with outside counsel and vendors? Do you have a formalized process for implementing and tracking litigation holds? These are examples of questions you might ask about your organization and, based on the answers, identify your organization’s strengths and weaknesses in managing the discovery process.

However, if you only look within your organization, that’s only half the battle. You also need to look at external factors and how they affect your organization in its handling of discovery issues. Trends such as the growth of social media, and changes to state or federal rules addressing handling of electronically stored information (ESI) need to be considered in your organization’s strategic discovery plan.

Having worked through the strategic analysis process with several organizations over a number of years, I find that the SWOT analysis is a useful tool for summarizing where the organization currently stands with regard to managing discovery, which naturally identifies areas for improvement that can be addressed.

So, what do you think? Has your organization performed a SWOT analysis of your discovery process? Please share any comments you might have or if you’d like to know more about a particular topic.

Graphic source: Wikipedia.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here are a Few Common Myths About Technology Assisted Review: eDiscovery Best Practices

August 11, 2015

A couple of years ago, after my annual LegalTech New York interviews with various eDiscovery thought leaders (a list of which can be found here, with links to each interview), I wrote a post about some of the perceived myths that exist regarding Technology Assisted Review (TAR) and what it means to the review process. After a recent discussion with a client where their misperceptions regarding TAR were evident, it seemed appropriate to revisit this topic and debunk a few myths that others may believe as well.

TAR is New Technology

Actually, with all due respect to each of the various vendors that have their own custom algorithm for TAR, the technology for TAR as a whole is not new technology. Ever heard of artificial intelligence? TAR, in fact, applies artificial intelligence to the review process. With all of the acronyms we use to describe TAR, here’s one more for consideration: “Artificial Intelligence for Review” or “AIR”. May not catch on, but I like it. (much to my disappointment, it didn’t)…

Maybe attorneys would be more receptive to it if they understood as artificial intelligence? As Laura Zubulake pointed out in my interview with her, “For years, algorithms have been used in government, law enforcement, and Wall Street. It is not a new concept.” With that in mind, Ralph Losey predicts that “The future is artificial intelligence leveraging your human intelligence and teaching a computer what you know about a particular case and then letting the computer do what it does best – which is read at 1 million miles per hour and be totally consistent.”

TAR is Just Technology

Treating TAR as just the algorithm that “reviews” the documents is shortsighted. TAR is a process that includes the algorithm. Without a sound approach for identifying appropriate example documents for the collection, ensuring educated and knowledgeable reviewers to appropriately code those documents and testing and evaluating the results to confirm success, the algorithm alone would simply be another case of “garbage in, garbage out” and doomed to fail. In a post from last week, we referenced Tom O’Connor’s recent post where he quoted Maura Grossman, probably the most recognized TAR expert, who stated that “TAR is a process, not a product.” True that.

TAR and Keyword Searching are Mutually Exclusive

I’ve talked to some people that think that TAR and key word searching are mutually exclusive, i.e., that you wouldn’t perform key word searching on a case where you plan to use TAR. Not necessarily. Ralph Losey continues to advocate a “multimodal” approach, noting it as: “more than one kind of search – using TAR, but also using keyword search, concept search, similarity search, all kinds of other methods that we have developed over the years to help train the machine. The main goal is to train the machine.”

TAR Eliminates Manual Review

Many people (including the New York Times) think of TAR as the death of manual review, with all attorney reviewers being replaced by machines. Actually, manual review is a part of the TAR process in several aspects, including: 1) Subject matter knowledgeable reviewers are necessary to perform review to create a training set of documents for the technology, 2) After the process is performed, both sets (the included and excluded documents) are sampled and the samples are reviewed to determine the effectiveness of the process, and 3) The resulting responsive set is generally reviewed to confirm responsiveness and also to determine whether the documents are privileged. Without manual review to train the technology and verify the results, the process would fail.

TAR Has to Be Perfect to Be Useful

Detractors of TAR note that TAR can miss plenty of responsive documents and is nowhere near 100% accurate. In one recent case, the producing party estimated as many as 31,000 relevant documents may have been missed by the TAR process. However, they also estimated that a much more costly manual review would have missed as many as 62,000 relevant documents.

Craig Ball’s analogy about the two hikers that encounter the angry grizzly bear is appropriate – the one hiker doesn’t have to outrun the bear, just the other hiker. Craig notes: “That is how I look at technology assisted review. It does not have to be vastly superior to human review; it only has to outrun human review. It just has to be as good or better while being faster and cheaper.”

So, what do you think? Do you agree that these are myths? Can you think of any others? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Keyword Searching Isn’t Dead, If It’s Done Correctly: eDiscovery Best Practices

August 5, 2015

In the latest post of the Advanced Discovery blog, Tom O’Connor (who is an industry thought leader and has been a thought leader interviewee on this blog several times) posed an interesting question: Is Keyword Searching Dead?

In his post, Tom recapped the discussion of a session with the same name at the recent Today’s General Counsel Institute in New York City where Tom was a co-moderator of the session along with Maura Grossman, a recognized Technology Assisted Review (TAR) expert, who was recently appointed as Special Master in the Rio Tinto case. Tom then went on to cover some of the arguments for and against keyword searching as discussed by the panelists and participants in the session, while also noting that numerous polls and client surveys show that the majority of people are NOT using TAR today. So, they must be using keyword searching, right?

Should they be? Is there still room for keyword searching in today’s eDiscovery landscape, given the advances that have been made in recent years in TAR technology?

There is, if it’s done correctly. Tom quotes Maura in the article as stating that “TAR is a process, not a product.” The same could be said for keyword searching. If the process is flawed within which the keyword searches are being performed, you could either retrieve way more documents to be reviewed than necessary and drive up eDiscovery costs or leave yourself open to challenges in the courtroom regarding your approach. Many lawyers at corporations and law firms identify search terms to be performed (and, in many cases, agree on those terms with opposing counsel) without any testing done to confirm the validity of those terms.

Way back in the first few months of this blog (over four years ago), I advocated an approach to searching that I called “STARR” – Search, Test, Analyze, Revise (if necessary) and Repeat (also, if necessary). With an effective platform (using advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and knowledge and experience of that platform and also knowledge of search best practices, you can start with a well-planned search that can be confirmed or adjusted using the “STARR” approach.

And, even when you’ve been searching databases for as long as I have (decades now), an effective process is key because you never know what you will find until you test the results. The favorite example that I have used over recent years (and walked through in this earlier post) is the example where I was doing work for a petroleum (oil) company looking for documents that related to “oil rights” and retrieved almost every published and copyrighted document in the oil company with a search of “oil AND rights”. Why? Because almost every published and copyrighted document in the oil company had the phrase “All Rights Reserved”. Testing and an iterative process eventually enabled me to find the search that offered the best balance of recall and precision.

Like TAR, keyword searching is a process, not a product. And, you can quote me on that. (-:

So, what do you think? Is keyword searching dead? And, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“Da Silva Moore Revisited” Will Be Visited by a Newly Appointed Special Master: eDiscovery Case Law

July 27, 2015

In Rio Tinto Plc v. Vale S.A., 14 Civ. 3042 (RMB)(AJP) (S.D.N.Y. Jul. 15, 2015), New York Magistrate Judge Andrew J. Peck, at the request of the defendant, entered an Order appointing Maura Grossman as a special master in this case to assist with issues concerning Technology-Assisted Review (TAR).

Back in March (as covered here on this blog), Judge Peck approved the proposed protocol for technology assisted review (TAR) presented by the parties, titling his opinion “Predictive Coding a.k.a. Computer Assisted Review a.k.a. Technology Assisted Review (TAR) — Da Silva Moore Revisited”. Alas, as some unresolved issues remained regarding the parties’ TAR-based productions, Judge Peck decided to prepare the order appointing Grossman as special master for the case. Grossman, of course, is a recognized TAR expert, who (along with Gordon Cormack) wrote Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review and also the Grossman-Cormack Glossary of Technology Assisted Review (covered on our blog here).

While noting that it has “no objection to Ms. Grossman’s qualifications”, the plaintiff issued several objections to the appointment, including:

The defendant should have agreed much earlier to appointment of a special master: Judge Peck’s response was that “The Court certainly agrees, but as the saying goes, better late than never. There still are issues regarding the parties’ TAR-based productions (including an unresolved issue raised at the most recent conference) about which Ms. Grossman’s expertise will be helpful to the parties and to the Court.”
The plaintiff stated a “fear that [Ms. Grossman’s] appointment today will only cause the parties to revisit, rehash, and reargue settled issues”: Judge Peck stated that “the Court will not allow that to happen. As I have stated before, the standard for TAR is not perfection (nor of using the best practices that Ms. Grossman might use in her own firm’s work), but rather what is reasonable and proportional under the circumstances. The same standard will be applied by the special master.”
One of the defendant’s lawyers had three conversations with Ms. Grossman about TAR issues: Judge Peck noted that one contact in connection with The Sedona Conference “should or does prevent Ms. Grossman from serving as special master”, and noted that, in the other two, the plaintiff “does not suggest that Ms. Grossman did anything improper in responding to counsel’s question, and Ms. Grossman has made clear that she sees no reason why she cannot serve as a neutral special master”, agreeing with that statement.

Judge Peck did agree with the plaintiff on allocation of the special master’s fees, stating that the defendant’s “propsal [sic] is inconsistent with this Court’s stated requirement in this case that whoever agreed to appointment of a special master would have to agree to pay, subject to the Court reallocating costs if warranted”.

So, what do you think? Was the appointment of a special master (albeit an eminently qualified one) appropriate at this stage of the case? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

This Study Discusses the Benefits of Including Metadata in Machine Learning for TAR: eDiscovery Trends

July 16, 2015

A month ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month and we covered one of those papers a couple of weeks later. Today, let’s cover another paper from the study.

The Role of Metadata in Machine Learning for Technology Assisted Review (by Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler) attempts to study the role of metadata in machine learning for technology assisted review (TAR), particularly with respect to the algorithm development process.

Let’s face it, we all generally agree that metadata is a critical component of ESI for eDiscovery. But, opinions are mixed as to its value in the TAR process. For example, the Grossman-Cormack Glossary of Technology Assisted Review (which we covered here in 2012) includes metadata as one of the “typical” identified features of a document that are used as input to a machine learning algorithm. However, a couple of eDiscovery software vendors have both produced documentation stating that “machine learning systems typically rely upon extracted text only and that experts engaged in providing document assessments for training should, therefore, avoid considering metadata values in making responsiveness calls”.

So, the authors decided to conduct a study that established the potential benefit of incorporating metadata into TAR algorithm development processes, as well as evaluate the benefits of using extended metadata and also using the field origins of that metadata. Extended metadata fields included Primary Custodian, Record Type, Attachment Name, Bates Start, Company/Organization, Native File Size, Parent Date and Family Count, to name a few. They evaluated three distinct data sets (one drawn from Topic 301 of the TREC 2010 Interactive Task, two other proprietary business data sets) and generated a random sample of 4,500 individual documents for each (split into a 3,000 document Control Set and a 1,500 document Training Set).

The metric they used throughout to compare model performance is Area Under the Receiver Operating Characteristic Curve (AUROC). Say what? According to the report, the metric indicates the probability that a given model will assign a higher ranking to a randomly selected responsive document than a randomly selected non-responsive document.

As indicated by the graphic above, their findings were that incorporating metadata as an integral component of machine learning processes for TAR improved results (based on the AUROC metric). Particularly, models incorporating Extended metadata significantly outperformed models based on body text alone in each condition for every data set. While there’s still a lot to learn about the use of metadata in modeling for TAR, it’s an interesting study and start to the discussion.

A copy of the twelve page study (including Bibliography and Appendix) is available here. There is also a link to the PowerPoint presentation file from the workshop, which is a condensed way to look at the study, if desired.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Craig Ball Explains HASH Deduplication As Only He Can: eDiscovery Best Practices

July 10, 2015

Ever wonder why some documents are identified as duplicates and others are not, even though they appear to be identical? Leave it to Craig Ball to explain it in plain terms.

In the latest post (Deduplication: Why Computers See Differences in Files that Look Alike) in his excellent Ball in your Court blog, Craig states that “Most people regard a Word document file, a PDF or TIFF image made from the document file, a printout of the file and a scan of the printout as being essentially “the same thing.” Understandably, they focus on content and pay little heed to form. But when it comes to electronically stored information, the form of the data—the structure, encoding and medium employed to store and deliver content–matters a great deal.” The end result is that two documents may look the same, but may not be considered duplicates because of their format.

Craig also references a post from “exactly” three years ago (it’s four days off Craig, just sayin’) that provides a “quick primer on deduplication” that shows the three approaches where deduplication can occur, including the most common approach of using HASH values (MD5 or SHA-1).

My favorite example of how two seemingly duplicate documents can be different is the publication of documents to Adobe Portable Document Format (PDF). As I noted in our post from (nowhere near exactly) three years ago, I “publish” marketing slicks created in Microsoft® Publisher, “publish” finalized client proposals created in Microsoft Word and “publish” presentations created in Microsoft PowerPoint to PDF format regularly (still do). With a free PDF print driver, you can conceivably create a PDF file for just about anything that you can print. Of course, scans of printed documents that were originally electronic are another way where two seemingly duplicate documents can be different.

The best part of Craig’s post is the exercise that he describes at the end of it – creating a Word document of the text of the Gettysburg Address (saved as both .DOC and .DOCX), generating a PDF file using the Save As and Print As PDF file methods and scanning the printed document to both TIFF and PDF at different resolutions. He shows the MD5HASH value and the file size of each file. Because the format of the file is different each time, the MD5HASH value is different each time. When that happens for the same content, you have what some of us call “near dupes”, which have to be analyzed based on the text content of the file.

The file size is different in almost every case too. We performed a similar test (still not exactly) three years ago (but much closer). In our test, we took one of our one page blog posts about the memorable Apple v. Samsung litigation and saved it to several different formats, including TXT, HTML, XLSX, DOCX, PDF and MSG – the sizes ranged from 10 KB all the way up to 221 KB. So, as you can see, the same content can vary widely in both HASH value and file size, depending on the file format and how it was created.

As usual, I’ve tried not to steal all of Craig’s thunder from his post, so please check out it out here.

So, what do you think? What has been your most unique deduplication challenge? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s One Study That Shows Potential Savings from Technology Assisted Review: eDiscovery Trends

June 30, 2015

A couple of weeks ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month. Today, let’s cover one of those papers.

The Case for Technology Assisted Review and Statistical Sampling in Discovery (by Christopher H Paskach, F. Eli Nelson and Matthew Schwab) aims to show how Technology Assisted Review (TAR) and Statistical Sampling can significantly reduce risk and improve productivity in eDiscovery processes. The easy to read 6 page report concludes with the observation that, with measures like statistical sampling, “attorney stakeholders can make informed decisions about the reliability and accuracy of the review process, thus quantifying actual risk of error and using that measurement to maximize the value of expensive manual review. Law firms that adopt these techniques are demonstrably faster, more informed and productive than firms who rely solely on attorney reviewers who eschew TAR or statistical sampling.”

The report begins by giving an introduction which includes a history of eDiscovery, starting with printing documents, “Bates” stamping them, scanning and using Optical Character Recognition (OCR) programs to capture text for searching. As the report notes, “Today we would laugh at such processes, but in a profession based on ‘stare decisis,’ changing processes takes time.” Of course, as we know now, “studies have concluded that machine learning techniques can outperform manual document review by lawyers”. The report also references key cases such as DaSilva Moore, Kleen Products and Global Aerospace, demonstrating with the first few of many cases to approve the use of technology assisted review for eDiscovery.

Probably the most interesting portion of the report is the section titled Cost Impact of TAR, which illustrates a case scenario that compares the cost of TAR to the cost of manual review. On a strictly relevance based review of 90,000 documents (after keyword filtering, which implies a multimodal approach to TAR), the TAR approach was over $57,000 less expensive ($136,225 vs. $193,500 for manual review). The report illustrates the comparison with both a numbers spreadsheet and a pie chart comparison of costs, based on the assumptions provided. Sounds like the basis for a budgeting tool!

Anyway, the report goes on to discuss the benefits of statistical sampling to validate the results, demonstrating that the only way to attempt to do so in a manual review scenario is to review the documents multiple times, which is prone to human error and inconsistent assessments of responsiveness. The report then covers necessary process changes to realize the benefits of TAR and statistical sampling and concludes with the declaration that:

“Companies and law firms that take advantage of the rapid advances in TAR will be able to keep eDiscovery review costs down and reduce the investment in discovery by getting to the relevant facts faster. Those firms who stick with unassisted manual review processes will likely be left behind.”

The report is a quick, easy read and can be viewed here.

So, what do you think? Do you agree with the report’s findings? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

DESI Got Your Input, and Here It Is: eDiscovery Trends

June 16, 2015

Back in January, we discussed the Discovery of Electronically Stored Information (DESI, not to be confused with Desi Arnaz, pictured above) workshop and its call for papers describing research or practice for the DESI VI workshop that was held last week at the University of San Diego as part of the 15th International Conference on Artificial Intelligence & Law (ICAIL 2015). Now, links to those papers are available on their web site.

The DESI VI workshop aims to bring together researchers and practitioners to explore innovation and the development of best practices for application of search, classification, language processing, data management, visualization, and related techniques to institutional and organizational records in eDiscovery, information governance, public records access, and other legal settings. Ideally, the aim of the DESI workshop series has been to foster a continuing dialogue leading to the adoption of further best practice guidelines or standards in using machine learning, most notably in the eDiscovery space. Organizing committee members include Jason R. Baron of Drinker Biddle & Reath LLP and Douglas W. Oard of the University of Maryland.

The workshop included keynote addresses by Bennett Borden and Jeremy Pickens, a session regarding Topics in Information Governance moderated by Jason R. Baron, presentations of some of the “refereed” papers and other moderated discussions. Sounds like a very informative day!

As for the papers themselves, here is a list from the site with links to each paper:

Refereed Papers

William C. Dimm, Information Retrieval Performance Measurement Using Extrapolated Precision
Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler, The Role of Metadata in Machine Learning for Technology Assisted Review
David J. Marcos, How a Bill Becomes a Bit: Engineering Compliance
T. Oehrle and E. A. Johnson, Statistical Context Analysis and Search Quality
Jeremy Pickens, An Exploratory Analysis of Control Sets for Measuring E-Discovery Progress
James A. Sherer, Jenny Le and Amie Taal, Big Data Discovery, Privacy, and the Application of Differential Privacy Mechanisms
David van Dijk, David Graus, Zhaochun Ren, Hans Henseler and Maarten de Rijke, Who is Involved? Semantic Search for E-Discovery

Position Papers

Thomas I. Barnett, Away with Words: The Myths and Misnomers of Conventional Search Strategies and the Search for Meaning in eDiscovery
Christopher H Paskach, F. Eli Nelson and Matthew Schwab, The Case for Technology Assisted Review and Statistical Sampling in Discovery
Sandra Serkes, The Larger Picture: Moving Beyond Predictive Coding for Document Productions to Predictive Analytics for Information Governance
Yasu Robert Wasem Yoshii, The State of IG in Japan and an Unexplored Approach to Opening Up the Conservative Corporations

If you’re interested in discovery of ESI, Information Governance and artificial intelligence, these papers are for you! Kudos to all of the authors who submitted them. Over the next few weeks, we plan to dive deeper into at least a few of them.

So, what do you think? Did you attend DESI VI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Analysis