Analysis Archives

Court Denies Defendant’s Motion to Overrule Plaintiff’s Objections to Discovery Requests

Judge Peck Refuses to Order Defendant to Use Technology Assisted Review: eDiscovery Case Law

August 5, 2016

We’re beginning to see more disputes between parties regarding the use of technology assisted review (TAR) in discovery. Usually in these disputes, one party wants to use TAR and the other party objects. In this case, the dispute was a bit different…

In Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP) (S.D.N.Y. Aug. 1, 2016), New York Magistrate Judge Andrew J. Peck, indicating that the key issue before the court in the discovery dispute between parties was whether (at the plaintiff’s request) the defendants can be forced to use technology assisted review, refused to force the defendant to do so, stating “The short answer is a decisive ‘NO.’”

Case Background

In this discrimination case by a former employee of the defendant, after several delays in discovery, the parties had several discovery disputes. They filed a joint letter with the court, seeking rulings as to the proper scope of ESI discovery (mostly issues as to custodians and date range) and search methodology – whether to use keywords (which the defendants wanted to do) or TAR (which the plaintiff wanted the defendant to do).

With regard to date range, the parties agreed to a start date for discovery of September 1, 2005 but disagreed on the end date. In the discovery conference held on July 27, 2016, Judge Peck ruled on a date in between what the plaintiff and defendants – April 30, 2010, without prejudice to the plaintiff seeking documents or ESI from a later period, if justified, on a more targeted inquiry basis. As to custodians, the City agreed to search the files of nine custodians, but not six additional custodians that the plaintiff requested. The Court ruled that discovery should be staged, by starting with the agreed upon nine custodians. After reviewing the production from the nine custodians, if the plaintiff could demonstrate that other custodians had relevant, unique and proportional ESI, the Court would consider targeted searches from those custodians.

After the parties had initial discussions about the City using keywords, the plaintiff’s counsel consulted an ediscovery vendor and proposed that the defendants should use TAR as a “more cost-effective and efficient method of obtaining ESI from Defendants.” The defendants declined, both because of cost and concerns that the parties, based on their history of scope negotiations, would not be able to collaborate to develop the seed set for a TAR process.

Judge’s Ruling

Judge Peck noted that “Hyles absolutely is correct that in general, TAR is cheaper, more efficient and superior to keyword searching” and referenced his “seminal” DaSilva Moore decision and also his 2015 Rio Tinto decision where he wrote that “the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Judge Peck also noted that “Hyles’ counsel is correct that parties should cooperate in discovery”, but stated that “[c]ooperation principles, however, do not give the requesting party, or the Court, the power to force cooperation or to force the responding party to use TAR.”

Judge Peck, while acknowledging that he is “a judicial advocate for the use of TAR in appropriate cases”, also noted that he is also “a firm believer in the Sedona Principles, particularly Principle 6, which clearly provides that:

Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

Judge Peck went on to state: “Under Sedona Principle 6, the City as the responding party is best situated to decide how to search for and produce ESI responsive to Hyles’ document requests. Hyles’ counsel candidly admitted at the conference that they have no authority to support their request to force the City to use TAR. The City can use the search method of its choice. If Hyles later demonstrates deficiencies in the City’s production, the City may have to re-do its search. But that is not a basis for Court intervention at this stage of the case.” As a result, Judge Peck denied the plaintiff’s application to force the defendants to use TAR.

So, what do you think? Are you surprised by that ruling? Please share any comments you might have or if you’d like to know more about a particular topic.

Don’t forget that next Wednesday at 1:00pm ET, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC. Click on the link here to register.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

ACEDS Adds its Weight to the eDiscovery Business Confidence Survey: eDiscovery Trends

August 3, 2016

We’ve covered two rounds of the quarterly eDiscovery Business Confidence Survey created by Rob Robinson and conducted on his terrific Complex Discovery site (previous results are here and here). It’s time for the Summer 2016 Survey. Befitting of the season, the survey has a HOT new affiliation with the Association of Certified eDiscovery Specialists (ACEDS).

As before, the eDiscovery Business Confidence Survey is a non-scientific survey designed to provide insight into the business confidence level of individuals working in the eDiscovery ecosystem. The term ‘business’ represents the economic factors that impact the creation, delivery, and consumption of eDiscovery products and services. The purpose of the survey is to provide a subjective baseline for understanding the trajectory of the business of eDiscovery through the eyes of industry professionals.

Also as before, the survey asks questions related to how you rate general business conditions for eDiscovery in your segment of the eDiscovery market, both current and six months from now, a general sense of where you think revenue and profits will be for your segment of the market in six months and which issue do you think will most impact the business of eDiscovery over the next six months, among other questions. It’s a simple nine question survey that literally takes about a minute to complete. Who hasn’t got a minute to provide useful information?

Individual answers are kept confidential, with the aggregate results to be published on the ACEDS website (News & Press), on the Complex Discovery blog, and on selected ACEDS Affiliate websites and blogs (we’re one of those and we’ll cover the results as we have for the first two surveys) upon completion of the response period, which started on August 1 and goes through Wednesday, August 31.

What are experts saying about the survey? Here are a couple of notable quotes:

Mary Mack, Executive Director of ACEDS stated: “The business of eDiscovery is an ever-present and important variable in the equation of legal discovery. As financial factors are a primary driver in eDiscovery decisions ranging from sourcing and staffing to development and deployment, ACEDS sees value in regularly checking the business pulse of eDiscovery professionals. The eDiscovery Business Confidence Survey provides a tool to help take that pulse on a systematic basis and ACEDS looks forward to sponsoring, participating, and reporting on the results of this salient survey each quarter.”

George Socha, Co-Founder of EDRM and Managing Director of Thought Leadership of BDO stated: “In my experience, the successful conduct of eDiscovery is comprised of a balance of in-depth education, practical execution, and experience-based excellence. The eDiscovery Business Confidence survey being highlighted by ACEDS is one of many industry surveys that positively contributes to this balance, as it provides a quarterly snapshot into the business of discovery. I highly encourage serious eDiscovery professionals to complete and consider this survey as a key tool for understanding the business challenges and opportunities in our profession.”

The more respondents there are, the more useful the results will be! What more do you need? Click here to take the survey yourself. Don’t forget!

So, what do you think? Are you confident in the state of business within the eDiscovery industry? Share your thoughts in the survey and, as always, please share any comments you might have with us or let us know if you’d like to know more about a particular topic.

Don’t forget that next Wednesday at 1:00pm ET, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC. Click on the link here to register.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

How Automation is Revolutionizing eDiscovery: eDiscovery Trends

August 2, 2016

I thought about titling this post “Less Than Half of Automation is Revolutionizing eDiscovery” to keep the streak alive, but (alas) all good streaks must come to an end… :o)

If you missed our panel session last month in New York City at The Masters Conference, you missed a terrific discussion about automation in eDiscovery and, particularly an in-depth discussion about technology assisted review (TAR) and whether it lives up to the current hype. Now, you get another chance to check it out, thanks to ACEDS.

Next Wednesday, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC.

The webinar will be conducted at 1:00 pm ET (which is 12:00 pm CT, 11:00 am MT and 10:00 am PT). Oh, and 5:00 pm GMT (Greenwich Mean Time). If you’re in any other time zone, you’ll have to figure it out for yourself. Click on the link here to register.

If you’re interested in learning about various ways in which automation is being used in eDiscovery and getting a chance to look at the current state of TAR, possible warts and all, I encourage you to sign up and attend. It should be an enjoyable and educational hour. Thanks to our friends at ACEDS for conducting the session!

So, what do you think? Do you think automation is revolutionizing eDiscovery? As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Cooperation in Predictive Coding Exercise Fails to Avoid Disputed Production: eDiscovery Case Law

July 22, 2016

In Dynamo Holdings v. Commissioner of Internal Revenue, Docket Nos. 2685-11, 8393-12 (U.S. Tax Ct. July 13, 2016), Texas Tax Court Judge Ronald Buch ruled denied the respondent’s Motion to Compel Production of Documents Containing Certain Terms, finding that there is “no question that petitioners satisfied our Rules when they responded using predictive coding”.

Case Background

In this case involving various transfers from one entity to a related entity where the respondent determined that the transfers were disguised gifts to the petitioner’s owners and the petitioners asserted that the transfers were loans, the parties previously disputed the use of predictive coding for this case and, in September 2014 (covered by us here), Judge Buch ruled that “[p]etitioners may use predictive coding in responding to respondent’s discovery request. If, after reviewing the results, respondent believes that the response to the discovery request is incomplete, he may file a motion to compel at that time.”

At the outset of this ruling, Judge Buch noted that “[t]he parties are to be commended for working together to develop a predictive coding protocol from which they worked”. As indicated by the parties’ joint status reports, the parties agreed to and followed a framework for producing the electronically stored information (ESI) using predictive coding: (1) restoring and processing backup tapes, (2) selecting and reviewing seed sets, (3) establishing and applying the predictive coding algorithm; and (4) reviewing and returning the production set

While the petitioners were restoring the first backup tape, the respondent requested that the petitioners conduct a Boolean search and provided petitioners with a list of 76 search terms for the petitioners to run against the processed data. That search yielded over 406,000 documents, from which two 1,000 document samples were conducted and provided to the respondent for review. After the model was run against the second 1,000 documents, the petitioners’ technical professionals reported that the model was not performing well, so the parties agreed that the petitioners would select an additional 1,000 documents that the algorithm had ranked high for likely relevancy and the respondent reviewed them as well. The respondent declined to review one more validation sample of 1,000 documents when the petitioner’s technical professionals explained that the additional review would be unlikely to improve the model.

Ultimately, using the respondent’s selected recall rate of 95 percent, the petitioners ran the algorithm against the 406,000 documents to identify documents to produce (followed by a second algorithm to identify privileged materials) and, between January and March 2016, the petitioners delivered a production set of approximately 180,000 total documents on a portable device for the respondent to review and included a relevancy score for each document – ultimately, the respondent only found 5,796 to be responsive (barely over 3% of the production) and returned the rest.

On June 17, 2016, the respondent filed a motion to compel production of the documents identified in the Boolean search that were not produced in the production set (1,353 of 1,645 documents containing those terms they claimed were not produced), asserting that those documents were “highly likely to be relevant.” Ten days later, the petitioner filed an objection to the respondent’s motion to compel, challenging the respondent’s calculations of documents that were incorrectly produced by noting that only 1,360 of documents actually contained those terms, that 440 of them had actually been produced and that many of the remaining documents predated or postdated the relevant time period. They also argued that the documents were selected by the predictive coding algorithm based on selection criteria set by the respondent.

Judge’s Ruling

Judge Buch noted that “[r]espondent’s motion is predicated on two myths”: 1) the myth that “manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured”, and 2) the myth of a perfect response to the respondent’s discovery request, which the Tax Court Rules don’t require. Judge Buch cited Rio Tinto where Judge Andrew Peck stated:

“One point must be stressed – it is inappropriate to hold TAR [technology assisted review] to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using from using TAR for review.”

Stating that “[t]here is no question that petitioners satisfied our Rules when they responded using predictive coding”, Judge Buch denied the respondent’s Motion to Compel Production of Documents Containing Certain Terms.

So, what do you think? If parties agree to the predictive coding process, should they accept the results no matter what? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules that Judges Can Consider Predictive Algorithms in Sentencing: eDiscovery Trends

July 19, 2016

Score one for big data analytics. According to The Wall Street Journal Law Blog, the Wisconsin Supreme Court ruled last week that sentencing judges may take into account algorithms that score offenders based on their risk of committing future crimes.

As noted in Court: Judges Can Consider Predictive Algorithms in Sentencing (written by Joe Palazzolo), the Wisconsin Supreme Court, in a unanimous ruling, upheld a six-year prison sentence for 34-year-old Eric Loomis, who was deemed a high risk of re-offending by a popular tool known as COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a 137-question test that covers criminal and parole history, age, employment status, social life, education level, community ties, drug use and beliefs.

“Ultimately, we conclude that if used properly, observing the limitations and cautions set forth herein, a circuit court’s consideration of a COMPAS risk assessment at sentencing does not violate a defendant’s right to due process,” wrote Justice Ann Walsh Bradley of the Wisconsin Supreme Court.

During his appeal in April after pleading guilty to eluding an officer and no contest to operating a vehicle without the owner’s consent, Loomis challenged the use of the test’s score, saying it violated his right to due process of law because he was unable to review the algorithm and raise questions about it. Loomis, a registered sex offender, had been then sentenced to six years in prison because his score on the COMPAS test noted he was a “high risk” to the community.

As part of the ruling, Justice Bradley ordered state officials to inform the sentencing court about several cautions regarding a COMPAS risk assessment’s accuracy: (1) the proprietary nature of COMPAS has been invoked to prevent disclosure of information relating to how factors are weighed or how risk scores are to be determined; (2) risk assessment compares defendants to a national sample, but no crossvalidation study for a Wisconsin population has yet been completed; (3) some studies of COMPAS risk assessment scores have raised questions about whether they disproportionately classify minority offenders as having a higher risk of recidivism; and (4) risk assessment tools must be constantly monitored and re-normed for accuracy due to changing populations and subpopulations.

And, the court also had guidance for how the scores should be used, as well:

“Although it cannot be determinative, a sentencing court may use a COMPAS risk assessment as a relevant factor for such matters as: (1) diverting low-risk prison-bound offenders to a non-prison alternative; (2) assessing whether an offender can be supervised safely and effectively in the community; and (3) imposing terms and conditions of probation, supervision, and responses to violations.”

So, while the sentencing judge may take COMPAS scores into consideration, they can’t use it to justify making a sentence longer or shorter, or serve as the sole factor in determining whether someone should be sentenced to prison or released into the community. As Judge Bradley wrote in her opinion, “Using a risk assessment tool to determine the length and severity of a sentence is a poor fit”.

So, what do you think? Should algorithms that have a significant effect on people’s lives be secret? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s One Group of People Who May Not Be a Fan of Big Data Analytics: eDiscovery Trends

June 29, 2016

Most of us love the idea of big data analytics and how it can ultimately benefit us, not just in the litigation process, but in business and life overall. But, there may be one group of people who may not be as big a fan of big data analytics as the rest of us: criminals who are being sentenced at least partly on the basis of predictive data analysis regarding the likelihood that they will be a repeat offender.

This article in the ABA Journal (Legality of using predictive data to determine sentences challenged in Wisconsin Supreme Court case, written by Sony Kassam), discusses the case of 34-year-old Eric Loomis, who was arrested in Wisconsin in February 2013 for driving a car that had been used in a shooting. He ultimately pled guilty to eluding an officer and no contest to operating a vehicle without the owner’s consent. Loomis, a registered sex offender, was then sentenced to six years in prison because a score on a test noted he was a “high risk” to the community.

During his appeal in April, Loomis challenged the use of the test’s score, saying it violated his right to due process of law because he was unable to review the algorithm and raise questions about it.

As described in The New York Times, the algorithm used is known as COMPAS (Correctional Offender Management Profiling for Alternative Sanctions). Compas is an algorithm developed by a private company, Northpointe Inc., that calculates the likelihood of someone committing another crime and suggests what kind of supervision a defendant should receive in prison. The algorithm results come from a survey of the defendant and information about his or her past conduct. Company officials at Northpointe say the algorithm’s results are backed by research, but they are “proprietary”. While Northpointe does acknowledge that men, women and juveniles all receive different assessments, the factors considered and the weight given to each are kept secret.

The secrecy and the use of different scales for men and women are at the heart of Loomis’ appeal, which an appellate court has referred to the Wisconsin Supreme Court, which could rule on the appeal in the coming days or weeks.

Other states also use algorithms, including Utah and Virginia, the latter of which has used algorithms for more than a decade. According to The New York Times, at least one previous prison sentence involving Compas was appealed in Wisconsin and upheld. And, algorithms have also been used to predict potential crime hot spots: Police in Chicago have used data to identify people who are likely to shoot or get shot and authorities in Kansas City, Mo. have used data to identify possible criminals. We’re one step closer to pre-crime, folks.

So, what do you think? Should algorithms that have a significant effect on people’s lives be secret? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data May Be Doubling Every Couple of Years, But How Much of it is Original?: Best of eDiscovery Daily

June 20, 2016

Even those of us at eDiscovery Daily have to take an occasional vacation (which, as you can see by the picture above, means taking the kids to their favorite water park); however, instead of “going dark” for a few days, we thought we would take a look back at some topics that we’ve covered in the past. Today’s post takes a look back at the challenge of managing duplicative ESI during eDiscovery. Enjoy!

______________________________

According to the Compliance, Governance and Oversight Council (CGOC), information volume in most organizations doubles every 18-24 months (now, it’s more like every 1.2 years). However, just because it doubles doesn’t mean that it’s all original. Like a bad cover band singing Free Bird, the rendition may be unique, but the content is the same. The key is limiting review to unique content.

When reviewers are reviewing the same files again and again, it not only drives up costs unnecessarily, but it could also lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, we all know the importance of identifying exact duplicates (that contain the exact same content in the same file format) which can be identified through MD5 and SHA-1 hash values, so that they can be removed from the review population and save considerable review costs.

Identifying near duplicates that contain the same (or almost the same) information (such as a Word document published to an Adobe PDF file where the content is the same, but the file format is different, so the hash value will be different) also reduces redundant review and saves costs.

Then, there is message thread analysis. Many email messages are part of a larger discussion, sometimes just between two parties, and, other times, between a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Pulling those messages together and enabling them to be reviewed as an entire discussion can eliminate that redundant review. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about the latest misstep by Anthony Weiner).

Clustering is a process which pulls similar documents together based on content so that the duplicative information can be identified more quickly and eliminated to reduce redundancy. With clustering, you can minimize review of duplicative information within documents and emails, saving time and cost and ensuring consistency in the review. As a result, even if the data in your organization doubles every couple of years, the cost of your review shouldn’t.

So, what do you think? Does your review tool support clustering technology to pull similar content together for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Number of Files in Each Gigabyte Can Vary Widely: eDiscovery Best Practices

May 17, 2016

Now and then, I am asked by clients how many documents (files) are typically contained in one gigabyte (GB) of data. When trying to estimate the costs for review, having a good estimate of the number of files is important to provide a good estimate for review costs. However, because the number of files per GB can vary widely, estimating review costs accurately can be a challenge.

About four years ago, I conducted a little (unscientific) experiment to show how the number of pages in each GB can vary widely, depending on the file formats that comprise that GB. Since we now tend to think more about files per GB than pages, I have taken a fresh look using the updated estimate below.

Each GB of data is rarely just one type of file. Many emails include attachments, which can be in any of a number of different file formats. Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats. Even files within the same application can vary, depending on the version in which they are stored. For example, newer versions of Office files (e.g., .docx, .xlsx) incorporate zip compression of the text, so the data sizes tend to be smaller than their older counterparts. So, estimating file counts with any degree of precision can be somewhat difficult.

To illustrate this, I decided to put the content from yesterday’s case law blog post into several different file formats to illustrate how much the size can vary, even when the content is essentially the same. Here are the results – rounded to the nearest kilobyte (KB):

Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 4 KB, it would take 262,144 text files at 4 KB each to equal 1 GB;
HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 57 KB, it would take 18,396 HTML files at 57 KB each to equal 1 GB;
Microsoft Excel 97-2003 Format (XLS): Created by copying the contents of the blog post and pasting it into a blank Excel XLS workbook – 325 KB, it would take 3,226 XLS files at 325 KB each to equal 1 GB;
Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel XLSX workbook – 296 KB, it would take 3,542 XLSX files at 296 KB each to equal 1 GB;
Microsoft Word 97-2003 Format (DOC): Created by copying the contents of the blog post and pasting it into a blank Word DOC document – 312 KB, it would take 3,361 DOC files at 312 KB each to equal 1 GB;
Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word DOCX document – 299 KB, it would take 3,507 DOCX files at 299 KB each to equal 1 GB;
Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 328 KB, it would take 3,197 MSG files at 328 KB each to equal 1
Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 1,550 KB, it would take 677 PDF files at 1,550 KB each to equal 1

The HTML and PDF examples weren’t exactly an “apples to apples” comparison to the other formats – they included other content from the web page as well. Nonetheless, the examples above hopefully illustrate that, to estimate the number of files in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well. Performing an Early Data Assessment on your data beforehand can provide those file counts you need to more accurately estimate your review costs.

So, what do you think? Was the 2016 example useful, highly flawed or both? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

At Litigation Time, the Cost of Data Storage May Not Be As Low As You Think: eDiscovery Best Practices

May 9, 2016

One of my favorite all-time graphics that we’ve posted on the blog (from one of our very first posts) is this ad from the early 1980s for a 10 MB disk drive – for $3,398! That’s MB (megabytes), not GB (gigabytes) or TB (terabytes). These days, the cost per GB for data storage is pennies on the dollar, which is a big reason why the total amount of data being captured and stored by industry doubles every 1.2 years. But, at litigation time, all that data can cost you – big.

When I checked on prices for external hard drives back in 2010 (not network drives, which are still more expensive), prices for a 2 TB external drive at Best Buy were as low as $140 (roughly 7 cents per GB). Now, they’re as low as $81.99 (roughly 4.1 cents per GB). And, these days, you can go bigger – a 5 TB drive for as low as $129.99 (roughly 2.6 cents per GB). I promise that I don’t have a side job at Best Buy and am not trying to sell you hard drives (even from the back of a van).

No wonder organizations are storing more and more data and managing Big Data in organizations has become such a challenge!

Because organizations are storing so much data (and in more diverse places than ever before), information governance within those organizations has become vitally important in keeping that data as manageable as possible. And, when litigation or regulatory requests hit, the ability to quickly search and cull potentially responsive data is more important than ever.

Back in 2010, I illustrated how each additional GB that has to be reviewed can cost as much as $16,650 (even with fairly inexpensive contract reviewers). And, that doesn’t even take into consideration the costs to identify, preserve, collect, and produce each additional GB. Of course, that was before Da Silva Moore and several other cases that ushered in the era of technology assisted review (even though more cases are still not using it than are using it). Regardless, that statistic illustrates how the cost of data storage may not be as low as you think at litigation time – each GB could cost hundreds or even thousands to manage (even in the era of eDiscovery automation and falling prices for eDiscovery software and services).

Equating the early 1980’s ad above to GB, that equates to about $330,000 per GB! But, if you go all the way back to 1950, the cost of a 5 MB drive from IBM was $50,000, which equates to about $10 million per GB! Check out this interactive chart of hard drive prices from 1950-2010, courtesy of That Data Dude (yes, that really is the name of the site) where you can click on different years and see how the price per GB has dropped over the years. It’s way cool!

So, what do you think? Do you track GB metrics for your cases? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Ralph Losey of Jackson Lewis, LLP: eDiscovery Trends

March 9, 2016

This is the eighth and final of the 2016 LegalTech New York (LTNY) Thought Leader Interview series. eDiscovery Daily interviewed several thought leaders at LTNY this year to get their observations regarding trends at the show and generally within the eDiscovery industry. Unlike previous years, some of the questions posed to each thought leader were tailored to their position in the industry, so we have dispensed with the standard questions we normally ask all thought leaders.

Today’s thought leader is Ralph Losey. Ralph is an attorney in private practice with the law firm of Jackson Lewis, LLP, where he is a Shareholder and the firm’s National e-Discovery Counsel. Ralph is also a prolific author of eDiscovery books and articles, the principal author and publisher of the popular e-Discovery Team® Blog, founder and owner of an online training program, e-Discovery Team Training, with attorney and technical students all over the world, founder of the new Electronic Discovery Best Practices (EDBP) lawyer-centric work flow model. Ralph is also the publisher of LegalSearchScience.com and PreSuit.com on predictive coding methods and applications.

What are your general observations about LTNY this year and about eDiscovery trends in general?

{Interviewed the second day of LTNY}

I have not been on the vendor floor yet, but I hope to get there. I have been in several meetings and I was able to attend the keynote on cybersecurity today by Eric O’Neill, who was a terrific speaker. They started out by showing the movie that was made of the big event in his life where they caught the biggest spy America has ever had. He talked about that incident and cybersecurity and it was very good. Of course, cybersecurity is something that I’m very interested in, but not so much as an expert in the field, but just as an observer. My interest in cybersecurity is only as it relates to eDiscovery. O’Neill was talking about the big picture of catching spies and industrial espionage and the Chinese stealing American secrets. It was very good and the auditorium was filled.

Otherwise, the show seems quite alive and vibrant, with orange people and Star Wars characters here and there as a couple of examples of what the providers were doing to get attention here at the show. I have been live “tweeting” during the show. Of course, I’ve seen old friends pretty much everywhere I walk and everybody is here as usual. LTNY remains the premier event.

One trend that I’ll comment on is the new rules. I didn’t think the rules would make that much difference. Maybe they would be somewhat helpful. But, what I’m seeing in practice is that they’ve been very helpful. They really seem to help lawyers to “get it”. Proportionality is not a new message for me, but having it in the rules, I have found more helpful than I thought. So far, so good, knock on wood – that has been a pleasant surprise. I’m upbeat about that and the whole notion of proportionality, which we’ve really needed. I’ve been talking about proportionality for at least five years and, finally, it really seems to have caught on now, particularly with having the rules, so I’m upbeat about that.

I’ve observed that there seems to be a drop off in sessions this year discussing predictive coding and technology assisted review (TAR). Do you agree and, if so, why do you think that is?

I read that too, but it seems like I’ve seen several sessions that are discussing TAR. I’ve noticed at least four, maybe five sessions that are covering it. I noticed that FTI was sponsoring sessions related to TAR and Kroll was as well. So, I’m not sure that I agree with that 100%. I think that the industry’s near obsession with it in some of the prior shows is maybe not a fair benchmark in terms of how much attention it is getting. Since it’s my area of special expertise, I would probably always want to see it get more attention, but I realize that there are a number of other concerns. One possible reason for less coverage, if that is the case, is that TAR is less controversial than it once was. Judges have all accepted it – nobody has said “no, it’s too risky”. So, I think a lot of the initial “newsworthiness” of it is gone.

As I stated in my talk today, the reality is that the use of TAR requires training via the old fashioned legal apprenticeship tradition. I teach people how to do it by their shadowing me, just like when I first learned how to try a case when I carried the briefcase of the trial lawyer. And, after a while, somebody carried my briefcase. Predictive coding is the same way. People are carrying my briefcase now and learning how to do it, and pretty soon, they’ll do it on their own. It only takes a couple of matters watching how I do it for somebody to pick it up. After that, they might contact me if they run into something unusual and troublesome. Otherwise, I think it’s just getting a lot simpler – the software is getting better and it’s easier to do. You don’t need to be a rocket scientist.

My big thing is to expose the misuse of the secret control set that was making it way too complicated. No one has stood up in defense of the secret control set, so I think I’m succeeding in getting rid of one of the last obstacles to adopting predictive coding – this nonsense about reviewing and coding 10,000 random documents before you even start looking for the evidence. That was crazy. I’ve shown, and others have too, that it’s just not necessary. It overcomplicates matters and, if anything, it allows for a greater potential for error, not less as was its intent. We’ve cleaned up predictive coding, gotten rid of some mistaken approaches, the software is getting better and people are getting more knowledgeable, so there’s just no longer the need to have every other session be about predictive coding.

One trend that I’ve observed is an increased focus on automation and considerable growth of, and investment in, eDiscovery automation providers. What are your thoughts about that trend?

It is the trend and it will be the trend for the next 20 or 30 years. We’re just seeing the very beginning of it. The first way it has impacted the legal profession is through document review and the things that I’m doing. I love artificial intelligence because I need the help of artificial intelligence to boost my own limited intelligence. I can only remember so many things at once, I make mistakes, I’m only human. So, I believe that AI is going to augment the lawyers that are able to use it and they are going to be able to do much, much more than before. I can do the work of one hundred linear reviewers with no problem, by using a software AI enhancement.

It’s not going to put lawyers out of work, but it is going to reduce the volume of menial tasks in the law. For mental tasks that a lawyer can do that require just simple logic, a computer can do those tasks better than a human can do them. Simple rules-based applications, reviewing documents – there are many things that lawyers do that a computer can do better. But, there are also many, many things that only a human can do. We’re nowhere near actually replacing lawyers and I don’t think we ever will.

Just like all of the great technology doesn’t replace doctors in the medical profession – it just makes them better, makes them able to do miraculous things. The same thing will happen in the law. There will be lawyers, but they will be able to do what, by today’s standards, would look miraculous. How did that lawyer know how that judge was going to rule so exactly? That’s one of the areas we’re getting into with AI – predicting not just the coding of documents, but predicting how judges will rule. Right now, that’s an art form, but that’s the next big step in big data. They are already starting to do that in the patent world where they already have a pretty good idea how certain judges will rule on certain things. So, that’s the next application of AI that is coming down the road.

I think the continued advancement of AI and automation will be good for lawyers who adapt. For the lawyers that get technology and spend the time to learn it, the future looks good. For those who don’t and want to keep holding on to the “buggy whip”, they will find that the cars pass them by.

It seems like acquisition and investment in the eDiscovery market is accelerating, with several acquisitions and VC investments in providers in just the past few months. Do you feel that we are beginning to see true consolidation in the market?

Yes, I think it’s more than just beginning – I think it’s well underway. And, I think that’s a good thing. Why? Because there are so many operations that are not solid, that, in a more sophisticated market, wouldn’t survive. But, because many legal markets around the country are not sophisticated about eDiscovery, they are able to sell services to people who just don’t know any better and I don’t think these people are helping the legal profession. So, consolidation is good. I’m not saying that “new blood” isn’t good too, if those providers are really good at what they do. But, I think that’s a natural result of the marketplace itself becoming more sophisticated.

However, I do think the entire industry is vulnerable someday to extreme consolidation if Google and IBM decide to take an interest in it. I’ve long predicted that, at the end of the day, there will be three or four players. Aside from Google and IBM, who that will be, I don’t know. Maybe Google and IBM will never go into it. But, I believe Google will go into it and I think IBM will do so too. While I don’t have any inside knowledge to that effect, I think they’re probably researching it. I think they would be silly not to research it, but I don’t think they have a big staff devoted to it.

I read about this a lot because I’m curious about IBM in particular and I think that IBM is focusing all of its resources right now on medicine and doctors. They do have a booth here and they do have some eDiscovery focus, particularly on preservation and the left side of the EDRM model. What they don’t have yet is “Watson, the review lawyer”. In fact, I have said this in my Twitter account that if there ever is a “Watson, the review lawyer”, I challenge him. They can beat Jeopardy, but when it comes to things as sophisticated as legal analysis, I don’t think they’re there yet. Several of our existing e-Discovery vendor software is better. Anybody could beat a regular human, but when it comes to beating an “automated human”, I don’t think IBM is there yet. I bet IBM will have to buy out another e-discovery vendor to enhance their Watson algorithms. I hope I’m still practicing when they are ready, because I’d like to take them on. Maybe I’ll get beaten, but it would be fun to try and I think I can win, unless they happen to buy the vendor I use. Regardless, I think it’s clear that technology is going to keep getting better and better, but so will the tech savvy lawyers who use the technology to augment their human abilities of search and legal analysis. The key is the combination of Man and Machine, which is what I call the “hybrid” approach.

What are you working on that you’d like our readers to know about?

I am looking into the feasibility of having an eDiscovery “hackathon”. If you’ve heard of a regular “hackathon”, you get the idea. This would be a 24 hour event where the technology providers who think they are the best in document review come together and compete. It would be a fair and open content, run by scientists, where everybody has the same chance. Scientists will compute the scores and determine who obtained the best recall and best precision to determine a winner. It would be a way for us to generate interest the same way that cybersecurity does, using a live event to allow people to watch how high-tech lawyers do it. I think you would be amazed how much information can be found in 24 hours, if you’re using the technology right. It will be a proving ground for those vendors who think they have good software. Basically, I’m saying “show me”, “put up or shut up”.

The reality is, my presentation today was on TREC and I showed up with Kroll Ontrack – the only other vendor to show up was Catalyst, nobody else showed up. So, I’m going to make it easier and say “it’s 24 hours, compete!” Anybody can say that they’re great, but show me – I want to see it to believe it. Everybody loves competition – it’s fun. My concern is all the other vendors will be too risk adverse to compete against us. They are just empty suits.

For me, it’s exciting to do document review. I enjoy document review and if you don’t enjoy document review, you’re doing something wrong. You’re not really harnessing the power of artificial intelligence. Because working with a robot at your side that’s helping you find evidence can be a lot of fun. It’s somewhat like an Easter egg hunt – it’s fun to look for things when you have the help of AI to do the heavy lifting for you. Review a million documents? No problem if you have a good AI robot at your side.

So, I’m thinking of ways to show the world what eDiscovery can do and, within our community, to see who are among us is really the best. I have won before, so I think I can do it again, but you never know. There are many other great search attorneys out there. If we do pull it off with a hackathon, or something like that, there may not be one clear winner, but there may be a few that do better than others. It’s never been done before and I like to do things that have never been done before. But it will not happen unless other vendors step up to the plate and have the confidence to dare to compete. Time will tell…

Thanks, Ralph, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Analysis