Review Archives

Don’t Miss Today’s Webinar – How Automation is Revolutionizing eDiscovery!: eDiscovery Trends

August 10, 2016

Today is your chance to catch a terrific discussion about automation in eDiscovery and, particularly an in-depth discussion about technology assisted review (TAR) and whether it lives up to the current hype!

Today, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC.

The webinar will be conducted at 1:00 pm ET (which is 12:00 pm CT, 11:00 am MT and 10:00 am PT). Oh, and 5:00 pm GMT (Greenwich Mean Time). If you’re in any other time zone, you’ll have to figure it out for yourself. Click on the link here to register.

If you’re interested in learning about various ways in which automation is being used in eDiscovery and getting a chance to look at the current state of TAR, possible warts and all, I encourage you to sign up and attend. It should be an enjoyable and educational hour. Thanks to our friends at ACEDS for presenting today’s webinar!

So, what do you think? Do you think automation is revolutionizing eDiscovery? As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

English Court Rules that Respondents Can Use Predictive Coding in Contested Case: eDiscovery Case Law

August 9, 2016

In Brown v BCA Trading, et. al. [2016] EWHC 1464 (Ch), Mr. Registrar Jones ruled that, with “nothing, as yet, to suggest that predictive coding will not be able to identify the documents which would otherwise be identified through, for example, keyword search”, “predictive coding must be the way forward” in this dispute between parties as to whether the Respondents could use predictive coding to respond to eDisclosure requests.

The May 17 order began by noting that “the question whether or not electronic disclosure by the Respondents should be provided, as they ask, using predictive coding or via a more traditional keyword approach instead” was “contested”. With the “majority of the documents that may be relevant for the purposes of trial…in the hands of the First Respondent”, the order noted that fact is “relevant to take into account when considering the Respondents’ assertion, presented from their own view and on advice received professionally, that they think predictive coding will be the most reasonable and proportionate method of disclosure.” The cost for predictive coding was estimated “in the region of £132,000” whereas the costs for a key word search approach was estimated to be “at least £250,000” and could “even reach £338,000 on a worst case scenario” (emphasis added). In the order, it was acknowledged that the cost “is relevant and persuasive only to the extent that predictive coding will be effective and achieve the disclosure required.”

With that in mind, Mr. Registrar Jones stated the following: “I reach the conclusion based on cost that predictive coding must be the way forward. There is nothing, as yet, to suggest that predictive coding will not be able to identify the documents which would otherwise be identified through, for example, keyword search and, more importantly, with the full cost of employees/agents having to carry out extensive investigations as to whether documents should be disclosed or not. It appears from the information received from the Respondents that predictive coding will be considerably cheaper than key word disclosure.”

The order also referenced the ten factors set out by Master Matthews in the Pyrrho Investments case (the first case in England to approve predictive coding) to help determine that predictive coding was appropriate for that case, with essentially all factors applying to this case as well, except for factor 10 (the parties have agreed on the use of the software, and also how to use it).

So, what do you think? Do you think parties should always have the right to use predictive coding to support their production efforts absence strong evidence that it is not as effective as other means? Please share any comments you might have or if you’d like to know more about a particular topic.

For more reading about this case, check out Chris Dale’s post here and Adam Kuhn’s post here.

Don’t forget that tomorrow at 1:00pm ET, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC. Click on the link here to register.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Denies Defendant’s Motion to Overrule Plaintiff’s Objections to Discovery Requests

Judge Peck Refuses to Order Defendant to Use Technology Assisted Review: eDiscovery Case Law

August 5, 2016

We’re beginning to see more disputes between parties regarding the use of technology assisted review (TAR) in discovery. Usually in these disputes, one party wants to use TAR and the other party objects. In this case, the dispute was a bit different…

In Hyles v. New York City, No. 10 Civ. 3119 (AT)(AJP) (S.D.N.Y. Aug. 1, 2016), New York Magistrate Judge Andrew J. Peck, indicating that the key issue before the court in the discovery dispute between parties was whether (at the plaintiff’s request) the defendants can be forced to use technology assisted review, refused to force the defendant to do so, stating “The short answer is a decisive ‘NO.’”

Case Background

In this discrimination case by a former employee of the defendant, after several delays in discovery, the parties had several discovery disputes. They filed a joint letter with the court, seeking rulings as to the proper scope of ESI discovery (mostly issues as to custodians and date range) and search methodology – whether to use keywords (which the defendants wanted to do) or TAR (which the plaintiff wanted the defendant to do).

With regard to date range, the parties agreed to a start date for discovery of September 1, 2005 but disagreed on the end date. In the discovery conference held on July 27, 2016, Judge Peck ruled on a date in between what the plaintiff and defendants – April 30, 2010, without prejudice to the plaintiff seeking documents or ESI from a later period, if justified, on a more targeted inquiry basis. As to custodians, the City agreed to search the files of nine custodians, but not six additional custodians that the plaintiff requested. The Court ruled that discovery should be staged, by starting with the agreed upon nine custodians. After reviewing the production from the nine custodians, if the plaintiff could demonstrate that other custodians had relevant, unique and proportional ESI, the Court would consider targeted searches from those custodians.

After the parties had initial discussions about the City using keywords, the plaintiff’s counsel consulted an ediscovery vendor and proposed that the defendants should use TAR as a “more cost-effective and efficient method of obtaining ESI from Defendants.” The defendants declined, both because of cost and concerns that the parties, based on their history of scope negotiations, would not be able to collaborate to develop the seed set for a TAR process.

Judge’s Ruling

Judge Peck noted that “Hyles absolutely is correct that in general, TAR is cheaper, more efficient and superior to keyword searching” and referenced his “seminal” DaSilva Moore decision and also his 2015 Rio Tinto decision where he wrote that “the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Judge Peck also noted that “Hyles’ counsel is correct that parties should cooperate in discovery”, but stated that “[c]ooperation principles, however, do not give the requesting party, or the Court, the power to force cooperation or to force the responding party to use TAR.”

Judge Peck, while acknowledging that he is “a judicial advocate for the use of TAR in appropriate cases”, also noted that he is also “a firm believer in the Sedona Principles, particularly Principle 6, which clearly provides that:

Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

Judge Peck went on to state: “Under Sedona Principle 6, the City as the responding party is best situated to decide how to search for and produce ESI responsive to Hyles’ document requests. Hyles’ counsel candidly admitted at the conference that they have no authority to support their request to force the City to use TAR. The City can use the search method of its choice. If Hyles later demonstrates deficiencies in the City’s production, the City may have to re-do its search. But that is not a basis for Court intervention at this stage of the case.” As a result, Judge Peck denied the plaintiff’s application to force the defendants to use TAR.

So, what do you think? Are you surprised by that ruling? Please share any comments you might have or if you’d like to know more about a particular topic.

Don’t forget that next Wednesday at 1:00pm ET, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC. Click on the link here to register.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

ACEDS Adds its Weight to the eDiscovery Business Confidence Survey: eDiscovery Trends

August 3, 2016

We’ve covered two rounds of the quarterly eDiscovery Business Confidence Survey created by Rob Robinson and conducted on his terrific Complex Discovery site (previous results are here and here). It’s time for the Summer 2016 Survey. Befitting of the season, the survey has a HOT new affiliation with the Association of Certified eDiscovery Specialists (ACEDS).

As before, the eDiscovery Business Confidence Survey is a non-scientific survey designed to provide insight into the business confidence level of individuals working in the eDiscovery ecosystem. The term ‘business’ represents the economic factors that impact the creation, delivery, and consumption of eDiscovery products and services. The purpose of the survey is to provide a subjective baseline for understanding the trajectory of the business of eDiscovery through the eyes of industry professionals.

Also as before, the survey asks questions related to how you rate general business conditions for eDiscovery in your segment of the eDiscovery market, both current and six months from now, a general sense of where you think revenue and profits will be for your segment of the market in six months and which issue do you think will most impact the business of eDiscovery over the next six months, among other questions. It’s a simple nine question survey that literally takes about a minute to complete. Who hasn’t got a minute to provide useful information?

Individual answers are kept confidential, with the aggregate results to be published on the ACEDS website (News & Press), on the Complex Discovery blog, and on selected ACEDS Affiliate websites and blogs (we’re one of those and we’ll cover the results as we have for the first two surveys) upon completion of the response period, which started on August 1 and goes through Wednesday, August 31.

What are experts saying about the survey? Here are a couple of notable quotes:

Mary Mack, Executive Director of ACEDS stated: “The business of eDiscovery is an ever-present and important variable in the equation of legal discovery. As financial factors are a primary driver in eDiscovery decisions ranging from sourcing and staffing to development and deployment, ACEDS sees value in regularly checking the business pulse of eDiscovery professionals. The eDiscovery Business Confidence Survey provides a tool to help take that pulse on a systematic basis and ACEDS looks forward to sponsoring, participating, and reporting on the results of this salient survey each quarter.”

George Socha, Co-Founder of EDRM and Managing Director of Thought Leadership of BDO stated: “In my experience, the successful conduct of eDiscovery is comprised of a balance of in-depth education, practical execution, and experience-based excellence. The eDiscovery Business Confidence survey being highlighted by ACEDS is one of many industry surveys that positively contributes to this balance, as it provides a quarterly snapshot into the business of discovery. I highly encourage serious eDiscovery professionals to complete and consider this survey as a key tool for understanding the business challenges and opportunities in our profession.”

The more respondents there are, the more useful the results will be! What more do you need? Click here to take the survey yourself. Don’t forget!

So, what do you think? Are you confident in the state of business within the eDiscovery industry? Share your thoughts in the survey and, as always, please share any comments you might have with us or let us know if you’d like to know more about a particular topic.

Don’t forget that next Wednesday at 1:00pm ET, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC. Click on the link here to register.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

How Automation is Revolutionizing eDiscovery: eDiscovery Trends

August 2, 2016

I thought about titling this post “Less Than Half of Automation is Revolutionizing eDiscovery” to keep the streak alive, but (alas) all good streaks must come to an end… :o)

If you missed our panel session last month in New York City at The Masters Conference, you missed a terrific discussion about automation in eDiscovery and, particularly an in-depth discussion about technology assisted review (TAR) and whether it lives up to the current hype. Now, you get another chance to check it out, thanks to ACEDS.

Next Wednesday, ACEDS will be conducting a webinar panel discussion, titled How Automation is Revolutionizing eDiscovery, sponsored by CloudNine. Our panel discussion will provide an overview of the eDiscovery automation technologies and we will really take a hard look at the technology and definition of TAR and the limitations associated with both. This time, Mary Mack, Executive Director of ACEDS will be moderating and I will be one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC.

The webinar will be conducted at 1:00 pm ET (which is 12:00 pm CT, 11:00 am MT and 10:00 am PT). Oh, and 5:00 pm GMT (Greenwich Mean Time). If you’re in any other time zone, you’ll have to figure it out for yourself. Click on the link here to register.

If you’re interested in learning about various ways in which automation is being used in eDiscovery and getting a chance to look at the current state of TAR, possible warts and all, I encourage you to sign up and attend. It should be an enjoyable and educational hour. Thanks to our friends at ACEDS for conducting the session!

So, what do you think? Do you think automation is revolutionizing eDiscovery? As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Cooperation in Predictive Coding Exercise Fails to Avoid Disputed Production: eDiscovery Case Law

July 22, 2016

In Dynamo Holdings v. Commissioner of Internal Revenue, Docket Nos. 2685-11, 8393-12 (U.S. Tax Ct. July 13, 2016), Texas Tax Court Judge Ronald Buch ruled denied the respondent’s Motion to Compel Production of Documents Containing Certain Terms, finding that there is “no question that petitioners satisfied our Rules when they responded using predictive coding”.

Case Background

In this case involving various transfers from one entity to a related entity where the respondent determined that the transfers were disguised gifts to the petitioner’s owners and the petitioners asserted that the transfers were loans, the parties previously disputed the use of predictive coding for this case and, in September 2014 (covered by us here), Judge Buch ruled that “[p]etitioners may use predictive coding in responding to respondent’s discovery request. If, after reviewing the results, respondent believes that the response to the discovery request is incomplete, he may file a motion to compel at that time.”

At the outset of this ruling, Judge Buch noted that “[t]he parties are to be commended for working together to develop a predictive coding protocol from which they worked”. As indicated by the parties’ joint status reports, the parties agreed to and followed a framework for producing the electronically stored information (ESI) using predictive coding: (1) restoring and processing backup tapes, (2) selecting and reviewing seed sets, (3) establishing and applying the predictive coding algorithm; and (4) reviewing and returning the production set

While the petitioners were restoring the first backup tape, the respondent requested that the petitioners conduct a Boolean search and provided petitioners with a list of 76 search terms for the petitioners to run against the processed data. That search yielded over 406,000 documents, from which two 1,000 document samples were conducted and provided to the respondent for review. After the model was run against the second 1,000 documents, the petitioners’ technical professionals reported that the model was not performing well, so the parties agreed that the petitioners would select an additional 1,000 documents that the algorithm had ranked high for likely relevancy and the respondent reviewed them as well. The respondent declined to review one more validation sample of 1,000 documents when the petitioner’s technical professionals explained that the additional review would be unlikely to improve the model.

Ultimately, using the respondent’s selected recall rate of 95 percent, the petitioners ran the algorithm against the 406,000 documents to identify documents to produce (followed by a second algorithm to identify privileged materials) and, between January and March 2016, the petitioners delivered a production set of approximately 180,000 total documents on a portable device for the respondent to review and included a relevancy score for each document – ultimately, the respondent only found 5,796 to be responsive (barely over 3% of the production) and returned the rest.

On June 17, 2016, the respondent filed a motion to compel production of the documents identified in the Boolean search that were not produced in the production set (1,353 of 1,645 documents containing those terms they claimed were not produced), asserting that those documents were “highly likely to be relevant.” Ten days later, the petitioner filed an objection to the respondent’s motion to compel, challenging the respondent’s calculations of documents that were incorrectly produced by noting that only 1,360 of documents actually contained those terms, that 440 of them had actually been produced and that many of the remaining documents predated or postdated the relevant time period. They also argued that the documents were selected by the predictive coding algorithm based on selection criteria set by the respondent.

Judge’s Ruling

Judge Buch noted that “[r]espondent’s motion is predicated on two myths”: 1) the myth that “manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured”, and 2) the myth of a perfect response to the respondent’s discovery request, which the Tax Court Rules don’t require. Judge Buch cited Rio Tinto where Judge Andrew Peck stated:

“One point must be stressed – it is inappropriate to hold TAR [technology assisted review] to a higher standard than keywords or manual review. Doing so discourages parties from using TAR for fear of spending more in motion practice than the savings from using from using TAR for review.”

Stating that “[t]here is no question that petitioners satisfied our Rules when they responded using predictive coding”, Judge Buch denied the respondent’s Motion to Compel Production of Documents Containing Certain Terms.

So, what do you think? If parties agree to the predictive coding process, should they accept the results no matter what? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Inaugural Best of Corporate Counsel Survey Highlights CloudNine

July 18, 2016

Extract of survey from Corporate Counsel

The inaugural Best of Corporate Counsel reader ranking survey of top providers to the in-house corporate legal marketplace was published in the July edition of Corporate Counsel Magazine. Voting for this inaugural survey was conducted via online ballot and limited to those working within in-house corporate legal and compliance departments. The ballot consisted of 55 categories and more than 1,500 votes were cast in this inaugural run.

CloudNine was highlighted in the 2016 reading ranking survey as the second leading online review platform behind Relativity from kCura.

Source: Corporate Counsel Magazine

For access to the complete survey, click here.

Is a Blended Document Review Rate of $466 Per Hour Excessive?: Best of eDiscovery Daily

June 22, 2016

Even those of us at eDiscovery Daily have to take an occasional vacation (which, as you can see by the picture above, means taking the kids to their favorite water park); however, instead of “going dark” for a few days, we thought we would take a look back at some topics that we’ve covered in the past. Today’s post is our all-time most viewed post ever. I guess it struck a nerve with our readers! Enjoy!

______________________________

Remember when we raised the question as to whether it is time to ditch the per hour model for document review? One of the cases we highlighted for perceived overbilling was ruled upon here.

In the case In re Citigroup Inc. Securities Litigation, No. 09 MD 2070 (SHS), 07 Civ. 9901 (SHS) (S.D.N.Y. Aug. 1, 2013), New York District Judge Sidney H. Stein rejected as unreasonable the plaintiffs’ lead counsel’s proffered blended rate of more than $400 for contract attorneys—more than the blended rate charged for associate attorneys—most of whom were tasked with routine document review work.

In this securities fraud matter, a class of plaintiffs claimed Citigroup understated the risks of assets backed by subprime mortgages. After the parties settled the matter for $590 million, Judge Stein had to evaluate whether the settlement was “fair, reasonable, and adequate and what a reasonable fee for plaintiffs’ attorneys should be.” The court issued a preliminary approval of the settlement and certified the class. In his opinion, Judge Stein considered the plaintiffs’ motion for final approval of the settlement and allocation and the plaintiffs’ lead counsel’s motion for attorneys’ fees and costs of $97.5 million. After approving the settlement and allocation, Judge Stein decided that the plaintiffs’ counsel was entitled to a fee award and reimbursement of expenses but in an amount less than the lead counsel proposed.

One shareholder objected to the lead counsel’s billing practices, claiming the contract attorneys’ rates were exorbitant.

Judge Stein carefully scrutinized the contract attorneys’ proposed hourly rates “not only because those rates are overstated, but also because the total proposed lodestar for contract attorneys dwarfs that of the firm associates, counsel, and partners: $28.6 million for contract attorneys compared to a combined $17 million for all other attorneys.” The proposed blended hourly rate was $402 for firm associates and $632 for firm partners. However, the firm asked for contract attorney hourly rates as high as $550 with a blended rate of $466. The plaintiff explained that these “contract attorneys performed the work of, and have the qualifications of, law firm associates and so should be billed at rates commensurate with the rates of associates of similar experience levels.” In response, the complaining shareholder suggested that a more appropriate rate for contract attorneys would be significantly lower: “no reasonable paying client would accept a rate above $100 per hour.” (emphasis added)

Judge Stein rejected the plaintiffs’ argument that the contract attorneys should be billed at rates comparable to firm attorneys, citing authority that “clients generally pay less for the work of contract attorneys than for that of firm associates”:

“There is little excuse in this day and age for delegating document review (particularly primary review or first pass review) to anyone other than extremely low-cost, low-overhead temporary employees (read, contract attorneys)—and there is absolutely no excuse for paying those temporary, low-overhead employees $40 or $50 an hour and then marking up their pay ten times for billing purposes.”

Furthermore, “[o]nly a very few of the scores of contract attorneys here participated in depositions or supervised others’ work, while the vast majority spent their time reviewing documents.” Accordingly, the court decided the appropriate rate would be $200, taking into account the attorneys’ qualifications, work performed, and market rates.

For this and other reasons, the court found the lead counsel’s proposed lodestar “significantly overstated” and made a number of reductions. The reductions included the following amounts:

$7.5 million for document review by contract attorneys that happened after the parties agreed to settle; 20 of the contract attorneys were hired on or about the day of the settlement.
$12 million for reducing the blended hourly rate of contract attorneys from $466 to $200 for 45,300 hours, particularly where the bills reflected that these attorneys performed document review—not higher-level work—all day.
10% off the “remaining balance to account for waste and inefficiency which, the Court concludes, a reasonable hypothetical client would not accept.”

As a result, the court awarded a reduced amount of $70.8 million in attorneys’ fees, or 12% of the $590 million common fund.

So, what do you think? Was the requested amount excessive? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data May Be Doubling Every Couple of Years, But How Much of it is Original?: Best of eDiscovery Daily

June 20, 2016

Even those of us at eDiscovery Daily have to take an occasional vacation (which, as you can see by the picture above, means taking the kids to their favorite water park); however, instead of “going dark” for a few days, we thought we would take a look back at some topics that we’ve covered in the past. Today’s post takes a look back at the challenge of managing duplicative ESI during eDiscovery. Enjoy!

______________________________

According to the Compliance, Governance and Oversight Council (CGOC), information volume in most organizations doubles every 18-24 months (now, it’s more like every 1.2 years). However, just because it doubles doesn’t mean that it’s all original. Like a bad cover band singing Free Bird, the rendition may be unique, but the content is the same. The key is limiting review to unique content.

When reviewers are reviewing the same files again and again, it not only drives up costs unnecessarily, but it could also lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, we all know the importance of identifying exact duplicates (that contain the exact same content in the same file format) which can be identified through MD5 and SHA-1 hash values, so that they can be removed from the review population and save considerable review costs.

Identifying near duplicates that contain the same (or almost the same) information (such as a Word document published to an Adobe PDF file where the content is the same, but the file format is different, so the hash value will be different) also reduces redundant review and saves costs.

Then, there is message thread analysis. Many email messages are part of a larger discussion, sometimes just between two parties, and, other times, between a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Pulling those messages together and enabling them to be reviewed as an entire discussion can eliminate that redundant review. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about the latest misstep by Anthony Weiner).

Clustering is a process which pulls similar documents together based on content so that the duplicative information can be identified more quickly and eliminated to reduce redundancy. With clustering, you can minimize review of duplicative information within documents and emails, saving time and cost and ensuring consistency in the review. As a result, even if the data in your organization doubles every couple of years, the cost of your review shouldn’t.

So, what do you think? Does your review tool support clustering technology to pull similar content together for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Number of Files in Each Gigabyte Can Vary Widely: eDiscovery Best Practices

May 17, 2016

Now and then, I am asked by clients how many documents (files) are typically contained in one gigabyte (GB) of data. When trying to estimate the costs for review, having a good estimate of the number of files is important to provide a good estimate for review costs. However, because the number of files per GB can vary widely, estimating review costs accurately can be a challenge.

About four years ago, I conducted a little (unscientific) experiment to show how the number of pages in each GB can vary widely, depending on the file formats that comprise that GB. Since we now tend to think more about files per GB than pages, I have taken a fresh look using the updated estimate below.

Each GB of data is rarely just one type of file. Many emails include attachments, which can be in any of a number of different file formats. Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats. Even files within the same application can vary, depending on the version in which they are stored. For example, newer versions of Office files (e.g., .docx, .xlsx) incorporate zip compression of the text, so the data sizes tend to be smaller than their older counterparts. So, estimating file counts with any degree of precision can be somewhat difficult.

To illustrate this, I decided to put the content from yesterday’s case law blog post into several different file formats to illustrate how much the size can vary, even when the content is essentially the same. Here are the results – rounded to the nearest kilobyte (KB):

Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 4 KB, it would take 262,144 text files at 4 KB each to equal 1 GB;
HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 57 KB, it would take 18,396 HTML files at 57 KB each to equal 1 GB;
Microsoft Excel 97-2003 Format (XLS): Created by copying the contents of the blog post and pasting it into a blank Excel XLS workbook – 325 KB, it would take 3,226 XLS files at 325 KB each to equal 1 GB;
Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel XLSX workbook – 296 KB, it would take 3,542 XLSX files at 296 KB each to equal 1 GB;
Microsoft Word 97-2003 Format (DOC): Created by copying the contents of the blog post and pasting it into a blank Word DOC document – 312 KB, it would take 3,361 DOC files at 312 KB each to equal 1 GB;
Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word DOCX document – 299 KB, it would take 3,507 DOCX files at 299 KB each to equal 1 GB;
Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 328 KB, it would take 3,197 MSG files at 328 KB each to equal 1
Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 1,550 KB, it would take 677 PDF files at 1,550 KB each to equal 1

The HTML and PDF examples weren’t exactly an “apples to apples” comparison to the other formats – they included other content from the web page as well. Nonetheless, the examples above hopefully illustrate that, to estimate the number of files in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well. Performing an Early Data Assessment on your data beforehand can provide those file counts you need to more accurately estimate your review costs.

So, what do you think? Was the 2016 example useful, highly flawed or both? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review