eDiscoveryDaily

Google Beats Oracle (Again): eDiscovery Trends

In a litigation that has been going on since 2010 (we started covering it in 2011), a federal jury concluded last Thursday that Google’s Android operating system does not infringe Oracle-owned copyrights because its re-implementation of 37 Java APIs is protected by “fair use.”

As reported by Ars Technica (Google beats Oracle—Android makes “fair use” of Java APIs, written by Joe Mullin), there was only one question on the special verdict form, asking if Google’s use of the Java APIs was a “fair use” under copyright law. The jury unanimously answered “yes,” in Google’s favor. The verdict ends the trial, which began earlier this month. If Oracle had won, the same jury would have gone into a “damages phase” to determine how much Google should pay. Because Google won, the trial is over – for now, at least.  Oracle vowed to appeal the decision as it did after the decision in 2012 where Google was found not to have infringed Oracle’s patents, despite inadvertent disclosures of draft emails (where recipients and the words “Attorney Work Product” hadn’t yet been added) in which a Google engineer discussed the need to negotiate terms with Oracle.

Oracle’s previous appeal was heard in December 2013 and the appellate court reversed the district court on the central issue in May 2014, holding that the “structure, sequence and organization” of an API was copyrightable.  The case was remanded to the district court for reconsideration only the basis of the fair use doctrine.

This time, prior to the trial starting earlier this month, U.S. District Judge William Alsup’s submitted an order urging both sides to respect the privacy of jurors after it became clear that both sides wanted that time to “scrub Facebook, Twitter, LinkedIn, and other Internet sites to extract personal data” and when asked about it, “counsel admitted this.”  Ultimately, as a result of Judge Alsup’s order, both sides agreed not to mine the jurors’ social media data.  Maybe Oracle wishes they had?  It will be interesting to see if Oracle can obtain another reversal on appeal.

So, what do you think?  Is this case finally over?  Or, will it keep going and going and going?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Cloud is a “Rush” Project’s Best Friend: eDiscovery Best Practices

Today is Friday.  While many of you can look forward to a long, enjoyable Memorial Day weekend, chances are that at least a few of you will be making weekend plans when, late in the day, you will receive a CD, DVD, hard drive or link to data on a server somewhere that needs to be reviewed over the weekend.  There goes your weekend!

Not only that, good luck connecting with your in-house litigation support person or a vendor for assistance late on a Friday – you may play a game of “phone tag” or wait for email responses for a bit.  Lit support people and vendors have weekend plans too.  Even if you do get in touch with them, you then have to fill out a form and arrange to get the data to them, which can be tricky.  It’s a lot of time, hassle and cost to get started – especially if you’re at a small law firm that doesn’t already have an eDiscovery software application to support processing and review of the data.

When consumers quickly need to find that special item to buy, or that new cool song to download, or need to stream the new season of Bloodline (available starting today on Netflix) for binge watching, they turn to the cloud.  More than ever, attorneys are turning to the cloud as well to help them get their “rush” project started immediately.  And, you don’t even have to own the software or interact with anyone to get started.

As an eDiscovery provider that offers a no-risk free trial, CloudNine (shameless plug warning!) sees at least one or two clients a week that give our software a try (many of them with “rush” projects just like this).  The trend toward automation and the cloud in the industry has not only made eDiscovery more affordable than ever, it has also made it easier than ever to get a “rush” project off and running.

If you find yourself in that situation later today, here are three easy steps to get started:

  1. Sign up for a free account here. You will receive an email with your credentials (including temporary password), to get started.
  2. When you first log in, you’ll see a button to “Upload Data”. That will take you to a form to download the CloudNine Discovery client (which is a Windows based client application that resides on your desktop) for uploading data for processing.  Download and install the client to upload data.
  3. Once the client is downloaded and installed, launch the client, log in with your newly created credentials and simply follow the wizard prompts to upload the desired data set and put it into the project of your choice (which you can create if it doesn’t already exist). It’s that easy!

We can’t get you out of working this weekend.  But, we can take the hassle out of getting started.  You’re welcome.  :o)

So, what do you think?  Have you been faced with any “rush” eDiscovery projects lately?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will return on Tuesday as we remember this Memorial Day the people who gave their lives while serving in our armed forces.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Faster, Cheaper Better: How Automation is Revolutionizing eDiscovery: eDiscovery Trends

We had a terrific session on Tuesday discussing how automation is revolutionizing eDiscovery at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event.  If you’re disappointed that you missed it, you’re in luck – there’s a recording of the session!

The Masters Conference brings together leading experts and professionals from law firms, corporations and the bench to develop strategies, practices and resources for managing the information life cycle.  There were a number of terrific sessions this Tuesday and a wonderful speech from (nearly 82 year old) Jesse White, the Illinois Secretary of State.  What an amazing life he has had – from being a paratrooper in the Army to playing minor league baseball with the Cubs to founding the Jesse White Tumbling Team to serve as a positive alternative for children to his time as a Chicago schoolteacher and his long tenure as Illinois Secretary of State.  He says he can still do a hand stand today.  I hope I have that much energy when I am his age.

Anyway, CloudNine sponsored the session Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15 and the panelists, Rob Robinson, Managing Director of Complex Discovery, Kevin Clark, Executive Managing Director of Discovery Service for Hire Counsel, Jay Lieb, Founder and Managing Member of NexLP and I discussed a variety of current and emerging eDiscovery automation technologies.  The attendees were engaged and asked several good questions, so it was a very interesting free-form discussion regarding eDiscovery automation such as Technology Assisted Review, automated processing and pre-litigation artificial intelligence analysis.

Rob arranged for Kaylee Walstad of ACEDS to record the session and Rob has posted it on his Complex Discovery site here.  Thanks so much to Kaylee for recording the session!  Feel free to check it out.

The Masters Conference also has an event coming up in New York City in July and Washington DC in October.  Click here for more information on remaining scheduled events for the year.

So, what do you think?  Do you think that automation is revolutionizing eDiscovery?  As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Rules Lack of Bad Faith in Denying Sanctions for Defendants’ Deletion of ESI: eDiscovery Case Law

In Martin v. Stoops Buick, Inc. et. al., No. 14-00298 (S.D. Ind., Apr. 25, 2016), Indiana Chief District Judge Richard L. Young ruled that the plaintiff did not carry her burden of proving that the defendants’ deliberately destroyed evidence in bad faith; therefore, he denied her Motion for Sanctions Against Defendants for the Spoliation of Evidence.

Case Background

In this wrongful termination case, the plaintiff worked for the defendants for nearly a year as a part-time employee before being offered full time employment in February 2013. However, two weeks after her full-time work began the defendants terminated the plaintiff’s employment stating that “she [was] not a good fit for [the] position” and replaced her with a new hire.  The plaintiff claimed that immediately after she was terminated, she informed the defendant’s General Manager that she was going to file a discrimination claim against the dealership, she filed an Equal Employment Opportunity Commission (“EEOC”) claim within two weeks and the defendants were notified three days later.  After hearing from both sides, EEOC dismissed the charge in Novermber 2013, after which the Plaintiff filed suit in February 2014.

In December 2015, the plaintiff filed an instant motion for sanctions against the defendants for spoliation of evidence, claiming they destroyed and/or replaced the plaintiff’s work computer, which precluded her from obtaining evidence in support of her claims, and that the plaintiff’s supervisor (Debra Trauner) deleted her e-mail communications with her replacement (Lisa Goodin) that allegedly occurred before she received her resume.

The defendant’s unwritten data retention policy called for the files of terminated employees to be preserved for at least 30 days. Shortly after the plaintiff was terminated, Trauner claimed she asked the IT department to preserve all of Plaintiff’s computer data and, according to Trauner, “they said they would.”  However, she later requested the plaintiff’s email files and work documents and IT said they had been deleted. Trauner also claimed she deleted her sent e-mail as a matter of course “whenever [her] computer would tell [her] that [she] can’t send e-mails anymore”, so the emails with the new employee were no longer available.

Judge’s Ruling

Judge Young, referencing Malibu Media, LLC v. Tashiro, noted that “[t]he court’s determination of whether spoliation occurred requires a two-part inquiry… First, the court must determine whether the defendant was under a duty to preserve evidence; second, it must determine whether the defendant destroyed evidence in bad faith.”

Regarding the duty to preserve, Judge Young stated: “Although Trauner testified to placing a litigation hold on Plaintiff’s work e-mails, there is no evidence in the record to support her statement. There is no evidence of a ticket generated by the IT department regarding the request, and neither Prow, Nolan, Nelson, Jarvis, Stocking, nor Robinson could verify such a request. The court therefore finds Defendants breached their duty to preserve evidence.”

Regarding the determination as to whether the defendant destroyed the evidence in bad faith, Judge Young noted that the defendant “did produce those documents responsive to Plaintiff’s First Request for Production of Documents that were in its possession and characterized Trautner’s testimony that she deleted the emails with Goodin to make room on the server as “credible”.  He also stated: “Lastly, and most significantly, Plaintiff’s own expert admitted that, after hearing all of the evidence, Stoops did not destroy evidence in bad faith. (Tr. at 110 (“Q: But you did not — it’s your opinion, based upon your background and experience, that what you’ve seen and heard and read and that’s been provided to you, that you do not find bad faith here? A: Right. Correct.”). Plaintiff, therefore, has failed to establish the required element of bad faith.”

As a result, Judge Young ruled that the plaintiff did not carry her burden of proving that the defendants’ deliberately destroyed evidence in bad faith and denied her Motion for Sanctions.

So, what do you think?  Was the court right to deny sanctions due to lack of bad faith?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Today’s the Day to “Master” Your Knowledge of eDiscovery in Chicago!: eDiscovery Trends

Today’s the day!  If you’re in the Chicago area today, join me and other legal technology experts and professionals at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event for a full day of educational sessions covering a wide range of topics!  It’s not too late to register and attend!

The Masters Conference brings together leading experts and professionals from law firms, corporations and the bench to develop strategies, practices and resources for managing the information life cycle.  This year’s Chicago event covers topics ranging from vendor selection to benefits and challenges associated with creating an Information Governance (IG) program to how to handle cross-border data in the wake of the Schrems decision and the new privacy shield.  The Internet of Things (IoT), cybersecurity and social media discovery are covered too.

The event will be held at the Metropolitan Club, 233 South Wacker Drive, 67th Floor, Chicago, IL 60606.  Registration begins at 8am, with sessions starting right after that, at 8:30am.

CloudNine will be sponsoring the session Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15.  I will be moderating it with Rob Robinson, Managing Director of Complex Discovery, Kevin Clark, Executive Managing Director of Discovery Service for Hire Counsel and Jay Lieb, Founder and Managing Member of NexLP, as panelists.

Our panel discussion will provide an overview of the evolution of electronic discovery technologies and also share with attendees ways that they can consider and compare technology offerings from the large ecosystem of providers supporting litigation, investigations, and audits.  It should be a very informative discussion with a very knowledgeable panel!  Hope you can join us!

Click here to register for the conference.  If you’re a non-vendor, the cost is only $100 to attend for the full day!

The Masters Conference also has an event coming up in New York City in July and Washington DC in October.  Click here for more information on remaining scheduled events for the year.

So, what do you think?  Are you attending today’s conference?  As always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Rule Change Could Facilitate the Government’s Ability to Access ESI in Criminal Investigations: eDiscovery Trends

A rule modification adopted by the United States Supreme Court that significantly changes the way in which the government can obtain search warrants to access computer systems and electronically stored information (ESI) of suspected hackers could go into effect on December 1.

On April 28, the Supreme Court submitted the amendments to the Federal Rules of Criminal Procedure that were adopted by the Supreme Court of the United States pursuant to Section 2072 of Title 28, United States Code.  One of those proposed rule changes, to Federal Rule of Criminal Procedure 41, would enable “a magistrate judge with authority in any district where activities related to a crime may have occurred has authority to issue a warrant to use remote access to search electronic storage media and to seize or copy electronically stored information located within or outside that district if:”

  • “the district where the media or information is located has been concealed through technological means; or”
  • “in an investigation of a violation of 18 U.S.C. § 1030(a)(5), the media are protected computers that have been damaged without authorization and are located in five or more districts.”

Currently, the government can only obtain a warrant to access ESI from a magistrate in the district where the computer with the stored information is physically located.

As reported in JD Supra Business Advisor (Come Back With a Warrant: Proposed Rule Change Expands the Government’s Ability to Access Electronically Stored Information in Criminal Investigations, written by Thomas Kurland and Peter Nelson), proponents of the rule change say it is necessary to allow the government to respond quickly to cyber-attacks of unknown origin – particularly malicious “botnets” – which are becoming increasingly common as hackers become ever more sophisticated.

However, others say the rule change will significantly expand the government’s power to search computers without their owners’ consent – regardless of whether those computers belong to criminals or even to the victims of a crime.  One US senator, Ron Wyden of Oregon, has called for Congress to reject the rules changes, indicating that they “will massively expand the government’s hacking and surveillance powers” and “will have significant consequences for Americans’ privacy”.  He has indicated a “plan to introduce legislation to reverse these amendments shortly, and to request details on the opaque process for the authorization and use of hacking techniques by the government”.

So, what do you think?  Will Congress reverse these amendments?  Should they?  Please share any comments you might have or if you’d like to know more about a particular topic.

Just a reminder that I will be moderating a panel at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event tomorrow (we covered it here) as part of a full day of educational sessions covering a wide range of topics.  CloudNine will be sponsoring that session, titled Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15.  Click here to register for the conference.  If you’re a non-vendor, the cost is only $100 to attend for the full day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Judge Scheindlin Speaks!: eDiscovery Trends

Kudos to Jason Krause at ACEDS for getting the first “post-bench retirement” (at least that I know of) interview with (now former) U.S. District Judge Shira A. Scheindlin!

In the interview (which is available here), Judge Scheindlin comments on everything from the significance of her landmark Zubulake and Pension Committee rulings to the differences between circuits in sanctioning spoliation of ESI to thoughts about the amended Rule 37 to issues that most need attention now to even the departure of Judge Paul Grewal from the bench to join Facebook (wow!).  While she is retired from the bench, it appears that she will still be quite actively involved in litigation via special master work and via work in arbitration and mediation.  It’s an interesting and enlightening discussion and write-up.  Great job, Jason!

So, what do you think?  Will the retirement of influential judges like Judge Scheindlin and Judge Grewal adversely affect the judiciary’s handling of eDiscovery issues?  Or will other judges step up to continue their legacy?  Please share any comments you might have or if you’d like to know more about a particular topic.

Just a reminder that I will be moderating a panel at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event next Tuesday, May 24 (we covered it here) as part of a full day of educational sessions covering a wide range of topics.  CloudNine will be sponsoring that session, titled Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15.  Click here to register for the conference.  If you’re a non-vendor, the cost is only $100 to attend for the full day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Orders Plaintiff to Perform a “Download Your Info” From Facebook: eDiscovery Case Law

In Rhone v. Schneider Nat’l Carriers, Inc., No. 4:15-cv-01096-NCC, (E.D. Mo. Apr. 21, 2016), Missouri Magistrate Judge Noelle C. Collins ordered the plaintiff to disclose a complete list of her social media accounts to the defendant and also provide a “Download Your Info” report from her Facebook account from June 2, 2014 to the present within fourteen days and ordered the defendant to disclose to the plaintiff any and all posts, photos or other media from the report it intends to use in support of its defense.

Case Background

In this case, the plaintiff asserted that she sustained “severe physical injuries” as a result of a motor vehicle accident that occurred on June 2, 2014, when the vehicle driven by Third Party Defendant Charles Quinn, in which the plaintiff was a passenger, was struck from behind by Defendant Schneider National Carriers, Inc.’s (“Schneider”) vehicle driven by Defendant Dean Lilly.  The defendant requested production of any social media postings, photographs and/or videos posted by the plaintiff to any social media accounts since the date of the accident; in turn, the plaintiff objected and did not acknowledge the existence of any social media accounts.

However, according to Defendant Schneider, its own independent investigation uncovered that the plaintiff did have a Facebook account and may have also had a LinkedIn account and the information uncovered included “relevant information; specifically, comments and photos regarding physical activity such as dancing”.  The plaintiff initially objected to the defendant’s request for social media information as irrelevant, then provided a supplemental answer to the defendant’s request to indicate that no social media information was “related to this incident”.

As a result, the defendant requested that the plaintiff be required to provide a “Download Your Info” report from her Facebook account from the date of the accident, June 2, 2014, to the present. In the alternative, in the event the account or other social media content has been deleted, the defendant requested sanctions in the form of dismissal of the action with prejudice and attorney’s fees.  In response, the plaintiff indicated that the defendant had “failed to show that any evidence, whether relevant or irrelevant, has been deleted” and that the motion was moot as the defendant had already accessed the plaintiff’s Facebook account and printed at least 264 pages of Facebook postings; the defendant countered that sanctions are warranted because, although the plaintiff claimed not to have deleted any posts, its January 2016 download from the plaintiff’s Facebook account produced 441 pages of material whereas the same method in March 2016 retrieved only 226 pages of material.

Judge’s Ruling

In light of the information available, Judge Collins found that “Plaintiff has not fully and completely responded to Schneider’s production requests, even in light of her objections. Plaintiff did not initially disclose the existence of any social media accounts. However, Plaintiff does not deny that the Facebook account in question belongs to her. Furthermore, there is some indication that Plaintiff may have other social media accounts.  Accordingly, Plaintiff shall disclose to Schneider a complete list of Plaintiff’s social media accounts during the requested time periods.”

Judge Collins also ruled that “Plaintiff is directed to provide a ‘Download Your Info’ report from her Facebook account from the date of the accident, June 2, 2014 to the present. Plaintiff and Schneider shall consult regarding the process and the most effective means of disclosing this information. Thereafter, Schneider shall produce to Plaintiff, from this download, any and all posts, photographs, videos or other material that it intends to rely on for its case. However, the Court finds that, at this time, sanctions are unwarranted. Not only is it unclear whether Plaintiff has deleted any information, such a download from Facebook may afford Plaintiff the ability to recover any, even innocuous, information that may have been deleted.”

So, what do you think?  Should the court have required the plaintiff to download the info from her Facebook account?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Former IT Administrator with “Keys to the Kingdom” Charged with Hacking into Former Employer: eDiscovery Trends

A former IT administrator pled not guilty earlier this month to federal charges of hacking into the computer system of Blue Stone Strategy Group – an Irvine-based company and the man’s former employer – and deleting files.

As announced by the U.S. Attorney’s Office in California, Nikishna Polequaptewa, 34, surrendered to federal employees after being indicted by a federal grand jury in March on one count of unauthorized impairment of a protected computer. At his arraignment, he entered a not guilty plea, was ordered released on a $25,000 bond and was ordered to stand trial on June 28.

“IT administrators often hold the ‘keys to the kingdom’ for companies,” said United States Attorney Eileen M. Decker. “Disgruntled IT administrators can therefore pose a grave threat to businesses, which must take measures to protect themselves when letting such an employee go.”

According to the indictment, Blue Stone provided consulting services to Native American tribal governments throughout the United States. Polequaptewa was responsible for information technology at Blue Stone until November 2014, when he was relieved of his duties, which led to his resignation. The indictment states that Polequaptewa repeatedly accessed the Blue Stone internal server, a desktop computer, and remote accounts held by Blue Stone immediately following his resignation, and allegedly deleted various files belonging to the company.  The computer hacking charge in the indictment carries a statutory maximum penalty of 20 years in federal prison.

Of course, as the announcement notes, “[e]very defendant is presumed to be innocent until and unless proven guilty in court”.  Nonetheless, as US Attorney Decker points out, organizations need to have a plan in place for protecting themselves that at least includes closing accounts and changing credentials when key IT personnel leave the company.

So, what do you think?  Does your organization have a plan in place to lock down access when IT personnel leave?  Please share any comments you might have or if you’d like to know more about a particular topic.

Thanks to Peter S. Vogel’s Internet, Information Technology & e-Discovery Blog for the tip!

Just a reminder that I will be moderating a panel at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event next Tuesday, May 24 (we covered it here) as part of a full day of educational sessions covering a wide range of topics.  CloudNine will be sponsoring that session, titled Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15.  Click here to register for the conference.  If you’re a non-vendor, the cost is only $100 to attend for the full day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Number of Files in Each Gigabyte Can Vary Widely: eDiscovery Best Practices

Now and then, I am asked by clients how many documents (files) are typically contained in one gigabyte (GB) of data.  When trying to estimate the costs for review, having a good estimate of the number of files is important to provide a good estimate for review costs.  However, because the number of files per GB can vary widely, estimating review costs accurately can be a challenge.

About four years ago, I conducted a little (unscientific) experiment to show how the number of pages in each GB can vary widely, depending on the file formats that comprise that GB.  Since we now tend to think more about files per GB than pages, I have taken a fresh look using the updated estimate below.

Each GB of data is rarely just one type of file.  Many emails include attachments, which can be in any of a number of different file formats.  Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats.  Even files within the same application can vary, depending on the version in which they are stored.  For example, newer versions of Office files (e.g., .docx, .xlsx) incorporate zip compression of the text, so the data sizes tend to be smaller than their older counterparts.  So, estimating file counts with any degree of precision can be somewhat difficult.

To illustrate this, I decided to put the content from yesterday’s case law blog post into several different file formats to illustrate how much the size can vary, even when the content is essentially the same.  Here are the results – rounded to the nearest kilobyte (KB):

  • Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 4 KB, it would take 262,144 text files at 4 KB each to equal 1 GB;
  • HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 57 KB, it would take 18,396 HTML files at 57 KB each to equal 1 GB;
  • Microsoft Excel 97-2003 Format (XLS): Created by copying the contents of the blog post and pasting it into a blank Excel XLS workbook – 325 KB, it would take 3,226 XLS files at 325 KB each to equal 1 GB;
  • Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel XLSX workbook – 296 KB, it would take 3,542 XLSX files at 296 KB each to equal 1 GB;
  • Microsoft Word 97-2003 Format (DOC): Created by copying the contents of the blog post and pasting it into a blank Word DOC document – 312 KB, it would take 3,361 DOC files at 312 KB each to equal 1 GB;
  • Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word DOCX document – 299 KB, it would take 3,507 DOCX files at 299 KB each to equal 1 GB;
  • Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 328 KB, it would take 3,197 MSG files at 328 KB each to equal 1
  • Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 1,550 KB, it would take 677 PDF files at 1,550 KB each to equal 1

The HTML and PDF examples weren’t exactly an “apples to apples” comparison to the other formats – they included other content from the web page as well.  Nonetheless, the examples above hopefully illustrate that, to estimate the number of files in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well.  Performing an Early Data Assessment on your data beforehand can provide those file counts you need to more accurately estimate your review costs.

So, what do you think?  Was the 2016 example useful, highly flawed or both?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.