Electronic Discovery

Most Big Companies Have a Big Data Program, But They’re Not Crazy about the Term “Big Data” – eDiscovery Trends

Yesterday, we discussed some amazing facts about just how “BIG” that Big Data has gotten to be.  Today, let’s look at what BIG companies are doing about BIG data.

NewVantage Partners has just released a new survey (their third annual survey) of Fortune 1000 senior business and technology executives regarding their companies’ investments in Big Data entitled Big Data Executive Survey 2014: An Update on the Progress of Big Data in the Large Corporate World.  Survey respondents are Fortune 1000 senior business and technology executives who have a vested interest in the success of an organization’s data and analytics, and Big Data, initiatives.  This year, 59 companies participated, with 125 individual executive respondents.  78% of the participating organizations were in the financial services sector, including companies such as American Express, Fidelity Investments, General Electric, Johnson & Johnson, Lincoln Financial and Wells Fargo.  Here’s a link to the Executive Summary for the report.

As noted in their press release, here are some key findings from the survey:

  • Big Data is Becoming Mainstream: Executives report that their corporate investments in Big Data are projected to grow from 35% to 75% by 2017 for investments greater than $10MM, and by a remarkable 6% to 28% for investments greater than $50MM67% of executives now report that they have Big Data initiatives running in production within the corporation.
  • Enthusiasm for Big Data Initiatives is Widespread: 82% of executives say that Big Data is “important or mission critical” to their organizations and 74% believe that its value “warrants serious attention.”
  • Business-IT Partnership is Key to Big Data Adoption: 88% of executives cited the importance of a strong business-IT partnership, with 77% citing business leadership and sponsorship, and partnership and organizational alignment as being the most critical factors in ensuring successful adoption of Big Data initiatives within the corporation.
  • The Chief Data Officer is an Emerging Role: 43% of executives report that their organization has established a Chief Data Officer (CDO) function, up from only 19% in 2012.

While big companies are embracing programs to manage Big Data, they’re not too keen on the term “Big Data”.  Fewer than 1 in 5 respondents (17%) feel that the name is “apt and descriptive,” and the rest dislike it (30%) or view it as overstated (53%).  As discussed in the Executive Summary, that finding raises the question whether everyone means the same thing when they’re talking about Big Data.  Regardless, it’s clear that large organizations are becoming seriously invested in programs to manage Big Data, regardless what they want to call it.

So, what do you think? Does your organization have a plan for managing Big Data?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Just How “BIG” is Big Data Getting? Check Out These Facts – eDiscovery Trends

 

If you work with information as an attorney, paralegal, litigation support professional or information technology (IT) professional, you have probably heard the term “big data” at an ever increasing rate.  But, just how BIG is big data getting?  Check out these facts.

An article by Bernard Marr on SmartData Collective (Big Data: 25 Amazing Need-to-Know Facts) provides some startling facts that you might be surprised to know.  Here are a few examples (with sources linked):

  • Every 2 days we create as much information as we did from the beginning of time until 2003;
  • Over 90% of all the data in the world was created in the past 2 years;
  • It is expected that by 2020 the amount of digital information in existence will have grown from 3.2 zettabytes today to 40 zettabytes (FYI, a zettabyte is one billion terabytes!);
  • The total amount of data being captured and stored by industry doubles every 1.2 years;
  • If you burned all of the data created in just one day onto DVDs, you could stack them on top of each other and reach the moon – twice;
  • 570 new websites spring into existence every minute of every day;
  • 1.9 million IT jobs will be created in the US by 2015 to carry out big data projects. Each of those will be supported by 3 new jobs created outside of IT – meaning a total of 6 million new jobs thanks to big data;
  • The big data industry is expected to grow from US$10.2 billion in 2013 to about US$54.3 billion by 2017.

With this level of data growth in the world, it’s no wonder that information governance and eDiscovery continues to be more challenging!

Check out Bernard’s article here for the entire list of 25 facts (it even includes a slide deck!).  And, thanks to Rob Robinson’s excellent ComplexDiscovery site for the heads up!

Tomorrow, we will take a look at what big companies think about (and what they’re doing about) big data.  Speaking of something BIG, check this out.

So, what do you think? Does your organization have a plan for managing big data?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiff Slips, But Defendant Takes the Fall – eDiscovery Case Law

 

In Riley v. Marriott Int’l, 12-CV-6242P (W.D. N.Y. Sept. 25, 2014), New York Magistrate Judge Marian W. Payson agreed with the plaintiffs that spoliation of data had occurred when the defendant failed to preserve video surveillance and “sweep logs” after one of the plaintiffs slipped and fell in the defendant’s hotel garage and that the defendant was at least grossly negligent for not preserving the information.  However, the judge denied the plaintiffs request for summary judgment, granting an adverse inference instruction instead.

Case Background

The plaintiffs filed suit, claiming a slip and fall accident had occurred at the defendant’s hotel in Maui when Linda Riley slipped and fell on the floor of the hotel's parking garage after exiting an elevator because the floor was wet from rainwater that had been permitted to pool there.  The defendant had a surveillance camera that monitored and recorded the area of Linda's accident twenty-four hours a day and the loss prevention manager at the hotel, testified that the recordings are maintained for thirty days, at which time the stored recordings are overwritten by new recordings.  

According to the loss prevention manager, once he is notified of a potential claim against the Hotel, he is responsible for preserving information relating to that claim and he testified that the video showed Linda's fall, her removal from the scene in a wheelchair, and hotel employees placing wet floor signs and sweeping up the water on the floor.  He also testified that he turned the recording over to the hotel's liability insurance carrier.  The plaintiffs stated that the defendant provided only approximately seven minutes of the footage, which begins about one minute before Linda's accident and filed the motion for summary judgment due to the unavailability of more of the video as well as “sweep logs” that kept a record of floor maintenance by employees.  The defendant did not dispute that the sweep logs and video footage existed or that it had a duty to preserve them, but objected, claiming that the plaintiffs failed to demonstrate prejudice due to the destruction of data.

Judge’s Ruling

Noting that “Marriott has not challenged the Rileys' contention that it had a duty to preserve the destroyed evidence” and that “no genuine question exists that video footage depicting the scene of an accident and sweep logs reflecting maintenance performed at the scene of an accident is likely to contain relevant information”, Judge Payson “easily conclude[d] that Marriott had a duty to preserve both the sweep logs and the video footage from the day of the accident”.  She also stated that “Marriott has failed to offer any justification for its failure to preserve the evidence” and “failed to offer any facts concerning how or why the evidence was destroyed”; therefore, “Marriott's failure to preserve the entire video footage relating to Linda's accident and the sweep logs for the day in question despite the Hotel's loss prevention employee's testimony that he knew that he had a duty to preserve relevant evidence constitutes, at a minimum, gross negligence.” 

However, Judge Payson stopped short of granting the plaintiffs motion for summary judgment, concluding “that an adverse inference instruction is both appropriate and sufficient to deter Marriott from similar future conduct, to shift the risk of an erroneous judgment to Marriott and to restore the Rileys' position in this litigation.”

So, what do you think?  Did the judge go far enough or should the motion for summary judgment have been granted?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Be Afraid, Be Very Afraid – eDiscovery Horrors!

Today is Halloween.  Every year at this time, because (after all) we’re an eDiscovery blog, we try to “scare” you with tales of eDiscovery horrors.  This is our fifth year of doing so, let’s see how we do this year.  Be afraid, be very afraid!

Did you know that overlaying Bates numbers on image-only Adobe PDF files causes the text of the image not to be captured by eDiscovery processing applications?

What about this?

Finding that the information was relevant and that the defendants “acted with a culpable state of mind” when they failed to preserve the data in its original form, New York Magistrate Judge Ronald L. Ellis granted the plaintiff’s motion for spoliation sanctions against the defendant, ordering the defendant to bear the cost of obtaining all the relevant data in question from a third party as well as paying for plaintiff attorney fees in filing the motion.

Or this?

It’s Friday at 5:00 and I need 15 gigabytes of data processed to review this weekend.

How about this?

Ultimately, it became clear that the defendant had not exported or preserved the data from salesforce.com and had re-used the plaintiffs’ accounts, spoliating the only information that could have addressed the defendant’s claim that the terminations were performance related (the defendant claimed did not conduct performance reviews of its sales representatives).  As a result, Judge Kemp stated that the “only realistic solution to this problem is to preclude Tellermate from using any evidence which would tend to show that the Browns were terminated for performance-related reasons”

Or maybe this?

Could an “unconscionable” eDiscovery vendor actually charge nearly $190,000 to process 505 GB and host it for three months?  Could another vendor charge over $800,000 to re-process and host data (that it had previously hosted) for approximately two months?  Yes, in both cases (though, at least in the second case, the court disallowed over $700,000 of the billed costs).

Scary, huh?  If the possibility of additional processing charges for your PDF files, sanctions because you didn’t preserve data in its original format or preserve it in your cloud-based system or inflated eDiscovery vendor charges scares you, then the folks at eDiscovery Daily will do our best to provide useful information and best practices to enable you to relax and sleep soundly, even on Halloween!

Then again, if it really is Friday at 5:00 and you need 15 gigabytes of data processed to review this weekend (inexpensively, no less), maybe you should check this out.

Of course, if you seriously want to get into the spirit of Halloween, click here.  This will really terrify you!  (Rest in Peace, Robin)

What do you think?  Is there a particular eDiscovery issue that scares you?  Please share your comments and let us know if you’d like more information on a particular topic.

Happy Halloween!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery in Arbitration Has Become Less…Arbitrary – eDiscovery Trends

 

When you think of eDiscovery, you typically think of it as it relates to litigation – two sides of a case requesting and producing electronically stored information (ESI) as one means of identifying evidence designed to lead to resolution of a lawsuit.  But litigation is just one method for dispute resolution.  Another method is arbitration.  But, do arbitrators really “get” eDiscovery?

According to a new article in Corporate Counsel (Arbitrators Finally 'Get' E-discovery, written by Josh M. Leavitt), they finally do – thanks to the issuance of new rules (though some of those rules have actually been around for a while).  Leavitt observes that proponents of arbitration have considered the cost and delays of discovery “inconsistent with core principles of arbitration such as efficiency and cost-effectiveness” and that it was “not uncommon to go through substantial arbitrations without participating in anything remotely resembling either a federal e-discovery conference with opposing counsel or a prearbitration conference where the arbitration panel engaged the parties in meaningful and technically sound discussions about e-discovery.”

The end result was often either an ineffective, costly and/or manipulated discovery process.  However, as Leavitt notes, arbitral bodies JAMS and the American Arbitration Association (AAA) “now have protocols for e-discovery, as do several of the international arbitration providers”.

JAMS

The former Judicial Arbitration and Mediation Services, now known as JAMS, Inc., published in January 2010 its Recommended Arbitration Discovery Protocols to “provide JAMS arbitrators with an effective tool that will help them exercise their sound judgment in furtherance of achieving an efficient, cost-effective process which affords the parties a fair opportunity to be heard.”  Also, Rule 16 of JAMS Comprehensive Arbitration Rules (also added in 2010) covers topics such as preliminary conferences, formats of production, metadata, custodians and cost shifting.

American Arbitration Association (AAA)

Last October, the AAA added new rules R-22 and R-23 to its Commercial Arbitration Rules and Mediation Procedures, which establishes parameters for arbitrators to manage exchange of ESI, impose ESI search parameters and make cost allocations and sanction noncompliance.  Also, the International Centre for Dispute Resolution® (ICDR), the international arm of the AAA, has published a 3 page Guidelines for Arbitrators Concerning Exchanges of Information to establish the authority for arbitrators to manage ESI and impose sanctions for noncompliance with their ESI orders.

With these resources available, arbitrators can make the process of eDiscovery less…arbitrary.

So, what do you think? Have you managed discovery in arbitration? Was it efficient?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Despite 18 Missing Emails in Production, Court Denies Request for “Discovery on Discovery” – eDiscovery Case Law

In Freedman v. Weatherford Int’l, 12 Civ. 2121 (LAK) (JCF) (S.D.N.Y. Sept. 12, 2014), New York Magistrate Judge James C. Francis, IV denied the plaintiff’s request to, among other things, require the defendant to produce “certain reports comparing the electronic search results from discovery in this action to the results from prior searches” – despite the fact that the plaintiff identified 18 emails that the defendant did not produce that were ultimately produced by a third party.

Case Background

In this securities fraud class action, Judge Francis had previously denied three motions to compel by the plaintiffs seeking production of “(1) ‘certain reports comparing the electronic search results from discovery in this action to the results from prior searches’; (2) ‘documents concerning an investigation undertaken by [the] Audit Committee’ of [the] defendant…; and (3) ‘documents concerning an investigation undertaken by the law firm Latham & Watkins LLP’.”  In denying the motions, Judge Francis stated that “Although I recognized that such ‘discovery on discovery’ is sometimes warranted, I nevertheless denied the request because the plaintiffs had not ‘proffered an adequate factual basis for their belief that the current production is deficient.’”

However, Judge Francis granted reconsideration and asked for further briefing on the second item, based on the plaintiffs’ presentation of “new evidence, unavailable at the time [they] filed their [earlier] motion, which allegedly reveals deficiencies in [Weatherford’s] current production.”

Eighteen Missing Emails

The new evidence referenced by the plaintiffs consisted of 18 emails from “critical custodians at Weatherford” that were produced (after briefing on the original motion to compel was complete) not by the defendants, but by a third-party causing the plaintiffs to contend that Weatherford’s production is “significantly deficient.”  The plaintiffs contended that “providing them with a “report of the documents `hit'” by search terms used in connection with the Latham and Audit Committee Investigations will identify additional relevant documents that have not been produced here.”

Judge’s Ruling

However, Judge Francis disagreed, stating “the suggested remedy is not suited to the task. The plaintiffs admit that of those 18 e-mails only three, at most, would have been identified by a search using the terms from the investigations.”  He also cited Da Silva Moore, noting that “[T]he Federal Rules of Civil Procedure do not require perfection…Weatherford has reviewed “millions of documents [] and [produced] hundreds of thousands,” comprising “nearly 4.4 million pages,” in this case…It is unsurprising that some relevant documents may have fallen through the cracks. But, most importantly, the plaintiffs’ proposed exercise is unlikely to remedy the alleged discovery defects. In light of its dubious value, I will not require Weatherford to provide the requested report.”

So, what do you think?  Was the decision justified or should the defendant have been held to a higher standard?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“The Decade of Discovery” On Tour – eDiscovery Trends

A few months ago, we told you about an intriguing documentary about eDiscovery that premiered in the New York area.  Now, that documentary is making the rounds and may be coming to a theatre near you.

The Decade of Discovery was written and directed by Joe Looby, who, according to his LinkedIn profile, served in the U.S. Navy’s Judge Advocate General Corps, practiced as an environmental enforcement attorney for New York state and was a founder of the forensic technology practices at Deloitte and FTI.  His film production company is called 10th Mountain Films, named in honor of his father, who served in the 10th Mountain Division, a U.S. Army ski patrol that fought in World War II.

Described as a “documentary about a government attorney on a quest to find a better way to search White House e-mail, and a teacher who takes a stand for civil justice on the electronic frontier”, Looby notes in a radio interview with the Mid Hudson News that the documentary includes comments by “a government attorney, a teacher, seven judges and two professors”, which includes several well-known names in eDiscovery: U.S. District Judge Shira Scheindlin, of the Southern District of New York, Jason R. Baron, former director of litigation for the U.S. National Archives and Records Administration and now of counsel at Drinker Biddle & Reath, and the late Richard Braman, founder of The Sedona Conference, among others.  Looby refers to those who have advanced tremendous progress made over the past decade in eDiscovery practice as “true American heroes”.

The movie addresses the considerable advancements to address problems like this in both the government and litigation arenas.

Now, the movie has some additional showings scheduled in other parts of the country, including South Carolina, Florida, Chicago, Washington DC, San Francisco and Houston (yay! – I already have tickets).  You can get more information on scheduled showings – and view the trailer – here.

So, what do you think? Is this a movie you would like to see? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Apple Recovers Part, But Not All, of its Requested eDiscovery Costs from Samsung – eDiscovery Case Law

Apple won several battles with Samsung, including ultimately being awarded over $1 billion in verdicts, as well as a $2 million sanction for the inadvertent disclosure of its outside counsel firm (Quinn Emanuel Urquhart & Sullivan LLP) commonly known as “patentgate”, but ultimately may have lost the war when the court refused to ban Samsung from selling products that were found to have infringed on Apple products.  Now, they’re fighting over relative chicken-feed in terms of a few million that Apple sought to recover in eDiscovery costs.

On December 5, 2013, Apple submitted its Bill of Costs seeking a total of over $6.2 million in three categories of taxable costs: “printed or electronically recorded transcripts;” “exemplification and the costs of making copies;” and “[c]ompensation of interpreters.” Samsung filed objections on January 24 of this year. Apple then filed an Amended Bill of Costs on February 6, 2014, waiving and withdrawing certain costs, including the costs related to its sanctions motion against Samsung. Apple’s Amended Bill of Costs sought a total of nearly $5.9 million in costs (of which, nearly $1.5 million related to eDiscovery costs).  Yet, on February 20, Samsung again filed objections.

On June 6, the Clerk taxed costs in the amount of $2,064,940.55, disallowing: $193,884.17 in transcript costs $3,346,652.74 in costs for exemplification and copies and $282,500 in compensation of interpreters.  Both parties sought judicial review of the Clerk’s assessment with Apple requesting that the Court increase the costs award to the full amount requested in their Amended Bill of Costs and Samsung making multiple arguments against the assessment of costs, including the fact that they were appealing the award, Apple only received a partial recovery and that millions of dollars requested were either untaxable or unjustified.

California District Judge Lucy H. Koh found that there was “no basis to defer a decision on the bill of costs pending Samsung’s appeal” and also concluded that Apple is the prevailing party because “[t]he large jury damages award in favor of Apple clearly “materially alter[ed] the legal relationship between the parties” in this case. Moreover, Samsung did not prevail on any of its counterclaims.”

With regard to the eDiscovery costs, Samsung argued that 1) Apple failed to prove that these costs were the functional equivalent of making copies and not costs for intellectual effect, 2) Apple’s documentation failed to prove whether the costs requested are tied to documents actually produced to Samsung and 3) Apple’s “extremely high per page e-discovery rate is excessive and therefore impermissible.”

Judge Koh focused in on the second argument, stating that “it is somewhat unclear from Apple’s documentation of its e-discovery costs whether and to what extent Apple’s claimed costs cover only the costs of documents produced to Samsung. However, in the briefing on the parties’ Cross-Motions, Apple acknowledges that many of its claimed e-discovery costs relate to documents not produced to Samsung.”  As a result, she ruled that “Using Apple’s own figures, Apple estimates that it uploaded a total of 18,264,712 pages of which 2,944,467 pages were ultimately produced…Based on this, the Court calculates that approximately 16.12% of Apple’s e-discovery costs were spent on documents produced to Samsung. The Court will therefore award Apple e-discovery costs in the amount of $238,102.66.” [emphasis added]

The awarded eDiscovery costs were part of an overall award to Apple of nearly $1.9 million.

So, what do you think?  Should the eDiscovery portion of the award have been limited to documents Apple produced?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Those Pesky Email Signatures and Disclaimers – eDiscovery Best Practices

 

Are email signatures and disclaimers causing more trouble than they’re worth?  According to one author, perhaps they are.

Earlier this week, Jeff Bennion wrote an interesting post on the Above the Law blog (‘Please Consider the Environment Before Printing’ Email Signatures Are Hurting the Environment) where he noted that, about 5 years ago, people started putting ‘Please consider the environment before printing this e-mail’ in their email signature (along with a webdings font character of a tree).

Bennion states that this is “the Kony 2012 of the environmental battles – it’s a noble war, but a pointless battle” and that the printing of emails is only a tiny fraction of the paper that lawyers waste.  Instead, he notes, “the ‘please consider the environment’ email signature is more like one of those ‘I voted’ stickers — both serve no purpose other than proclaiming your self-righteousness for performing a civic duty”.

In fact, per a Time magazine article, the internet accounts for a good deal of the pollution in the world. In a 2011 article, cleantechnica.com reported that there were about 500,000 data centers in the world and each used 10 megawatts of energy a month.  That’s a lot more than 1.21 gigawatts.  Great Scott!

When comparing Word files containing data that might go into an email with the same data that also includes the email signature, Bennion observes that the one with the email signature contains .3 KB more of data than the one without the signature.  He extrapolates that out to 27,000 GB of extra useless data being added to internet storage servers every day (10 million GB per year) over all business emails, while acknowledging that not all 90 billion business emails are including the signature.  “The point is that it is a pointless gesture that, as a whole, does more harm than good”, Bennion states.

And, the same holds true for those confidential and privileged email disclaimers at the bottom of emails, which he observes “take up about 10-20 times more wasted space than the ‘please stop printing your emails’ disclaimer” – “roughly the environmental equivalent of clubbing 3 baby seals a month”.  Some interesting takes.

These email signatures and disclaimers also affect eDiscovery costs, both in terms of extra data to process and also host.  They can also lead to false hits when searching text and affect conceptual clustering or predictive coding of documents (which are based on text content of the documents) unless steps are taken to remove those from indices and ignore the text when performing those processes.  All of which can lead to extra work and extra cost.

So, what do you think?  Do you use “please stop printing your emails” signatures and confidential and privileged email disclaimers?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Text Overlays on Image-Only PDF Files Can Be Problematic – eDiscovery Best Practices

Recently, we at CloudNine Discovery received a set of Adobe PDF files from a client that raised an issue regarding the handling of those files for searching and reviewing purposes.   The issue serves as a cautionary tale for those working with image-only PDFs in their document collection.  Here’s a recap of the issue.

The client was using OnDemand Discovery®, which is our new Client Side add-on to OnDemand® that allows clients to upload their own native data for automated processing and loading into new or existing projects.  The collection was purported to consist mostly of image-only PDF files.  PDF files are created in two ways:

  1. By saving or printing from applications to a PDF file: Many applications, such as Microsoft Office applications like Word, Excel and PowerPoint, provide the ability to save the document or spreadsheet that you’ve created to a PDF file, which is common when you want to “publish” the document.  If the application you’re using doesn’t provide that option, you can print the document to PDF using any of several PDF printer drivers available (some of which are free).  These PDFs that are created usually include the text of the file from which the PDF was created.
  2. By scanning or otherwise creating an image to a PDF file: Typically, this occurs either by scanning hard copy documents to PDF or through some sort of receipt in an image-only PDF form (such as through fax software).  These PDFs that are created are images and do not include the text of the document from which they came.

Like many processing tools, such as LAW PreDiscovery®, OnDemand Discovery is programmed to handle PDF files by extracting the text if present or, if not, performing OCR on the files to capture text from the image.  Text from the file is always preferable to OCR text because it’s a lot more accurate, so this is why OCR is typically only performed on the PDF files lacking text.

After the client loaded their data, we did a spot Quality Control check (like we always do) and discovered that the text for several of the documents only consisted of Bates numbers.

Why?

Because the Bates numbers were added as text overlays to the pre-existing image-only PDF files.  When the processing software viewed the file, it found that there was extractable text, so it extracted that text instead of OCRing the PDF file.  In effect, adding the Bates numbers as text overlays to the image-only PDF rendered it as no longer an image-only PDF.  Therefore, the content portion of the text wasn’t captured, so it wasn’t available for indexing and searching.  These documents were essentially rendered non-searchable even after processing.

How did this happen?  Likely through Adobe Acrobat’s Bates Numbering functionality, which is available on later versions of Acrobat (version 8 and higher).  It does exactly that – applies a text overlay Bates number to each page of the document.  Once that happens, eDiscovery processing software applications will not perform OCR on the image-only PDF.

What can you do about it?  If you haven’t applied Bates numbers on the files yet (or have a backup of the files before they were applied – highly recommended) and they haven’t been produced, you should process the files before putting Bates numbers on the images to ensure that you capture the most text available.  And, if opposing counsel will be producing any image-only PDF files, you will want to request the text as well (along with a load file) so that you can maximize your ability to search their production (of course, your first choice should be to receive native format productions whenever possible – here’s a link to an excellent guide on that subject).

If the Bates numbers are already applied and you don’t have a backup of the files without the Bates numbers (oops!) you’re faced with additional processing charges to convert them to TIFF and perform OCR of the text AND the Bates number, a totally unnecessary charge if you plan ahead.

So, what do you think?  Have you dealt with image-only PDF files with text overlaid Bates numbers?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.