Processing Archives

When a Text File Doesn’t Match the Image or Native Excel File, What Do You Do?: eDiscovery Best Practices

October 1, 2015

Even when you’ve been in the business for 25+ years, you sometimes encounter situations you can’t explain (at least initially). Here is a story about a document that I encountered yesterday that initially didn’t make sense to me. Thankfully, I’m extremely curious and ultimately figured it out (with some help). See if it will be obvious to you.

The Issue

In a document collection produced by the opposing party to our client (where we received agreed upon images, text files, native files and metadata), I was performing searches in our CloudNine review platform looking for documents related to a key investment account disputed between the two parties. On one of the documents, I found a hit in the searchable text referencing key information related to the account that was noted by an accountant that we had not yet previously encountered. This appeared to be an important document.

To get a better look at the document, I decided to look at the image that was provided. That text entry was not there.

Since we had the produced Excel file, I downloaded a copy of it (from CloudNine) to take a look at it and the text did not appear to be present in the original native Excel file either. When I performed a search for the accountant’s last name in the entire workbook, Excel retrieved no hits.

What? How can that be?

Figuring It Out

My first thought was that there were hidden columns, rows or worksheets within the Excel file that were not being searched. As it turned out, there was one hidden sheet (which I unhid), but repeating my search for the accountant’s last name in the entire workbook still retrieved no hits.

At this point, I’m wondering if the opposing party may have doctored the image and the Excel file, but forgot to doctor the produced extracted text? You hate to believe the worst of people, but it happens.

Out of ideas, I took the issue to CloudNine’s production manager, Jesus Arellano. After he looked at the Excel file and performed the same search (finding nothing, which made me feel better), he then decided to perform a text extract of the Excel file using LAW PreDiscovery® (which was later reproduced with our own CloudNine Discovery Client processing software). We looked at the results in the text and, behold, there was the note from the accountant!

What the hell is going on out here?

Finally, The Answer

Taking another look at the Excel file, we finally noticed that little red triangle in the corner of some of the cells. Excel comments. Of course.

When I put the cursor over the cell, the comment popped up, revealing the note (that should have been a clue) from the accountant. Excel comments aren’t normally displayed unless you put the cursor on the cell where the comment is contained (you can show all comments under the review tab, but hardly anybody ever does). When the Excel is “printed” to an image file, only the main portion of the workbook is “printed”, not the hidden comments. The same is true for other Microsoft Office applications, as well. So, don’t expect to typically see the hidden comments in an image of an Excel workbook, Word document or other Office file.

As for searching the hidden comments in Excel, you can do so using Ctrl+F, you just have to make sure you change the “Look in” field to Comments to search those specifically (see the example below using my last name of Austin):

Perhaps, if it hadn’t been at the end of a long day, I would have caught it more quickly (that’s my excuse, anyway). Nonetheless, it serves as an excellent example of how hidden metadata can contain important information. Due to this find, resulting from the original text search I did, we identified an individual for our client to depose!

So, what do you think? Have you ever encountered data important to a case in the hidden metadata of a file? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Is Your Outlook PST File Corrupt? All is Not Lost!: eDiscovery Best Practices

September 16, 2015

With our 5 year anniversary coming up this weekend and my having experienced this issue with a client recently, it seemed to make sense to revisit this topic.

Though we’d like to believe that there will never be any problems with the data that we preserve, collect and process for eDiscovery purposes, data is not perfect. Sometimes the most critical data may be difficult or impossible to use. For example, key files could be password protected from being opened or they could be corrupted. If an Outlook Personal Storage Table (PST) file is corrupted, that file corruption could literally make tens of thousands of documents unavailable for discovery unless the file can be repaired.

I have a client who regularly sends us PST files to be processed and loaded via CloudNine’s Discovery Client processing application into our CloudNine review platform (double shameless plug warning!). But, sometimes, the PST files that we have received have been corrupted. Once, I had a case where 40% of the collection was contained in 2 corrupt Outlook PST files. Without the ability to repair those files, we would have been unable to access key portions of the collections in these cases that needed to be processed and reviewed.

Fortunately, there is a repair tool for Outlook designed to repair corrupted PST files. It’s called SCANPST. It’s an official repair tool that has been included since Office 2007. Despite the fact that it’s a “tool”, you won’t find SCANPST in the Microsoft Office Tools folder within the Microsoft Office folder in Program files. Instead, you’ll have to navigate to the C:Program FilesMicrosoft OfficeOffice14 folder (for Office 2010) or C:Program FilesMicrosoft OfficeOffice15 (for Office 2013) to find the SCANPST.EXE utility.

Double-click this file to open Microsoft Outlook Inbox Repair Tool. The utility will prompt for the path and name of the PST file (with a Browse button to browse to the corrupted PST file). There is also an Options button to enable you to log activity to a new log file, append to an existing log file or choose not to write to a log file. Before you start, you’ll need to close Outlook and all mail-enabled applications.

Once ready, press the Start button and the application will begin checking for errors. When the process is complete, it should indicate that it found errors on the corrupted file, along with a count of folders and items found in the PST file. The utility will also provide a check box to make a backup of the scanned file before repairing. ALWAYS make a backup – you never know what might happen during the repair process. Click the Repair button when ready and the utility will hopefully repair the corrupted PST file.

If SCANPST.EXE fails to repair the file, then there are some third party utilities available that may succeed where SCANPST failed. If all else fails, you can hire a data recovery expert (like us). Of course, sometimes files are beyond repair, regardless of the utility.

By repairing the PST file, you are technically changing the file, so if the PST file is discoverable, it may be necessary to disclose the corruption to opposing counsel and the intent to attempt to repair the file to avoid potential spoliation claims.

So, what do you think? Have you encountered corrupted PST files in discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Craig Ball Explains HASH Deduplication As Only He Can: eDiscovery Best Practices

July 10, 2015

Ever wonder why some documents are identified as duplicates and others are not, even though they appear to be identical? Leave it to Craig Ball to explain it in plain terms.

In the latest post (Deduplication: Why Computers See Differences in Files that Look Alike) in his excellent Ball in your Court blog, Craig states that “Most people regard a Word document file, a PDF or TIFF image made from the document file, a printout of the file and a scan of the printout as being essentially “the same thing.” Understandably, they focus on content and pay little heed to form. But when it comes to electronically stored information, the form of the data—the structure, encoding and medium employed to store and deliver content–matters a great deal.” The end result is that two documents may look the same, but may not be considered duplicates because of their format.

Craig also references a post from “exactly” three years ago (it’s four days off Craig, just sayin’) that provides a “quick primer on deduplication” that shows the three approaches where deduplication can occur, including the most common approach of using HASH values (MD5 or SHA-1).

My favorite example of how two seemingly duplicate documents can be different is the publication of documents to Adobe Portable Document Format (PDF). As I noted in our post from (nowhere near exactly) three years ago, I “publish” marketing slicks created in Microsoft® Publisher, “publish” finalized client proposals created in Microsoft Word and “publish” presentations created in Microsoft PowerPoint to PDF format regularly (still do). With a free PDF print driver, you can conceivably create a PDF file for just about anything that you can print. Of course, scans of printed documents that were originally electronic are another way where two seemingly duplicate documents can be different.

The best part of Craig’s post is the exercise that he describes at the end of it – creating a Word document of the text of the Gettysburg Address (saved as both .DOC and .DOCX), generating a PDF file using the Save As and Print As PDF file methods and scanning the printed document to both TIFF and PDF at different resolutions. He shows the MD5HASH value and the file size of each file. Because the format of the file is different each time, the MD5HASH value is different each time. When that happens for the same content, you have what some of us call “near dupes”, which have to be analyzed based on the text content of the file.

The file size is different in almost every case too. We performed a similar test (still not exactly) three years ago (but much closer). In our test, we took one of our one page blog posts about the memorable Apple v. Samsung litigation and saved it to several different formats, including TXT, HTML, XLSX, DOCX, PDF and MSG – the sizes ranged from 10 KB all the way up to 221 KB. So, as you can see, the same content can vary widely in both HASH value and file size, depending on the file format and how it was created.

As usual, I’ve tried not to steal all of Craig’s thunder from his post, so please check out it out here.

So, what do you think? What has been your most unique deduplication challenge? Please share any comments you might have or if you’d like to know more about a particular topic.

For a Successful Outcome to Your Discovery Project, Work Backwards: eDiscovery Best Practices

May 22, 2015

Based on a recent experience with a client, it seemed appropriate to revisit this topic. Plus, it’s always fun to play with the EDRM model. Notice anything different? 🙂

While the Electronic Discovery Reference Model from EDRM has become the standard model for the workflow of the process for handling electronically stored information (ESI) in discovery, it might be helpful to think about the EDRM model and work backwards, whether you’re the producing party or the receiving party.

Why work backwards?

You can’t have a successful outcome without envisioning the successful outcome that you want to achieve. The end of the discovery process includes the production and presentation stages, so it’s important to determine what you want to get out of those stages. Let’s look at them.

Presentation

Whether you’re a receiving party or a producing party, it’s important to think about what types of evidence you need to support your case when presenting at depositions and at trial – this is the type of information that needs to be included in your production requests at the beginning of the case as well as the type of information that you’ll need to preserve as a producing party.

Production

The format of the ESI produced is important to both sides in the case. For the receiving party, it’s important to get as much useful information included in the production as possible. This includes metadata and searchable text for the produced documents, typically with an index or load file to facilitate loading into a review application. The most useful form of production is native format files with all metadata preserved as used in the normal course of business.

For the producing party, it’s important to be efficient and minimize costs, so it’s important to agree to a production format that minimizes production costs. Converting files to an image based format (such as TIFF) adds costs, so producing in native format can be cost effective for the producing party as well. It’s also important to determine how to handle issues such as privilege logs and redaction of privileged or confidential information.

Addressing production format issues up front will maximize cost savings and enable each party to get what they want out of the production of ESI. If you don’t, you could be arguing in court like our case participants from yesterday’s post.

Processing-Review-Analysis

It also pays to make decisions early in the process that affect processing, review and analysis. How should exception files be handled? What do you do about files that are infected with malware? These are examples of issues that need to be decided up front to determine how processing will be handled.

As for review, the review tool being used may impact how quick and easy it is to get started, to load data and to use the tool, among other considerations. If it’s Friday at 5 and you have to review data over the weekend, is it easy to get started? As for analysis, surely you test search terms to determine their effectiveness before you agree on those terms with opposing counsel, right?

Preservation-Collection-Identification

Long before you have to conduct preservation and collection for a case, you need to establish procedures for implementing and monitoring litigation holds, as well as prepare a data map to identify where corporate information is stored for identification, preservation and collection purposes.

And, before a case even begins, you need an effective Information Governance program to minimize the amount of data that you might have to consider for responsiveness in the first place.

As you can see, at the beginning of a case (and even before), it’s important to think backwards within the EDRM model to ensure a successful discovery process. Decisions made at the beginning of the case affect the success of those latter stages, so working backwards can help ensure a successful outcome!

So, what do you think? What do you do at the beginning of a case to ensure success at the end? Please share any comments you might have or if you’d like to know more about a particular topic.

Managing Email Signature Logos During Review: eDiscovery Best Practices

April 8, 2015

Yesterday, we discussed how corporate logo graphic files in email signatures can add complexity when managing those emails in eDiscovery, as these logos, repeated over and over again, can add up to a significant percentage of your collection on a file count basis. Today, we are going to discuss a couple of ways that I have worked with clients to manage those files during the review process.

These corporate logos cause several eDiscovery complications such as slowing page refreshes in review tools and wasting reviewer time and making review even more tedious. I’ll focus on those particular issues below.

It should be noted that, as VP of Professional Services at CloudNine, my (recent) experience in assisting clients has primarily been using CloudNine’s review platform, so, with all due respect to those “technically astute vendor colleagues” that Craig Ball referred to in his excellent post last week, I’ll be discussing how I have handled the situation with logos in Outlook emails at CloudNine (shameless plug warning!).

Processing Embedded Graphics within Emails

I think it’s safe to say as a general rule, when it comes to processing of Outlook format emails (whether those originated from EDB, OST, PST or MSG files), most eDiscovery processing applications (including LAW and CloudNine’s processing application, Discovery Client) treat embedded graphic files as attachments to the email and those are loaded into most review platforms as attachments linked to the parent email. So, a “family” that consists of an email with two attached PDF files and a corporate logo graphic file would actually have four “family” members with the corporate logo graphic file (assuming that there is just one) as one of the four “family” members.

This basically adds an extra “document” to each email with a logo that is included in the review population (more than one per email if there are additional logo graphics for links to the organization’s social media sites). These files don’t require any thought during review, but they still have to be clicked through and marked as reviewed during a manual review process. This adds time and tedium to an already tedious process. If those files could be excluded from the review population, reviewers could focus on more substantive files in the collection.

In Discovery Client, an MD5 hash value is computed for each individual file, including each email attachment (including embedded graphics). So, if the same GIF file is used over and over again for a corporate logo, it would have the same MD5 hash value in each case. CloudNine provides a Quick Search function that enables you to retrieve all documents in the collection with the same value as the current document. So, if you’re currently viewing a corporate logo file, it’s easy to retrieve all documents with the same MD5 hash value, apply a tag to those documents and then use the tag to exclude them from review. I’ve worked with clients to do this before to enable them to shorten the review process while establishing more reliable metrics for the remaining documents being reviewed.

It should be noted that doing so doesn’t preclude you from assigning responsiveness settings from the rest of the “family” to the corporate logo later if you plan to produce those corporate logos as separate attachments to opposing counsel.

Viewing Emails with Embedded Logos

Embedded logos and other graphics files can slow down the retrieval of emails for viewing in some document viewers, depending on how they render those graphics. By default, Outlook emails are already formatted in HTML and CloudNine provides an HTML view option that enables the user to view the email without the embedded graphics. As a result, the email retrieves more quickly, so, in many cases, where the graphics don’t add value, the HTML view option will speed up the review process (users can still view the full native file with embedded graphics as needed). In working with clients, I’ve recommended the HTML view tab as the default view in CloudNine as a way of speeding retrieval of files for review, which helps speed up the overall review process.

So, what do you think? Do you find that corporate logo graphics files are adding complexity to your own eDiscovery processes? If so, how do you address the issue? Please share any comments you might have or if you’d like to know more about a particular topic.

What Time Is It? That is an Important Question When it Comes to Your Document Collection: eDiscovery Best Practices

March 26, 2015

It may not be game time (hoo!), but the question of what time it really is has a significant effect on how eDiscovery is handled.

Our clients that process their electronically stored information (ESI) with CloudNine’s Discovery Client processing application (shameless plug warning!) generally find the wizard based application for processing and loading data to our review platform easy to use (as of this week, you can now process your data for loading in your own preferred review platform). But, for those few clients who have questions, we get one question WAY more than any other:

Why is the application asking me in which time zone would I like for my dates to be displayed?

Most ESI is stored in UTC (Coordinated Universal Time), which is the primary time standard by which the world regulates clocks and time. That’s not the same as Greenwich Mean Time (GMT), which is actually a time zone, not a time standard like UTC. The user’s operating system uses regional settings on the user’s system to convert the UTC time to the user’s local time zone. In many litigation cases, one of the issues that should be decided up front is the time zone to apply to the produced files. Why is it a big deal? Consider this example:

A multinational corporation has offices from coast to coast and potentially responsive emails are routinely sent between people in New York and Los Angeles offices. If an email is sent from one custodian in the Los Angeles office at 10 PM on June 30, 2013 and is received by another custodian in the New York office at 1 AM on July 1, 2013, and the relevant date range is from July 1, 2013 thru December 31, 2014, then the choice of time zones will determine whether or not that email falls within the relevant date range. Because the time zone is based on the workstation setting, the two employees could actually even be in the same office when the email is sent (if someone is traveling).

As noted in the recently released EDRM Data Processing Standards Guide (which we covered here), if the processing time zone for the case is not standardized across the entire collection, then the email metadata for custodians in the different time zones will be different – because the time (and, possibly, the date, as indicated in the example above) would be different. As a result, two copies of the same email (one in the New York custodian’s email collection and one in the Los Angeles custodian’s email collection), would fail to be de-duplicated. Not to mention that the different time zones would create a convoluted chronology or, as in the example above, a convoluted relevant date range.

As a result, most eDiscovery processing software (including ours) expects you to use a standard time zone for all files in the case. That can be the predominant time zone where the producing party is located – for example, an organization has offices throughout the country, but its headquarters is (along with most of the producing custodians) based in Houston, TX – so you might choose Central Standard Time as the time zone for the case. Or if the producing party is fairly evenly spread out across multiple time zones, you can choose to standardize to UTC.

So, what do you think? Have you had any date disputes in your eDiscovery projects? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

A New Processing Standards Guide from EDRM: eDiscovery Best Practices

March 16, 2015

When dealing with electronic data, some attorneys think that since the files are already electronic, how hard can they be to load? Unfortunately, it’s not as simple as that. To be useable in discovery, electronic files need to be processed and good processing requires a sound process. Leave it to EDRM to offer a new standards guide to establish a set of basic standards for processing various types of data for eDiscovery.

Let’s face it, at some point in nearly every eDiscovery life cycle, it is necessary to “process” data from an electronic storage device into a database so the data may be used in subsequent e-discovery steps. So, last Tuesday, EDRM released its new “software agnostic”* EDRM Data Processing Standards Guide, which is designed to help eDiscovery professionals ask the right questions and be knowledgeable about the tools available (*while the guide is meant to be software-agnostic, it does draw heavily on examples from kCura’s system, Relativity).

Written by experienced practitioners, the guide addresses considerations and concerns that arise when one processes data from an electronic storage device into an eDiscovery database and is intended to be a resource for anyone who would like to use the processing stage of eDiscovery to streamline review and improve analysis of information in the database. It covers everything from virus protection, container files, deduplication and de-NISTing to HASH values, time zone considerations, passwords and exception handling. It also identifies key metadata fields necessary for searching, sorting and production purposes and a basic glossary of terms. And, as processing has numerous potential permutations, the guide identifies some of the topics that aren’t yet covered in the “Potential Future Topics” section, such as language identification, EML files (Outlook Express) and processing Lotus Notes email.

The draft guide is available here and is open for public comment until tomorrow, March 17 (extra credit for submitting your comments in green ink – just kidding!), after which time input will be reviewed and considered for incorporation before the new guide is finalized. If you’re used to simply turning over your electronic files to a vendor for processing and want to know what that vendor is actually doing with them, it’s a good guide to help you understand the steps involved in making your data usable for review.

So, what do you think? Have you read the guide yet? If so, did you find it useful? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscoveryDaily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Brad Jenkins of CloudNine: eDiscovery Trends

February 23, 2015

This is the first of the 2015 LegalTech New York (LTNY) Thought Leader Interview series. eDiscovery Daily interviewed several thought leaders at LTNY this year and generally asked each of them the following questions:

What are your general observations about LTNY this year and how it fits into emerging trends? Do you think American Lawyer Media (ALM) should consider moving LTNY to a different time of year to minimize travel disruptions due to weather?
After our discussion last year regarding the new amendments to discovery provisions of the Federal Rules of Civil Procedure, additional changes were made to Rule 37(e). Do you see those changes as being positive and do you see the new amendments passing through Congress this year?
Last year, most thought leaders agreed that, despite numerous resources in the industry, most attorneys still don’t know a lot about eDiscovery. Do you think anything has been done in the past year to improve the situation?
What are you working on that you’d like our readers to know about?

Today’s thought leader is Brad Jenkins of CloudNine™. Brad has over 20 years of experience as an entrepreneur, as well as 15 years leading customer focused companies in the litigation support arena. Brad has authored several articles on document management and litigation support issues, and has appeared as a speaker before national audiences on document management practices and solutions. He’s also my boss! 🙂

What are your general observations about LTNY this year and how it fits into emerging trends? Do you think American Lawyer Media (ALM) should consider moving LTNY to a different time of year to minimize travel disruptions due to weather?

LTNY seemed reasonably well attended this year and I think it was a good show. I have noticed a drop in the number of listed exhibitors though, from 225 a couple of years ago to 199 this year. Not sure if that’s a reflection of consolidation in the industry or providers simply choosing to market to prospects in other ways. I guess we’ll see. Nonetheless, I thought there were several good sessions, especially the three judges’ sessions that addressed key cases, the rules changes and general problems with discovery. I liked the fact that those were free and available to all attendees, not just paid ones. Not surprisingly, those sessions were very well attended.

Overall, I thought the primary focus of this show’s curriculum in three areas: information governance (which had its own educational track at the show), cybersecurity and data privacy. With the amazing pace at which Big Data is growing, I expect information governance to be a major topic for some time to come, especially with regard to the use of technology to manage growing data volumes. And, as we discussed in this blog a couple of weeks ago, data breaches continue to be on the rise and we’ve already had a major one involving over 80 million records this year. That’s also going to continue to be a major focus.

One issue at the show that I think affected several attendees was the sudden lack of meeting space. The Hilton got rid of its lobby lounge, replacing it with a smaller executive lounge limited to hotel guests. And, ALM booked up the Bridges Bar for private events throughout the show. Meetings and discussions are a big part of LTNY and I hope ALM will take that into account next year and at least make the Bridges Bar available for meetings.

As for whether ALM should consider moving LTNY to a different time of year, there are pros and cons to that. As a person who missed the show entirely last year due to weather and travel issues and was delayed a few hours this year, it would be nice to minimize the chance of weather delays. On the other hand, I suspect that part of the reason that the show is in the winter is that it’s less costly to host then. Certainly, vendors would need an advanced heads up of at least a year if ALM were to decide to move the show to a different time of year. I don’t expect that to happen, despite the recent travel issues for remote attendees.

After our discussion last year regarding the new amendments to discovery provisions of the Federal Rules of Civil Procedure, additional changes were made to Rule 37(e). Do you see those changes as being positive and do you see the new amendments passing through Congress this year?

I’m not an attorney and am no expert on the rules, but, based on everything that I’ve heard, it sounds as though they should pass. I know that large organizations are counting on Rule 37(e) to reduce their preservation burden. I think whether it will or not will depend on judges’ interpretation of Rule 37(e)(2) (which enables more severe sanctions “only upon finding that the party acted with the intent to deprive another party of the information’s use”). That section may result in lesser sanctions in at least some cases, but we’ll see. At eDiscovery Daily, we’ve covered over 60 cases per year each of the past three years, so at some point in a year or two, it will be interesting to look back at trends and what they show.

Last year, most thought leaders agreed that, despite numerous resources in the industry, most attorneys still don’t know a lot about eDiscovery. Do you think anything has been done in the past year to improve the situation?

I think it’s still a battle. We continue to work with a lot of firms whose attorneys lack basic eDiscovery fundamentals and we continue to provide education through this blog and consulting to attorneys to assist them with technical language in requests for production to ensure that they receive the most useful form of production to them, native files with included metadata. I think it’s imperative for providers like us to continue to do what we can to simplify the discovery process for our clients – through education and through streamlining of processes and process improvement. That’s what our corporate mission is and it continues to be a major focus for CloudNine.

What are you working on that you’d like our readers to know about?

Well, speaking of has “anything been done in the past year to improve the situation”, in November, we released CloudNine’s new easy-to-use Discovery Client application to automate the processing and uploading of raw native data into our CloudNine platform. Many of our clients have struggled with having data dumped on their desk at 4:00 on a Friday afternoon and having to fill out forms, swap emails and play phone tag with vendors to get the data up quickly so that they can review it over the weekend. With CloudNine’s Discovery Client, they can get data processed and loaded themselves without having to contact a vendor, whether it is load ready or not.

The application will extract data from archives such as ZIP and PST files, extract metadata, extract and index text (and OCR documents without text) render native files to HTML and identify duplicates based on MD5HASH value. The application will also generate key data assessment analytics such as domain categorization to enable attorneys to develop an understanding of their data more quickly. And, we are just about to release a new version of the Discovery Client that will enable clients to simply process the data and retrieve the processed data to load into their own preferred platform (if it’s not CloudNine), so we can support you even if you use a different review platform.

Our do-It-yourself features such as loading your own data, adding your own users and fields, accessing audit logs and setting user rights gives our clients unique control of their review process and makes it easier for them to understand eDiscovery and feel in control of the process. Simplifying discovery and taking the worry out of it (as much as possible) is what CloudNine is all about.

Thanks, Brad, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Allows Costs for TIFF Conversion and OCR, Likens it to “Making Copies” – eDiscovery Case Law

November 21, 2014

In Kuznyetsov v. West Penn Allegheny Health Sys., No. 10-948 (W.D. Pa.Oct. 23, 2014), Pennsylvania Senior District Judge Donetta W. Ambrose upheld the Clerk of Courts issuance of Taxation of Costs for $60,890.97 in favor of the defendants and against the named the plaintiffs, including costs for “scanning and conversion of native files to the agreed-upon format for production of ESI”.

Case Background

The plaintiffs filed a collective action pursuant to §216(b) of the Fair Labor Standards Act (“FLSA”) against the defendants, which was ultimately decertified as Judge Ambrose ruled that the 824 opt-in plaintiffs were not similarly situated. After that, the plaintiffs filed a Motion for Voluntary Dismissal, which Judge Ambrose granted, dismissing the claims of the opt-in Plaintiffs without prejudice and dismissing the claims of the named Plaintiffs with prejudice (the plaintiffs appealed and the Third Circuit dismissed the appeal for lack of jurisdiction).

On October 15, 2013, the defendants filed a Bill of Costs seeking a total of $78,561.77. On October 31, 2013, the Clerk of Courts filed a Letter calling for objections to the Bill of Costs, which was followed in January of this year by objections from the named plaintiffs (to which the Defendants filed a response). On August 1, the Clerk of Courts issued his Taxation of Costs in the amount of $60,890.97 in favor of Defendants and against the named Plaintiffs.

Judge’s Ruling

Stating that “Rule 54(d)(1) creates a strong presumption that costs are to be awarded to the prevailing party”, Judge Ambrose analyzed the costs as defined in 28 U.S.C. § 1920, including §1920(4), which covers “Fees for exemplification and the costs of making copies of any material where the copies are necessarily obtained for use in the case”.

Addressing the plaintiff’s contention that the costs awarded were for eDiscovery costs were not necessary and were awarded at unreasonably high rates and referencing the Race Tires case in her ruling, Judge Ambrose stated:

“With regard to unnecessary e-discovery costs and unreasonably high rates, Plaintiffs first argue that the costs associated with Optical Character Recognition (‘OCR’) were unnecessary…As Defendants point out, however, Plaintiffs requested the information be produced in, inter alia, OCR format…The ‘scanning and conversion of native files to the agreed-upon format for production of ESI constitutes `making copies of materials’ as pursuant to §1920(4)…Accordingly, I find the costs associated with OCR conversion are taxable.

Furthermore, I do not find the cost of 5 cents per page for TIFF services to be unreasonably high, nor do I find 24 cents per page for scanning paper documents to be unreasonably high…Consequently, I find not merit to this argument either.”

Rejecting the plaintiff’s arguments that “1) Defendants have unclean hands; 2) Plaintiffs are unable to pay the costs; and 3) it would be inequitable to force the three named Plaintiffs to pay the entire costs of defending against the claims of the opt-in Plaintiffs”, Judge Ambrose affirmed the amount of $60,890.97 in favor of Defendants.

So, what do you think? Should the costs have been allowed for conversion of native files when they may have already been usable as is? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Want an Automated, Easy and Inexpensive Way to Process Your Data? Read On – eDiscovery Trends

November 10, 2014

A couple of months ago, we had a laugh at Ralph Losey’s post that took a humorous look at the scenario where it’s Friday at 5 and you need data processed to be reviewed over the weekend. It was a funny take on a real problem that most of us have experienced from time to time. But, there may be a solution to this problem that’s automated, easy and inexpensive.

Anytime we talk about something that relates to our company, CloudNine Discovery, we always add the “shameless plug warning” to let people know that the topic relates to our software or a service we offer. If you’re a regular reader of our blog, you know it doesn’t happen that often. But, we have just made a major announcement that we believe will interest many of you.

Today, we are officially announcing the release of OnDemand Discovery®, our new application that enables you to upload your native data and have it processed and loaded directly into OnDemand®, our cloud-based online review tool.

It’s a 100% automated upload process that includes native file extraction from container files (such as Outlook PSTs and ZIP Files), metadata & text extraction and indexing, OCR of image files, duplicate identification and HTML creation, streamlining the process to get started reviewing documents for discovery. The process automatically notifies you when we’ve received your data and then again when we’ve loaded and indexed it and when all processing (including advanced analytics for early data assessment) is complete. So, you never have to wonder about the status of your processing job.

It’s ideal for situations where you receive data late on a Friday afternoon and have to get it ready to review over the weekend and also ideal for preparing small batches of files for review without having to run them through cumbersome processing software built for multiple gigabytes, not a small batch of files. OnDemand Discovery is designed to handle two megabytes, two gigabytes or two hundred gigabytes or more!

There are three easy steps to give it a try:

Sign up for a free account here. You will receive an email with your credentials (including temporary password), to get started.
When you first log in, you’ll see a button to “Upload Data”. That will take you to a form to download the OnDemand Discovery client (which is a Windows based client application that resides on your desktop) for uploading data for processing. Download and install the client to upload data.
Once the client is downloaded and installed, launch the client, log in with your newly created credentials and simply follow the wizard prompts to upload the desired data set and put it into the project of your choice (which you can create if it doesn’t already exist). It’s that easy!

For more information, feel free to check out our press release on our news page here. You can also contact me at daustin@cloudnincloudnine.comm for more information as well.

And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Processing