Processing

Process This! – Close Outlook Before Compressing or Zipping PST Files for Processing: eDiscovery Best Practices

Having recently experienced this with a client, I thought I would revisit this helpful tip.  This is one of the tips Tom O’Connor and I will be covering this Friday – E-Discovery Day – on our webcast Murphy’s eDiscovery Law: How to Keep What Could Go Wrong From Going Wrong at noon CST (1:00pm EST, 10:00am PST).  Click here to register for Friday’s webcast.

As you may know, at CloudNine (shameless plug warning!), we have an automated processing capability for enabling clients to load and process their own data – they can use this capability to load their data into our review platform.  They can even process and load data straight into Relativity using our Outpost for Relativity module.

Regardless whether they load data into CloudNine or Relativity, most of our users are using the processing capability to process emails, usually from Outlook Personal Storage Table (PST) files.  Even though increased volumes of social media and other types of electronically stored information, emails are still predominant in eDiscovery.  And, for users trying to process and load that data, we get one issue more than any other when it comes to processing those Outlook emails:

They still have Outlook open with the PST file opened when they attempt to upload that PST file or when they try to create a ZIP file containing the Outlook PST.

When that happens, the resulting ZIP file that is created (either by the user or by our client application if the data is not already contained in an archive file) will almost invariably be corrupted or empty.  Either way, this will result in a failure during processing of the loaded data – because the data being processed will simply be corrupt.

This is not only true for CloudNine processing, this is also true for any application that you use for processing, such as Law PreDiscovery.  So, before attempting to create a ZIP (or RAR or other type of archive) of a PST file (or before you upload it to a platform like CloudNine for processing), make sure that Outlook is closed or at least that the PST file is closed within Outlook.  That’s the best way to have a positive “outlook” to discovering emails.  Get it?  :o)

So, what do you think?  Is email still the predominant source of discoverable ESI in your organization?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Brad Jenkins of CloudNine: eDiscovery Trends

This is the first of the 2017 LegalTech New York (LTNY) Thought Leader Interview series.  eDiscovery Daily interviewed several thought leaders at LTNY (aka LegalWeek) this year to get their observations regarding trends at the show and generally within the eDiscovery industry.

Today’s thought leader is Brad Jenkins of CloudNine™.  Brad has over 20 years of experience as an entrepreneur, as well as 15 years leading customer focused companies in the litigation technology arena. Brad also has authored several articles on document management and litigation support issues, and has appeared as a speaker before national audiences on document management practices and solutions.  He’s also my boss!  🙂

What are your observations about LTNY this year and how it compared to other LTNY shows that you have attended?

Once again, a majority of my time at LTNY was spent in meetings with colleagues and business partners as CloudNine had a suite and we had several meetings set up over the course of the three days of the show.  It seems that the meetings outside the show have become as big as the show itself.  Several people that I met with had hardly spent any time (if any) at the show when I met with them.  Because it’s the biggest conference of the year, LTNY provides a unique opportunity for face to face meetings you don’t get during the rest of the year, so it pays to take advantage of that opportunity.  Unfortunately, that comes at the expense of attending most of the conference itself.

I was able to attend some of the conference and spent a little time in the exhibit hall.  Based on what I saw, attendance seemed down this year and some of the exhibitors that I spoke with seemed to agree.  I assume the decision by ALM to charge a fee for the Exhibits Plus passes for the first time ever had an impact on attendance in the exhibit hall.  Not surprisingly, some criticized that decision, so it will be interesting to see if exhibitors push back on that and if ALM decides to charge that fee again next year.

Regardless, with so many opportunities for providers to reach prospects in a less expensive manner and with a market that clearly appears to be consolidating, I would expect that it will continue to be a challenge for ALM to retain exhibitors.  Over the past few years, the number of exhibitors have dropped and I wouldn’t be surprised to see that trend continue unless ALM gets creative in identifying new ways to attract potential exhibitors to the conference.

What about general industry trends?  Are there any notable trends that you’ve observed?

Last year, I noted a clear trend toward SaaS automation within eDiscovery and I think it’s clear that trend has not only continued, but expanded.  In addition to the investment in some automation providers, and the emergence of others like our company, CloudNine, we’ve seen several of the “big boys” (such as Ipro, Thomson Reuters and kCura) roll out their own cloud-based automation initiatives.  In the past year, we also saw organizations like Gartner acknowledge that cloud eDiscovery solutions are gaining momentum in the market due to their ease of use and competitive and straightforward pricing structures.  The move to the cloud for eDiscovery reflects a similar migration to the cloud within organizations for everything from SalesForce.com to Office 365.  In fact, Forbes.com recently published an article that reflected a prediction that, by 2020, 92% of everything we do will be in the cloud.  So, it makes sense that eDiscovery solutions would reflect that trend.

Another trend that has been happening for a few years and is certainly accelerating is the move to the left of the EDRM model for discovery and analytics.  With estimates of data doubling in organizations every 1.2 years, organizations are certainly having to turn to technology to address the challenges associated with that explosion of data.  The need for discovery is no longer initiated just by trigger events such as litigation or investigations – the need for organizations to perform discovery is a perpetual need.  You’re seeing organizations beginning to focus on data discovery to explore patterns and trends within unstructured data, even at the point of data creation, to gather insight into the data they have.  Then, when those trigger events occur, organizations are progressing into more traditional legal discovery to identify, preserve, collect, process, analyze, review and produce key ESI to support legal or investigative activities.  I think you’ll see that trend toward an increased focus on data discovery continue to accelerate as a way for organizations to address the challenges associated with the explosion of data in their environments.

One last trend that I’ll mention is the growing number of state bar associations that have adopted some sort of expectation or guidance for technology competence among their bar members.  I believe that there are 26 states now that have adopted some version of Comment 8 to ABA Model Rule 1.1 and Florida has become the first state to actually mandate technology CLE for their attorneys – three hours of technology CLE over a three year period.  At CloudNine, we believe that educated clients make the best clients and we’ve tried to do our part for the past several years to help educate the legal profession with our blog and, this year, we are adding educational webcasts (with CLE certification in some states) to help educate lawyers.  While I think we still have a long way to go before the legal profession is generally knowledgeable about technology, I think the increased focus on technology competence along with the continued trend toward simplified discovery automation puts attorneys in a better position than ever to use technology to support their discovery needs.

What are you working on that you’d like our readers to know about?

In addition to the educational webcasts that we have started conducting this year, CloudNine recently announced our latest accomplishment in simplified discovery automation with our integration with Relativity that provides Relativity users with a client application that automates the upload, processing, and ingestion of ESI into Relativity, directly from their desktop.  Just as CloudNine users have been able to automate the upload, processing, and ingestion of ESI into CloudNine for several years now, the universe of more than 150,000 Relativity users will now be able to do the same.

We have several other new features and capabilities that provide simplified discovery automation capabilities to our clients that are also in the works and I look forward to having more information to share on those soon.

We are also very active in the data discovery space that I referred to earlier, providing solutions and assistance to help clients address their data discovery needs.  We’re finding that the needs of organizations to gain insight into their data occurs long before litigation and other events trigger the duty of those organizations and CloudNine is at the forefront in helping organizations address their data discovery needs.

As I said during last year’s interview, we feel that CloudNine is the leader in simplifying discovery automation and our unique combination of Speed, Simplicity, Security and Services enables CloudNine to simplify discovery for our clients.  That continues to be our mission as a company and has been throughout our more than 14 years as a company assisting our clients.

Thanks, Brad, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Hashing Out the Idea of a Standard Hash Algorithm for Vendors: eDiscovery Best Practices

In a blog post earlier this month, Craig Ball discussed the question (which was posed at the recent ILTACON conference by Beth Patterson, Chief Legal & Technology Services Officer for Allens) of why eDiscovery service providers can’t (or don’t) standardize hash values so as to support identification and deduplication across products and collections.  Good question.  Let’s take a look.

In his post from his excellent Ball in Your Court blog (Cross-Matter & -Vendor Message ID), Craig noted that standardization would enable you to use work from one matter in another and flag emails already identified as privileged in one case so that they don’t slip through.  Wouldn’t that be great?

According to Craig, unfortunately, the panelists’ response to the question appeared to be to characterize it as “a big technical challenge.”

Craig then took a look at the issue, beginning by recapping some “hash facts” to establish a baseline for understanding considerations for computing hash values.  He then differentiated loose documents (easy, because as long as they are properly preserved, they should generate the same hash value consistently) from emails.  Emails are more difficult to construct consistent hash values for because the hash value of an email depends on when it is exported as well as other factors.  So, the same email exported at different times or from different email clients will have a different hash value – even though we see them as the same, the computer doesn’t.  Make sense?

Craig also took a look at some approaches for generating standardized hash values for emails and also took a look at MD5 vs. SHA-1 methods of hashing and debunked the idea that MD5 hash values aren’t unique enough to be “defensible”.  There are 340,282,366,920,938,463,463,374,607,431,768,211,000 unique MD5 hash values.  Unique enough for you?

I asked Bill David, Chief Technical Officer at CloudNine and architect of the platform, about the use of MD5 for generating hash values.

“Of these (and other) HASH routines, we ultimately chose MD5 for a couple of reasons”, Bill said. “First, for all practical purposes, MD5 Hash is sufficient for identifying duplicate files in a given population. Second, it’s faster than the alternatives. And third, it is widely available. You can find the MD5 Hash routine in all major computer languages as well as in most relational database. This allows us to utilize and generate HASH values from a client’s browser all the way down the line to the rational databases used in a review platform.”

As for the idea of eDiscovery vendors agreeing to use the same routine to generate the same hash value, Bill seemed to think it was very doable and advocated a concatenation approach:

“As is commonly known, emails throw us a monkey wrench. Every email has some hidden data that is unique to that file. And as a result, we have to pick certain sections of a given email to construct a “string” of data, which we can then “HASH” to generate a unique value. But the slightest change in the format of the data affects the resulting unique hash. Something as simple as a single extra space will result in a completely different hash value.”

“What we have to do is to take the different parts of an email, combine them altogether and hash the result. At CloudNine, we pull these parts of an email and separate them with a single space.

  • SentDate (in the ISO format)
  • From
  • To
  • CC
  • BCC
  • Subject
  • Attachments (file names separated by semi-colons)
  • MsgText (text version)”

Bill, while noting that these are his initial thoughts after reading Craig’s article and might be subject to some revision, suggested a way to “code” it, in this case using C# (C Sharp) programming language:

“The combination of these fields give us a unique finger print of an email. As an extra step in trying to normalize data it’s wise to ‘trim’ up these fields (remove any leading or trailing spaces). So in code it would look like this:”

hashString = String.Format(“{0} {1} {2} {3} {4} {5} {6} {7}”,

     args.file.SentDate.ToString(“yyyy’-‘MM’-‘dd’T’HH’:’mm’:’ss”),   //ISO Format example 2009-06-15T13:45:30

     args.file.From.Trim(),

     args.file.To.Trim(),

     args.file.CC.Trim(),

     args.file.BCC.Trim(),

     args.file.Subject.Trim(),

     args.file.Attachments.Trim(),

     args.file.MsgText.Trim());

“We now have a string to hash. The last step is to hash the string. Many MD5 hash routines will contain ‘dashes’. In one more step to normalize the results let’s remove those dashes and force all of the characters to lower case.”

hash = clsHash.GetHash(hashString, clsHash.HashType.MD5).Replace(“-“, “”).ToLower();

“Based on my initial thoughts, that’s how you could standardize a hash value to use for deduping.”

Sounds like standardization on a method for generating hash values could be relatively straightforward – if you can get all the vendors to agree.

So, what do you think?  Would you benefit from a standardized method for computing hash values across all eDiscovery platforms?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here’s a New Twist to Text Overlays on Image-Only PDF Files That Can Be Even More Problematic: eDiscovery Best Practices

Remember when we discussed the issue of text overlays on image-only PDF files (typically represented as Bates numbers) and the problems they cause?  Well, we found a variation to the issue that is even more of a problem.

Here’s a recap of the issue we identified a couple of years ago.  The client was using the Discovery Client that allows clients to upload their own native data for automated processing and loading into new or existing projects into our CloudNine platform.  The collection was purported to consist mostly of image-only PDF files, which is one way to create PDF files (click back to the old post for more info on both ways to do so).

Like many processing tools, such as LAW PreDiscovery®, CloudNine was programmed back then to handle PDF files by extracting the text if present or, if not, performing OCR on the files to capture text from the image.  Text from the file is always preferable to OCR text because it’s a lot more accurate, so this is why OCR is typically only performed on the PDF files lacking text.

After the client loaded their data, we did a spot quality control check (like we always do) and discovered that the text for several of the documents only consisted of Bates numbers.

Why?

Because the Bates numbers were added as text overlays to the pre-existing image-only PDF files.  When the processing software viewed the file, it found that there was extractable text, so it extracted that text instead of OCRing the PDF file.  In effect, adding the Bates numbers as text overlays to the image-only PDF rendered it as no longer an image-only PDF.

As a result of this issue a couple of years ago, we added logic to the processing engine of CloudNine to perform OCR if there is minimal text per page (to account for the scenarios where there is only a Bates number).  Therefore, the content portion of the text would still be captured, so it would be available for indexing and searching.  Problem solved, right?

For the most part, yes.  Until a couple of weeks ago, where we ran into the situation again on a few PDF files.  Again, these files only generated the Bates numbers during processing.  What made them different?

Ever hear of a watermark?  These documents were stamped DRAFT via a light gray watermark on the PDF file.  Then, they were Bates stamped with the Adobe Acrobat Bates Numbering functionality.

Evidently, because of the watermark, the document image and the text overlaid Bates number were on separate levels of the PDF.  The processing tool failed to pick up the text because it essentially couldn’t find it.  Our production team ultimately had to re-generate the PDF files (by printing them back to PDF) and then OCR them.  That’s one reason why it’s good to have a team in place – to handle anomalies like that which occur.

As we noted a couple of years ago, if you haven’t applied Bates numbers on the files yet (or have a backup of the files before they were applied – highly recommended) and they haven’t been produced, you should process the files before putting Bates numbers on the images to ensure that you capture the most text available.  And, if opposing counsel will be producing any image-only PDF files, you will want to request the text as well (along with a load file) so that you can maximize your ability to search their production.  Doing so will save you additional processing charges.

Of course, your first choice should be to receive native format productions whenever possible – here’s a link to an excellent guide on that subject.

So, what do you think?  Have you dealt with image-only PDF files with text overlaid Bates numbers?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Here is Where You Can Catch Last Week’s ACEDS Webinar: eDiscovery Trends

Our webinar panel discussion conducted by ACEDS last week was highly attended, well reviewed and generated some interesting discussion (more on that soon).  Were you unable to attend last week’s webinar?  Good news, we have it for you here, on demand, whenever you want to check it out.

The webinar panel discussion, titled How Automation is Revolutionizing eDiscovery was sponsored by CloudNine.  Our panel discussion provided an overview of eDiscovery automation technologies and we took a hard look at the technology and definition of TAR and potential limitations associated with both.  Mary Mack, Executive Director of ACEDS moderated the discussion and I was one of the panelists, along with Bill Dimm, CEO of Hot Neuron and Bill Speros, Evidence Consulting Attorney with Speros & Associates, LLC.

Thanks to our friends at ACEDS for presenting the webinar and to Bill Dimm and Bill Speros for participating in an interesting and thought-provoking discussion.  Hope you enjoy the presentation!

So, what do you think?  Do you think automation is revolutionizing eDiscovery?  As always, please share any comments you might have or if you’d like to know more about a particular topic.

Happy Anniversary to my wife (and the love of my life), Paige!  I’m very lucky to be married to such a wonderful woman!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Cloud is a “Rush” Project’s Best Friend: eDiscovery Best Practices

Today is Friday.  While many of you can look forward to a long, enjoyable Memorial Day weekend, chances are that at least a few of you will be making weekend plans when, late in the day, you will receive a CD, DVD, hard drive or link to data on a server somewhere that needs to be reviewed over the weekend.  There goes your weekend!

Not only that, good luck connecting with your in-house litigation support person or a vendor for assistance late on a Friday – you may play a game of “phone tag” or wait for email responses for a bit.  Lit support people and vendors have weekend plans too.  Even if you do get in touch with them, you then have to fill out a form and arrange to get the data to them, which can be tricky.  It’s a lot of time, hassle and cost to get started – especially if you’re at a small law firm that doesn’t already have an eDiscovery software application to support processing and review of the data.

When consumers quickly need to find that special item to buy, or that new cool song to download, or need to stream the new season of Bloodline (available starting today on Netflix) for binge watching, they turn to the cloud.  More than ever, attorneys are turning to the cloud as well to help them get their “rush” project started immediately.  And, you don’t even have to own the software or interact with anyone to get started.

As an eDiscovery provider that offers a no-risk free trial, CloudNine (shameless plug warning!) sees at least one or two clients a week that give our software a try (many of them with “rush” projects just like this).  The trend toward automation and the cloud in the industry has not only made eDiscovery more affordable than ever, it has also made it easier than ever to get a “rush” project off and running.

If you find yourself in that situation later today, here are three easy steps to get started:

  1. Sign up for a free account here. You will receive an email with your credentials (including temporary password), to get started.
  2. When you first log in, you’ll see a button to “Upload Data”. That will take you to a form to download the CloudNine Discovery client (which is a Windows based client application that resides on your desktop) for uploading data for processing.  Download and install the client to upload data.
  3. Once the client is downloaded and installed, launch the client, log in with your newly created credentials and simply follow the wizard prompts to upload the desired data set and put it into the project of your choice (which you can create if it doesn’t already exist). It’s that easy!

We can’t get you out of working this weekend.  But, we can take the hassle out of getting started.  You’re welcome.  :o)

So, what do you think?  Have you been faced with any “rush” eDiscovery projects lately?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily will return on Tuesday as we remember this Memorial Day the people who gave their lives while serving in our armed forces.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Brad Jenkins of CloudNine: eDiscovery Trends

This is the first of the 2016 LegalTech New York (LTNY) Thought Leader Interview series.  eDiscovery Daily interviewed several thought leaders at LTNY this year to get their observations regarding trends at the show and generally within the eDiscovery industry.  Unlike previous years, some of the questions posed to each thought leader were tailored to their position in the industry, so we have dispensed with the standard questions we normally ask all thought leaders.

Today’s thought leader is Brad Jenkins of CloudNine™.  Brad has over 20 years of experience as an entrepreneur, as well as 15 years leading customer focused companies in the litigation technology arena. Brad also has authored several articles on document management and litigation support issues, and has appeared as a speaker before national audiences on document management practices and solutions.  He’s also my boss!  :o)

What are your general observations about LTNY this year and how it compared to other LTNY shows that you have attended?

Again this year, LTNY seemed reasonably well attended.  Thankfully, we didn’t have the weather and travel issues that we had the past few years, so that probably helped boost attendance.  And, the Hilton Lobby Lounge was back this year, so that provided an additional location to meet, though most of our meetings were in our suite.  Though I was really busy and didn’t get much chance to attend sessions, I understand that they were very good as always.  I did notice a drop in the number of exhibitors again this year and the exhibit hall did seem to be less crowded.  One colleague of mine who exhibited indicated that the number of leads he received at the show dropped about 30 percent from last year, so that’s consistent with my own observations and those of my colleagues.

For me, LTNY has become as much about the meetings with colleagues and business partners as it is about the show itself.  CloudNine had meetings practically booked throughout the show, with various people including industry analysts, partners and potential partners and clients and prospects.  Because it is the biggest show of the year, most people in the industry attend, so it’s an ideal opportunity to meet face to face and move business relationships along further.  Sometimes, there is just no substitute for in-person meetings to further business relationships and to communicate your message to other business colleagues.

What about general industry trends?  Are there any notable trends that you’ve observed?

Certainly one trend that I have noticed, as others have certainly noticed, is the accelerated consolidation within our industry within the provider community and the growth of investment of outside venture capital firms in our industry.  Just in the past couple of months, we have seen Huron Legal acquired by Consilio (which received a major investment from Shamrock Capital Advisors a few months before that), Millnet acquired by Advanced Discovery, Orange Legal acquired by Xact Data Discovery and Kiersted Systems acquired by OmniVere.  Rob Robinson does a terrific job of tracking mergers, acquisitions and investments in our industry and, according to his list, there have been eleven significant acquisitions and investments in just the past three months!

Another noticeable trend in the industry is the clear trend toward automation within eDiscovery.  You wrote about it earlier this year and, like you, I believe that the age of automation is here.  Some have dismissed the term “automation” as a marketing term, but I can’t think of a better term to describe the transformation of tasks that used to require a high degree of manual intervention and supervision to a point where little, if any, human involvement is necessary.  We’ve seen it for years through automation of review with technology assisted review techniques such as clustering and predictive coding and we have begun to see use of some artificial intelligence techniques on the information governance side.  Now, we are seeing automation of the processing of data to get it into a review platform and cloud-based providers (including CloudNine) automating that process.

Having been in the legal technology industry for many years, I have really seen an evolution of technology offerings in the marketplace.  At the beginning, I saw applications that were originally developed for other purposes being adapted for eDiscovery and those solutions were incomplete.  As the market developed, there started to be applications that were specifically designed for eDiscovery and those solutions were an improvement, but they were designed for isolated processes, such as collection or processing or review, with no automation of tasks.  The next generation of solutions were designed for eDiscovery and designed for task integration, but still adapted for task automation – some of those are the most popular solutions in the market today.  The new solutions – the “fourth generation” technology offerings are not only designed for eDiscovery and designed for task integration, they’re designed for task automation as well.

Many people say that if you want to tell where an industry is heading, follow the money.  In the past several months, you’ve seen providers like Logikcull and Everlaw that emphasize automation receive significant capital investments and, just before LTNY, you saw Thomson Reuters announce a new platform where automated processing is a key component.  It’s clear that big money is being invested in the growing automation sector of the industry.  You can get on the bus, or you can get run over by the bus.  As a provider that has been committed to simplified eDiscovery automation for several years now, CloudNine is on the bus and we feel that we have an excellent “seat” on that bus and are well positioned to help usher eDiscovery into the automation age.

What are you working on that you’d like our readers to know about?

Well, since I was just talking about fourth generation technology solutions, it seems appropriate to discuss how CloudNine has gotten to the point where we are in that evolution.  About 3 1/2 years ago at CloudNine, we looked at our legacy platform that had been in place since the early 2000s and was on version 14.  Our clients were happy with the platform overall, but we realized that if we were going to stay competitive as the market evolved, our legacy platform wasn’t going to be able to support those future needs.  So, we made the decision to almost completely start from scratch and re-develop our platform from the ground up, using the latest technology with an eye toward a truly simplified eDiscovery automation approach.  The platform that you see today via the user interface is just the tip of the iceberg of the overall solution – behind it is a series of workflows to accomplish various tasks.  For example, there are 34 distinct workflows (our CTO and co-founder Bill David calls them “cascading buckets“ that enable the workflows to scale) just in our Discovery Client application that enables clients to upload and process data into our CloudNine review platform.  This modularized approach of putting together re-usable workflows enables us to both scale and adapt as needed to meet changing client needs and positions us well for the future.

We feel that CloudNine is the leader in simplifying eDiscovery automation.  We do this through what we call the 4 S’s: Speed, Simplicity, Security and Services.  Clients, even brand new clients, can be up and running in five minutes (Speed) through their ability to sign up for their own account and upload and process their own data.  We recently had a brand new client who signed up for an account, uploaded and processed 27 GB of Outlook PST files (which amounted to over 300,000 emails and attachments) and culled out nearly two-thirds of the collection via HASH deduplication and irrelevant domain culling – all within 24 hours without ever having to speak to a CloudNine representative!  The ease of use (Simplicity) of the platform through the wizard-based client application for uploading data and a browser independent review module enables our clients to get up to speed with no more than an hour (or less) of training required.

Our approach to Security is unique as well – we operate within a protected cloud, not a public cloud, where the clients know that their data will be located on our servers in a Tier IV data center that is located 5 minutes from our offices.  This data center hosts data for nine of the top Fortune 20 corporations and was instrumental in us being selected over a year ago by a Fortune 150 corporation to host their data.  Finally, what makes us unique are the Services that we provide to support the software and automation – in addition to the software that we provide to help automate the eDiscovery process, we also provide managed services ranging from forensic collection to data conversion to technical advice and eDiscovery best practices and managed document review.  This enables our clients to rely on one provider for all of their services needs – as opposed to software-only providers that would have to outsource those services to a third party.

We believe that the combination of Speed, Simplicity, Security and Services enables CloudNine to provide the simplified eDiscovery automation approach that our clients want.  It’s an exciting time in our industry and CloudNine is excited to be forefront in its continued evolution, as we have been for the last 13 years!

Thanks, Brad, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

For a Positive Outlook to Discovering Emails, You Need a Closed Outlook: eDiscovery Best Practices

Does that statement seem confusing?  Let me explain.

Let’s call this a “tip of the day”.  As you may know, at CloudNine (shameless plug warning!), we have an automated processing capability for enabling clients to load and process their own data – they can use this capability to load their data into our review platform or they can even process data for loading into their own preferred review platform if they want.  So, we can still help you even if you already use Relativity or a number of other popular platforms.

Regardless of that fact, most of our users are using the processing capability to process emails, usually from Outlook Personal Storage Table (PST) files.  Let’s face it, despite increased volumes of social media and other types of electronically stored information, emails are still predominant in eDiscovery.  And, for those users, we get one issue more than any other when it comes to processing those Outlook emails:

They still have Outlook open with the PST file opened when they attempt to upload that PST file or when they try to create a ZIP file containing the Outlook PST.

The resulting ZIP file that is created (either by the user or by our client application if the data is not already contained in an archive file) will almost invariably be corrupted or empty.  Either way, this results in a failure during processing of the loaded data – because, that data is simply corrupt.

So, my tip of the day is this: Before attempting to create a ZIP (or RAR or other type of archive) of a PST file (or before you upload it to a platform like CloudNine for processing), make sure that Outlook is closed or at least that the PST file is closed within Outlook.  For a positive outlook to discovering emails, you need a closed Outlook.

Does that make sense now?  :o)

So, what do you think?  Is email still the predominant source of discoverable ESI in your organization?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Mo’ Data, Mo’ Data, Mo’ Data from EDRM: eDiscovery Trends

It didn’t take long for EDRM to deliver on its promise of an advanced data set.  Back in August, EDRM announced the release of the first of its “Micro Datasets”, designed for eDiscovery data testing and process validation.  The first one was small, this new data set is MUCH bigger.

The initial August offering was a 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files containing EDRM specific work product files and data from public websites to uncommon formats including .mbox email storage files and .gz archive files.  On Monday, EDRM announced the release of a new 5.7 GB Micro Dataset. As before, this new EDRM dataset was assembled to meet eDiscovery data testing and process validation needs of software and tool providers, litigation support organizations, law firms and educational organizations and is sourced from publicly available data and free from copyright restrictions.

Designed to support exception handling exercises and advanced testing, the files in the new dataset have various levels of corruption, and the dataset contains a duplicate set of files that are encrypted.  The file types in the set include:

  • A variety of.csv files
  • Websites and web pages
  • Adobe Acrobat files
  • Graphic files and photographs
  • Public census data
  • Microsoft Office files
  • Audio files
  • 4 email boxes with shared correspondence, threads and attachments
  • Multiple Encase .e01 files containing data from a phone and another data source

This new EDRM Micro Dataset is available exclusively to EDRM members. Current EDRM members have been notified by email with instructions for file downloading (I just downloaded my copy yesterday and look forward to delving into it this week).  So, if you’re interested in joining EDRM, there has never been a better time!  Organizations and individuals interested in EDRM membership will find information at https://www.edrm.net/join/.

“The EDRM Dataset team has done outstanding work in advancing the industry with the development of advanced datasets that better reflect the types of data anomalies and challenges faced by e-discovery professionals today,” said George Socha, co-founder of EDRM. “EDRM members will benefit greatly from their work, in addition to the education, guidelines and latest in industry best practices provided to members.”

Five years after the Enron data set was converted to Outlook by the EDRM Data Set team (in November of 2010) we’re beginning to have some new dataset options.  We may actually someday see an eDiscovery product demo without Enron data!

So, what do you think?  Are you looking forward to checking out the new data set?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Defendant Compelled to Restore and Produce Emails from Backup Tapes: eDiscovery Case Law

In United States ex rel Guardiola v. Renown Health, No. 3:12-cv-00295-LRH-VPC, (D. Nev. Aug. 25, 2015), Nevada Magistrate Judge Valerie P. Cooke concluded that emails contained on backup tapes held by the defendants was not reasonably inaccessible due to undue cost and, even if the emails were reasonably inaccessible due to undue burden or undue cost, “good cause supports their discoverability”.  Also, after an analysis of cost-shifting factors found only one factor favored cost-shifting of the production of emails to the relator, Judge Cooke ordered the defendant to bear the cost of restoration and production.

Case Background

In this qui tam action under the False Claims Act, the relator filed a motion to compel production of email from the defendant for a “gap period” when the emails were stored on backup tapes, pursuant to the defendants’ email retention policy.  On the belief that the March 2011 tapes held the greatest number and scope of historical emails relevant to this litigation, the defendant had previously restored the March 2011 backup tapes via a third-party vendor and produced emails at a cost of over $100,000 (including attorney review and production).

The defendants objected, alleging that the emails were not reasonably accessible because of undue burden and cost, and stating that its IT department could not restore the gap-period emails in house; therefore, it would have to outsource the restoration work for a cost of $136,000 and a total cost of at least $248,000 after adding data processing and contract attorney review.

Judge’s Ruling

Noting that “[u]nder Rule 26(b)(2)(B), it is Renown’s burden to show that the gap-period emails are not reasonably accessible due to undue burden”, Judge Cooke stated that “As a preliminary matter, the plain language of Rule 26(b)(2)(B) instructs that “undue burden,” rather than the format of the ESI, is to guide the court’s analysis. Technological features of the storage media do enter the analysis, but only as they relate to the undue burden inquiry. Stated differently, undue burden is fact specific and no format is inaccessible per se.”

With that in mind, Judge Cooke concluded that “Renown has failed to show that the gap-period emails are not reasonably accessible because of undue burden. As described above, Renown has produced emails from the restored March 2011 backup tapes. In so doing, Renown has demonstrated that it is technologically feasible to restore and produce the gap-period emails… Accordingly, the court cannot fathom what burden accompanies third-party restoration. Renown has not stated that use of a vendor will nevertheless impose burdens – in terms of staff resources, delay of other critical IT projects, or inadequate attention to existing technology infrastructure.”

As for the defendants’ undue cost argument, Judge Cooke rejected “Renown’s argument that ‘cost’ under Rule 26(b)(2)(B) includes document review and storage”, determining that the “$136,000 figure for restoration is not an undue cost that renders the gap-period emails reasonably inaccessible”.

Next, Judge Cooke turned to the question of whether the relator had established good cause for the emails’ production by applying the seven factor balancing test of the costs and potential benefits of the requested discovery under Rule 26(b)(2)(B).  Determining that “five of the relevant factors favor relator, while two are neutral”, Judge Cooke found that “relator has carried her step-two burden of demonstrating good cause”, so “even were the gap-period emails reasonably inaccessible due to undue burden or undue cost, good cause supports their discoverability.”

Finally, Judge Cooke performed an examination of cost shifting, using the seven factor test used in Zubulake.  Noting that “[t]he weightiest factors, relevance and availability of alternatives, balance powerfully against cost shifting”, Judge Cooke ruled that “the costshifting factors require that Renown bear the cost of restoration.”  Therefore, she granted the relator’s motion to compel and denied the defendant’s motion for cost shifting, ordering the relator and defendant to meet and confer to discuss a schedule for production of the gap-period emails.

So, what do you think?  Should the defendant have been ordered to restore the emails from backup tape?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.