Processing Archives

Be Afraid, Be Very Afraid – eDiscovery Horrors!

October 31, 2014

Today is Halloween. Every year at this time, because (after all) we’re an eDiscovery blog, we try to “scare” you with tales of eDiscovery horrors. This is our fifth year of doing so, let’s see how we do this year. Be afraid, be very afraid!

Did you know that overlaying Bates numbers on image-only Adobe PDF files causes the text of the image not to be captured by eDiscovery processing applications?

What about this?

Finding that the information was relevant and that the defendants “acted with a culpable state of mind” when they failed to preserve the data in its original form, New York Magistrate Judge Ronald L. Ellis granted the plaintiff’s motion for spoliation sanctions against the defendant, ordering the defendant to bear the cost of obtaining all the relevant data in question from a third party as well as paying for plaintiff attorney fees in filing the motion.

Or this?

It’s Friday at 5:00 and I need 15 gigabytes of data processed to review this weekend.

How about this?

Ultimately, it became clear that the defendant had not exported or preserved the data from salesforce.com and had re-used the plaintiffs’ accounts, spoliating the only information that could have addressed the defendant’s claim that the terminations were performance related (the defendant claimed did not conduct performance reviews of its sales representatives). As a result, Judge Kemp stated that the “only realistic solution to this problem is to preclude Tellermate from using any evidence which would tend to show that the Browns were terminated for performance-related reasons”…

Or maybe this?

Could an “unconscionable” eDiscovery vendor actually charge nearly $190,000 to process 505 GB and host it for three months? Could another vendor charge over $800,000 to re-process and host data (that it had previously hosted) for approximately two months? Yes, in both cases (though, at least in the second case, the court disallowed over $700,000 of the billed costs).

Scary, huh? If the possibility of additional processing charges for your PDF files, sanctions because you didn’t preserve data in its original format or preserve it in your cloud-based system or inflated eDiscovery vendor charges scares you, then the folks at eDiscovery Daily will do our best to provide useful information and best practices to enable you to relax and sleep soundly, even on Halloween!

Then again, if it really is Friday at 5:00 and you need 15 gigabytes of data processed to review this weekend (inexpensively, no less), maybe you should check this out.

Of course, if you seriously want to get into the spirit of Halloween, click here. This will really terrify you! (Rest in Peace, Robin)

What do you think? Is there a particular eDiscovery issue that scares you? Please share your comments and let us know if you’d like more information on a particular topic.

Happy Halloween!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Those Pesky Email Signatures and Disclaimers – eDiscovery Best Practices

October 24, 2014

Are email signatures and disclaimers causing more trouble than they’re worth? According to one author, perhaps they are.

Earlier this week, Jeff Bennion wrote an interesting post on the Above the Law blog (‘Please Consider the Environment Before Printing’ Email Signatures Are Hurting the Environment) where he noted that, about 5 years ago, people started putting ‘Please consider the environment before printing this e-mail’ in their email signature (along with a webdings font character of a tree).

Bennion states that this is “the Kony 2012 of the environmental battles – it’s a noble war, but a pointless battle” and that the printing of emails is only a tiny fraction of the paper that lawyers waste. Instead, he notes, “the ‘please consider the environment’ email signature is more like one of those ‘I voted’ stickers — both serve no purpose other than proclaiming your self-righteousness for performing a civic duty”.

In fact, per a Time magazine article, the internet accounts for a good deal of the pollution in the world. In a 2011 article, cleantechnica.com reported that there were about 500,000 data centers in the world and each used 10 megawatts of energy a month. That’s a lot more than 1.21 gigawatts. Great Scott!

When comparing Word files containing data that might go into an email with the same data that also includes the email signature, Bennion observes that the one with the email signature contains .3 KB more of data than the one without the signature. He extrapolates that out to 27,000 GB of extra useless data being added to internet storage servers every day (10 million GB per year) over all business emails, while acknowledging that not all 90 billion business emails are including the signature. “The point is that it is a pointless gesture that, as a whole, does more harm than good”, Bennion states.

And, the same holds true for those confidential and privileged email disclaimers at the bottom of emails, which he observes “take up about 10-20 times more wasted space than the ‘please stop printing your emails’ disclaimer” – “roughly the environmental equivalent of clubbing 3 baby seals a month”. Some interesting takes.

These email signatures and disclaimers also affect eDiscovery costs, both in terms of extra data to process and also host. They can also lead to false hits when searching text and affect conceptual clustering or predictive coding of documents (which are based on text content of the documents) unless steps are taken to remove those from indices and ignore the text when performing those processes. All of which can lead to extra work and extra cost.

So, what do you think? Do you use “please stop printing your emails” signatures and confidential and privileged email disclaimers? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Text Overlays on Image-Only PDF Files Can Be Problematic – eDiscovery Best Practices

October 23, 2014

Recently, we at CloudNine Discovery received a set of Adobe PDF files from a client that raised an issue regarding the handling of those files for searching and reviewing purposes. The issue serves as a cautionary tale for those working with image-only PDFs in their document collection. Here’s a recap of the issue.

The client was using OnDemand Discovery®, which is our new Client Side add-on to OnDemand® that allows clients to upload their own native data for automated processing and loading into new or existing projects. The collection was purported to consist mostly of image-only PDF files. PDF files are created in two ways:

By saving or printing from applications to a PDF file: Many applications, such as Microsoft Office applications like Word, Excel and PowerPoint, provide the ability to save the document or spreadsheet that you’ve created to a PDF file, which is common when you want to “publish” the document. If the application you’re using doesn’t provide that option, you can print the document to PDF using any of several PDF printer drivers available (some of which are free). These PDFs that are created usually include the text of the file from which the PDF was created.
By scanning or otherwise creating an image to a PDF file: Typically, this occurs either by scanning hard copy documents to PDF or through some sort of receipt in an image-only PDF form (such as through fax software). These PDFs that are created are images and do not include the text of the document from which they came.

Like many processing tools, such as LAW PreDiscovery®, OnDemand Discovery is programmed to handle PDF files by extracting the text if present or, if not, performing OCR on the files to capture text from the image. Text from the file is always preferable to OCR text because it’s a lot more accurate, so this is why OCR is typically only performed on the PDF files lacking text.

After the client loaded their data, we did a spot Quality Control check (like we always do) and discovered that the text for several of the documents only consisted of Bates numbers.

Why?

Because the Bates numbers were added as text overlays to the pre-existing image-only PDF files. When the processing software viewed the file, it found that there was extractable text, so it extracted that text instead of OCRing the PDF file. In effect, adding the Bates numbers as text overlays to the image-only PDF rendered it as no longer an image-only PDF. Therefore, the content portion of the text wasn’t captured, so it wasn’t available for indexing and searching. These documents were essentially rendered non-searchable even after processing.

How did this happen? Likely through Adobe Acrobat’s Bates Numbering functionality, which is available on later versions of Acrobat (version 8 and higher). It does exactly that – applies a text overlay Bates number to each page of the document. Once that happens, eDiscovery processing software applications will not perform OCR on the image-only PDF.

What can you do about it? If you haven’t applied Bates numbers on the files yet (or have a backup of the files before they were applied – highly recommended) and they haven’t been produced, you should process the files before putting Bates numbers on the images to ensure that you capture the most text available. And, if opposing counsel will be producing any image-only PDF files, you will want to request the text as well (along with a load file) so that you can maximize your ability to search their production (of course, your first choice should be to receive native format productions whenever possible – here’s a link to an excellent guide on that subject).

If the Bates numbers are already applied and you don’t have a backup of the files without the Bates numbers (oops!) you’re faced with additional processing charges to convert them to TIFF and perform OCR of the text AND the Bates number, a totally unnecessary charge if you plan ahead.

So, what do you think? Have you dealt with image-only PDF files with text overlaid Bates numbers? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

How Mature is Your Organization in Handling eDiscovery? – eDiscovery Best Practices

October 7, 2014

A new self-assessment resource from EDRM helps you answer that question.

A few days ago, EDRM announced the release of the EDRM eDiscovery Maturity Self-Assessment Test (eMSAT-1), the “first self-assessment resource to help organizations measure their eDiscovery maturity” (according to their press release linked here).

As stated in the press release, eMSAT-1 is a downloadable Excel workbook containing 25 worksheets (actually 27 worksheets when you count the Summary sheet and the List sheet of valid choices at the end) organized into seven sections covering various aspects of the e-discovery process. Complete the worksheets and the assessment results are displayed in summary form at the beginning of the spreadsheet. eMSAT-1 is the first of several resources and tools being developed by the EDRM Metrics group, led by Clark and Dera Nevin, with assistance from a diverse collection of industry professionals, as part of an ambitious Maturity Model project.

The seven sections covered by the workbook are:

General Information Governance: Contains ten questions to answer regarding your organization’s handling of information governance.
Data Identification, Preservation & Collection: Contains five questions to answer regarding your organization’s handling of these “left side” phases.
Data Processing & Hosting: Contains three questions to answer regarding your organization’s handling of processing, early data assessment and hosting.
Data Review & Analysis: Contains two questions to answer regarding your organization’s handling of search and review.
Data Production: Contains two questions to answer regarding your organization’s handling of production and protecting privileged information.
Personnel & Support: Contains two questions to answer regarding your organization’s hiring, training and procurement processes.
Project Conclusion: Contains one question to answer regarding your organization’s processes for managing data once a matter has concluded.

Each question is a separate sheet, with five answers ranked from 1 to 5 to reflect your organization’s maturity in that area (with descriptions to associate with each level of maturity). Default value of 1 for each question. The five answers are:

1: No Process, Reactive
2: Fragmented Process
3: Standardized Process, Not Enforced
4: Standardized Process, Enforced
5: Actively Managed Process, Proactive

Once you answer all the questions, the Summary sheet shows your overall average, as well as your average for each section. It’s an easy workbook to use with input areas defined by cells in yellow. The whole workbook is editable, so perhaps the next edition could lock down the calculated only cells. Nonetheless, the workbook is intuitive and provides a nice exercise for an organization to grade their level of eDiscovery maturity.

You can download a copy of the eMSAT-1 Excel workbook from here, as well as get more information on how to use it (the page also describes how to provide feedback to make the next iterations even better).

The EDRM Maturity Model Self-Assessment Test is the fourth release in recent months by the EDRM Metrics team. In June 2013, the new Metrics Model was released, in November 2013 a supporting glossary of terms for the Metrics Model was published and in November 2013 the EDRM Budget Calculators project kicked off (with four calculators covered by us here, here, here and here). They’ve been busy.

So, what do you think? How mature is your organization in handling eDiscovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Good Processing Requires a Sound Process – Best of eDiscovery Daily

October 3, 2014

Home at last! Today, we are recovering from our trip, after arriving back home one day late and without our luggage. Satan, thy name is Lufthansa! Anyway, for these past two weeks except for Jane Gennarelli’s Throwback Thursday series, we have been re-publishing some of our more popular and frequently referenced posts. Today’s post is a topic that comes up often with our clients. Enjoy! New posts next week!

As we discussed Wednesday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading. To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production. As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing. Examples of information we collect from our clients:

Do you need de-duplication? If so, should it performed at the case or the custodian level?
Should Outlook emails be extracted in MSG or HTM format?
What time zone should we use for email extraction? Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT). If you don’t think that matters, consider this example.
Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text? If we don’t OCR those files, these could be responsive files that are missed during searching.
If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
Should the collection be culled based on a responsive date range?
Should the collection be culled based on key terms?

Those are some general examples for native processing. If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing. Some examples:

Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
Should color images be created when appropriate?
Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
Should we create images of Excel files? If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
Should we endorse the images? If so, how?

Those are just some examples. Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use. Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data. Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think? How do you handle processing of electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – Best of eDiscovery Daily

October 1, 2014

Come fly with me! Today we are winding our way back home from Paris, by way of Frankfurt. For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts. Today’s post is a topic that relates to a question that I get asked often. Enjoy!

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started. Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited. Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.). It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is. They don’t just represent one problem, they can represent a whole collection of problems. For example:

Image only electronic files such as TIFF or image-only PDF files may be electronic, but they still have no searchable text. They still require OCR to generate searchable text to enable them to be effectively searched. It’s important to account for image-only files when self-collecting as keyword searches will miss these files.
Outlook Emails are typically stored in a “container” file like an EDB (Exchange Database), OST (Outlook Offline Storage Table) or PST (Outlook Personal Storage Table). To work with the emails individually, they typically require processing to break them out into individual MSG (Outlook MSG Files). That processing is also necessary to break out the attachments from the emails so that they can be reviewed or categorized individually, if required. And, if the emails are stored in Lotus Notes, there is no equivalent single message format, so those emails generally require conversion to HTML format during processing.
Databases are large, structured collections of data, but they don’t relate easily to a document format, so they require some analysis to determine if, and in what form, they should be produced.
In almost every collection, there are some files that cannot be processed or searched. Corrupt files, password protected files and other types of exception files are frequent components of your ESI collection and it can become very expensive to make these files searchable or reviewable.

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward. Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom. They’re way more useful than paper documents. So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available. But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think? Have you encountered difficulties or challenges when processing electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Our 1,000th Post! – eDiscovery Milestones

September 3, 2014

When we launched nearly four years ago on September 20, 2010, our goal was to be a daily resource for eDiscovery news and analysis. Now, after doing so each business day (except for one), I’m happy to announce that today is our 1,000th post on eDiscovery Daily!

We’ve covered the gamut in eDiscovery, from case law to industry trends to best practices. Here are some of the categories that we’ve covered and the number of posts (to date) for each:

Case Law (326 posts), including those dealing with Sanctions (151)
Searching (238)
Proportionality (140)
Law Firm Departments (115)
Project Management (102)
Outsourcing (97)
Social Media (95)
Federal Discovery Rules (68)
SaaS Based Technologies (65)
State Discovery Rules (35)

We’ve also covered every phase of the EDRM (177) life cycle, including:

Every post we have published is still available on the site for your reference, which has made eDiscovery Daily into quite a knowledgebase! We’re quite proud of that.

Comparing our first three months of existence to now, we have seen traffic on our site grow an amazing 474%! Our subscriber base has more than tripled in the last three years! We want to take this time to thank you, our readers and subcribers, for making that happen. Thanks for making the eDiscoveryDaily blog a regular resource for your eDiscovery news and analysis! We really appreciate the support!

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Ride the Lightning, Litigation Support Guru, Complex Discovery, Bryan University, The Electronic Discovery Reading Room, Litigation Support Today, Alltop, ABA Journal, Litigation Support Blog.com, InfoGovernance Engagement Area, EDD Blog Online, eDiscovery Journal, e-Discovery Team ® and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!). We really appreciate it!

I also want to extend a special thanks to Jane Gennarelli, who has provided some serial topics, ranging from project management to coordinating review teams to what litigation support and discovery used to be like back in the 80’s (to which some of us “old timers” can relate). Her contributions are always well received and appreciated by the readers – and also especially by me, since I get a day off!

We always end each post with a request: “Please share any comments you might have or if you’d like to know more about a particular topic.” And, we mean it. We want to cover the topics you want to hear about, so please let us know.

Tomorrow, we’ll be back with a new, original post. In the meantime, feel free to click on any of the links above and peruse some of our 999 previous posts. Now is your chance to catch up! 😉

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

It’s Friday at 5 and I Need Data Processed to Review this Weekend – eDiscovery Humor

August 22, 2014

We’ve referenced Ralph Losey’s excellent e-Discovery Team® blog several times before on this blog – it’s a great read and you won’t find a blog that gets more in depth than his does (he has also been gracious enough to participate in our thought leader interview series for the last three years). And, as Ralph has demonstrated before, he has a sense of humor when it comes to electronic discovery.

In his latest post, Are You The Lit Support Tech?, Ralph takes a humorous look at “what it is like on a Friday afternoon in the Litigation Support Departments of most law firms”. Or so it seems sometime. Like before, Ralph used XtraNormal to make the video. XtraNormal enables you to make an animated movie by selecting your animated “actors”, type or record your dialogue, and select a background. The “actors” sound a bit robotic if you type the dialogue, but that just adds to the humor as the pronunciations and inflections are rather humorous.

Anyway, the video involves a law firm partner coming to the lit support tech on a Friday afternoon and asking for help to process data for ten custodians so that he can review over the weekend as the production is due Monday. “When did you receive the request?”, asks the tech. “30 days ago”, says the partner, “Why?”. “No reason”, says the tech.

The video continues with the partner telling the tech not to worry “it’s only 15 gigabytes…not such a big number”. When the tech says that it’s over a million pages and he will have to process it and load it into their review platform, the partner says “I don’t have time for all the processing and stuff, just print it out and load it in my car.”

OK, so part of the humor is that it’s a bit farfetched (hopefully). Ralph notes that because he no longer has to supervise a litigation support department (because Jackson Lewis outsources all of its nonlegal electronic data discovery work), his “Friday afternoons are much nicer”.

It’s the vendor that has to deal with these last minute requests. At CloudNine Discovery, we can relate to the lit support tech who receives 15 gigabytes (or even more) on a Friday afternoon to process for weekend review – we get those types of requests more often than you think and our staff often works late Fridays to get the client’s data ready. It goes with the territory. So, we don’t make big plans on Friday night so that you can enjoy yours!

So, what do you think? Have you had to deal with last minute eDiscovery requests? If so, how do you handle them? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

If Your Documents Are Not Logical, Discovery Won’t Be Either – eDiscovery Best Practices

May 13, 2014

Scanning may no longer be cool, but it’s still necessary. Electronic discovery still typically includes a paper component. When it comes to paper, how documents are identified is critical to how useful they will be. Here’s an example.

Your client collects hard copy documents from various custodians related to the case and organizes them into folders. In one of the folders is a one page fax cover sheet attached to a two page letter, as well as an unrelated report and four different contracts, each 15-20 pages. The entire folder is scanned as a single document, as either a TIFF or PDF file.

Only the letter is retrieved in a search as responsive to the case. But, because it is contained within a document containing 70 to 80 other pages, you wind up reviewing 70 to 80 unrelated pages that would not otherwise have to review. It complicates production, as well – how do you produce partial “documents”? Also, if the non-responsive report and contracts have duplicates in the collection, you can’t effectively de-dupe those to eliminate those from the review population because they’re combined together.

It happens more often than you think. It also can happen – sometimes quite often – with the scanned documents that the other side produces to you. So, how do you get the documents into a more logical and usable organization?

Logical Document Determination (or LDD) is a process that some eDiscovery providers (including – shameless plug warning! – CloudNine Discovery). It’s a process where each image page in a scanned document set is reviewed and the “logical document breaks” (i.e., each page that starts a new document) is identified. Then, the documents are re-assembled, based on those logical document breaks.

Once the documents are logically organized, other processes – like Optical Character Recognition (OCR) and clustering (including near duplicate identification) can then be performed at the appropriate level of documents and the smaller, more precise, unitized documents can be indexed for searching. Instead of reviewing a 70-80 page “document” comprised of several logical documents, your search will retrieve the two page letter that is actually responsive, making your review and production processes more efficient.

LDD is typically priced on a per page basis of pages reviewed for logical document breaks – prices can vary depending on the volume of pages to be reviewed and where the work is being performed (there are providers in the US and overseas). While it’s a manual process, it’s well worth it if your collection of imaged documents is poorly defined.

So, what do you think? Have you ever received a collection of poorly organized image files? If so, did you use Logical Document Determination to organize them properly? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Tom O’Connor of Gulf Coast Legal Technology Center – eDiscovery Trends

March 14, 2014

This is the ninth of the 2014 LegalTech New York (LTNY) Thought Leader Interview series. eDiscoveryDaily interviewed several thought leaders after LTNY this year (don’t get us started) and generally asked each of them the following questions:

What significant eDiscovery trends did you see at LTNY this year and what do you see for 2014?
With new amendments to discovery provisions of the Federal Rules of Civil Procedure now in the comment phase, do you see those being approved this year and what do you see as the impact of those Rules changes?
It seems despite numerous resources in the industry, most attorneys still don’t know a lot about eDiscovery? Do you agree with that and, if so, what do you think can be done to improve the situation?
What are you working on that you’d like our readers to know about?

Today’s thought leader is Tom O’Connor. Tom is a nationally known consultant, speaker and writer in the area of computerized litigation support systems. A frequent lecturer on the subject of legal technology, Tom has been on the faculty of numerous national CLE providers and has taught college level courses on legal technology. Tom’s involvement with large cases led him to become familiar with dozens of various software applications for litigation support and he has both designed databases and trained legal staffs in their use on many of the cases mentioned above. This work has involved both public and private law firms of all sizes across the nation. Tom is the Director of the Gulf Coast Legal Technology Center in New Orleans.

What significant eDiscovery trends did you see at LTNY this year and what do you see for 2014?

In my opinion, LegalTech has become a real car show. There are just too many vendors on the show floor, all saying they do the same thing. Someone at the show tallied it up and determined that 38% of the exhibitors were eDiscovery vendors. And, that’s just the dedicated eDiscovery vendors – there are other companies like Lexis, who do other things, but half of their booth was focused on eDiscovery. The show has sections of the booths down one long hall with sales people standing in front of each section and it’s like “running the gauntlet” when you walk by them. It’s a bit overwhelming.

Having said that, a lot of people were still getting stuff done, but they were doing so in the suites either at the hotel or across the street. I saw a lot of good B-to-B activities off the sales floor and I think you can get more done with the leads that you get if you can get them off the sales floor in a more sane environment. At the same time, if you’re not at the show, people question you. They’ll say “hey, what happened to the wombat company?” So, being at the show still helps, at least with name recognition.

One trend that has been going on for a while is that “everybody under the sun” is doing eDiscovery or says that they’re doing eDiscovery. The phenomenal growth of the number of eDiscovery vendors of all sizes surprises me. We see headlines about providers getting bought out and some companies acquiring other companies, but it seems like every time one gets acquired, two more take its place. That surprised me as I expected to see more stratification, but did not. Not that buyouts aren’t occurring, but there’s just so much growth in the space that the number of players is not shrinking.

Another trend that I noticed which puzzled me until I walked around the show and realized what was going on, is the entry of companies like IBM and Xerox into the eDiscovery space. It puzzled me until I took a good look at their products and realized that the trend is to get more throughput in processing. Our data sets are getting so big. A terabyte is just not that unusual anymore. Two to five terabytes is becoming typical in large cases. 500 GB to 1 terabyte is becoming more common, even in a small case. Being able to process 5 to 10 GB an hour isn’t cutting it anymore and I saw more pressure on vendors to process up to a terabyte (or even more) per day. So, it makes sense that companies like IBM and Xerox are going to get into the big data space for corporate clients because they’re already there and they have the horsepower. So, I see the industry focused on different ways to speed up ingestion and processing of data.

That has been accompanied by another trend: pricing pressures. Providers are starting to offer deals like $20 per GB all in with hosting, processing, review, unlimited users, etc. From the other end of the spectrum of companies like IBM and Xerox are small technology companies, coming not from legal but from a very high-end technology background, looking to apply their technology skills in the eDiscovery space and offering really discounted prices. I’ve seen a lot of that and we started to see it last year, with providers starting to offer project pricing and getting away from a per GB pricing model. I think we’re going to see more and more of that as the year goes along. I hesitate to use the word “commoditized” because I don’t think it is. It’s not like scanning – every eDiscovery job is different with the types of files you have and what you want to accomplish. But, there will certainly be a big push to lower the pricing from what we’ve been seeing for the 1-3 years and I think you’re going to see some pretty dramatic price cuts with pressure from new players coming into the market and increased competition.

With new amendments to discovery provisions of the Federal Rules of Civil Procedure now in the comment phase, do you see those being approved this year and what do you see as the impact of those Rules changes?

I’ve been astonished that after the first wave of comments last fall that there has been little or no public comments or even discussion in the media about the rules changes. The public comment period closes tomorrow (Tom was interviewed on February 14) and you know the saying “March comes in like a lion and goes out like a lamb”? That seems to be how it is with the end of the comment period. I think I saw one article mentioning the fact that the comments were closing this week. It has been a surprising non-issue to me.

For that reason, I think the rules changes will go through. I don’t think there has been a concerted effort to speak out against them. As I understand it, the rules still won’t be enacted until 2016 because they still have to go back to the committee and through Congress and through the Supreme Court. It’s a really lengthy period which allows for intervention at a number of different steps. But, I haven’t seen any concerted effort mounted to talk against them, though Judge Scheindlin has been quite adamant in her comments. My personal feeling is that we didn’t need the new rules. I think they benefit the corporate defense world and change some standards. Craig Ball pointed out in a column last year that they don’t even address the issue of metadata, which is problematic. I don’t think we needed the rules changes, quite frankly. And, I wrote a column about that last year. In a world where I hear commentators and judges say that 90% of the attorneys that appear in front of them still don’t understand ESI or how things work, clearly if they don’t understand the current rules, why do we need rules changes? Let’s get people up to speed on what they’re supposed to be doing now before we worry about fine tuning it. I understand the motivation behind getting them enacted from the people who are pushing for them, why they wanted them and I suspect they will pretty much go through as written.

It seems despite numerous resources in the industry, most attorneys still don’t know a lot about eDiscovery? Do you agree with that and, if so, what do you think can be done to improve the situation?

I absolutely agree with that. I think the obvious remedy is to educate them where lawyers get educated, which is in law schools and I think the law schools have been negligent, if not grossly negligent, in addressing that issue. Browning Marean and I went around to the different law schools to try to get them to sponsor a clinic or educational program in this area eight or nine years ago and were rebuffed. Even to this day, though there are some individuals that are teaching classes at individual law schools, with the exception of a new program at Northeastern, there has been no curriculum devoted to technology as part of the regular law school curriculum.

Even the programs that have sprung up: the wonderful job that Craig Ball and Judge Facciola does at Georgetown Law School is sponsored by their CLE department, not the law school itself. Michael Arkfeld has a great program that he does for three days down at the Sandra Day O’Connor law school in Arizona State University (covered on the blog here). But, it’s a three day program, not a course, not a curriculum. It’s not a focus in the curriculum of the actual law school itself. We’ve had “grass roots” efforts spring up with Craig’s and Michael’s efforts, what Ralph Losey and his son Adam have been doing, as well as a number of people at the local level with CLE programs. But, the fact is that lawyers get educated in law schools and if you really want to solve this, you make it as part of the curriculum at law schools.

There has always been an attitude on the part of law schools. As Browning and I were told by the dean of a top flight law school several years ago, “we train architects, not carpenters”. I myself was referred to, face-to-face, by a group of law professors as a “tradesman”. They said “Gee, Tom, this proposal is a great idea, but why would we trust the education of our students to a tradesman like you?” There’s this sort of disdainful academic outlook on anything that involves the hands-on use of computers and that’s got to change. Judge Rosenthal said that “we have to change the paradigm” on how we handle things. Lawyers and judges alike have to look at things differently and all of us need to adjust how we look at the world today. Because it’s not just a legal issue, it’s a social issue. Society has changed how it manufactures, creates and stores information/data/documents. Other professional areas have caught onto that and legal education has really lagged behind.

I mentioned the eDiscovery Institute at Georgetown Law School, which happens every June. But, they cap the attendants at about 60. Do the math, there are about a million lawyers in the country and if you’re only going to educate 60 per year, you’ll never get there. I also think that bar associations could be much more forthright in education in this area and requiring it. Judicial pressure is having the best results – judges are requiring some sort of certification of competence in this area. I know of several Federal judges who require the parties to state for the record that they’re qualified to address eDiscovery. Some of the pilot projects that have sprung up, like the one at the University of Chicago, are going to require a self-certifying affidavit of competence (assuming they pass) stating that you’re qualified to talk about these issues. Judges are expecting lawyers, regardless of how they learn it, to know what they’re talking about with regard to technology and not to waste the court’s time.

What are you working on that you’d like our readers to know about?

I just recently published a new guide on Technolawyer, titled LitigationWorld Quick Start Guide to Mastering Ediscovery (and covered on this blog here). There are a lot of beginner’s guides to eDiscovery, but this one doesn’t really focus on eDiscovery, it focuses on technology, answering questions like: How do computers work? What are bits, bytes, RAM, what’s a gigabyte, what’s a terabyte, etc.

I literally had a discussion about an hour ago with a client for whom we have a big case going on in Federal court and there’s a large production, over a terabyte being processed by our opponents in the case right now. I asked the client how much paper he thought that was and he had no idea. The next time we start arguing cost in front of the judge, I’m going to bring in a chart that says a gigabyte is X number of pages of paper so that it has some meaning to them. So, I think it’s really important to explain these basic concepts, and we in the technology world forget quite a bit how little many lawyers know about technology. So the guide is designed to talk about how electronic media stores data, how that data is retrieved and explains some of the common terms and phrases used in the physical construction and workings of a computer. Before you even start talking about eDiscovery, you need to have an understanding of how computers work and how they find data and where data can reside. We throw around terms like “slack space” and “metadata” casually without realizing that not everyone understands those terms. This guide is meant to address that knowledge gap.

I’m continuing some of my case work, of course. Lastly, I recently joined a company called Cavo, which is bringing a new eDiscovery product to market that I’m excited about. Busy as always! And, of course, there are always good things going on in New Orleans!

Thanks, Tom, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.