Review

Four Inefficiency Traps to Avoid in Your Legal Document Review Process

For every time-saving, cost-cutting efficiency available in legal document review lives an equal number of challenges and pitfalls which can consume your productivity and budget. For LSPs and law firms, a thorough and effective legal review will depend on more than just data size and solution speed. Navigating successfully through the review process means knowing where you can expedite and streamline your project with efficiency and how to avoid costly mistakes.

Read on to learn about four potential inefficiency traps you can avoid in your next legal review to save time and money:

  1. Document format and storage
  2. Inefficient upload speeds
  3. Duplicate data
  4. Single-user access to documents

Trap 1: Document Format and Storage

One of the first challenges to overcome is determining the best method to consolidate and convert collected data into searchable content. From digital emails and websites to printed letters and hand-written notes, different document formats are collected and stored across several disparate systems. While you may have some stored in Outlook, others could be kept in a binder on your desk.

A legal document review system digitizes and stores every document in one place, allowing you to search and review all documents at the same time. This enables you to apply a search strategy to locate relevant documents quickly and efficiently. It also provides you with the flexibility to view data in different, organized layouts to easily see document attributes such as:

  1. Source
  2. Type
  3. Origin date
  4. Author
  5. Recipients

With all case documents organized and stored on a single platform, LSPs and legal departments can locate and produce responsive content, ranging from scanned documents to email to spreadsheets and more.

Trap 2: Inefficient Upload Speeds

Initiating your review project will be determined by the speed of your initial data import. This phase can make or break the rest of your project timeline making it essential to start off strong.

Scheduling and staying within the time budgeted for your project will equally affect the efficiency of your whole team. The faster your team can scan and import documents, the faster you can start your document review.

While we can’t control your connection speed, we can manage the resources and technology on our platform to ensure the application performance is optimized for maximum efficiency.

Time equals money; having an eDiscovery tool capable of moving as quickly as you do allows you to work on the next step of your case faster.

Trap 3: Duplicate Data

Nearly 30% of email data is duplicated which directly impacts hosting costs if not removed prior to promotion for review.  Duplicate data occurs when the same file originates from multiple sources and in different formats.

For example, if you have two people engaged in an email exchange and both become custodians in a legal case, both sets of emails are collected for discovery. Now you have the same exchange from both people and you have to determine which set of emails you’re going to use. The complexity of duplicate data increases when the matter is shared across email distribution groups.

By cutting out duplicative documents, you save storage space and reduce the chance that two copies of the same document will be reviewed differently.

To prevent duplicate data from costing time and money, you need an eDiscovery tool to:

  • Centralize your data in one place
  • Eliminate duplicative data
  • Track how data is being reviewed in real-time
  • Prevent conflicting tags by different reviewers

Trap 4: Single User Access to Documents

Remote document review should be an easy and convenient option for you and your staff. However, documents still need to be digitized and uploaded to a shared system. This can be problematic for a number of reasons:

  • You don’t have anyone in the office to upload documents.
  • You don’t have the infrastructure in place to share working documents across multiple users at the same time.
  • You don’t have the ability to review, redact, and produce documents electronically without affecting the originals.

This forces the organization to spend time and money building new infrastructure. Or, they could use a private cloud-hosted system like CloudNine Review.

 

How CloudNine Review Helps You Avoid Inefficiency Traps

CloudNine Review is a safe, robust, and cost-effective solution that simplifies the eDiscovery review process and keeps you more productive.

We offer a single spot repository for all your discovery documents. Whether they are electronic or paper documents, you can load them into CloudNine Review to make them searchable. This allows you to access all the data at the same time giving you a consistent search strategy.

By utilizing a search strategy, you create an efficient way to review your data without wasting time by:

  • Showing search-term history
  • Filtering out previously reviewed documents
  • Setting up preview sets

With incredibly fast upload speeds on our end, installation is simple and straightforward. If there’s a slow connection speed on your end, we can help you identify the source of the problem while offering alternative solutions to upload heavy data loads.

To prevent duplicate data from slowing down your legal document review process, our processing engine detects duplicates and suppresses them before the data gets advanced for review.  Plus, all documents are hashed during the import process, so you can set up automation to identify and review specific documents from the searchable and reviewable records.

Hosted on a private cloud, CloudNine Review is a web-accessible, legal document review platform providing secure access to every approved member of your team. Every document is locked down so nothing can be deleted, altered, or sent to anyone without access. Even metadata like the author and timestamps are protected.

To protect sensitive or confidential data from being exposed, CloudNine Review will redact images of documents. Redacted files are copied and saved as a single-layer file so the redaction bars can’t be removed by outside parties.

 

CloudNine Review is designed to help your eDiscovery services be more efficient and productive.

To avoid the pitfalls of inefficiency traps, click banner below to request a free demo and see how CloudNine Review can help you today.

What Happened in Vegas? It’s No Secret – Read the Buzz

With many reasons to celebrate, CloudNine is still enjoying the excitement of our time Vegas last week. Visiting with valued customers and meeting new contacts are always fortunate opportunities. As an enhancement to those already fruitful conversations, we were thrilled to announce our expanded capabilities through the acquisition of ESI Analyst. Adding modern communication formats such as mobile, chat, social media and more to CloudNine’s powerful and proven applications, fills the market need to manage eDiscovery more effectively, with both modern and traditional data types, on a single platform.

The buzz of our expansion reached the news feeds across multiple social media channels.  A few highlights include:


 

Demonstrations are being scheduled now to debut CloudNine’s new technology – click to request a time to speak and a member of our team will be in touch to schedule.

 

Ready to try it out for yourself?  Request a free demo and see how CloudNine can help you.

Focusing on speed, security, simplicity, and services, CloudNine is dedicated to empowering our law firm and LSP clients with proven eDiscovery software solutions for litigation, investigations, and audits. 

Release Preview LAW 7.6

As data volumes grow, fortunately so has computer processing power.  CloudNine LAW and Explore 7.6 will take advantage of this power boost to amplify your speed to review and production.  The import technology behind both LAW’s Turbo Import and Explore uses a computer’s multiple processing cores more efficiently, making processing faster.

The application updates will include over 200 enhancements to build upon the already strong, import speeds of LAW and Explore.  The most notable improvement in LAW will be the introduction of a Turbo Imaging that can be used in parallel with existing imaging.  Turbo leverages near-native imaging technology to create static images of native files without relying on the native application.  This saves time while imaging, because the files don’t have to be opened, printed to image, and then closed.  The module can also take advantage of multiple processing cores, if available.  With imaging speeds up to eight times faster than the traditional imaging license, Turbo Imaging generates production-ready images quickly and easily.

In Explore, users will benefit from improved reporting capabilities.  This builds on Explore’s email threading, near duplicate detection, advanced filtering and integration with Relativity to provide a first-class, early case assessment experience, allowing clients to reduce the volume of data promoted for review by as much as 70%.  The import and ECA processes are all done without creating additional copies of the data until the client is ready to commit an export of potentially responsive content for further review.  This saves clients from storing copies of data that they don’t need, reducing cost and risk associated with managing multiple identical files.

To learn more about these and other enhancements coming in CloudNine LAW and Explore 7.6, please reach out to your account manager, or email us at info@cloudnine.com to schedule an overview and demonstration.  We’re excited to give you a preview of these updates and plan a wider release in the week of September 13th, 2021.

 

Focusing on speed, security, simplicity, and services, CloudNine is dedicated to empowering our law firm and LSP clients with proven eDiscovery software solutions for litigation, investigations, and audits. 

Ready to try it out for yourself?  Request a free demo and see how CloudNine can help you.

Answer the Unknown Challenges of eDiscovery Review

When it comes to document review in electronic discovery, choosing a solution can be a daunting task. To make an informed decision, you need to know what challenges await you and how to overcome them. 

The three biggest challenges to address when looking for a document review platform are:

  • Security
  • Volume 
  • Cost

By better understanding these challenges and their impact on your operations and bottom line, you’ll be in a better position to choose the best eDiscovery software solution for your business.

Is My eDiscovery Data Safe and Protected?

The benefits of storing eDiscovery data in the cloud are many but, digital security and remote access top the list of concerns among legal teams and legal service providers (LSPs) for these reasons:

  • Lack of control over their data
  • Concerns about handing data over to a third party
  • Worried about their data integrity

Utilizing a company dependent on public cloud solutions like Azure or AWS, means you can’t specifically tell your client where your data actually is as it can be stored across multiple cloud locations.

Another topic, more relevant since the COVID-19 pandemic, is security for remote document review. Creating additional access points can cause worry about your network being vulnerable to data breaches.

Can My eDiscovery Software Handle the Volume and Variety of Records?

Investigations and litigation can create a lot of digital records, often soaring into the terabytes. This is compounded by the variety of digital records being used as evidence in legal proceedings, including:

  • Emails
  • Instant messages
  • Digital images
  • Videos
  • Audio files
  • Text messages
  • Social media posts
  • Websites

This volume and variety of data can have a detrimental impact on your operation if you don’t have the digital space to process or store it all. 

The more documents you have, the more infrastructure you need to support it. This can compromise your software performance which could affect your ability to navigate, search, report, export, or produce required information, in a timely manner.

If that happens, new eDiscovery projects could be delayed or canceled outright while you complete your current project. Delays could also lead to missed deadlines which could cause your clients to be sanctioned or fined by the courts.

Are eDiscovery Tools Cost-Effective?

Simply put, some eDiscovery software can be expensive. Many include a lot of features and functionality you may not even need but, you are charged generously for it. 

When looking for an eDiscovery solution, it is important to take the following into consideration in order to get the most cost-effective solution:

  • Is the pricing flexible, allowing you to pay only for what you need?
  • How much archived data is stored on the cloud vs. on-prem? Often a hybrid solution will provide you with the most cost savings. 
  • What support is included within your contract? In order to reap the most cost-saving benefits, it’s important to have your team well-versed in the eDiscovery tool.

The eDiscovery Solution – CloudNine Review

Secure, powerful, and cost-effective, CloudNine Review simplifies eDiscovery enabling you to upload documents quickly and to begin reviewing data within minutes. 

CloudNine Review utilizes a private cloud environment so you know where your data is at all times, giving you better audit controls and the assurance your data is intact. To protect client data, CloudNine cybersecurity safeguards are always up-to-date and constantly monitored.

Another safeguard to protect your data on the CloudNine platform are access controls allowing only authorized staff to review specific documents:

  • User-based permissions
  • Project-based rights
  • Document-level rights

By utilizing our private cloud platform, you have access to all of your data anywhere, anytime, without having to log into your network VPN. 

Also, CloudNine Review carries greater bandwidth so you’re able to get your data much faster. As an example, the first day for a new project can look like this:

  • Registered online and began uploading data
  • Uploaded 27 GB of PST email files
  • Processed 300,000 documents (emails and attachments)
  • A reduced document set by 61% with deduplication and irrelevant domain filtering

Imagine being able to accomplish all of that in just the first 24 hours. 

The best part is you get what you pay for. CloudNine’s transparent pricing model includes multiple pricing methodologies so you’re never caught off-guard.

Beginning with a predictable upfront cost and low storage fees, CloudNine pricing models are designed to keep your costs down while providing the essential services you need.  To learn about our flexible pricing options, click here to speak to one of our eDiscovery experts.

  • All-in Model – one ‘all-in’ price with no hourly rates for self-service
  • Flex Model – low monthly storage costs for long-running litigation

Plus, you only pay for what you use if your case ends or becomes dormant. Older, dormant case data can be archived and saved at your own location using CloudNine Concordance to help you keep costs low and your data safely archived for the future. 

Focusing on speed, security, simplicity, and services, CloudNine is dedicated to empowering our law firm and LSP clients with proven eDiscovery software solutions for litigation, investigations, and audits. 

Ready to try it out for yourself?  Request a free demo and see how CloudNine can help you.

Export Review Documents Using Self-Serve Productions

#DidYouKnow: You can export your own documents for production from CloudNine Review, with or without support from the CloudNine Client Services team?

 

Document production sizes can range from one document to tens or even hundreds of thousands of documents.

Some platforms require legal teams to work with their project managers to coordinate productions, regardless of size or complexity. This can lead to delays and risk as instructions are handed off to teams, unfamiliar with the case.

 

Self-service productions provide complete project control with 24×7 access to export case documents independently.

  • Control your data to ensure important, relevant documents are not missed.
  • Control your project cost by paying only for what you need and, nothing you don’t.

CloudNine Review empowers clients to export documents themselves, including native files and emails, searchable text, and static images of documents containing confidential language, unique identifiers, and redactions.

Ready to Get Started with Self-Service Productions?  

To start a self-service production in CloudNine Review, select the Tools menu in the upper right corner then, click the Self-Service Production option.

Next, select whether you would like to produce images, native files, text, and metadata, or any combination of these.

If your documents do not yet have static images, they can be created during this process.  Annotations and redactions can be permanently applied to images, and there are several Excel imaging options, including using slip sheets instead of generating images.

The fielded metadata associated with the production records can be exported in a variety of formats including the common .DAT file, a .CSV file, or XML.

Users running a production can select and deselect fields to include in the production metadata file.

View Productions As They Progress in the User-facing Dashboard:   

This screen also allows users to clean up old productions and exports they no longer need to access.

Stay informed of software updates and browse training documentation in the CloudNine Review Knowledgebase (login required).

To learn more about CloudNine Review and Self-Service Productions, click the button below to schedule a demo!

Here’s a Terrific Listing of eDiscovery Workstream Processes and Tasks: eDiscovery Best Practices

Let’s face it – workflows and workstreams in eDiscovery are as varied as organizations that conduct eDiscovery itself.  Every organization seems to do it a little bit differently, with a different combination of tasks, methodologies and software solutions than anyone else.  But, could a lot of organizations improve their eDiscovery workstreams?  Sure.  Here’s a resource (that you probably already know well) which could help them do just that.

Rob Robinson’s post yesterday on his terrific Complex Discovery site is titled The Workstream of eDiscovery: Considering Processes and Tasks and it provides a very comprehensive list of tasks for eDiscovery processes throughout the life cycle.  As Rob notes:

“From the trigger point for audits, investigations, and litigation to the conclusion of cases and matters with the defensible disposition of data, there are countless ways data discovery and legal discovery professionals approach and administer the discipline of eDiscovery.  Based on an aggregation of research from leading eDiscovery educators, developers, and providers, the following eDiscovery Processes and Tasks listing may be helpful as a planning tool for guiding business and technology discussions and decisions related to the conduct of eDiscovery projects. The processes and tasks highlighted in this listing are not all-inclusive and represent only one of the myriads of approaches to eDiscovery.”

Duly noted.  Nonetheless, the list of processes and tasks is comprehensive.  Here are the number of tasks for each process:

  • Initiation (8 tasks)
  • Legal Hold (11 tasks)
  • Collection (8 tasks)
  • Ingestion (17 tasks)
  • Processing (6 tasks)
  • Analytics (11 tasks)
  • Predictive Coding (6 tasks)*
  • Review (17 tasks)
  • Production/Export (6 tasks)
  • Data Disposition (6 tasks)

That’s 96 total tasks!  But, that’s not all.  There are separate lists of tasks for each method of predictive coding, as well.  Some of the tasks are common to all methods, while others are unique to each method:

  • TAR 1.0 – Simple Active Learning (12 tasks)
  • TAR 1.0 – Simple Passive Learning (9 tasks)
  • TAR 2.0 – Continuous Active Learning (7 tasks)
  • TAR 3.0 – Cluster-Centric CAL (8 tasks)

The complete list of processes and tasks can be found here.  While every organization has a different approach to eDiscovery, many have room for improvement, especially when it comes to exercising due diligence during each process.  Rob provides a comprehensive list of tasks within eDiscovery processes that could help organizations identify steps they could be missing in their processes.

So, what do you think?  How many steps do you have in your eDiscovery processes?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Special Master Declines to Order Defendant to Use TAR, Rules on Other Search Protocol Disputes: eDiscovery Case Law

In the case In re Mercedes-Benz Emissions Litig., No. 2:16-cv-881 (KM) (ESK) (D.N.J. Jan. 9, 2020), Special Master Dennis Cavanaugh (U.S.D.J., Ret.) issued an order and opinion stating that he would not compel defendants to use technology assisted review (TAR), and instead adopted the search term protocol negotiated by the parties, with three areas of dispute resolved by his ruling.

Case Background

In this emissions test class action involving an automobile manufacturer, the plaintiffs proposed that the defendants use predictive coding/TAR, asserting that TAR yields significantly better results than either traditional human “eyes on” review of the full data set or the use of search terms.  The plaintiffs also argued that if the Court were to decline to compel the defendants to adopt TAR, the Court should enter its proposed Search Term Protocol.

The defendants argued that there is no authority for imposing TAR on an objecting party and that this case presented a number of unique issues that would make developing an appropriate and effective seed set challenging, such as language and translation issues, unique acronyms and identifiers, redacted documents, and technical documents. As a result, they contended that they should be permitted to utilize their preferred custodian-and-search term approach.

Judge’s Ruling

Citing Rio Tinto Plc v. Vale S.A., Special Master Cavanaugh quoted from that case in stating: “While ‘the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it’…, no court has ordered a party to engage in TAR over the objection of that party. The few courts that have considered this issue have all declined to compel predictive coding.”  Citing Hyles v. New York City (another case ruling by now retired New York Magistrate Judge Andrew J. Peck), Special Master Cavanaugh stated: “Despite the fact that it is widely recognized that ‘TAR is cheaper, more efficient and superior to keyword searching’…, courts also recognize that responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for producing their own electronically stored information.”

As a result, Special Master Cavanaugh ruled: “While the Special Master believes TAR would likely be a more cost effective and efficient methodology for identifying responsive documents, Defendants may evaluate and decide for themselves the appropriate technology for producing their ESI. Therefore, the Special Master will not order Defendants to utilize TAR at this time. However, Defendants are cautioned that the Special Master will not look favorably on any future arguments related to burden of discovery requests, specifically cost and proportionality, when Defendants have chosen to utilize the custodian-and-search term approach despite wide acceptance that TAR is cheaper, more efficient and superior to keyword searching. Additionally, the denial of Plaintiffs’ request to compel Defendants to utilize TAR is without prejudice to revisiting this issue if Plaintiffs contend that Defendants’ actual production is deficient.”

Special Master Cavanaugh also ruled on areas of dispute regarding the proposed Search Term Protocol, as follows:

  • Validation: Special Master Cavanaugh noted that “the parties have been able to reach agreement on the terms of Defendants’ validation process, [but] the parties are at an impasse regarding the level of validation of Plaintiffs’ search term results”, observing that “Plaintiffs’ proposal does not articulate how it will perform appropriate sampling and quality control measures to achieve the appropriate level of validation.” As a result, Special Master Cavanaugh, while encouraging the parties to work together to develop a reasonable procedure for the validation of Plaintiffs’ search terms, ruled: “As no articulable alternative process has been proposed by Plaintiffs, the Special Master will adopt Defendants’ protocol to the extent that it will require the parties, at Defendants’ request, to meet and confer concerning the application of validation procedures described in paragraph 12(a) to Plaintiffs, if the parties are unable to agree to a procedure.”
  • Known Responsive Documents & Discrete Collections: The defendants objected to the plaintiffs’ protocol to require the production of all documents and ESI “known” to be responsive as “vague, exceedingly burdensome, and provides no clear standard for the court to administer or the parties to apply”. The defendants also objected to the plaintiffs’ request for “folders or collections of information that are known to contain documents likely to be responsive to a discovery request” as “overly broad and flouts the requirement that discovery be proportional to the needs of the case.”  Noting that “Defendants already agreed to produce materials that are known to be responsive at the November status conference”, Special Master Cavanaugh decided to “modify the Search Term Protocol to require production of materials that are ‘reasonably known’ to be responsive.”  He also decided to require the parties to collect folders or collections of information “to the extent it is reasonably known to the producing party”, also requiring “the parties to meet and confer if a party believes a discrete document folder or collection of information that is relevant to a claim or defense is too voluminous to make review of each document proportional to the needs of the case.”

So, what do you think?  Should a decision not to use TAR negatively impact a party’s ability to make burden of discovery arguments?  Please let us know if any comments you might have or if you’d like to know more about a particular topic.

Related to this topic, Rob Robinson’s Complex Discovery site published its Predictive Coding Technologies and Protocols Spring 2020 Survey results last week, which (as always) provides results on most often used primary predictive coding platforms and technologies, as well as most-often used TAR protocols and areas where TAR is most used (among other results).  You can check it out at the link directly above.

Case opinion link courtesy of eDiscovery Assistant.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Fall 2019 Predictive Coding Technologies and Protocols Survey Results: eDiscovery Trends

So many topics, so little time!  Rob Robinson published the latest Predictive Coding and Technologies and Protocols Survey on his excellent ComplexDiscovery site last week, but this is the first chance I’ve had to cover it.  The results are in and here are some of the findings in the largest response group for this survey yet.

As Rob notes in the results post here, the third Predictive Coding Technologies and Protocols Survey was initiated on August 23 and concluded on September 5 with individuals invited to participate directly by ComplexDiscovery and indirectly by industry website, blog, and newsletter mentions – including a big assist from the Association of Certified E-Discovery Specialists (ACEDS).  It’s a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies and protocols from data discovery and legal discovery professionals within the eDiscovery ecosystem.  The survey was designed to provide a general understanding of predictive coding technologies and protocols and had two primary educational objectives:

  • To provide a consolidated listing of potential predictive coding technology and protocol definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
  • To ask eDiscovery ecosystem professionals about their usage and preferences of predictive coding platforms, technologies, and protocols.

There were 100 total respondents in the survey (a nice, round number!).  Here are some of the more notable results:

  • 39 percent of responders were from law firms, 37 percent of responders were from software or services provider organizations, and the remaining 24 percent of responders were either part of a consultancy (12 percent), a corporation (6 percent), the government (3 percent), or another type of entity (3 percent).
  • 86 percent of responders shared that they did have a specific primary platform for predictive coding versus 14 percent who indicated they did not.
  • There were 31 different platforms noted as primary predictive platforms by responders, nine of which received more than one vote and they accounted for more than three-quarters of responses (76 percent).
  • Active Learning was the most used predictive coding technology, with 86 percent reporting that they use it in their predictive coding efforts.
  • Just over half (51 percent) of responders reported using only one predictive coding technology in their predictive coding efforts.
  • Continuous Active Learning (CAL) was (by far) the most used predictive coding protocol, with 82 percent reporting that they use it in their predictive coding efforts.
  • Maybe the most interesting stat: 91 percent of responders reported using technology-assisted review in more than one area of data and legal discovery. So, the uses of TAR are certainly expanding!

Rob has reported several other results and provided graphs for additional details.  To check out all of the results, click here.  Want to compare to the previous two surveys?  They’re here and here:o)

So, what do you think?  Do any of the results surprise you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Image Copyright © FremantleMedia North America, Inc.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Number of Pages (Documents) in Each Gigabyte Can Vary Widely: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on July 31, 2012 – when eDiscovery Daily wasn’t even two years old yet.  It’s “so old (how old is it?)”, it references a blog post from the now defunct Applied Discovery blog.  We’ve even done an updated look at this topic with more file types about four years later.  Oh, and (as we are more focused on documents than pages for most of the EDRM life cycle as it’s the metric by which we evaluate processing to review), so it’s the documents per GB that tends to be more considered these days.

So, why is this important?  Not only for estimation purposes for review, but also for considering processing throughput.  If you have two 40 GB (or so) PST container files and one file has twice the number of documents as the other, the one with more documents will take considerably longer to process. It’s getting to a point where the document per hour throughput is becoming more important than the GB per hour, as that can vary widely depending on the number of documents per GB.  Today, we’re seeing processing throughput speeds as high as 1 million documents per hour with solutions like (shameless plug warning!) our CloudNine Explore platform.  This is why Early Data Assessment tools have become more important as they can provide that document count quickly that lead to more accurate estimates.  Regardless, the exercise below illustrates just how widely the number of pages (or documents) can vary within a single GB.  Enjoy!

A long time ago, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs.  But, did you know just how widely the number of pages (or documents) per gigabyte can vary?  The “how many pages” question came up a lot back then and I’ve seen a variety of answers.  The aforementioned Applied Discovery blog post provided some perspective in 2012 based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file.  Many emails include attachments, which can be in any of a number of different file formats.  Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats.  So, estimating page (or document) counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application.  To illustrate this, I decided to conduct a little (admittedly unscientific) study using our one-page blog post (also from July 2012) about the Apple/Samsung litigation (the first of many as it turned out, as that litigation dragged on for years).  I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same.  Here are the results:

  • Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
  • HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
  • Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
  • Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
  • Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
  • Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example back then was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too).  Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications.  Data size will enable you to provide a “ballpark” estimate for processing and review at best, but, to provide a more definitive estimate, you need a document count today to get there.  Early data assessment has become key to better estimates of scope and time frame for delivery than ever before.

So, what do you think?  Was this example useful or highly flawed?  Or both?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The March Toward Technology Competence (and Possibly Predictive Coding Adoption) Continues: eDiscovery Best Practices

I know, because it’s “March”, right?  :o)  Anyway, it’s about time is all I can say.  My home state of Texas has finally added its name to the list of states that have adopted the ethical duty of technology competence for lawyers, becoming the 36th state to do so.  And, we have a new predictive coding survey to check out.

As discussed on Bob Ambrogi’s LawSites blog, just last week (February 26), the Supreme Court of Texas entered an order amending Paragraph 8 of Rule 1.01 of the Texas Disciplinary Rules of Professional Conduct. The amended comment now reads (emphasis added):

Maintaining Competence

  1. Because of the vital role of lawyers in the legal process, each lawyer should strive to become and remain proficient and competent in the practice of law, including the benefits and risks associated with relevant technology. To maintain the requisite knowledge and skill of a competent practitioner, a lawyer should engage in continuing study and education. If a system of peer review has been established, the lawyer should consider making use of it in appropriate circumstances. Isolated instances of faulty conduct or decision should be identified for purposes of additional study or instruction.

The new phrase in italics above mirrors the one adopted in 2012 by the American Bar Association in amending the Model Rules of Professional Conduct to make clear that lawyers have a duty to be competent not only in the law and its practice, but also in technology.  Hard to believe it’s been seven years already!  Now, we’re up to 36 states that have formally adopted this duty of technology competence.  Just 14 to go!

Also, this weekend, Rob Robinson published the results of the Predictive Coding Technologies and Protocols Spring 2019 Survey on his excellent Complex Discovery blog.  Like the first version of the survey he conducted back in September last year, the “non-scientific” survey designed to help provide a general understanding of the use of predictive coding technologies, protocols, and workflows by data discovery and legal discovery professionals within the eDiscovery ecosystem.  This survey had 40 respondents, up from 31 the last time.

I won’t steal Rob’s thunder, but here are a couple of notable stats:

  • Approximately 62% of responders (62.5%) use more than one predictive coding technology in their predictive coding efforts: That’s considerably higher than I would have guessed;
  • Continuous Active Learning (CAL) was the most used predictive coding protocol with 80% of responders reporting that they use it in their predictive coding efforts: I would have expected that CAL was the leader, but not as dominant as these stats show; and
  • 95% of responders use technology-assisted review in more than one area of data and legal discovery: Which seems a good sign to me that practitioners aren’t just limiting it to identification of relevant documents in review anymore.

Rob’s findings, including several charts, can be found here.

So, what do you think?  Which state will be next to adopt an ethical duty of technology competence for lawyers?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.