Collection

Rule Change Could Facilitate the Government’s Ability to Access ESI in Criminal Investigations: eDiscovery Trends

A rule modification adopted by the United States Supreme Court that significantly changes the way in which the government can obtain search warrants to access computer systems and electronically stored information (ESI) of suspected hackers could go into effect on December 1.

On April 28, the Supreme Court submitted the amendments to the Federal Rules of Criminal Procedure that were adopted by the Supreme Court of the United States pursuant to Section 2072 of Title 28, United States Code.  One of those proposed rule changes, to Federal Rule of Criminal Procedure 41, would enable “a magistrate judge with authority in any district where activities related to a crime may have occurred has authority to issue a warrant to use remote access to search electronic storage media and to seize or copy electronically stored information located within or outside that district if:”

  • “the district where the media or information is located has been concealed through technological means; or”
  • “in an investigation of a violation of 18 U.S.C. § 1030(a)(5), the media are protected computers that have been damaged without authorization and are located in five or more districts.”

Currently, the government can only obtain a warrant to access ESI from a magistrate in the district where the computer with the stored information is physically located.

As reported in JD Supra Business Advisor (Come Back With a Warrant: Proposed Rule Change Expands the Government’s Ability to Access Electronically Stored Information in Criminal Investigations, written by Thomas Kurland and Peter Nelson), proponents of the rule change say it is necessary to allow the government to respond quickly to cyber-attacks of unknown origin – particularly malicious “botnets” – which are becoming increasingly common as hackers become ever more sophisticated.

However, others say the rule change will significantly expand the government’s power to search computers without their owners’ consent – regardless of whether those computers belong to criminals or even to the victims of a crime.  One US senator, Ron Wyden of Oregon, has called for Congress to reject the rules changes, indicating that they “will massively expand the government’s hacking and surveillance powers” and “will have significant consequences for Americans’ privacy”.  He has indicated a “plan to introduce legislation to reverse these amendments shortly, and to request details on the opaque process for the authorization and use of hacking techniques by the government”.

So, what do you think?  Will Congress reverse these amendments?  Should they?  Please share any comments you might have or if you’d like to know more about a particular topic.

Just a reminder that I will be moderating a panel at The Masters Conference Windy City Cybersecurity, Social Media and eDiscovery event tomorrow (we covered it here) as part of a full day of educational sessions covering a wide range of topics.  CloudNine will be sponsoring that session, titled Faster, Cheaper, Better: How Automation is Revolutionizing eDiscovery at 4:15.  Click here to register for the conference.  If you’re a non-vendor, the cost is only $100 to attend for the full day!

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

At Litigation Time, the Cost of Data Storage May Not Be As Low As You Think: eDiscovery Best Practices

One of my favorite all-time graphics that we’ve posted on the blog (from one of our very first posts) is this ad from the early 1980s for a 10 MB disk drive – for $3,398!  That’s MB (megabytes), not GB (gigabytes) or TB (terabytes).  These days, the cost per GB for data storage is pennies on the dollar, which is a big reason why the total amount of data being captured and stored by industry doubles every 1.2 years.  But, at litigation time, all that data can cost you – big.

When I checked on prices for external hard drives back in 2010 (not network drives, which are still more expensive), prices for a 2 TB external drive at Best Buy were as low as $140 (roughly 7 cents per GB).  Now, they’re as low as $81.99 (roughly 4.1 cents per GB).  And, these days, you can go bigger – a 5 TB drive for as low as $129.99 (roughly 2.6 cents per GB).  I promise that I don’t have a side job at Best Buy and am not trying to sell you hard drives (even from the back of a van).

No wonder organizations are storing more and more data and managing Big Data in organizations has become such a challenge!

Because organizations are storing so much data (and in more diverse places than ever before), information governance within those organizations has become vitally important in keeping that data as manageable as possible.  And, when litigation or regulatory requests hit, the ability to quickly search and cull potentially responsive data is more important than ever.

Back in 2010, I illustrated how each additional GB that has to be reviewed can cost as much as $16,650 (even with fairly inexpensive contract reviewers).  And, that doesn’t even take into consideration the costs to identify, preserve, collect, and produce each additional GB.  Of course, that was before Da Silva Moore and several other cases that ushered in the era of technology assisted review (even though more cases are still not using it than are using it).  Regardless, that statistic illustrates how the cost of data storage may not be as low as you think at litigation time – each GB could cost hundreds or even thousands to manage (even in the era of eDiscovery automation and falling prices for eDiscovery software and services).

Equating the early 1980’s ad above to GB, that equates to about $330,000 per GB!  But, if you go all the way back to 1950, the cost of a 5 MB drive from IBM was $50,000, which equates to about $10 million per GB!  Check out this interactive chart of hard drive prices from 1950-2010, courtesy of That Data Dude (yes, that really is the name of the site) where you can click on different years and see how the price per GB has dropped over the years.  It’s way cool!

So, what do you think?  Do you track GB metrics for your cases?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Even a “Luddite” Can Learn the Ins and Outs of Data Backups with this Guide: eDiscovery Best Practices

You have to love an instructional guide that begins with a picture of Milton Waddams (the sad sack employee obsessing over his red stapler in the movie Office Space) and ends with a nice consolidated list of ten practice tips for backups in discovery.

Leave it to Craig Ball to provide that and more in the Luddite Lawyer’s Guide to Backup Systems, which Craig introduces in his Ball in Your Court blog here.  As Craig notes in his blog, this guide is an update from a primer that he wrote back in 2009 for the Georgetown E-Discovery Institute.  He has updated it to reflect the state-of-art in backup techniques and media and also added some “nifty” new stuff and graphics to illustrate concepts such as the difference between a differential and an incremental backup.  Craig even puts a “Jargon Watch” on the first page to list the terms he will define during the course of the guide.

Within this 20 page guide, Craig covers topics such as the Good and Bad of Backups, the differences between Duplication, Replication and Backup, the Major Elements of Backup Systems and the types of Backup Media and characteristics of each.  Craig illustrates how restoration to tape (despite popular opinion to the contrary) could actually be the most cost-effective way of recovering ESI in a case.  And, Craig discusses the emergence of the use of the Cloud for backups (which should come as no surprise to many of you).  He concludes with his Ten Practice Tips for Backups in Civil Discovery, which is a concise, one-page reference guide to keep handy when considering backups as part of your information governance and discovery processes.

Whether you’re a Luddite lawyer or one who is more apt to embrace technology, this guide is sure to provide an essential understanding of how backups are created and used and how they can be used during the discovery process.  Backups may be the Milton Waddams of the eDiscovery world, but they’re still important – remember that, at the end of the movie, Milton was the one relaxing on the beach with all of the money.  :o)

So, what do you think?  How do backups affect your eDiscovery process?   Please share any comments you might have or if you’d like to know more about a particular topic.

Image © Twentieth Century Fox

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Craig Ball’s “Alexa-lent” Example of How the Internet of Things is Affecting Our Lives: eDiscovery Trends

I probably shouldn’t be writing about this as it will give my wife Paige another reason to say that we should get one of these.  Nonetheless, Craig Ball’s latest blog post illustrates how much data can be, and is being, captured these days in our everyday life.  Now, if we could just get to that data when we need it for legal purposes.

In Craig’s blog, Ball in your Court, his latest post (“Alexa. Preserve ESI.”) discusses how many cool things the Amazon Echo (with its “Alexa” voice command service) can do.  Sounding like he has gotten a little too up close and personal with the device, Craig notes that:

“Alexa streams music, and news updates.  Checks the weather and traffic.  Orders pizzas and Ubers.  Keeps up with the grocery and to do lists.  Tells jokes.  Turns on the lights.  Adjusts the temperature.  Answers questions.  Does math. Wakes me up.  Reminds me of appointments.  She also orders stuff from Amazon (big surprise there).”

Sounds pretty good.  Hopefully, my wife has stopped reading by this point.

Have you ever seen the movie Minority Report where Tom Cruise walks into his apartment and issues voice commands to turn on the lights and music?  Those days are here.

Anyway, Craig notes that, using the Alexa app on his phone or computer, he can view a list of every interaction since Alexa first came into his life, and listen to each recording of the instruction, including background sounds (even when his friends add heroin and bunny slippers to his shopping list).  Craig notes that “Never in the course of human history have we had so much precise, probative and objective evidence about human thinking and behavior.”

However, as he also notes, “what they don’t do is make it easy to preserve and collect their digital archives when a legal duty arises.  Too many apps and social networking sites fail to offer a reasonable means by which to lock down or retrieve the extensive, detailed records they hold.”  Most of them only provide an item-by-item (or screenshot by screenshot) mechanism for sifting through the data.

To paraphrase a Seinfeld analogy, they know how to take the reservation, they just don’t know how to hold the reservation (OK, it’s not completely relevant, but it’s funny).

In a call to action, Craig says that both “the user communities and the legal community need to speak out on this.  Users need an effective, self-directed means to preserve and collect their own data when legal and regulatory duties require it.”  I agree.  Some, like Google and Twitter, provide excellent mechanisms for getting to the data, but most don’t.

As Wooderson says in the movie Dazed and Confused, “it’d be a lot cooler if you did”.

So, what do you think?  Will the “Internet of Things” age eventually include a self-export feature?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiff’s Failure to Demonstrate Allegations Leads to Summary Judgment for Defendant: eDiscovery Case Law

In Malibu Media, LLC v. Doe, Case No. 13-6312 (N.D. Ill., Feb. 8, 2016), in a case of dueling summary judgment motions, Illinois Magistrate Judge Geraldine Soat Brown denied the plaintiff’s motion for summary judgment, but granted the defendant’s summary motion in its entirety, concluding that the plaintiff had not presented sufficient evidence to prove its allegations of illegally downloading movies.

Case Background

The plaintiff alleged that the defendant, identified through his Internet Protocol (“IP”) address, downloaded its copyrighted work, specifically, twenty-four adult movies from the plaintiff’s site, using BitTorrent.  In this matter, the defendant was allowed to proceed anonymously as “John Doe.”  With regard to the identification of the defendant via the IP address, the defendant claimed that, during the time in question, he had many guests at his house, and any number of people could have downloaded from his IP.

In a forensic examination of the defendant’s hard drives from his computer, the plaintiff’s expert did not find any evidence that the plaintiff’s copyrighted works, or the BitTorrent software, had been on the defendant’s computer.  However, he did find evidence that one external storage device and one internal hard drive that were capable of storing files downloaded via BitTorrent had been connected to the defendant’s computer, but they had not been produced by the defendant.  He also found several virtual machines on one of the defendant’s hard drives, but not the program “VMWare” he believed was used to create them.

The defendant retained his own expert to conduct a forensic examination of his hard drive.  The defendant expert also concluded that there was no evidence that the plaintiff’s copyrighted works, or the BitTorrent software, had been on the defendant’s computer.  With regard to the two devices identified by the plaintiff’s expert, the defendant’s expert determined that they were last used in 2012 (which was before the infringement period and before the date the plaintiff says the works at issue were created) and the virtual machines were last used no more recently than September 2010, which was the expiration timeframe for the one-year student license for VMWare that the defendant would have received as a graduate student.  The defendant also moved to strike declarations from plaintiff’s experts regarding the forensic and IP evidence, as the plaintiff never served any Rule 26(a)(2) disclosure – in response, the plaintiff characterized them as “lay witnesses — not experts”.

The plaintiff and defendant filed cross-motions for summary judgment in the case.

Judge’s Ruling

Stating that “[u]nlike other cases, Malibu has no evidence that any of its works were ever on Doe’s computer or storage device”, Judge Brown denied the plaintiff’s summary judgment motion, as follows:

“Considering all of Malibu’s evidence, including the Fieser, Patzer, and Paige declarations Doe has moved to strike, in the light most favorable to Doe, Malibu’s summary judgment motion must be denied. Even if those contested declarations are considered, Malibu has not eliminated all material questions of fact about whether there was actionable infringement and, if so, whether Doe was the infringer.”

With regard to the defendant’s motion to strike declarations from plaintiff’s experts, Judge Brown granted the motion pursuant to Fed. R. Civ. P. 26(a)(2) and 37(c)(1).  As a result, Judge Brown ruled “[w]ithout the evidence of Fieser’s and Patzer’s declarations, there is no evidence linking Doe or even his IP address to Malibu’s works. Paige’s evidence, which depends entirely on the finding of IPP using Excipio’s system, does not contain any evidence based on his personal knowledge that Doe copied or distributed any of Malibu’s works. Doe’s motion for summary judgment is, accordingly, granted.”

So, what do you think?  Should the defendant’s summary judgment motion have been granted?  Please share any comments you might have or if you’d like to know more about a particular topic.

Here are links to two previous cases we have covered regarding this plaintiff.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Date Searching for Emails and Loose Files Can Be Tricky: eDiscovery Best Practices

I recently had a client that was searching for emails and loose files based on a relevant date range.  However, because of the way the data was collected and the way the search was performed, identifying the correct set of responsive emails and loose files within the relevant date range proved to be challenging.  Let’s take a look at the challenges this client faced.

Background

The files were collected by client personnel with the emails placed into PST files and loose files from local and network stores all put into folders based on custodian.  The data was then uploaded and processed using CloudNine’s Discovery Client software for automated data processing and placed into the CloudNine Review platform.

Issue #1

Before they retained us for this project, the client placed copies of loose files into the custodian folders via “drag and drop”.  If you have been reading our blog for a long time, you may recall that when you “drag and drop” files to copy them to a new location, the created date will reflect the date the file was copied, not the date the original file was created (click here for our earlier post on whether a file can be modified before it is created).

As a result, using created date to accurately reflect whether a file was created within the relevant date range was not possible, so, since it was too late to redo the collection, we had to turn to modified date to identify potentially responsive loose files based on date/time stamp within the relevant date range.

Issue #2

With the data processed and loaded, the client proceeded to perform a search to identify documents to be reviewed for responsiveness to the case that included the agreed-upon responsive terms for which either the sent date or the modified date was from 1/1/2014 to present.  Make sense?  In theory, yes.  However, the result set included numerous emails that were sent before 1/1/2014.  Why?

One of the first steps that Discovery Client (and most other eDiscovery processing applications) performs is “flattening” of the data, which includes extracting individual files out of archive files.  Archive files include ZIP and RAR compressed files, but another file type that is considered to be an archive file is Outlook PST.  Processing applications extract individual messages from the PST, usually as individual MSG (individual Outlook message) or HTML files.  So, if a PST contains 10,000 messages (not including attachments), the processing software creates 10,000 new individual MSG or HTML files.  And, each of those newly created files has a created date and modified date equal to the date that individual file was created.

See the problem?  All of the emails were included in the result set, regardless of when they were sent, because the modified date was within the relevant date range.  Our client assumed the modified date would be blank for the emails.

Solution

To resolve this issue, we helped the client restructure the search by grouping the terms within CloudNine so that the sent date range parameter was applied to emails only and the modified date range parameter was applied to loose files only.  This gave the client the proper result with only emails sent or loose files modified within the relevant date range.  The lesson learned here is to use the appropriate metadata fields for searching specific types of files as others can yield erroneous results.

So, what do you think?  Have you ever experienced search issues when searching across emails and loose files at once?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Has Gone to the Dogs: eDiscovery Trends

If I had known that yesterday was National Dog Day, I would have posted this then, instead of today, but it’s a great story any day.

As reported by ABA Journal, Discover Magazine and NBC News, there is a new type of forensic collection device being used in criminal forensic investigations.  His name is Bear and he’s a black Labrador.

This 2-year-old rescue dog played a key role in the arrest of former Subway pitchman Jared Fogle on child-porn charges, finding a thumb drive that humans had failed to find during a search of Fogle’s Indiana house in July, several weeks before he agreed to plead guilty to having X-rated images of minors and paying to have sex with teenage girls.

According to the Discover article, Bear also helped officers locate 16 smartphones, 10 flash drives and six laptops during an 11-hour search last month of Fogle’s home.  His training relies on the work of chemist Jack Hubball, who tested flash drives, circuit boards and other electronic components and found a chemical that is common to all of them.  Hubball previously identified the accelerants (e.g., gasoline) dogs sniff out to identify arson, and also helped train dogs to find narcotics and bombs.

According to the NBC article, Bear has taken part in four other investigations, including this week’s arrest of Olympics gymnastics coach Marvin Sharp. And he’s just been sold to the Seattle Police Department for $9,500 (basically the cost of the training) to help investigate Internet crimes.  The NBC article includes a video of Bear in action, with Bear’s “dog whisperer” Todd Jordan providing a demonstration of his abilities.

After helping with the Fogle investigation, Bear’s trainer says he’s received some 30 inquiries from police who want to buy their own electronics-sniffing dog.  I can see why.  Labradors not only have particular sniffing skills, they also make great pets, too!  And, although I have so far been unable to train our black Labrador Brooke to keep from jumping on guests to our house, we still love her and are glad we were able to rescue her last year.  Here’s a picture of her, with her favorite Kong ball:

In the future, criminal forensic investigators may show up at a suspect’s residence with a subpoena, a copy of Forensic Toolkit (FTK) and their trusty lab.  As in Labrador.

So, what do you think?  Do you have a unique ESI collection story?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Pitfalls Associated with Self-Collection of Data by Custodians: eDiscovery Best Practices

In a prior article, we covered the Burd v. Ford Motor Co. case where the court granted the plaintiff’s motion for a deposition of a Rule 30(b)(6) witness on the defendant’s search and collection methodology involving self-collection of responsive documents by custodians based on search instructions provided by counsel.  In light of that case and a recent client experience of mine, I thought it would be appropriate to revisit this topic that we addressed a couple of years ago.

I’ve worked with a number of attorneys who have turned over the collection of potentially responsive files to the individual custodians of those files, or to someone in the organization responsible for collecting those files (typically, an IT person).  Self-collection by custodians, unless managed closely, can be a wildly inconsistent process (at best).  In some cases, those attorneys have instructed those individuals to perform various searches to turn “self-collection” into “self-culling”.  Self-culling can cause at least two issues:

  1. You have to go back to the custodians and repeat the process if additional search terms are identified.
  2. Potentially responsive image-only files will be missed with self-culling.

It’s not uncommon for additional searches to be required over the course of a case, even when search terms are agreed to by the parties up front (search terms are frequently renegotiated), so the self-culling process has to be repeated when new or modified terms are identified.

It’s also common to have a number of image-only files within any collection, especially if the custodians frequently scan executed documents or use fax software to receive documents from other parties.  In some cases, image-only PDF or TIFF files can often make up as much as 20% of the collection.  When custodians are asked to perform “self-culling” by performing their own searches of their data, these files will typically be missed.

For these reasons, I usually advise against self-culling by custodians in litigation.  I also typically don’t recommend that the organization’s internal IT department perform self-culling either, unless they have the capability to process that data to identify image-only files and perform Optical Character Recognition (OCR) on them to capture text.  If your IT department doesn’t have the capabilities and experience to do so (which includes a well-documented process and chain of custody), it’s generally best to collect all potentially responsive files from the custodians and turn them over to a qualified eDiscovery provider to perform the culling.  Most qualified eDiscovery providers, including (shameless plug warning!) CloudNine™, perform OCR as needed to include image-only files in the resulting potentially responsive document set before culling.  With the full data set available, there is also no need to go back to the custodians to perform additional searches to collect additional data (unless, of course, the case requires supplemental productions).


Most organizations that have their custodians perform self-collection of files for eDiscovery probably don’t expect that they will have to explain that process to the court.  Ford sure didn’t.  If your organization plans to have its custodians self-collect, you’d better be prepared to explain that process, which includes discussing your approach for handling image-only files.

So, what do you think?  Do you self-collect data for discovery purposes?  If so, how do you account for image-only files?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control, Making Sure the Numbers Add Up: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

Friday, we wrote about tracking file counts from collection to production, the concept of expanded file counts, and the categorization of files during processing.  Today, let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We discussed the typical categories of excluded files after processing – obviously, what’s not excluded is available for searching and review.  Even if your approach includes technology assisted review (TAR) as part of your methodology, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged.  Privileged documents are also often classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log.  Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail based example.  We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case.  The relevant time period for the case is January 1, 2013 through December 31, 2014.  Other than date range, we plan to do no other filtering of files during processing.  Identified duplicates will not be reviewed or produced.  We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged.  Here’s what this collection might look like:

  • Collected Files: After expansion and processing, 7 PST files expand to 101,852 eMails and attachments.
  • Filtered Files: Filtering eMails outside of the relevant date range eliminates 23,564
  • Remaining Files after Filtering: After filtering, there are 78,288 files to be processed.
  • NIST/System Files: eMail collections typically don’t have NIST or system files, so we’ll assume zero (0) files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
  • Exception Files: Let’s assume that a little less than 1% of the collection (912) is exception files like password protected, corrupted or empty files.
  • Duplicate Files: It’s fairly common for approximately 30% or more of the collection to include duplicates, so we’ll assume 24,215 files here.
  • Remaining Files after Processing: We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
  • Files Culled During Searching: If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618.
  • Remaining Files for Review: After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
  • Files Tagged as Non-Responsive: If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such.
  • Remaining Files Tagged as Responsive: After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
  • Responsive Files Tagged as Privileged: If roughly 8% of the responsive documents are determined to be privileged during review, that would be 842 privileged documents.
  • Produced Files: After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them.  The key is to note the numbers in red above.  Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals.  What happens if you add the counts in red together?  You should get 101,852 – the number of collected files after expanding the PST files.  As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery.  That’s the way it should be.  If not, investigation is required to determine where files were missed.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control By The Numbers: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

A while back, we wrote about Quality Assurance (QA) and Quality Control (QC) in the eDiscovery process.  Both are important in improving the quality of work product and making the eDiscovery process more defensible overall.  With regard to QC, an overall QC mechanism is tracking of document counts through the discovery process, especially from collection to production, to identify how every collected file was handled and why each non-produced document was not produced.

Expanded File Counts

Scanned counts of files collected are not the same as expanded file counts.  There are certain container file types, like Outlook PST files and ZIP archives that exist essentially to store a collection of other files.  So, the count that is important to track is the “expanded” file count after processing, which includes all of the files contained within the container files.  So, in a simple scenario where you collect Outlook PST files from seven custodians, the actual number of documents (emails and attachments) within those PST files could be in the tens of thousands.  That’s the starting count that matters if your goal is to account for every document or file in the discovery process.

Categorization of Files During Processing

Of course, not every document gets reviewed or even included in the search process.  During processing, files are usually categorized, with some categories of files usually being set aside and excluded from review.  Here are some typical categories of excluded files in most collections:

  • Filtered Files: Some files may be collected, and then filtered during processing. A common filter for the file collection is the relevant date range of the case.  If you’re collecting custodians’ source PST files, those may include messages outside the relevant date range; if so, those messages may need to be filtered out of the review set.  Files may also be filtered based on type of file or other reasons for exclusion.
  • NIST and System Files: Many file collections also contain system files, like executable files (EXEs) or Dynamic Link Library (DLLs) that are part of the software on a computer which do not contain client data, so those are typically excluded from the review set. NIST files are included on the National Institute of Standards and Technology list of files that are known to have no evidentiary value, so any files in the collection matching those on the list are “De-NISTed”.
  • Exception Files: These are files that cannot be processed or indexed, for whatever reason. For example, they may be password-protected or corrupted.  Just because these files cannot be processed doesn’t mean they can be ignored, depending on your agreement with opposing counsel, you may need to at least provide a list of them on an exception log to prove they were addressed, if not attempt to repair them or make them accessible (BTW, it’s good to establish that agreement for disposition of exception files up front).
  • Duplicate Files: During processing, files that are exact duplicates may be put aside to avoid redundant review (and potential inconsistencies). Some exact duplicates are typically identified based on the HASH value, which is a digital fingerprint generated based on the content and format of the file – if two files have the same HASH value, they have the same exact content and format.  Emails (and their attachments) may be identified as duplicates based on key metadata fields, so an attachment cannot be “de-duped” out of the collection by a standalone copy of the same file.

All of these categories of excluded files can reduce the set of files to actually be searched and reviewed.  On Monday, we’ll illustrate an example of a file set from collection to production to illustrate how each file is accounted for during the discovery process.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.