Electronic Discovery

Should Contract Review Attorneys Receive Overtime Pay?: eDiscovery Trends

Whether they should or not, maybe they can – if they’re found NOT to be practicing law, according to a ruling from the Second U.S. Circuit Court of Appeals.

According to a story in The Posse List (Contract attorney lawsuit against Skadden Arps can proceed, appeals court says; case could enable temporary lawyers hired for routine document review to earn extra wages), the Second U.S. Circuit Court of Appeals vacated the judgment of the district court and remanded the matter for further proceedings, ruling that a lawsuit demanding overtime pay from law firm Skadden, Arps and legal staffing agency Tower Legal Solutions can proceed.

The plaintiff, David Lola, on behalf of himself and all others similarly situated, filed the case as a Fair Labor Standards Act collective action against Skadden, Arps and Tower Legal Staffing.  He alleged that, beginning in April 2012, he worked for the defendants for fifteen months in North Carolina, working 45 to 55 hours per week and was paid $25 per hour. He conducted document review for Skadden in connection with a multi-district litigation pending in the United States District Court for the Northern District of Ohio. Lola is an attorney licensed to practice law in California, but he is not admitted to practice law in either North Carolina or the Northern District of Ohio.

According to the ruling issued by the appellate court, “Lola alleged that his work was closely supervised by the Defendants, and his entire responsibility . . . consisted of (a) looking at documents to see what search terms, if any, appeared in the documents, (b) marking those documents into the categories predetermined by Defendants, and (c) at times drawing black boxes to redact portions of certain documents based on specific protocols that Defendants provided.’  Lola also alleged that Defendants provided him with the documents he reviewed, the search terms he was to use in connection with those documents, and the procedures he was to follow if the search terms appeared.

The defendants moved to dismiss the complaint, arguing that Lola was exempt from FLSA’s overtime rules because he was a licensed attorney engaged in the practice of law. The district court granted the motion, finding (1) state, not federal, standards applied in determining whether an attorney was practicing law under FLSA; (2) North Carolina had the greatest interest in the outcome of the litigation, thus North Carolina’s law should apply; and (3) Lola was engaged in the practice of law as defined by North Carolina law, and was therefore an exempt employee under FLSA.”

While the appellate court agreed with the first two points, it disagreed with the third.  In vacating the judgment of the district court and remanding the matter for further proceedings, the appellate court stated in its ruling:

“The gravamen of Lola’s complaint is that he performed document review under such tight constraints that he exercised no legal judgment whatsoever—he alleges that he used criteria developed by others to simply sort documents into different categories. Accepting those allegations as true, as we must on a motion to dismiss, we find that Lola adequately alleged in his complaint that he failed to exercise any legal judgment in performing his duties for Defendants. A fair reading of the complaint in the light most favorable to Lola is that he provided services that a machine could have provided.”

A link to the appeals court ruling, also available in the article in The Posse List, can be found here.

So, what do you think?  Are document reviewers practicing law?  If not, should they be entitled to overtime pay?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“Da Silva Moore Revisited” Will Be Visited by a Newly Appointed Special Master: eDiscovery Case Law

In Rio Tinto Plc v. Vale S.A., 14 Civ. 3042 (RMB)(AJP) (S.D.N.Y. Jul. 15, 2015), New York Magistrate Judge Andrew J. Peck, at the request of the defendant, entered an Order appointing Maura Grossman as a special master in this case to assist with issues concerning Technology-Assisted Review (TAR).

Back in March (as covered here on this blog), Judge Peck approved the proposed protocol for technology assisted review (TAR) presented by the parties, titling his opinion “Predictive Coding a.k.a. Computer Assisted Review a.k.a. Technology Assisted Review (TAR) — Da Silva Moore Revisited”.  Alas, as some unresolved issues remained regarding the parties’ TAR-based productions, Judge Peck decided to prepare the order appointing Grossman as special master for the case.  Grossman, of course, is a recognized TAR expert, who (along with Gordon Cormack) wrote Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review and also the Grossman-Cormack Glossary of Technology Assisted Review (covered on our blog here).

While noting that it has “no objection to Ms. Grossman’s qualifications”, the plaintiff issued several objections to the appointment, including:

  • The defendant should have agreed much earlier to appointment of a special master: Judge Peck’s response was that “The Court certainly agrees, but as the saying goes, better late than never. There still are issues regarding the parties’ TAR-based productions (including an unresolved issue raised at the most recent conference) about which Ms. Grossman’s expertise will be helpful to the parties and to the Court.”
  • The plaintiff stated a “fear that [Ms. Grossman’s] appointment today will only cause the parties to revisit, rehash, and reargue settled issues”: Judge Peck stated that “the Court will not allow that to happen. As I have stated before, the standard for TAR is not perfection (nor of using the best practices that Ms. Grossman might use in her own firm’s work), but rather what is reasonable and proportional under the circumstances. The same standard will be applied by the special master.”
  • One of the defendant’s lawyers had three conversations with Ms. Grossman about TAR issues: Judge Peck noted that one contact in connection with The Sedona Conference “should or does prevent Ms. Grossman from serving as special master”, and noted that, in the other two, the plaintiff “does not suggest that Ms. Grossman did anything improper in responding to counsel’s question, and Ms. Grossman has made clear that she sees no reason why she cannot serve as a neutral special master”, agreeing with that statement.

Judge Peck did agree with the plaintiff on allocation of the special master’s fees, stating that the defendant’s “propsal [sic] is inconsistent with this Court’s stated requirement in this case that whoever agreed to appointment of a special master would have to agree to pay, subject to the Court reallocating costs if warranted”.

So, what do you think?  Was the appointment of a special master (albeit an eminently qualified one) appropriate at this stage of the case?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Participant Profiles: eDiscovery Trends

When EDRM announced eDiscovery Daily as an Education partner back in March (we covered it here), EDRM agreed to publish our daily posts on the EDRM site and it has been great to publish our content via the leading standards organization for the eDiscovery market!  However, another part of our agreement was for eDiscovery Daily to provide exclusive content to EDRM, including articles sharing real-life examples of organizations using EDRM resources in their own eDiscovery workflows.  Now, our first participant profile is available on the EDRM site and we’re looking for other organizations to share their EDRM experiences!

These profiles are designed to illustrate how participants and their organizations contribute to the success of EDRM as well as how those organizations use EDRM resources in their own businesses.

Our first EDRM Participant Profile Interview is with Seth Magaw. Seth currently serves Ricoh Americas Corporation as Director of eDiscovery Client Services within Ricoh Legal. He is responsible for the development and implementation of service delivery for Ricoh’s electronic discovery hosting services and enhancing the organization’s overall standing in the litigation support industry.  During Seth’s ten years at Ricoh, he has handled many eDiscovery projects, including large forensic collections, ESI and hosting projects. Prior to his current role, Seth has also served Ricoh Legal as Regional Digital Support Project Manager and Digital Sales Analyst.

Ricoh is a global technology and services company and has been a powerful partner to the legal community for more than two decades, earning the trust of clients through experience, expertise and long-term relationships.

In my interview with Seth, he provided some excellent examples of Ricoh’s participation and contributions to EDRM resources and also discussed several of the instances where Ricoh has applied EDRM models and standards within its organization.  Hopefully, the interview with Seth (as well as additional interviews with other EDRM participants to come) will help educate eDiscovery professionals as to how they can use EDRM resources within their own organizations.

The link to Seth’s interview on the EDRM site is here.  I hope you will check it out.

If you are a participant of EDRM and would like to be profiled (or would like to recommend a current EDRM participant to be profiled), please contact George Socha (george@edrm.net), Tom Gelbmann (tom@edrm.net) or me (daustin@cloudnincloudnine.comm) to arrange a profile interview with me to be published on the EDRM site.  We would love for you to share your experiences with EDRM and its resources!

So, what do you think?  Are you an EDRM member and want your organization to be profiled?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“Quality is Job 1” at Ford, Except When it Comes to Self-Collection of Documents: eDiscovery Case Law

In Burd v. Ford Motor Co., Case No. 3:13-cv-20976 (S.D. W. Va. July 8, 2015), West Virginia Magistrate Judge Cheryl A. Eifert granted the plaintiff’s motion for a deposition of a Rule 30(b)(6) witness on the defendant’s search and collection methodology, but did not rule on the issue of whether the defendant had a reasonable collection process or adequate production, denying the plaintiff’s motion as “premature” on that request.

Case Background

In these cases involving alleged events of sudden unintended acceleration in certain Ford vehicles, the plaintiffs, in December 2014, requested regularly scheduled discovery conferences in an effort to expedite what they anticipated would be voluminous discovery.

At the February 10, 2015 conference, the plaintiffs raised concerns regarding the reasonableness of the searches being performed by the defendant in its effort to respond to plaintiffs’ requests for documents.  While conceding that it had not produced e-mail in certain instances, because it did not understand that the request sought e-mail communications, the defendant did indicate that it had conducted a “sweep” of the emails of ten to twenty key custodians.  That “sweep” was described as a “self-selection” process being conducted by the individual employees, who had each been given information about the plaintiffs’ claims, as well as suggested search terms.  However, excerpts of deposition transcripts of defendant’s witnesses provided by the plaintiff revealed that some of those key employees performed limited searches or no searches at all.

Also, the Court ordered the parties to meet, confer, and agree on search terms.  The defendant objected to sharing its search terms, contending that the plaintiff sought improper “discovery on discovery,” and deemed the plaintiff’s request as “overly burdensome” given that each custodian developed their own search terms after discussing the case with counsel.

Judge’s Ruling

Noting that the defendant’s “generic objections to ‘discovery on discovery’ and ‘non-merits’ discovery are outmoded and unpersuasive”, Judge Eifert stated, as follows:

“Here, there have been repeated concerns voiced by Plaintiffs regarding the thoroughness of Ford’s document search, retrieval, and production. Although Ford deflects these concerns with frequent complaints of overly broad and burdensome requests, it has failed to supply any detailed information to support its position. Indeed, Ford has resisted sharing any specific facts regarding its collection of relevant and responsive materials. At the same time that Ford acknowledges the existence of variations in the search terms and processes used by its custodians, along with limitations in some of the searches, it refuses to expressly state the nature of the variations and limitations, instead asserting work product protection. Ford has cloaked the circumstances surrounding its document search and retrieval in secrecy, leading to skepticism about the thoroughness and accuracy of that process. This practice violates ‘the principles of an open, transparent discovery process.’”

Judge Eifert also rejected the defendant’s claim of work product protection regarding the search terms, stating that “[u]ndoubtedly, the search terms used by the custodians and the names of the custodians that ran searches can be disclosed without revealing the substance of discussions with counsel.”  As a result, Judge Eifert granted the plaintiff’s motion for a deposition of a Rule 30(b)(6) witness on the defendant’s search and collection methodology, but did not rule on the issue of whether the defendant had a reasonable collection process or adequate production, denying the plaintiff’s motion as premature on that request.

So, what do you think?  Was the order for a deposition of a Rule 30(b)(6) witness the next appropriate step?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Life is Short, But Can Seem Long if You’re a Cheater About to Be Exposed in the Ashley Madison Hack: eDiscovery Trends

One of the most discussed topics at LegalTech® New York 2015 (LTNY) earlier this year was cybersecurity.  We’ve started covering some of the trends related to security breaches with posts here, here and here and even my hometown baseball team, the Houston Astros, was recently hacked by a competitor.  The latest victims of cyber hacking – the purported 37 million subscribers of the online cheating site AshleyMadison.com – may find little sympathy in their plight.

According to Brian Krebs in Krebs on Security, an authoritative Web site that monitors hacking worldwide, large caches of data  have been stolen from the site and some has been posted online by an individual or group that claims to have completely compromised the company’s user databases, financial records and other proprietary information.  The breach was confirmed in a statement from Toronto-based Avid Life Media Inc. (ALM*), which owns AshleyMadison as well as related hookup sites Cougar Life and Established Men. ALM stated that “We apologize for this unprovoked and criminal intrusion into our customers’ information” and also claimed that “At this time, we have been able to secure our sites, and close the unauthorized access points.”

That’s probably little comfort to the subscribers who have had their personal information compromised.

The hacker or hackers identify themselves as The Impact Team and is threatening to expose all customer records (including “profiles with all the customers’ secret sexual fantasies, nude pictures, and conversations and matching credit card transactions, real names and addresses, and employee documents and emails”) unless ALM takes AshleyMadison and Established Men offline “permanently in all forms.”

As stated in the article in Krebs on Security, “In a long manifesto posted alongside the stolen ALM data, The Impact Team said it decided to publish the information in response to alleged lies ALM told its customers about a service that allows members to completely erase their profile information for a $19 fee.

According to the hackers, although the ‘full delete’ feature that Ashley Madison advertises promises ‘removal of site usage history and personally identifiable information from the site,’ users’ purchase details — including real name and address — aren’t actually scrubbed.”  On Monday, ALM said it would offer all users the ability to fully delete their personal information from the site and waive the fee (presumably fully).

Ashley Madison’s slogan is “Life is short.  Have an affair.®”  For those that have chosen to do so, life may start to seem very long, at least for a while.

So, what do you think?  Is there anything that can be done to stem the tide of data breaches throughout the world?  Please share any comments you might have or if you’d like to know more about a particular topic.

* Not to be confused with American Lawyer Media, which goes by the same acronym.  🙂

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Court Denies Plaintiff’s Request for Spoliation Sanctions, as Most Documents Destroyed Before Duty to Preserve: eDiscovery Case Law

In Giuliani v. Springfield Township, et al., Civil Action No. 10-7518 (E.D.Penn. June 9, 2015), Pennsylvania District Judge Thomas N. O’Neill, Jr. denied the plaintiffs’ motion for spoliation sanctions, finding that the duty to preserve began when the case was filed and finding that “plaintiffs have not shown that defendants had any ill motive or bad intent in failing to retain the documents which plaintiffs seek”.

Case Background

In this harassment and discrimination case, the plaintiff owned land within the defendant’s township and alleged that the defendant’s zoning decisions violated the plaintiff’s civil rights violations. In June 2009, the defendant withdrew its opposition to the plaintiffs’ application for use of the property and its Zoning Hearing Board granted the plaintiffs’ zoning appeal, ending the zoning dispute.   The plaintiff then filed this new complaint against the defendant in January 2011.

The plaintiffs contended that the defendants’ production had been deficient because defendants “provided a miniscule number [of emails] in response to Plaintiffs’ [discovery] request[s] – just 24 emails spanning a seventeen-year period of near-constant controversy.”  In response, the defendants noted that, during the time period relevant to this case, it did not generate large volumes of email and also cited it’s document retention policy, which stated that “e-mail messages and attachments that do not meet the definition of records and are not subject to litigation and other legal proceedings should be deleted immediately after they are read”.

The defendants also did not preserve data relating to the case until the case was filed in 2011, believing that all of the outstanding issues related to the plaintiffs’ land development applications had finally been resolved after the zoning dispute was resolved in 2009.  The plaintiffs disputed that interpretation of when the duty to preserve arose and also pointed out instances where the defendants failed to instruct key custodians to preserve data related to the case.

Judge’s Ruling

With regard to the beginning of the duty to preserve by the defendants, Judge O’Neill stated that “Plaintiffs’ arguments are not sufficient to meet their burden to show that defendants’ duty to preserve files related to other properties, emails or planning commission board minutes was triggered at any time prior to the commencement of this action. They have not set forth any reason why I should disbelieve ‘the Township’s assertion that it had absolutely no reason to anticipate litigation until it was served with the Complaint on January 7, 2011,’…and that in June 2009, ‘with the property being leased in its entirety to one tenant, the Township . . . believed that all disputes with the Giulianis had come to an end.’”

As for alleged preservation failures after the duty to preserve commenced, Judge O’Neill determined that “Plaintiffs have not met their burden to establish that defendants actually suppressed the evidence they seek. At most, defendants lost or deleted the evidence plaintiffs seek as the result of mere inadvertent negligence. Plaintiffs have not set forth any proof that defendants in fact failed to preserve emails, documents relating to other properties or Planning Commission Board Minutes at any time after January 7, 2011…Further plaintiffs have not shown that defendants had any ill motive or bad intent in failing to retain the documents which plaintiffs seek.”  As a result, Judge O’Neill denied the plaintiffs’ motion for spoliation sanctions.

So, what do you think?  Should the duty to preserve have been applied earlier?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control, Making Sure the Numbers Add Up: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

Friday, we wrote about tracking file counts from collection to production, the concept of expanded file counts, and the categorization of files during processing.  Today, let’s walk through a scenario to show how the files collected are accounted for during the discovery process.

Tracking the Counts after Processing

We discussed the typical categories of excluded files after processing – obviously, what’s not excluded is available for searching and review.  Even if your approach includes technology assisted review (TAR) as part of your methodology, it’s still likely that you will want to do some culling out of files that are clearly non-responsive.

Documents during review may be classified in a number of ways, but the most common ways to classify documents as to whether they are responsive, non-responsive, or privileged.  Privileged documents are also often classified as responsive or non-responsive, so that only the responsive documents that are privileged need be identified on a privilege log.  Responsive documents that are not privileged are then produced to opposing counsel.

Example of File Count Tracking

So, now that we’ve discussed the various categories for tracking files from collection to production, let’s walk through a fairly simple eMail based example.  We conduct a fairly targeted collection of a PST file from each of seven custodians in a given case.  The relevant time period for the case is January 1, 2013 through December 31, 2014.  Other than date range, we plan to do no other filtering of files during processing.  Identified duplicates will not be reviewed or produced.  We’re going to provide an exception log to opposing counsel for any file that cannot be processed and a privilege log for any responsive files that are privileged.  Here’s what this collection might look like:

  • Collected Files: After expansion and processing, 7 PST files expand to 101,852 eMails and attachments.
  • Filtered Files: Filtering eMails outside of the relevant date range eliminates 23,564
  • Remaining Files after Filtering: After filtering, there are 78,288 files to be processed.
  • NIST/System Files: eMail collections typically don’t have NIST or system files, so we’ll assume zero (0) files here. Collections with loose electronic documents from hard drives typically contain some NIST and system files.
  • Exception Files: Let’s assume that a little less than 1% of the collection (912) is exception files like password protected, corrupted or empty files.
  • Duplicate Files: It’s fairly common for approximately 30% or more of the collection to include duplicates, so we’ll assume 24,215 files here.
  • Remaining Files after Processing: We have 53,161 files left after subtracting NIST/System, Exception and Duplicate files from the total files after filtering.
  • Files Culled During Searching: If we assume that we are able to cull out 67% (approximately 2/3 of the collection) as clearly non-responsive, we are able to cull out 35,618.
  • Remaining Files for Review: After culling, we have 17,543 files that will actually require review (whether manual or via a TAR approach).
  • Files Tagged as Non-Responsive: If approximately 40% of the document collection is tagged as non-responsive, that would be 7,017 files tagged as such.
  • Remaining Files Tagged as Responsive: After QC to ensure that all documents are either tagged as responsive or non-responsive, this leaves 10,526 documents as responsive.
  • Responsive Files Tagged as Privileged: If roughly 8% of the responsive documents are determined to be privileged during review, that would be 842 privileged documents.
  • Produced Files: After subtracting the privileged files, we’re left with 9,684 responsive, non-privileged files to be produced to opposing counsel.

The percentages I used for estimating the counts at each stage are just examples, so don’t get too hung up on them.  The key is to note the numbers in red above.  Excluding the interim counts in black, the counts in red represent the different categories for the file collection – each file should wind up in one of these totals.  What happens if you add the counts in red together?  You should get 101,852 – the number of collected files after expanding the PST files.  As a result, every one of the collected files is accounted for and none “slips through the cracks” during discovery.  That’s the way it should be.  If not, investigation is required to determine where files were missed.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Quality Control By The Numbers: eDiscovery Best Practices

Having touched on this topic a few years ago, a recent client experience spurred me to revisit it.

A while back, we wrote about Quality Assurance (QA) and Quality Control (QC) in the eDiscovery process.  Both are important in improving the quality of work product and making the eDiscovery process more defensible overall.  With regard to QC, an overall QC mechanism is tracking of document counts through the discovery process, especially from collection to production, to identify how every collected file was handled and why each non-produced document was not produced.

Expanded File Counts

Scanned counts of files collected are not the same as expanded file counts.  There are certain container file types, like Outlook PST files and ZIP archives that exist essentially to store a collection of other files.  So, the count that is important to track is the “expanded” file count after processing, which includes all of the files contained within the container files.  So, in a simple scenario where you collect Outlook PST files from seven custodians, the actual number of documents (emails and attachments) within those PST files could be in the tens of thousands.  That’s the starting count that matters if your goal is to account for every document or file in the discovery process.

Categorization of Files During Processing

Of course, not every document gets reviewed or even included in the search process.  During processing, files are usually categorized, with some categories of files usually being set aside and excluded from review.  Here are some typical categories of excluded files in most collections:

  • Filtered Files: Some files may be collected, and then filtered during processing. A common filter for the file collection is the relevant date range of the case.  If you’re collecting custodians’ source PST files, those may include messages outside the relevant date range; if so, those messages may need to be filtered out of the review set.  Files may also be filtered based on type of file or other reasons for exclusion.
  • NIST and System Files: Many file collections also contain system files, like executable files (EXEs) or Dynamic Link Library (DLLs) that are part of the software on a computer which do not contain client data, so those are typically excluded from the review set. NIST files are included on the National Institute of Standards and Technology list of files that are known to have no evidentiary value, so any files in the collection matching those on the list are “De-NISTed”.
  • Exception Files: These are files that cannot be processed or indexed, for whatever reason. For example, they may be password-protected or corrupted.  Just because these files cannot be processed doesn’t mean they can be ignored, depending on your agreement with opposing counsel, you may need to at least provide a list of them on an exception log to prove they were addressed, if not attempt to repair them or make them accessible (BTW, it’s good to establish that agreement for disposition of exception files up front).
  • Duplicate Files: During processing, files that are exact duplicates may be put aside to avoid redundant review (and potential inconsistencies). Some exact duplicates are typically identified based on the HASH value, which is a digital fingerprint generated based on the content and format of the file – if two files have the same HASH value, they have the same exact content and format.  Emails (and their attachments) may be identified as duplicates based on key metadata fields, so an attachment cannot be “de-duped” out of the collection by a standalone copy of the same file.

All of these categories of excluded files can reduce the set of files to actually be searched and reviewed.  On Monday, we’ll illustrate an example of a file set from collection to production to illustrate how each file is accounted for during the discovery process.

So, what do you think?  Do you have a plan for accounting for all collected files during discovery?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

This Study Discusses the Benefits of Including Metadata in Machine Learning for TAR: eDiscovery Trends

A month ago, we discussed the Discovery of Electronically Stored Information (DESI) workshop and the papers describing research or practice presented at the workshop that was held earlier this month and we covered one of those papers a couple of weeks later.  Today, let’s cover another paper from the study.

The Role of Metadata in Machine Learning for Technology Assisted Review (by Amanda Jones, Marzieh Bazrafshan, Fernando Delgado, Tania Lihatsh and Tamara Schuyler) attempts to study the  role of metadata in machine learning for technology assisted review (TAR), particularly with respect to the algorithm development process.

Let’s face it, we all generally agree that metadata is a critical component of ESI for eDiscovery.  But, opinions are mixed as to its value in the TAR process.  For example, the Grossman-Cormack Glossary of Technology Assisted Review (which we covered here in 2012) includes metadata as one of the “typical” identified features of a document that are used as input to a machine learning algorithm.  However, a couple of eDiscovery software vendors have both produced documentation stating that “machine learning systems typically rely upon extracted text only and that experts engaged in providing document assessments for training should, therefore, avoid considering metadata values in making responsiveness calls”.

So, the authors decided to conduct a study that established the potential benefit of incorporating metadata into TAR algorithm development processes, as well as evaluate the benefits of using extended metadata and also using the field origins of that metadata.  Extended metadata fields included Primary Custodian, Record Type, Attachment Name, Bates Start, Company/Organization, Native File Size, Parent Date and Family Count, to name a few.  They evaluated three distinct data sets (one drawn from Topic 301 of the TREC 2010 Interactive Task, two other proprietary business data sets) and generated a random sample of 4,500 individual documents for each (split into a 3,000 document Control Set and a 1,500 document Training Set).

The metric they used throughout to compare model performance is Area Under the Receiver Operating Characteristic Curve (AUROC). Say what?  According to the report, the metric indicates the probability that a given model will assign a higher ranking to a randomly selected responsive document than a randomly selected non-responsive document.

As indicated by the graphic above, their findings were that incorporating metadata as an integral component of machine learning processes for TAR improved results (based on the AUROC metric).  Particularly, models incorporating Extended metadata significantly outperformed models based on body text alone in each condition for every data set.  While there’s still a lot to learn about the use of metadata in modeling for TAR, it’s an interesting study and start to the discussion.

A copy of the twelve page study (including Bibliography and Appendix) is available here.  There is also a link to the PowerPoint presentation file from the workshop, which is a condensed way to look at the study, if desired.

So, what do you think?  Do you agree with the report’s findings?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Similar Spoliation Case, Somewhat Different Outcome: eDiscovery Case Law

Remember the Malibu Media, LLC v. Tashiro case that we covered a couple of weeks ago, which involved spoliation sanctions against a couple accused of downloading its copyrighted adult movies via a BitTorrent client?  Here’s a similar case with the same plaintiff and similar spoliation claims, but with a somewhat different outcome (at least for now).

In Malibu Media, LLC v. Michael Harrison, Case No. 12-cv-1117 (S.D. Ind. June 8, 2015), Indiana District Judge William T. Lawrence denied the plaintiff’s motion for summary judgment, upholding the magistrate judge’s ruling which found an adverse inference instruction for destroying a hard drive with potentially responsive data on it to be not warranted, and ruled that “it will be for a jury to decide” if such a sanction is appropriate.

Case Background

The plaintiff alleged that the defendant installed a BitTorrent Client onto his computer and then went to a torrent site to upload and download its copyrighted Work, specifically, six adult films (or portions thereof).  As in the Tashiro case, the plaintiff used a German company to identify certain IP addresses that were being used to distribute the plaintiff’s copyrighted movies, and the defendant was eventually identified by Comcast as the subscriber assigned to this particular IP address.

After the lawsuit was filed, in January 2013, the defendant’s hard drive on his custom-built gaming computer crashed and he took it to an electronics recycling company, to have it “melted”. He then replaced the gaming computer’s hard drive. In addition to his gaming computer, the defendant also had another laptop. During discovery, that laptop and the new hard drive were examined by forensic experts; while the laptop revealed extensive BitTorrent use, it did not contain any of the plaintiff’s movies or files and the new hard drive did not reveal any evidence of BitTorrent use.  Nonetheless, because of the destroyed hard drive, the plaintiff filed a motion for sanctions for the Intentional Destruction of Material Evidence, as well as a motion for summary judgment.

In an evidentiary hearing in December 2014, the magistrate judge recommended that the motion for sanctions be denied, concluding that the defendant “did not destroy the hard drive in bad faith”, that “[h]ad [Harrison] truly wished to hid adverse information, the Court finds it unlikely that [Harrison] would have waited nearly five months to destroy such information” and noted that he found the defendant’s testimony to be credible.  The plaintiff filed an objection to that report and recommendation, arguing that “bad faith should be inferred from the undisputed evidence.”

Judge’s Ruling

Regarding both the summary judgment motion and the motion for sanctions, Judge Lawrence stated the following:

“The Court agrees with Magistrate Judge Dinsmore that default judgment was not warranted in this case. That said, Magistrate Judge Dinsmore found an adverse inference not to be warranted because he found Harrison’s testimony to be credible. While the Court does not necessarily disagree with Magistrate Judge Dinsmore—in that it is certainly possible a jury would find Harrison’s testimony to be credible—ultimately, the Court believes this is an issue best left for a jury to decide. Malibu Media has presented sufficient evidence to the contrary, and in light of the fact that Malibu Media’s motion for summary judgment was denied on the same grounds, the Court believes leaving the issue of spoliation to the jury to be the best approach. Accordingly, at trial the Court will instruct the jury that if it finds that Harrison destroyed the gaming computer’s hard drive in bad faith, it can assume that the evidence on the gaming computer’s hard drive would have been unfavorable to Harrison.”

So, what do you think?  Should this case have been handled the same way the Malibu Media, LLC v. Tashiro case was handled?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.