Electronic Discovery

Wednesday’s ILTA Sessions – eDiscovery Trends

Usually I write these blog posts early and schedule them to post in the middle of the night.  However, this is Vegas and it is the middle of the night, so I don’t have to schedule the post.  Viva Las Vegas!  🙂

As noted Monday and yesterday, the International Legal Technology Association (ILTA) annual educational conference of 2013 is happening this week and eDiscoveryDaily is here to report about the latest eDiscovery trends being discussed at the show.  There’s still time to check out the show if you’re in the Las Vegas area with a number of sessions available and over 180(!) exhibitors providing information on their products and services.  Here are today’s sessions in the main conference tracks related to litigation support and eDiscovery.

11:00 AM – 12:30 PM:

Hot Topics in E-Discovery

Description: In this open forum discussion, you’ll get to participate in a topic-based dialogue with peers and colleagues on the hottest issues and trends in e-discovery. ILTA members can vote on and influence the list of topics in the weeks leading up to the conference. While loosely moderated, audience participation and questions are encouraged and will drive this session!

Speakers are: David Cowen – The Cowen Group; Steven L. Clark – Lathrop & Gage LLP.

1:30 PM – 2:30 PM:

Technology and Better Project Management

Description: Keeping up with the ever-changing technology around collecting, processing and reviewing data poses a huge challenge to many case teams. The technology is clearly part of the process, but the question remains: Is the process driven by the technology or by principles of project management? Those working to form a litigation support department or those struggling to set up a project plan will get answers here, as we focus on the process of using technology in different ways to develop a straightforward project plan that the case team can use and rely on from case to case.

Speakers are: Cindy MacBean – Watt, Tieder, Hoffar & Fitzgerald; Gordon Moffat – Baker Donelson Bearman Caldwell & Berkowitz; Duane Lites – Jackson Walker L.L.P.; Chad Papenfuss – Kirkland & Ellis LLP.

3:30 PM – 4:30 PM:

A Numbers Game: The Value of E-Discovery Metrics

Description: Einstein once said, “Not everything that counts can be counted, and not everything that can be counted counts.” Come find out which e-discovery metrics really count. Corporate counsel want more certainty surrounding the time and cost of e-discovery, and our panelists will share their experiences implementing e-discovery metrics and lessons learned. Join us as we explore what to measure, how to collect data and what key metrics have added value for clients.

Speakers are: Florinda Baldridge – Norton Rose Fulbright; Browning E. Marean – DLA Piper; Beth Patterson – Allens; William W. Belt – Deloitte.

For a complete listing of all sessions at the conference, click here.

So, what do you think?  Are you planning to attend ILTA this year?  You’re running out of time!  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Tuesday’s ILTA Sessions – eDiscovery Trends

As noted yesterday, the International Legal Technology Association (ILTA) annual educational conference of 2013 is happening this week and eDiscoveryDaily is here to report about the latest eDiscovery trends being discussed at the show.  There’s still time to check out the show if you’re in the Las Vegas area with a number of sessions available and over 280(!) exhibitors providing information on their products and services.  Here are today’s sessions in the main conference tracks related to litigation support and eDiscovery.

8:30 AM – 10:00 AM & 11:00 AM – 12:30 PM (2 part session):

Technology-Assisted Review: A Hands-On Case Study

Description: It’s clear corporations and law firms are increasing their use of computer-assisted review/predictive coding.  That’s why you should join us for a hands-on walk through of computer-assisted review from start to finish, looking at all aspects of the work flow.  We’ll include presentations and exercises that teach attendees about reviewer preparation, training sets, statistical sampling, and validation, making this a can’t-miss session for those who are predictive-coding challenged.

Speakers are: Candi Smith – Winston & Strawn LLP; Andrea Garlanger – Relativity by kCura; Constantine Pappas – Relativity by kCura.

11:00 AM – 12:30 PM:

Predictive Coding Technologies for Information Management Purposes…Could It Be?

Description: Predictive coding is THE buzz word on the streets. This buzz has been focused on new technology capabilities to support e-discovery-related tasks, but we’ll challenge attendees to think outside the box and look at the problems that can be solved by leveraging these technologies for information governance. Attendees will be presented with a different journey through the EDRM model –– this time, starting from the left. Our panelists will present thought-provoking suggestions for innovation, balanced with case studies of firms successfully leveraging these technologies. Take a journey from the practical to the “if I had a magic wand,” and leave with cutting-edge information!

Speakers are: Rudy Moliere – Morgan, Lewis & Bockius, L.L.P.; Bennett Borden – Drinker Biddle & Reath LLP; Kathleen Jimenez – Orrick, Herrington & Sutcliffe LLP.

1:30 PM – 2:30 PM:

E-Discovery Pricing Predictability: An Ongoing Debate

Description: Attend a candid discussion about the world of fixed-fee billing, as it works for some and not for others. Some in the corporate world think there is value in vendor RFP competition, while others believe the improved consistency and aggregation of wholesale purchasing power is more advantageous. A panel of peers will debate the two views. You’ll also hear how to arrive at reasonable pricing assumptions and considerations, when fixed fees are advantageous for everyone involved, and how to negotiate fixed fees.

Speakers are: Eric Lieber – Toyota Motor Sales; Kathryn Goetz – Qualcomm; Jennifer Hamilton – Deere & Company; Gene Eames – Pfizer Inc; Rose Jones – King & Spalding LLP.

3:30 PM – 4:30 PM:

Get Invited to Discovery-Management Meet-and-Confer Meetings with No Regrets

Description: Ming the Merciless once said: “Pathetic earthlings. Hurling your bodies out into the void [of knowledge about meet-and-confers], without the slightest inkling of who or what is out here. If you had known anything about the true nature of the universe [of discovery management issues and dangers], anything at all, you would’ve hidden from it in terror.” Two industry experts with an aggregate of over 50 years of experience will share critical mistakes they witnessed during meet-and-confer meetings: terrified, hurtling bodies – and a mess they had to clean up.

Speakers are: Thomas Morrissey – Purdue Pharma L.P.; J. William Speros – Speros and Associates, LLC.

For a complete listing of all sessions at the conference, click here.

So, what do you think?  Are you planning to attend ILTA this year?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Welcome to ILTA 2013! – eDiscovery Trends

As we previewed on Friday, the International Legal Technology Association (ILTA) annual educational conference of 2013 kicked off yesterday with several networking events, and begins in earnest today with the first day of sessions.  eDiscoveryDaily is here to report about the latest eDiscovery trends being discussed at the show.  Over the next four days, we will provide a description each day of some of the sessions related to eDiscovery to give you a sense of the topics being covered.

If you’re in the Las Vegas area, come check out the show – there are a number of sessions available and over 180(!) exhibitors providing information on their products and services.  As for the conference, there is plenty to talk about as well.  Sessions in the main conference tracks related to litigation support and eDiscovery include:

11:00 AM – 12:00 PM:

If I Were in Your Shoes…Strengthening Partner Relationships

Description: Our panel of litigation support managers from both corporate law departments and law firms will discuss improving working relationships among law firms, their corporate clients and vendors. We’ll take a closer look at the client’s point of view and how firms can deliver superior services. Let’s examine how we can all communicate more effectively and strengthen the relationships between firms and clients.

Speakers are: Scott M. Cohen – Winston & Strawn LLP; Eric Lieber – Toyota Motor Sales; Vanessa Lozzi – Flagstar Bank; Andre Guilbeau – Kiersted Systems.

What Litigation Support Professionals Need To Know About Information Governance

Description: That drive of client-provided ESI you just transferred to the network could contain sensitive information that is a treasure trove for hackers and a huge risk for law firms. What makes firms potentially easy targets, and how can you respond? Litigation support professionals who receive, store, process and transfer data are often the most important line of defense in protecting the client and the firm. Come learn about potential risks, mitigation strategies and how litigation support professionals are partnering with their information governance peers to help mitigate risks.

Speakers are: Rudy Moliere – Morgan, Lewis & Bockius, L.L.P.; William Hamilton – Quarles & Brady LLP; Dera Jardine Nevin – TD Bank.

1:00 PM – 2:00 PM:

eDiscovery Features of Exchange 2013 and SharePoint 2013

Description: Get the inside scoop from Microsoft about how the new eDiscovery Center works to aid in content collection, legal holds, etc. in both Exchange 2013 and SharePoint 2013.

Speaker is: Paul Branson – Microsoft Corporation.

4:00 PM – 5:00 PM:

So You’ve Done a Few Predictive Coding Projects…

Description: Though the newness of technology-assisted review (TAR) is still present, this panel of experts has many projects under its combined belts. This advanced conversation about the use of TAR will demystify for some and provide good insight for all on the practical use of various workflow strategies, the practical development of workflow and protocols, and tips on implementation.

Speakers are: Greg R. Chan – Bingham McCutchen LLP; Brian Evans – Norton Rose Fulbright; Paige Hunt Wojcik – Perkins Coie; Rachel Rubenson – Barclays Bank PLC.

For a complete listing of all sessions at the conference, click here.

So, what do you think?  Are you planning to attend ILTA this year?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

ILTA: A Catalyst to Legal Technology Education – eDiscovery Trends

For over three decades, the International Legal Technology Association has led the way in sharing knowledge and experience for those faced with challenges in their firms and legal departments.  As part of that effort, they conduct an educational conference each year to provide information to legal technology professionals.  That conference (ILTA 2013: The Catalyst) is next week in Las Vegas at Caesars Palace.  Here’s a preview.

As ILTA states in its overview for the conference: A catalyst can be defined as something or someone that causes a reaction or activity between two or more things to create something new.  Via its four-day educational conference with over 200 peer-developed educational sessions, plenty of networking opportunities and more than 200 exhibiting vendors, ILTA is betting (because it’s in Vegas, get it?) that the conference will be a catalyst for its attendees.

I’ve been attending this conference since last century, when it was known as “Lawnet”.  In my experience, it has always been an informative and well attended show.

eDiscovery Daily will be there, reporting from the show to provide information about sessions and general trends observed in those sessions and within the exhibit hall.  There are at least 14 sessions related to eDiscovery and litigation support topics, so there will be plenty to discuss!

If you haven’t registered to attend, but wish to do so, you can register here.

Word of warning: Caesars Palace is not pager friendly, so be prepared to adjust accordingly.  🙂

So, what do you think?  Do you plan to attend ILTA this year?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Default Judgment Sanction Upheld on Appeal – eDiscovery Case Law

In Stooksbury v. Ross, Nos. 12-5739/12-6042/12-6230, No. 13a0575n.06 (6th Cir. June 13, 2013), the Sixth Circuit upheld the entry of default judgment as a sanction against defendants that repeatedly failed to comply with discovery obligations, including producing a “document dump” of tens of thousands of pages of nonresponsive information that prejudiced the plaintiffs.

At trial in this RICO action, the court found the defendants engaged in “contumacious conduct” and intentionally delayed discovery. Although the defendants had provided a document “dump” of 40,000 pages of documents in response to document requests, the information was not responsive to the requests, lacked important financial information, was not Bates stamped, and prejudiced the plaintiff. The plaintiffs asked the court to sanction the defendants, and the magistrate judge recommended default judgment in favor of the plaintiff. The judge found that the defendant had a “‘total lack of forthrightness’” in refusing to comply and explain their noncompliance; this conduct “‘amount[ed] to bad faith and a willful decision not to cooperate in discovery.’” The district court subsequently adopted the magistrate judge’s findings but awarded costs and fees instead of a default judgment. The court also afforded the defendants 10 more days to comply with the discovery order, warning them that noncompliance could result in further sanctions.

Despite the additional time and warning, the defendants still failed to provide responsive discovery: “[T]hey included boilerplate objections and failed to provide basic accounting documents or Bates stamp references for the earlier document dump.” As a result, when the plaintiff renewed his motion for a default judgment, the court granted it. The district court relied on four findings: “(1) the defendants intentionally failed to comply with the discovery orders, (2) they failed to heed the court’s warning, (3) the plaintiff suffered prejudice as a result of their noncompliance, and (4) less drastic sanctions would not be effective.” The defendants later objected, but the court refused to reconsider its motion, finding there was no evidence that the defendants’ actions stemmed from excusable neglect.

The defendants appealed this decision. The court reviewed the district court’s four findings for an abuse of discretion. The Sixth Circuit approved the lower court’s decision for the following reasons:

  • First, the defendants met the standard for “willful conduct and bad faith” because they “lacked forthrightness, failed to directly respond to the Court’s inquiries about the discovery matters, offered no explanation for their lack of compliance, and demonstrated ‘bad faith and a willful decision not to cooperate in discovery.’”
  • Second, the plaintiff was prejudiced because the dispute had continued for more than a year, despite judicial intervention and two continuances. Further, the “discovery abuses imposed excessive costs on Plaintiff, who had to sort through the document dump, and undermined Plaintiff’s proof on the issue of liability.”
  • Third, the defendants were fairly warned about the possibility of sanctions, including a default judgment, by the magistrate judge and district court.
  • Fourth, the court first issued a less severe sanction and warned the defendants of the possibility of the default if they did not meet their discovery obligations. Nevertheless, the defendants “forced the district court’s hand in ordering the default judgment.”

Accordingly, the court ruled there was no abuse of discretion.

So, what do you think?  Was the default judgment sanction warranted?   Please share any comments you might have or if you’d like to know more about a particular topic.

Case Summary Source: Applied Discovery (free subscription required).  For eDiscovery news and best practices, check out the Applied Discovery Blog here.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

EDRM Wants You! – eDiscovery Trends

A lot is happening in the Electronic Discovery Reference Model (EDRM) group lately and this blog has reported several accomplishments in just the last few months.  With so much going on, you would think they don’t need any help to get things done, but, in fact, EDRM wants your help.

In their latest press release, EDRM has announced its fall campaign for new members. As the press release states, EDRM is offering memberships to individuals and organizations that wish to contribute to the overall improvement of the electronic discovery process by participating in the development and delivery of guidelines, standards, and new resources to the electronic discovery industry.

Since its inception in 2005, EDRM has comprised more than 260 member organizations representing every aspect of eDiscovery and information governance. Attorneys, IT professionals, litigation, and eDiscovery directors and others from corporations, law firms, government, consulting firms, software companies, and service providers are welcome to join EDRM. Members select projects in which to participate based on their individual areas of interest.

The objective of the EDRM Membership Drive is to expand the array of talent and expertise to continue development of practical resources from EDRM by broadening membership from all areas of the electronic discovery industry: providers of software and services, corporations, law firms, educational institutions, and individuals.

Having been a member for most of the 8+ years since EDRM was founded, I can personally say that participating in EDRM is rewarding, not only from a standpoint of helping to shape the direction of the industry, but also in terms of the ability to network with other industry professionals.  It appears that despite the fact that more than half the attendees at this year’s annual meeting were first time attendees, EDRM is still looking for more new members.

Information about EDRM memberships is available here. EDRM will also be hosting a series of webinars in the coming weeks to provide information about the organization and current opportunities for participation to individuals and organizations interested in learning more or considering a new membership.

Since the annual meeting back in May, several EDRM projects (Metrics, Jobs, Data Set and the new Native Files project) have already announced new deliverables and/or requested feedback.  With so much going on and the Mid-Year meeting coming in October (9th through 11th), now is a great time to get involved.

So, what do you think?  Are you a member of EDRM or another organization focused on eDiscovery best practices?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Uninformed Attorneys Are Not in Kansas Anymore – eDiscovery Trends

Well, at least, they have additional resources to become better informed…

“Since March 2012, the U.S. District Court for the District of Kansas has been involved in an intense effort to find ways to ensure that civil litigation actually is handled in the “just, speedy, and inexpensive” manner contemplated by Rule 1 of the Federal Rules of Civil Procedure.”  That quote is from a Rule 1 Task Force Update, issued by the U.S. District Court in Kansas regarding efforts to create newly released guidelines for electronic data discovery.

This Rule 1 project was “spearheaded” by the court’s Bench-Bar Committee of three lawyers and two federal judges, working “in close consultation with two nationally recognized experts on the federal rules and a diverse assemblage of experienced and respected trial lawyers from throughout Kansas”.  There were six working groups formed to make recommendations in the following areas (nearly all of the recommendations were approved by the Bench-Bar Committee and in turn by the court):

  1. Overall civil case management;
  2. Discovery involving electronically stored information (ESI);
  3. Traditional non-ESI discovery;
  4. Dispositive-motion practice;
  5. Trial scheduling and procedures; and
  6. Professionalism and sanctions.

The guidelines promote limiting the scope of eDiscovery and resolving of discovery disputes without judicial intervention, stating “The failure of counsel or the parties in litigation to cooperate in facilitating and reasonably limiting discovery requests and responses increases litigation costs and contributes to the risk of sanctions.”  The guidelines also recommend native productions (over spending time or money to convert documents to PDF or TIFF format), production of documents with non-privileged metadata intact and appointment of an eDiscovery liaison who is both familiar with the party’s ESI systems and capabilities and eDiscovery knowledgeable to facilitate the process and participate in dispute resolution.

The Rule 1 Task Force documents include:

  • Initial Order Regarding Planning and Scheduling: Two page model order with fill-in-the-blank sections for customized info;
  • Rule 26(f) Report of Parties’ Planning Conference: Ten page model filing of a sample Rule 26(f) report;
  • Scheduling Order: Twelve page model scheduling order;
  • Pretrial Order: A one page Pretrial Order form, followed by a seven page pretrial order (8 pages total);
  • Guidelines for Cases Involving Electronically Stored Information (ESI): A ten page set of guidelines, followed by a two page appendix, containing a reprint of a 2008 article by Craig Ball (Ask and Answer the Right Questions in EDD) with 50(!) questions to ask your opponent;
  • Guidelines for Agreed Protective Orders (with pre-approved form order): Four pages of guidelines, followed by a one page instruction on use of the model form protective order and the thirteen page model order;
  • Summary Judgment Guidelines: Two page list of summary judgment guidelines;
  • Proposed Technical Amendments to Local Rules: Seven pages of local rules with amendments (including strikeouts of words and sentences) as appropriate.

Oddly enough, the task force documents are in imaged, but text-enabled (with OCR) PDF form.  It would be great if they could provide a fully electronic PDF form for the documents, even better if they could provide form enabled versions of the model orders.  Just sayin’.

So, what do you think?  How do the Kansas guidelines compare to those for your state?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Can You Figure Out How I Wrote this Blog Post? – eDiscovery Trends

I have to be honest, this blog post contains quite a bit of content from one of the early posts from this blog.  However, there is something different about this version of the content – it looks a bit unusual.  Can you figure out how I wrote it?  See if you can figure it out before you get to the bottom.  I promise I haven’t lost my mind.

Types of exceptions file

It’s important to note that efforts to quote fix quote these files will often change the files parentheses and the meta data associated with them parentheses, so it’s important to establish with opposing counsel what measures to address the exceptions are acceptable. Some files may not be recoverable and you need to agree up front how far to go to attempt to recover them.

  • Corrupted files colon files can become corrupted 4 a variety of reasons, from application failures 2 system crashes to computer viruses. I recently had a case where 40 percent of the collection what’s contained in to corrupt Outlook PST file dash fortunately, we were able to repair those files and recover the messages. If you have read Lee accessible backups of the files, try to restore them from backup. If not, you will need to try using a repair utility. Outlook comes with a utility called scan PST. Exe that scans and repairs PST and OST file, and there are utilities parenthesis including freeware utilities parenthesis available via the web foremost file types. If all else fails, you can hire a-data recovery expert, but that can get very expensive.
  • Password protected files colon most collections usually contain at least some password protected files. Files can require a password to enable them to be edited, or even just to view them. As the most popular publication format, PDF files are often password protected from editing, but they can still be feud 2 support review parenthesis though some search engines May fail to index them parenthesis. If a file is password protected, you can try to obtain the password from the custodian providing the file dash if the custodian is unavailable or unable to remember the password, you can try a password cracking application, which will run through a series of character combinations to attempt to find the password. Be patient, it takes time, and doesn’t always succeed.
  • Unsupported file types corn in most collections, there are some unusual file types that art supported by the review application, such as file for legacy or specialized applications parenthesis E. G. AutoCAD for engineering drawing parenthesis. You may not even initially no what type of files they are semi colon if not, you can find out based on file extension by looking the file extension up in file ext. If your review application can’t read the file, it also can’t index the files for searching or display them 4 review. If those file maybe responses 2 discovery requests, review them with the natives application to determine they’re relevancy.
  • No dash text file colon files with no searchable text aren’t really exceptions dash they have to be accounted for, but they won’t be retrieved in searches, so it’s important to make sure they don’t quote slip through the cracks unquote. It’s common to perform optical character recognition parenthesis Boosie are parenthesis on Tiff files and image only PDF files, because they are common document 4 minutes. Other types of no text files, such as pictures in JTAG or PNG format, are usually not oser, unless there is an expectation that they will have significant text.

Did you figure it out?  I “dictated” the above content using speech-to-text on my phone, a Samsung Galaxy 3.  I duplicated the formatting from the earlier post, but left the text the way that the phone “heard” it.  Some of the choices it made were interesting: it understands “period” and “comma” as punctuation, but not “colon”, “quote” or “parenthesis”.  Words like “viewed” became “feud”, “readily” became “read Lee” and “OCR” became “Boosie are”.  It also often either dropped or added an “s” to words that I spoke.

These days, more ESI is discoverable from sources that are non-formalized, including texts and “tweets”.  Acronyms and abbreviations (and frequent misspelling of words) is common in these data sources (whether typed or through bad dictation), which makes searching them for responsive information very challenging.  You need to get creative when searching these sources and use mechanisms such as conceptual clustering to group similar documents together, as well as stemming and fuzzy searching to find variations and misspellings of words.

Want to see the original version of the post?  Here it is.

So, what do you think?  How do you handle informal communications, like texts and “tweets”, in your searching of ESI?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

A Technical Explanation of Near-Dupes – eDiscovery Tutorial

Bill Dimm provides a comprehensive and interesting description of near-dupes and the algorithms used to identify them in his Clustify blog (What is a near-dupe, really?).  If you want to understand the “three reasonable, but different, ways of defining the near-dupe similarity between two documents”, bring your brain and check it out.

As we discussed last month, just because information volume in most organizations doubles every 18-24 months doesn’t mean that it’s all original.  When reviewers are reviewing the same data again and again, it’s unnecessarily expensive and prone to mistakes.

As Bill notes in his post, “Near-duplicates are documents that are nearly, but not exactly, the same.  They could be different revisions of a memo where a few typos were fixed or a few sentences were added.  They could be an original email and a reply that quotes the original and adds a few sentences.  They could be a Microsoft Word document and a printout of the same document that was scanned and OCRed with a few words not matching due to OCR errors.”  I also classify examples such as a Word document published to an Adobe PDF file (where the content is the same, but the file format is different, so the hash value will be different) as near-duplicates because they won’t be de-duped with an MD5 or SHA-1 hash algorithm at the file level.  You need an algorithm that looks for similarity in the document content.

Identifying near-duplicates that contain almost the same information reduces redundant review and saves costs.  A recent client of mine had over 800,000 emails belonging to near-duplicate groupings that would have been impossible to identify without an effective algorithm to group them together.

Bill’s blog post goes on to discuss different methods for measuring similarity using mechanisms like a Jaccard index and a MinHash algorithm which counts shingles (don’t worry, they’re neither painful nor scaly).  Understanding how your near-dupe software works is important.  As Bill notes, “If misunderstandings about how the algorithm works cause the similarity values generated by the software to be higher than you expected when you chose the similarity threshold, you risk tagging near-dupes of non-responsive documents incorrectly (grouped documents are not as similar as you expected).  If the similarity values are lower than you expected when you chose the threshold, you risk failing to group some highly similar documents together, which leads to less efficient review (extra groups to review).”  His post is an excellent primer to developing that understanding.

So, what do you think?  Do you have a plan for handling near-duplicates in your collection?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

I Removed a Virus, Did I Just Violate My Discovery Agreement? – eDiscovery Best Practices

As we discussed last month, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started.  Electronic files are diverse, they can represent a whole collection of issues to address in order to process them for loading, and processing them effectively requires a sound process.  But, what if the evidentiary files you collect from your custodians contain viruses or other malware?

It’s common to refer to all types of malware as “viruses”, but a computer virus is only one type of malware.  Malware includes computer viruses, worms, trojan horses, spyware, dishonest adware, scareware, crimeware, most rootkits, and other malicious and unwanted software or program.  A report from 2008 stated that more malicious code and other unwanted programs was being created than legitimate software applications.  If you’ve ever had to attempt to remove files from an infected computer, you’ve seen just how prolific different types of malware can be.

Having worked with a lot of clients who don’t understand why it can take time to get ESI processed and loaded into their review platform, I’ve had to spend some time educating those clients as to the various processes required (including those we discussed last month).  Before any of those processes can happen, you must first scan the files for viruses and other malware that may be infecting those files.  If malware is found in any files, one of two things must happen:

  • Attempt to remove the malware with virus protection software, or
  • Isolate and log the infected files as exceptions (which you will also have to do if the virus protection software fails to remove the malware).

So, let’s get started, right?  Not so fast.

While it may seem logical that the malware should always be removed, doing so is technically altering the file.  It’s important to address how malware should be handled as part of the Rule 26(f) “meet and confer” conference, so neither party can be accused of spoliating data when removing malware from potentially discoverable files.  If both sides agree that malware removal is acceptable, there still needs to be a provision to handle files for which malware removal attempts fail (i.e., exception logs).  Regardless, the malware needs to be addressed so that it doesn’t affect the entire collection.

By the way, malware can hit anybody, as I learned (the hard way) a couple of years ago.

So, what do you think?  How do you handle malware in your negotiations with opposing counsel and in your ESI collections?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.