Review Archives

How Big is Your ESI Collection, Really? – eDiscovery Best Practices

August 26, 2013

When I was at ILTA last week, this topic came up in a discussion with a colleague during the show, so I thought it would be good to revisit here.

After identifying custodians relevant to the case and collecting files from each, you’ve collected roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose electronic files from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! Are they trying to overbill you?

Yes and no.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files. For example, while Outlook emails can be stored in different file formats, they are typically collected from each custodian and saved in a personal storage (.PST) file format, which is an expanding container file. The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space? The PST file is like one of those vacuum bags – it often stores the emails and attachments in a compressed format to save space. There are other types of archive container files that compress the contents – .ZIP and .RAR files are two examples of compressed container files. These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, often in email. With email comprising a major portion of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.

When PST, ZIP, RAR or other compressed file formats are processed for loading into a review tool, they are expanded into their normal size. This expanded size can be 1.5 to 2 times larger than the scanned size (or more). And, that’s what some vendors will bill processing on – the expanded size. In those cases, you won’t know what the processing costs will be until the data is expanded since it’s difficult to determine until processing is complete.

It’s important to be prepared for that and know your options when processing that data. Make sure your vendor selection criteria includes questions about how processing is billed, on the scanned or expanded size. Some vendors (like the company I work for, CloudNine Discovery), do bill based on the scanned size of the collection for processing, so shop around to make sure you’re getting the best deal from your vendor.

So, what do you think? Have you ever been surprised by processing costs of your ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

A Technical Explanation of Near-Dupes – eDiscovery Tutorial

August 9, 2013

Bill Dimm provides a comprehensive and interesting description of near-dupes and the algorithms used to identify them in his Clustify blog (What is a near-dupe, really?). If you want to understand the “three reasonable, but different, ways of defining the near-dupe similarity between two documents”, bring your brain and check it out.

As we discussed last month, just because information volume in most organizations doubles every 18-24 months doesn’t mean that it’s all original. When reviewers are reviewing the same data again and again, it’s unnecessarily expensive and prone to mistakes.

As Bill notes in his post, “Near-duplicates are documents that are nearly, but not exactly, the same. They could be different revisions of a memo where a few typos were fixed or a few sentences were added. They could be an original email and a reply that quotes the original and adds a few sentences. They could be a Microsoft Word document and a printout of the same document that was scanned and OCRed with a few words not matching due to OCR errors.” I also classify examples such as a Word document published to an Adobe PDF file (where the content is the same, but the file format is different, so the hash value will be different) as near-duplicates because they won’t be de-duped with an MD5 or SHA-1 hash algorithm at the file level. You need an algorithm that looks for similarity in the document content.

Identifying near-duplicates that contain almost the same information reduces redundant review and saves costs. A recent client of mine had over 800,000 emails belonging to near-duplicate groupings that would have been impossible to identify without an effective algorithm to group them together.

Bill’s blog post goes on to discuss different methods for measuring similarity using mechanisms like a Jaccard index and a MinHash algorithm which counts shingles (don’t worry, they’re neither painful nor scaly). Understanding how your near-dupe software works is important. As Bill notes, “If misunderstandings about how the algorithm works cause the similarity values generated by the software to be higher than you expected when you chose the similarity threshold, you risk tagging near-dupes of non-responsive documents incorrectly (grouped documents are not as similar as you expected). If the similarity values are lower than you expected when you chose the threshold, you risk failing to group some highly similar documents together, which leads to less efficient review (extra groups to review).” His post is an excellent primer to developing that understanding.

So, what do you think? Do you have a plan for handling near-duplicates in your collection? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Data May Be Doubling Every Couple of Years, But How Much of it is Original? – eDiscovery Best Practices

July 31, 2013

According to the Compliance, Governance and Oversight Council (CGOC), information volume in most organizations doubles every 18-24 months. However, just because it doubles doesn’t mean that it’s all original. Like a bad cover band singing Free Bird, the rendition may be unique, but the content is the same. The key is limiting review to unique content.

When reviewers are reviewing the same files again and again, it not only drives up costs unnecessarily, but it could also lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, we all know the importance of identifying exact duplicates (that contain the exact same content in the same file format) which can be identified through MD5 and SHA-1 hash values, so that they can be removed from the review population and save considerable review costs.

Identifying near duplicates that contain the same (or almost the same) information (such as a Word document published to an Adobe PDF file where the content is the same, but the file format is different, so the hash value will be different) also reduces redundant review and saves costs.

Then, there is message thread analysis. Many email messages are part of a larger discussion, sometimes just between two parties, and, other times, between a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Pulling those messages together and enabling them to be reviewed as an entire discussion can eliminate that redundant review. That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about the latest misstep by Anthony Weiner).

Clustering is a process which pulls similar documents together based on content so that the duplicative information can be identified more quickly and eliminated to reduce redundancy. With clustering, you can minimize review of duplicative information within documents and emails, saving time and cost and ensuring consistency in the review. As a result, even if the data in your organization doubles every couple of years, the cost of your review shouldn’t.

So, what do you think? Does your review tool support clustering technology to pull similar content together for review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Good Processing Requires a Sound Process – eDiscovery Best Practices

July 26, 2013

As we discussed yesterday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started. Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading. To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production. As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing. Examples of information we collect from our clients:

Do you need de-duplication? If so, should it performed at the case or the custodian level?
Should Outlook emails be extracted in MSG or HTM format?
What time zone should we use for email extraction? Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT). If you don’t think that matters, consider this example.
Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text? If we don’t OCR those files, these could be responsive files that are missed during searching.
If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
Should the collection be culled based on a responsive date range?
Should the collection be culled based on key terms?

Those are some general examples for native processing. If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing. Some examples:

Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
Should color images be created when appropriate?
Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
Should we create images of Excel files? If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
Should we endorse the images? If so, how?

Those are just some examples. Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use. Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data. Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think? How do you handle processing of electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – eDiscovery Best Practices

July 25, 2013

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started. Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited. Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.). It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is. They don’t just represent one problem, they can represent a whole collection of problems. For example:

Image only electronic files such as TIFF or image-only PDF files may be electronic, but they still have no searchable text. They still require OCR to generate searchable text to enable them to be effectively searched. It’s important to account for image-only files when self-collecting as keyword searches will miss these files.
Outlook Emails are typically stored in a “container” file like an EDB (Exchange Database), OST (Outlook Offline Storage Table) or PST (Outlook Personal Storage Table). To work with the emails individually, they typically require processing to break them out into individual MSG (Outlook MSG Files). That processing is also necessary to break out the attachments from the emails so that they can be reviewed or categorized individually, if required. And, if the emails are stored in Lotus Notes, there is no equivalent single message format, so those emails generally require conversion to HTML format during processing.
Databases are large, structured collections of data, but they don’t relate easily to a document format, so they require some analysis to determine if, and in what form, they should be produced.
In almost every collection, there are some files that cannot be processed or searched. Corrupt files, password protected files and other types of exception files are frequent components of your ESI collection and it can become very expensive to make these files searchable or reviewable.

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward. Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom. They’re way more useful than paper documents. So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available. But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think? Have you encountered difficulties or challenges when processing electronic files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Plaintiffs Take the Supreme Step in Da Silva Moore – eDiscovery Case Law

July 12, 2013

As mentioned in Law Technology News (‘Da Silva Moore’ Goes to Washington), attorneys representing lead plaintiff Monique Da Silva Moore and five other employees have filed a petition for certiorari filed with the Supreme Court arguing that New York Magistrate Judge Andrew Peck, who approved an eDiscovery protocol agreed to by the parties that included predictive coding technology, should have recused himself given his previous public statements expressing strong support of predictive coding.

Da Silva Moore and her co-plaintiffs argued in the petition that the Second Circuit Court of Appeals was too deferential to Peck when denying the plaintiff’s petition to recuse him, asking the Supreme Court to order the Second Circuit to use the less deferential “de novo” standard. As noted in the LTN article:

“The employees also cited a circuit split in how appellate courts reviewed judicial recusals, pointing out that the Seventh Circuit reviews disqualification motions de novo. Besides resolving the circuit split, the employees asked the Supreme Court to find that the Second Circuit’s standard was incorrect under the law. Citing federal statute governing judicial recusals, the employees claimed that the law required motions for disqualification to be reviewed objectively and that a deferential standard flew in the face of statutory intent. “Rather than dispelling the appearance of a self-serving judiciary, deferential review exacerbates the appearance of impropriety that arises from judges deciding their own cases and thus undermines the purposes of [the statute],” wrote the employees in their cert petition.”

This battle over predictive coding and Judge Peck’s participation has continued for 15 months. For a recap of the events during that time, click here.

So, what do you think? Is this a “hail mary” for the plaintiffs and will it succeed? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

“Not Me”, The Fallibility of Human Review – eDiscovery Best Practices

June 20, 2013

When I talk with attorneys about using technology to assist with review (whether via techniques such as predictive coding or merely advanced searching and culling mechanisms), most of them still seem to question whether these techniques can measure up to good, old-fashioned human attorney review. Despite several studies that question the accuracy of human review, many attorneys still feel that their review capability is as good or better than technical approaches. Here is perhaps the best explanation I’ve seen yet why that may not be the case.

In Craig Ball’s latest blog post on his Ball in Your Court blog (The ‘Not Me’ Factor), Craig provides a terrific explanation as to why predictive coding is “every bit as good (and actually much, much better) at dealing with the overwhelming majority of documents that don’t require careful judgment—the very ones where keyword search and human reviewers fail miserably.”

“It turns out that well-designed and –trained software also has little difficulty distinguishing the obviously relevant from the obviously irrelevant. And, again, there are many, many more of these clear cut cases in a collection than ones requiring judgment calls.

So, for the vast majority of documents in a collection, the machines are every bit as capable as human reviewers. A tie. But giving the extra point to humans as better at the judgment call documents, HUMANS WIN! Yeah! GO HUMANS! Except….

Except, the machines work much faster and much cheaper than humans, and it turns out that there really is something humans do much, much better than machines: they screw up.

The biggest problem with human reviewers isn’t that they can’t tell the difference between relevant and irrelevant documents; it’s that they often don’t. Human reviewers make inexplicable choices and transient, unwarranted assumptions. Their minds wander. Brains go on autopilot. They lose their place. They check the wrong box. There are many ways for human reviewers to err and just one way to perform correctly.

The incidence of error and inconsistent assessments among human reviewers is mind boggling. It’s unbelievable. And therein lays the problem: it’s unbelievable. People I talk to about reviewer error might accept that some nameless, faceless contract reviewer blows the call with regularity, but they can’t accept that potential in themselves. ‘Not me,’ they think, ‘If I were doing the review, I’d be as good as or better than the machines.’ It’s the ‘Not Me’ Factor.”

While Craig acknowledges that “there is some cause to believe that the best trained reviewers on the best managed review teams get very close to the performance of technology-assisted review”, he notes that they “can only achieve the same result by reviewing all of the documents in the collection, instead of the 2%-5% of the collection needed to be reviewed using predictive coding”. He asks “[i]f human review isn’t better (and it appears to generally be far worse) and predictive coding costs much less and takes less time, where’s the rational argument for human review?”

Good question. Having worked with some large review teams with experienced and proficient document reviewers at an eDiscovery provider that employed a follow-up QC check of reviewed documents, I can still recall how often those well-trained reviewers were surprised at some of the classification mistakes they made. And, I worked on one project with over a hundred reviewers working several months, so you can imagine how expensive that was.

BTW, Craig is no stranger to this blog – in addition to several of his articles we’ve referenced, we’ve also conducted thought leader interviews with him at LegalTech New York the past three years. Here’s a link if you want to check those out.

So, what do you think? Do you think human review is better than technology assisted review? If so, why? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Motion to Compel Dismissed after Defendant Agrees to Conditional Meet and Confer – eDiscovery Case Law

June 11, 2013

In Gordon v. Kaleida Health, No. 08-CV-378S(F) (W.D.N.Y. May 21, 2013), New York Magistrate Judge Leslie G. Foschio dismissed (without prejudice) the plaintiffs’ motion to compel the defendant to meet and confer to establish an agreed protocol for implementing the use of predictive coding software after the defendants stated that they were prepared to meet and confer with the plaintiffs and their non-disqualified ESI consultants regarding the defendants’ predictive coding process.

For over a year, the parties unsuccessfully attempted to agree on how to achieve a cost-effective review of the defendants’ 200,000 to 300,000 emails using a keyword search methodology. Eventually, in June 2012, the court expressed dissatisfaction with the parties’ lack of progress toward resolving the issues and pointed to the availability of predictive coding, citing its approval in Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Feb. 24, 2012) (much more on that case here).

In a September 2012 email, after informing the plaintiffs that they intended to use predictive coding, the defendants objected to the plaintiffs’ ESI consultants participating in discussions with Defendants relating to the use of predictive coding and establishing a protocol. Later that month, despite the plaintiffs’ requests for discussion of numerous search issues to ensure a successful predictive coding outcome, the defendants sent their ESI protocol to the plaintiffs and indicated they would also send a list of their email custodians to the plaintiffs. In October 2012, the plaintiffs objected to the defendants’ proposed ESI protocol and filed the motion to compel, also citing Da Silva Moore and noting several technical issues “which should be discussed with the assistance of Plaintiffs’ ESI consultants and cooperatively resolved by the parties”.

Complaining that the defendants refused to discuss issues other than the defendants’ custodians, the plaintiffs claimed that “the defendants’ position excludes Plaintiffs’ access to important information regarding Defendants’ selection of so-called ‘seed set documents’ which are used to ‘train the computer’ in the predictive coding search method. The defendants responded, indicating they had no objection to a meet and confer with the plaintiffs and their consultants, except for those consultants that were the subject of the defendants’ motion to disqualify (because they had previously provided services to the defendants in the case). With regard to sharing seed set document information, the defendants stated that “courts do not order parties in ESI discovery disputes to agree to specific protocols to facilitate a computer-based review of ESI based on the general rule that ESI production is within the ‘sound discretion’ of the producing party” and noted that the defendants in Da Silva Moore weren’t required to provide the plaintiffs with their seed set documents, but volunteered to do so.

Because the defendants stated that “they are prepared to meet and confer with Plaintiffs and Plaintiffs’ ESI consultants, who are not disqualified”, Judge Foschio ruled that “it is not necessary for the court to further address the merits of Plaintiffs’ motion at this time” and dismissed the motion without prejudice. It will be interesting to see if the parties can ultimately agree on sharing the protocol or if the question regarding sharing information about seed set documents will come back before the court.

So, what do you think? Should producing parties be required to share information regarding selection of seed set documents? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Important Considerations when Negotiating Search Terms with Opposing Counsel – eDiscovery Best Practices

June 6, 2013

Negotiating search terms with opposing counsel has become commonplace to agree on the scope of discovery. However, when you negotiate terms with the other side, you could be agreeing to produce more than you think. Craig Ball’s latest article in Law Technology News discusses the issues and tries to answer the question: Are Keywords Just Filters?

Many attorneys still consider attorney eyes-on linear review as the final step to decide relevance of the document collection, but Craig notes that “requesting parties frequently believe that by agreeing to the use of a set of keywords as a proxy for attorney review, those agreed searches serve as a de facto request for production and define responsiveness per se, requiring production if not privileged.”

While producing parties may object to keyword search as a proxy for attorney review, Craig notes that “there’s sufficient ambiguity surrounding the issue to prompt prudent counsel to address the point explicitly when negotiating keyword search protocols and drafting memorializing agreements.”

Craig states what more and more people have come to accept, “Objective culling, keyword search, and emerging technologies such as predictive coding make clear that the idealized view of counsel as ultimate arbiter of relevance is mostly myth.” We discussed a study regarding the reliability of review attorneys in a post here. “Consequently, as more parties forge detailed agreements establishing objective evidentiary identifiers such as dates, sources, custodians, circulation, data types, and lexical content, litigants and courts grow impatient with the cost and time required for attorney review and reluctant to give it deference.”

Craig’s article discusses the issue in greater depth and even provides a couple of examples of agreed upon language – one where keyword search would be considered as a filter for attorney review, the other where it would be considered as a replacement for review. His advice to producing parties: “In effect, requesting parties regard an agreement to use queries as an agreement to treat those queries as requests for production. Producing parties who reject this thinking would nevertheless be wise to plan for opponents (and judges) who embrace it.”

It’s a terrific article and I don’t want to steal all his thunder, so click here to check it out.

BTW, Craig is no stranger to this blog – in addition to several of his articles we’ve referenced, we’ve also conducted thought leader interviews with him at LegalTech New York the past three years. Here’s a link if you want to check those out.

So, what do you think? Do you negotiate search terms with opposing counsel? If so, do you use the terms as a filter or a proxy for attorney review? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

200,000 Visits on eDiscovery Daily! – eDiscovery Milestones

June 3, 2013

While we may be “just a bit behind” Google in popularity (900 million visits per month), we’re proud to announce that yesterday eDiscoveryDaily reached the 200,000 visit milestone! It took us a little over 21 months to reach 100,000 visits and just over 11 months to get to 200,000 (don’t tell my boss, he’ll expect 300,000 in 5 1/2 months). When we reach key milestones, we like to take a look back at some of the recent stories we’ve covered, so here are some recent eDiscovery items of interest.

EDRM Data Set “Controversy”: Including last Friday, we have covered the discussion related to the presence of personally-identifiable information (PII) data (including social security numbers, credit card numbers, dates of birth, home addresses and phone numbers) within the Electronic Discovery Reference Model (EDRM) Enron Data Set and the “controversy” regarding the effort to clean it up (additional posts here and here).

Minnesota Implements Changes to eDiscovery Rules: States continue to be busy with changes to eDiscovery rules. One such state is Minnesota, which has amending its rules to emphasize proportionality, collaboration, and informality in the discovery process.

Changes to Federal eDiscovery Rules Could Be Coming Within a Year: Another major set of amendments to the discovery provisions of the Federal Rules of Civil Procedure is getting closer and could be adopted within the year. The United States Courts’ Advisory Committee on Civil Rules voted in April to send a slate of proposed amendments up the rulemaking chain, to its Standing Committee on Rules of Practice and Procedure, with a recommendation that the proposals be approved for publication and public comment later this year.

I Tell Ya, Information Governance Gets No Respect: A new report from 451 Research has indicated that “although lawyers are bullish about the prospects of information governance to reduce litigation risks, executives, and staff of small and midsize businesses, are bearish and ‘may not be placing a high priority’ on the legal and regulatory needs for litigation or government investigation.”

Is it Time to Ditch the Per Hour Model for Document Review?: Some of the recent stories involving alleged overbilling by law firms for legal work – much of it for document review – begs the question whether it’s time to ditch the per hour model for document review in place of a per document rate for review?

Fulbright’s Litigation Trends Survey Shows Increased Litigation, Mobile Device Collection: According to Fulbright’s 9th Annual Litigation Trends Survey released last month, companies in the United States and United Kingdom continue to deal with, and spend more on litigation. From an eDiscovery standpoint, the survey showed an increase in requirements to preserve and collect data from employee mobile devices, a high reliance on self-preservation to fulfill preservation obligations and a decent percentage of organizations using technology assisted review.

We also covered Craig Ball’s Eight Tips to Quash the Cost of E-Discovery (here and here) and interviewed Adam Losey, the editor of IT-Lex.org (here and here).

Jane Gennarelli has continued her terrific series on Litigation 101 for eDiscovery Tech Professionals – 32 posts so far, here is the latest.

We’ve also had 15 posts about case law, just in the last 2 months (and 214 overall!). Here is a link to our case law posts.

On behalf of everyone at CloudNine Discovery who has worked on the blog over the last 32+ months, thanks to all of you who read the blog every day! In addition, thanks to the other publications that have picked up and either linked to or republished our posts! We really appreciate the support! Now, on to 300,000!

And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review