Review Archives

eDiscovery Best Practices: The Number of Pages in Each Gigabyte Can Vary Widely

July 31, 2012

A while back, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs. But, did you know just how widely the number of pages per gigabyte can vary?

The “how many pages” question comes up a lot and I’ve seen a variety of answers. Michael Recker of Applied Discovery posted an article to their blog last week titled Just How Big Is a Gigabyte?, which provides some perspective based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file. Many emails include attachments, which can be in any of a number of different file formats. Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats. So, estimating page counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application. To illustrate this, I decided to conduct a little (admittedly unscientific) study using yesterday’s one page blog post about the Apple/Samsung litigation. I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same. Here are the results:

Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too). Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications. As a result, to estimate the number of pages in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well.

So, what do you think? Was this example useful or highly flawed? Or both? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Review Attorneys, Are You Smarter than a High Schooler?

July 26, 2012

Review attorneys are taking a beating these days. There’s so much attention being focused on technology assisted review, with the latest study noting the cost-effectiveness of technology assisted review (when compared to manual review) having just been released this month. There is also the very detailed and well known white paper study written by Maura Grossman and Gordon Cormack (Technology-Assisted Review in E-Discovery can be More Effective and More Efficient that Exhaustive Manual Review) which notes not only the cost-effectiveness of technology assisted review but also that it was actually more accurate.

The latest study, from information scientist William Webber (and discussed in this Law Technology News article by Ralph Losey) seems to indicate that trained reviewers don’t provide any better review accuracy than a pair of high schoolers that he selected with “no legal training, and no prior e-discovery experience, aside from assessing a few dozen documents for a different TREC topic as part of a trial experiment”. In fact, the two high schoolers did better! He also notes that “[t]hey worked independently and without supervision or correction, though one would be correct to describe them as careful and motivated.” His conclusion?

“The conclusion that can be reached, though, is that our assessors were able to achieve reliability (with or without detailed assessment guidelines) that is competitive with that of the professional reviewers — and also competitive with that of a commercial e-discovery vendor.”

Webber also cites two other studies with similar results and notes “All of this raises the question that is posed in the subject of this post: if (some) high school students are as reliable as (some) legally-trained, professional e-discovery reviewers, then is legal training a practical (as opposed to legal) requirement for reliable first-pass review for responsiveness? Or are care and general reading skills the more important factors?”

I have a couple of observations about the study. Keep in mind, I’m not an attorney (and don’t play one on TV), but I have worked with review teams on several projects and have observed the review process and how it has been conducted in a real world setting, so I do have some real-world basis for my thoughts:

Two high schoolers is not a significant sample size: I’ve worked on several projects where some reviewers are really productive and others are highly unproductive to the point of being useless. It’s difficult to determine a valid conclusion on the basis of two non-legal reviewers in his study and four non-legal reviewers in one of the studies that Webber cites.
Review is typically an iterative process: In my experience, most legal reviews that I’ve seen start with detailed instructions and training provided to the reviewers, followed up with regular (daily, if not more frequent) changes to instructions to reflect information gathered during the review process. Instructions are refined as the review commences and more information is learned about the document collection. Since Webber noted that “[t]hey worked independently and without supervision or correction”, it doesn’t appear that his review test was conducted in this manner. This makes it less of a real world scenario, in my opinion.

I also think some reviews especially benefit from a first pass review with legal trained reviewers (for example, a reviewer who understands intellectual property laws is going to understand potential IP issues better than someone who hasn’t had the training in IP law). Nonetheless, these studies are bound to “fan the flames” of debate regarding the effectiveness of manual attorney review (even more than they already are).

So, what do you think? Do you think his study is valid? Or do you have other concerns about the conclusions he has drawn? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Need to Catch Up on Trends Over the Last Six Weeks? Take a Time Capsule.

July 23, 2012

I try to set aside some time over the weekend to catch up on my reading and keep abreast of developments in the industry and although that’s sometimes that’s easier said than done, I stumbled across an interesting compilation of legal technology information from my friend Christy Burke and her team at Burke & Company. On Friday, Burke & Company released The Legal Technology Observer (LTO) Time Capsule on Legal IT Professionals. LTO was a 6 week concentrated collection of essays, articles, surveys and blog posts providing expert practical knowledge about legal technology, eDiscovery, and social media for legal professionals.

The content has been formatted into a PDF version and is available for free download here. As noted in their press release, Burke & Company's bloggers, including Christy, Melissa DiMercurio, Ada Spahija and Taylor Gould, as well as many distinguished guest contributors, set out to examine the trends, topics and perspectives that are driving today's legal technology world for 6 weeks from June 6 to July 12. They did so with help of many of the industry's most respected experts and LTO acquired more than 21,000 readers in just 6 weeks. Nice job!

The LTO Time Capsule covers a wide range of topics related to legal technology. There were several topics that have impact to eDiscovery, some of which included thought leaders previously interviewed on this blog (links to their our previous interviews with them below), including:

The EDRM Speaks My Language: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
Learning to Speak EDRM: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
Predictive Coding: Dozens of Names, No Definition, Lots of Controversy: Written by – Sharon D. Nelson, Esq. and John W. Simek.
Social Media 101 for Law Firms – Don’t Get Left Behind: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Kerry Scott Boll of JustEngage.
Results of Social Media 101 Snap-Poll: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC.
Getting up to Speed with eDiscovery: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Browning Marean, Senior Counsel at DLA Piper, San Diego.
LTO Interviews Craig Ball to Examine the Power of Computer Forensics: Written by – Melissa DiMercurio, Account Executive at Burke and Company LLC; Featuring – Expert Craig Ball, Trial Lawyer and Certified Computer Forensic Examiner.
LTO Asks Bob Ambrogi How a Lawyer Can Become a Legal Technology Expert: Written by – Melissa DiMercurio, Account Exectuive at Burke and Company LLC; Featuring – Bob Ambrogi, Practicing Lawyer, Writer and Media Consultant.
LTO Interviews Jeff Brandt about the Mysterious Cloud Computing Craze: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Jeff Brandt, Editor of PinHawk Law Technology Daily Digest.
Legal Technology Observer eDiscovery in America – A Legend in the Making: Written by – Christy Burke, President of Burke and Company LLC; Featuring – Barry Murphy, Analyst with the eDJ Group and Contributor to eDiscoveryJournal.com.
IT-Lex and the Sedona Conference® Provide Real Help to Learn eDiscovery and Technology Law: Written by – Christy Burke, President of Burke and Company LLC.

These are just some of the topics, particularly those that have an impact on eDiscovery. To check out the entire list of articles, click here to download the report.

So, what do you think? Do you need a quick resource to catch up on your reading? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Case Law: Judge Scheindlin Says “No” to Self-Collection, “Yes” to Predictive Coding

July 20, 2012

When most people think of the horrors of Friday the 13th, they think of Jason Voorhees. When US Immigration and Customs thinks of Friday the 13th horrors, do they think of Judge Shira Scheindlin?

As noted in Law Technology News (Judge Scheindlin Issues Strong Opinion on Custodian Self-Collection, written by Ralph Losey, a previous thought leader interviewee on this blog), New York District Judge Scheindlin issued a decision last Friday (July 13) addressing the adequacy of searching and self-collection by government entity custodians in response to Freedom of Information Act (FOIA) requests. As Losey notes, this is her fifth decision in National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al., including one that was later withdrawn.

Regarding the defendant’s question as to “why custodians could not be trusted to run effective searches of their own files, a skill that most office workers employ on a daily basis” (i.e., self-collect), Judge Scheindlin responded as follows:

“There are two answers to defendants' question. First, custodians cannot 'be trusted to run effective searches,' without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that 'contain reasonable specificity of detail rather than merely conclusory statements.' Defendants' counsel recognize that, for over twenty years, courts have required that these affidavits 'set [ ] forth the search terms and the type of search performed.' But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants' question has emerged from scholarship and caselaw only in recent years: most custodians cannot be 'trusted' to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: 'Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.' There is increasingly strong evidence that '[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.' As Judge Andrew Peck — one of this Court's experts in e-discovery — recently put it: 'In too many cases, however, the way lawyers choose keywords is the equivalent of the child's game of 'Go Fish' … keyword searches usually are not very effective.'”

Regarding search best practices and predictive coding, Judge Scheindlin noted:

“There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere. There is a 'need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or keywords to be used to produce emails or other electronically stored information.' And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.”

“Through iterative learning, these methods (known as 'computer-assisted' or 'predictive' coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies' unsupported assertions that their lay custodians have designed and conducted a reasonable search.”

Losey notes that “A classic analogy is that self-collection is equivalent to the fox guarding the hen house. With her latest opinion, Schiendlin [sic] includes the FBI and other agencies as foxes not to be trusted when it comes to searching their own email.”

So, what do you think? Will this become another landmark decision by Judge Scheindlin? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective

July 16, 2012

As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays). The overview of the 2011 TREC Legal Track can be found here.

The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.”

However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”

Other notable tidbits from the study and article:

Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
The document collection used was derived from the EDRM Enron Data Set;
The learning task had three distinct topics, each representing a distinct request for production. A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”. According to the LTN article, the barred participant was Recommind.

For more information, check out the links to the article and the study above. TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.

So, what do you think? Are you surprised by the results or are they expected? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: Quality Assurance vs. Quality Control and Why Both Are Important in eDiscovery

July 13, 2012

People tend to use the terms Quality Assurance (QA) and Quality Control (QC) interchangeably and it’s a pet peeve of mine. It’s like using the word “irregardless” – which isn’t really a word. The fact is that QA and QC are different mechanisms for ensuring quality in…anything. Products, processes and projects (as well as things that don’t begin with “pro”) are all examples of items that can benefit from quality ensuring mechanisms and those that are related to electronic discovery can particularly benefit.

First, let’s define terms

Quality Assurance (QA) can be defined as planned and systematic activities and mechanisms implemented so that quality requirements for a product or service will be fulfilled.

Quality Control, (QC) can be defined as one or more processes to review the quality of all factors involved in that product or service.

Now, let’s apply the terms to an example in eDiscovery

CloudNine Discovery’s flagship product is OnDemand®, which is an online eDiscovery review application. It’s easy to use and the leader in self-service, online eDiscovery review (sorry, I’m the marketing director, I can’t help myself).

OnDemand has a team of developers, who use a variety of Quality Assurance mechanisms to ensure the quality of the application. They include (but are not limited to):

Requirements meetings with stakeholders to ensure that all required functionality for each component is clearly defined;
Development team “huddles” to discuss progress and to learn from each other’s good development ideas;
Back end database and search engine that establish rules for data and searching that data (so, for example, the valid values for whether or not a document is responsive are “True” and “False” and not “Purple”) and;
Code management software to keep versions of development code to ensure the developers don’t overwrite each other’s work.

Quality Control mechanisms for OnDemand include:

Test plan creation to identify all functional areas of the application that need to be tested;
Rigorous testing of all functionality within each software release by a team of software testers;
Issue tracking software to track all problems found in testing that allows for assignment to responsible developers and tracking through to completion to address the issue and re-testing to confirm the issue has been adequately addressed;
Beta testing by selected clients interested in using the latest new features and willing to provide feedback as to how well those features work and how well they meet their needs.

These QA and QC mechanisms help ensure that OnDemand works correctly and that it provides the functionality required by our clients. And, we continue to work to make those mechanisms even more effective.

QA & QC mechanisms aren’t just limited to eDiscovery software. Take the process of conducting attorney review to determine responsiveness and privilege. QA mechanisms include instructions and background information provided to reviewers up front to get them up to speed on the review process, periodic “huddles” for additional instructions and discussion amongst reviewers to share best practices, assignment of “batches” so that each document is reviewed by one, and only one, reviewer and validation rules to ensure that entries are recorded correctly. QC mechanisms include a second review (usually by a review supervisor or senior attorney) to ensure that documents are being categorized correctly and metrics reports to ensure that the review team can meet deadlines while still conducting a thorough review. QA & QC mechanisms can also be applied to preservation, collection, searching and production (among other eDiscovery activities) and they are critical to enabling discovery obligations to be met.

So, what do you think? What QA & QC mechanisms do you use in your eDiscovery processes? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: eDiscovery Work is Growing in Law Firms and Corporations

July 12, 2012

There was an article in Law Technology News last Friday (Survey Shows Surge in E-Discovery Work at Law Firms and Corporations, written by Monica Bay) that discussed the findings of a survey released by The Cowen Group, indicating that eDiscovery work in law firms and corporations is growing considerably. Eighty-eight law firm and corporate law department professionals responded to the survey.

Some of the key findings:

70 percent of law firm respondents reported an increase in workload for their litigation support and eDiscovery departments (compared to 42 percent in the second quarter of 2009);
77 percent of corporate law department respondents reported an increase in workload for their litigation support and eDiscovery departments;
60 percent of respondents anticipate increasing their internal capabilities for eDiscovery;
55 percent of corporate and 62 percent of firm respondents said they "anticipate outsourcing a significant amount of eDiscovery to third-party providers” (some organizations expect to both increase internal capabilities and outsource);
50 percent of the firms believe they will increase technology speeding in the next three months (compared to 31 percent of firms in 2010);
43 percent of firms plan to add people to their litigation support and eDiscovery staff in the next 3 months, compared to 32 percent in 2011;
Noting that “corporate legal departments are under increasing pressure to ‘do more with less in-house to keep external costs down’”, only 12 percent of corporate respondents anticipate increasing headcount and 30 percent will increase their technology spend in the next six months;
In the past year, 49 percent of law firms and 23 percent of corporations have used Technology Assisted Review/ Predictive Coding technology through a third party service provider – an additional 38 percent have considered using it;
As for TAR/Predictive Coding inhouse, 30 percent of firms have an inhouse tool, and an additional 35 percent are considering making the investment.

As managing partner David Cowen notes, “Cases such as Da Silva Moore, Kleen, and Global Aerospace, which have hit our collective consciousness in the past three months, affect the investments in technology that both law firms and corporations are making.” He concludes the Executive Summary of the report with this advice: “Educate yourself on the latest evolving industry trends, invest in relationships, and be an active participant in helping your executives, your department, and your clients ‘do more with less’.”

So, what do you think? Do any of those numbers and trends surprise you? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: The Da Silva Moore Case Has Class (Certification, That Is)

July 11, 2012

As noted in an article written by Mark Hamblett in Law Technology News, Judge Andrew Carter of the U.S. District Court for the Southern District of New York has granted conditional class certification in the Da Silva Moore v. Publicis Groupe & MSL Group case.

In this case, women employees of the advertising conglomerate Publicis Groupe and its U.S. subsidiary, MSL, have accused their employer of company-wide discrimination, pregnancy discrimination, and a practice of keeping women at entry-level positions with few opportunities for promotion.

Judge Carter concluded that “Plaintiffs have met their burden by making a modest factual showing to demonstrate that they and potential plaintiffs together were victims of a common policy or plan that violated the law. They submit sufficient information that because of a common pay scale, they were paid wages lower than the wages paid to men for the performance of substantially equal work. The information also reveals that Plaintiffs had similar responsibilities as other professionals with the same title. Defendants may disagree with Plaintiffs' contentions, but the Court cannot hold Plaintiffs to a higher standard simply because it is an EPA action rather an action brought under the FLSA.”

“Courts have conditionally certified classes where the plaintiffs have different job functions,” Judge Carter noted, indicating that “[p]laintiffs have to make a mere showing that they are similarly situated to themselves and the potential opt-in members and Plaintiffs here have accomplished their goal.”

This is just the latest development in this test case for the use of computer-assisted coding to search electronic documents for responsive discovery. On February 24, Magistrate Judge Andrew J. Peck of the U.S. District Court for the Southern District of New York issued an opinion making it likely the first case to accept the use of computer-assisted review of electronically stored information (“ESI”) for this case. However, on March 13, District Court Judge Andrew L. Carter, Jr. granted plaintiffs’ request to submit additional briefing on their February 22 objections to the ruling. In that briefing (filed on March 26), the plaintiffs claimed that the protocol approved for predictive coding “risks failing to capture a staggering 65% of the relevant documents in this case” and questioned Judge Peck’s relationship with defense counsel and with the selected vendor for the case, Recommind.

Then, on April 5, Judge Peck issued an order in response to Plaintiffs’ letter requesting his recusal, directing plaintiffs to indicate whether they would file a formal motion for recusal or ask the Court to consider the letter as the motion. On April 13, (Friday the 13th, that is), the plaintiffs did just that, by formally requesting the recusal of Judge Peck (the defendants issued a response in opposition on April 30). But, on April 25, Judge Carter issued an opinion and order in the case, upholding Judge Peck’s opinion approving computer-assisted review.

Not done, the plaintiffs filed an objection on May 9 to Judge Peck's rejection of their request to stay discovery pending the resolution of outstanding motions and objections (including the recusal motion, which has yet to be ruled on. Then, on May 14, Judge Peck issued a stay, stopping defendant MSLGroup's production of electronically stored information. Finally, on June 15, Judge Peck, in a 56 page opinion and order, denied the plaintiffs’ motion for recusal.

So, what do you think? What will happen in this case next? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Best Practices: Types Of Metadata and How They Impact Discovery

July 10, 2012

If an electronic document is a “house” for information, then metadata could be considered the “deed” to that house. There is far more to explaining a house than simply the number of stories and the color of trim. It is the data that isn’t apparent to the naked eye that tells the rest of the story. For a house, the deed lines out the name of the buyer, the financier, and the closing date among heaps of other information that form the basis of the property. For an electronic document, it’s not just the content or formatting that holds the key to understanding it. Metadata, which is data about the document, contains information such as the user who created it, creation date, the edit history, and file type. Metadata often tells the rest of the story about the document and, therefore, is often a key focus of eDiscovery, such as in cases like this one we recently covered here.

There are many different types of metadata and it is important to understand each with regard to requesting that metadata in opposing counsel productions and being prepared to produce it in your own productions. Examples include:

Application Metadata: This is the data created by an application, such as Microsoft® Word, that pertains to the ESI (“Electronically Stored Information”) being addressed. It is embedded in the file and moves with it when copied, though copying may alter the application metadata.
Document Metadata: These are properties about a document that may not be viewable within the application that created it, but can often be seen through a “Properties” view (for example, Word tracks the author name and total editing time).
Email Metadata: Data about the email. Sometimes, this metadata may not be immediately apparent within the email application that created it (e.g., date and time received). The amount of email metadata available varies depending on the email system utilized. For example, Outlook has a metadata field that links messages in a thread together which can facilitate review – not all email applications have this data.
Embedded Metadata: This metadata is usually hidden; however, it can be a vitally important part of the ESI. Examples of embedded metadata are edit history or notes in a presentation file. These may only be viewable in the original, native file since it is not always extracted during processing and conversion for eDiscovery.
File System Metadata: Data generated by the file system, such as Windows, to track key statistics about the file (e.g., name, size, location, etc.) which is usually stored externally from the file itself.
User-Added Metadata: Data created by a user while working with, reviewing, or copying a file (such as notes or tracked changes).
Vendor-Added Metadata: Data created and maintained by an eDiscovery vendor during processing of the native document. Don’t be alarmed, it’s impossible to work with some file types without generating some metadata; for example, you can’t review and produce individual emails within a custodian’s Outlook PST file without generating those out as separate emails (either in Outlook MSG format or converted to an image format, such as TIFF or PDF).

Some metadata, such as user-added tracked changes or notes, could be work product that may affect whether a document is responsive or contains privileged information, so it’s important to consider that metadata during review, especially when producing in native format.

So, what do you think? Have you been involved in cases where metadata was specifically requested as part of discovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: First Pass Review – Domain Categorization of Your Opponent’s Data

July 6, 2012

Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet) So chances are, you haven’t seen these posts yet! Enjoy!

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass®, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through “fuzzy” searching to find misspellings or OCR errors in an opponent’s produced ESI.

Domain Categorization

Another type of analysis is the use of domain categorization. Email is generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

FirstPass supports domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection. Domain categorization provides several benefits when reviewing your opponent’s ESI:

Non-Responsive Produced ESI: Domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains can be “group-tagged” as non-responsive. If a significant percentage of files are identified as non-responsive, that may be a sign that your opponent is trying to “bury you with paper” (albeit electronic).
Inadvertent Disclosures: If there are any emails associated with outside counsel’s domain, they could be inadvertent disclosures of attorney work product or attorney-client privileged communications. If so, you can then address those according to the agreed-upon process for handling inadvertent disclosures and clawback of same.
Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly.

In summary, there are several ways to use first pass review tools, like FirstPass, for reviewing your opponent’s ESI production, including: email analytics, synonym searching, fuzzy searching and domain categorization. First pass review isn’t just for your own production; it’s also an effective process to quickly evaluate your opponent’s production.

So, what do you think? Have you used first pass review tools to assess an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Review