Analysis

eDiscovery Best Practices: The Number of Pages in Each Gigabyte Can Vary Widely

 

A while back, we talked about how the average number of pages in each gigabyte is approximately 50,000 to 75,000 pages and that each gigabyte effectively culled out can save $18,750 in review costs.  But, did you know just how widely the number of pages per gigabyte can vary?

The “how many pages” question comes up a lot and I’ve seen a variety of answers.  Michael Recker of Applied Discovery posted an article to their blog last week titled Just How Big Is a Gigabyte?, which provides some perspective based on the types of files contained within the gigabyte, as follows:

“For example, e-mail files typically average 100,099 pages per gigabyte, while Microsoft Word files typically average 64,782 pages per gigabyte. Text files, on average, consist of a whopping 677,963 pages per gigabyte. At the opposite end of the spectrum, the average gigabyte of images contains 15,477 pages; the average gigabyte of PowerPoint slides typically includes 17,552 pages.”

Of course, each GB of data is rarely just one type of file.  Many emails include attachments, which can be in any of a number of different file formats.  Collections of files from hard drives may include Word, Excel, PowerPoint, Adobe PDF and other file formats.  So, estimating page counts with any degree of precision is somewhat difficult.

In fact, the same exact content ported into different applications can be a different size in each file, due to the overhead required by each application.  To illustrate this, I decided to conduct a little (admittedly unscientific) study using yesterday’s one page blog post about the Apple/Samsung litigation.  I decided to put the content from that page into several different file formats to illustrate how much the size can vary, even when the content is essentially the same.  Here are the results:

  • Text File Format (TXT): Created by performing a “Save As” on the web page for the blog post to text – 10 KB;
  • HyperText Markup Language (HTML): Created by performing a “Save As” on the web page for the blog post to HTML – 36 KB, over 3.5 times larger than the text file;
  • Microsoft Excel 2010 Format (XLSX): Created by copying the contents of the blog post and pasting it into a blank Excel workbook – 128 KB, nearly 13 times larger than the text file;
  • Microsoft Word 2010 Format (DOCX): Created by copying the contents of the blog post and pasting it into a blank Word document – 162 KB, over 16 times larger than the text file;
  • Adobe PDF Format (PDF): Created by printing the blog post to PDF file using the CutePDF printer driver – 211 KB, over 21 times larger than the text file;
  • Microsoft Outlook 2010 Message Format (MSG): Created by copying the contents of the blog post and pasting it into a blank Outlook message, then sending that message to myself, then saving the message out to my hard drive – 221 KB, over 22 times larger than the text file.

The Outlook example was probably the least representative of a typical email – most emails don’t have several embedded graphics in them (with the exception of signature logos) – and most are typically much shorter than yesterday’s blog post (which also included the side text on the page as I copied that too).  Still, the example hopefully illustrates that a “page”, even with the same exact content, will be different sizes in different applications.  As a result, to estimate the number of pages in a collection with any degree of accuracy, it’s not only important to understand the size of the data collection, but also the makeup of the collection as well.

So, what do you think?  Was this example useful or highly flawed?  Or both?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: Need to Catch Up on Trends Over the Last Six Weeks? Take a Time Capsule.

 

I try to set aside some time over the weekend to catch up on my reading and keep abreast of developments in the industry and although that’s sometimes that’s easier said than done, I stumbled across an interesting compilation of legal technology information from my friend Christy Burke and her team at Burke & Company.  On Friday, Burke & Company released The Legal Technology Observer (LTO) Time Capsule on Legal IT Professionals. LTO was a 6 week concentrated collection of essays, articles, surveys and blog posts providing expert practical knowledge about legal technology, eDiscovery, and social media for legal professionals.

The content has been formatted into a PDF version and is available for free download here.  As noted in their press release, Burke & Company's bloggers, including Christy, Melissa DiMercurio, Ada Spahija and Taylor Gould, as well as many distinguished guest contributors, set out to examine the trends, topics and perspectives that are driving today's legal technology world for 6 weeks from June 6 to July 12. They did so with help of many of the industry's most respected experts and LTO acquired more than 21,000 readers in just 6 weeks.  Nice job!

The LTO Time Capsule covers a wide range of topics related to legal technology.  There were several topics that have impact to eDiscovery, some of which included thought leaders previously interviewed on this blog (links to their our previous interviews with them below), including:

  • The EDRM Speaks My Language: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Learning to Speak EDRM: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Experts George Socha and Tom Gelbmann.
  • Predictive Coding: Dozens of Names, No Definition, Lots of Controversy: Written by – Sharon D. Nelson, Esq. and John W. Simek.
  • Social Media 101 for Law Firms – Don’t Get Left Behind: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC; Featuring – Kerry Scott Boll of JustEngage.
  • Results of Social Media 101 Snap-Poll: Written by – Ada Spahija, Communications Specialist at Burke and Company LLC.
  • Getting up to Speed with eDiscovery: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Browning Marean, Senior Counsel at DLA Piper, San Diego.
  • LTO Interviews Craig Ball to Examine the Power of Computer Forensics: Written by – Melissa DiMercurio, Account Executive at Burke and Company LLC; Featuring – Expert Craig Ball, Trial Lawyer and Certified Computer Forensic Examiner.
  • LTO Asks Bob Ambrogi How a Lawyer Can Become a Legal Technology Expert: Written by – Melissa DiMercurio, Account Exectuive at Burke and Company LLC; Featuring – Bob Ambrogi, Practicing Lawyer, Writer and Media Consultant.
  • LTO Interviews Jeff Brandt about the Mysterious Cloud Computing Craze: Written by – Taylor Gould, Communications Intern at Burke and Company LLC; Featuring – Jeff Brandt, Editor of PinHawk Law Technology Daily Digest.
  • Legal Technology Observer eDiscovery in America – A Legend in the Making: Written by – Christy Burke, President of Burke and Company LLC; Featuring – Barry Murphy, Analyst with the eDJ Group and Contributor to eDiscoveryJournal.com.
  • IT-Lex and the Sedona Conference® Provide Real Help to Learn eDiscovery and Technology Law: Written by – Christy Burke, President of Burke and Company LLC.

These are just some of the topics, particularly those that have an impact on eDiscovery.  To check out the entire list of articles, click here to download the report.

So, what do you think?  Do you need a quick resource to catch up on your reading?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Case Law: Judge Scheindlin Says “No” to Self-Collection, “Yes” to Predictive Coding

 

When most people think of the horrors of Friday the 13th, they think of Jason Voorhees.  When US Immigration and Customs thinks of Friday the 13th horrors, do they think of Judge Shira Scheindlin?

As noted in Law Technology News (Judge Scheindlin Issues Strong Opinion on Custodian Self-Collection, written by Ralph Losey, a previous thought leader interviewee on this blog), New York District Judge Scheindlin issued a decision last Friday (July 13) addressing the adequacy of searching and self-collection by government entity custodians in response to Freedom of Information Act (FOIA) requests.  As Losey notes, this is her fifth decision in National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al., including one that was later withdrawn.

Regarding the defendant’s question as to “why custodians could not be trusted to run effective searches of their own files, a skill that most office workers employ on a daily basis” (i.e., self-collect), Judge Scheindlin responded as follows:

“There are two answers to defendants' question. First, custodians cannot 'be trusted to run effective searches,' without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that 'contain reasonable specificity of detail rather than merely conclusory statements.' Defendants' counsel recognize that, for over twenty years, courts have required that these affidavits 'set [ ] forth the search terms and the type of search performed.' But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants' question has emerged from scholarship and caselaw only in recent years: most custodians cannot be 'trusted' to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: 'Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.' There is increasingly strong evidence that '[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.' As Judge Andrew Peck — one of this Court's experts in e-discovery — recently put it: 'In too many cases, however, the way lawyers choose keywords is the equivalent of the child's game of 'Go Fish' … keyword searches usually are not very effective.'”

Regarding search best practices and predictive coding, Judge Scheindlin noted:

“There are emerging best practices for dealing with these shortcomings and they are explained in detail elsewhere. There is a 'need for careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or keywords to be used to produce emails or other electronically stored information.' And beyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents.”

“Through iterative learning, these methods (known as 'computer-assisted' or 'predictive' coding) allow humans to teach computers what documents are and are not responsive to a particular FOIA or discovery request and they can significantly increase the effectiveness and efficiency of searches. In short, a review of the literature makes it abundantly clear that a court cannot simply trust the defendant agencies' unsupported assertions that their lay custodians have designed and conducted a reasonable search.”

Losey notes that “A classic analogy is that self-collection is equivalent to the fox guarding the hen house. With her latest opinion, Schiendlin [sic] includes the FBI and other agencies as foxes not to be trusted when it comes to searching their own email.”

So, what do you think?  Will this become another landmark decision by Judge Scheindlin?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective

 

As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays).  The overview of the 2011 TREC Legal Track can be found here.

The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.” 

However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”

Other notable tidbits from the study and article:

  • Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
  • Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
  • The document collection used was derived from the EDRM Enron Data Set;
  • The learning task had three distinct topics, each representing a distinct request for production.  A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
  • OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
  • One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”.  According to the LTN article, the barred participant was Recommind.

For more information, check out the links to the article and the study above.  TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.

So, what do you think?  Are you surprised by the results or are they expected?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: eDiscovery Work is Growing in Law Firms and Corporations

 

There was an article in Law Technology News last Friday (Survey Shows Surge in E-Discovery Work at Law Firms and Corporations, written by Monica Bay) that discussed the findings of a survey released by The Cowen Group, indicating that eDiscovery work in law firms and corporations is growing considerably.  Eighty-eight law firm and corporate law department professionals responded to the survey.

Some of the key findings:

  • 70 percent of law firm respondents reported an increase in workload for their litigation support and eDiscovery departments (compared to 42 percent in the second quarter of 2009);
  • 77 percent of corporate law department respondents reported an increase in workload for their litigation support and eDiscovery departments;
  • 60 percent of respondents anticipate increasing their internal capabilities for eDiscovery;
  • 55 percent of corporate and 62 percent of firm respondents said they "anticipate outsourcing a significant amount of eDiscovery to third-party providers” (some organizations expect to both increase internal capabilities and outsource);
  • 50 percent of the firms believe they will increase technology speeding in the next three months (compared to 31 percent of firms in 2010);
  • 43 percent of firms plan to add people to their litigation support and eDiscovery staff in the next 3 months, compared to 32 percent in 2011;
  • Noting that “corporate legal departments are under increasing pressure to ‘do more with less in-house to keep external costs down’”, only 12 percent of corporate respondents anticipate increasing headcount and 30 percent will increase their technology spend in the next six months;
  • In the past year, 49 percent of law firms and 23 percent of corporations have used Technology Assisted Review/ Predictive Coding technology through a third party service provider – an additional 38 percent have considered using it;
  • As for TAR/Predictive Coding inhouse, 30 percent of firms have an inhouse tool, and an additional 35 percent are considering making the investment.

As managing partner David Cowen notes, “Cases such as Da Silva Moore, Kleen, and Global Aerospace, which have hit our collective consciousness in the past three months, affect the investments in technology that both law firms and corporations are making.”  He concludes the Executive Summary of the report with this advice: “Educate yourself on the latest evolving industry trends, invest in relationships, and be an active participant in helping your executives, your department, and your clients ‘do more with less’.”

So, what do you think?  Do any of those numbers and trends surprise you?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: The Da Silva Moore Case Has Class (Certification, That Is)

 

As noted in an article written by Mark Hamblett in Law Technology News, Judge Andrew Carter of the U.S. District Court for the Southern District of New York has granted conditional class certification in the Da Silva Moore v. Publicis Groupe & MSL Group case.

In this case, women employees of the advertising conglomerate Publicis Groupe and its U.S. subsidiary, MSL, have accused their employer of company-wide discrimination, pregnancy discrimination, and a practice of keeping women at entry-level positions with few opportunities for promotion.

Judge Carter concluded that “Plaintiffs have met their burden by making a modest factual showing to demonstrate that they and potential plaintiffs together were victims of a common policy or plan that violated the law. They submit sufficient information that because of a common pay scale, they were paid wages lower than the wages paid to men for the performance of substantially equal work. The information also reveals that Plaintiffs had similar responsibilities as other professionals with the same title. Defendants may disagree with Plaintiffs' contentions, but the Court cannot hold Plaintiffs to a higher standard simply because it is an EPA action rather an action brought under the FLSA.”

“Courts have conditionally certified classes where the plaintiffs have different job functions,” Judge Carter noted, indicating that “[p]laintiffs have to make a mere showing that they are similarly situated to themselves and the potential opt-in members and Plaintiffs here have accomplished their goal.”

This is just the latest development in this test case for the use of computer-assisted coding to search electronic documents for responsive discovery. On February 24, Magistrate Judge Andrew J. Peck of the U.S. District Court for the Southern District of New York issued an opinion making it likely the first case to accept the use of computer-assisted review of electronically stored information (“ESI”) for this case.  However, on March 13, District Court Judge Andrew L. Carter, Jr. granted plaintiffs’ request to submit additional briefing on their February 22 objections to the ruling.  In that briefing (filed on March 26), the plaintiffs claimed that the protocol approved for predictive coding “risks failing to capture a staggering 65% of the relevant documents in this case” and questioned Judge Peck’s relationship with defense counsel and with the selected vendor for the case, Recommind.

Then, on April 5, Judge Peck issued an order in response to Plaintiffs’ letter requesting his recusal, directing plaintiffs to indicate whether they would file a formal motion for recusal or ask the Court to consider the letter as the motion.  On April 13, (Friday the 13th, that is), the plaintiffs did just that, by formally requesting the recusal of Judge Peck (the defendants issued a response in opposition on April 30).  But, on April 25, Judge Carter issued an opinion and order in the case, upholding Judge Peck’s opinion approving computer-assisted review.

Not done, the plaintiffs filed an objection on May 9 to Judge Peck's rejection of their request to stay discovery pending the resolution of outstanding motions and objections (including the recusal motion, which has yet to be ruled on.  Then, on May 14, Judge Peck issued a stay, stopping defendant MSLGroup's production of electronically stored information.  Finally, on June 15, Judge Peck, in a 56 page opinion and order, denied the plaintiffs’ motion for recusal

So, what do you think?  What will happen in this case next?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: First Pass Review – Domain Categorization of Your Opponent’s Data

 

Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet)  So chances are, you haven’t seen these posts yet!  Enjoy!

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass®, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production.  One way to analyze that data is through “fuzzy” searching to find misspellings or OCR errors in an opponent’s produced ESI.

Domain Categorization

Another type of analysis is the use of domain categorization.  Email is generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

FirstPass supports domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection.  Domain categorization provides several benefits when reviewing your opponent’s ESI:

  • Non-Responsive Produced ESI: Domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains can be “group-tagged” as non-responsive.  If a significant percentage of files are identified as non-responsive, that may be a sign that your opponent is trying to “bury you with paper” (albeit electronic).
  • Inadvertent Disclosures: If there are any emails associated with outside counsel’s domain, they could be inadvertent disclosures of attorney work product or attorney-client privileged communications.  If so, you can then address those according to the agreed-upon process for handling inadvertent disclosures and clawback of same.
  • Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly.

In summary, there are several ways to use first pass review tools, like FirstPass, for reviewing your opponent’s ESI production, including: email analytics, synonym searching, fuzzy searching and domain categorization.  First pass review isn’t just for your own production; it’s also an effective process to quickly evaluate your opponent’s production.

So, what do you think?  Have you used first pass review tools to assess an opponent’s produced ESI?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: First Pass Review – Fuzzy Searching Your Opponent’s Data

 

Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet)  So chances are, you haven’t seen these posts yet!  Enjoy!

Tuesday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass®, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production.  One way to analyze that data is through synonym searching to find variations of your search terms to increase the possibility of finding the terminology used by your opponents.

Fuzzy Searching

Another type of analysis is the use of fuzzy searching.  Attorneys know what terms they’re looking for, but those terms may not often be spelled correctly.  Also, opposing counsel may produce a number of image only files that require Optical Character Recognition (OCR), which is usually not 100% accurate.

FirstPass supports "fuzzy" searching, which is a mechanism by finding alternate words that are close in spelling to the word you're looking for (usually one or two characters off).  FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for the term “petroleum”, you can find variations such as “peroleum”, “petoleum” or even “petroleom” – misspellings or OCR errors that could be relevant.  Then, simply select the variations you wish to include in the search.  Fuzzy searching is the best way to broaden your search to include potential misspellings and OCR errors and FirstPass provides a terrific capability to select those variations to review additional potential “hits” in your collection.

Tomorrow, I’ll talk about the use of domain categorization to quickly identify potential inadvertent disclosures and weed out non-responsive files produced by your opponent, based on the domain of the communicators.  Hasta la vista, baby! J

In the meantime, what do you think?  Have you used fuzzy searching to find misspellings or OCR errors in an opponent’s produced ESI?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Trends: First Pass Review – Synonym Searching Your Opponent’s Data

 

Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet)  So chances are, you haven’t seen these posts yet!  Enjoy!

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass®, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production.  One way to analyze that data is through email analytics to see the communication patterns graphically to identify key parties for deposition purposes and look for potential production omissions.

Synonym Searching

Another type of analysis is the use of synonym searching.  Attorneys understand the key terminology their client uses, but they often don’t know the terminology their client’s opposition uses because they haven’t interviewed the opposition’s custodians.  In a product defect case, the opposition may refer to admitted design or construction “mistakes” in their product or process as “flaws”, “errors”, “goofs” or even “flubs”.  With FirstPass, you can enter your search term into the synonym searching section of the application and it will provide a list of synonyms (with hit counts of each, if selected).  Then, you can simply select the synonyms you wish to include in the search.  As a result, FirstPass identifies synonyms of your search terms to broaden the scope and catch key “hits” that could be the “smoking gun” in the case.

Thursday, I’ll talk about the use of fuzzy searching to find misspellings that may be commonly used by your opponent or errors resulting from Optical Character Recognition (OCR) of any image-only files that they produce.  Stay tuned!  🙂

In the meantime, what do you think?  Have you used synonym searching to identify variations on terms in an opponent’s produced ESI?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Happy Independence Day from all of us at eDiscovery Daily and CloudNine Discovery!

eDiscovery Trends: First Pass Review – of Your Opponent’s Data

 

Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet)  So chances are, you haven’t seen these posts yet!  Enjoy!

In the past few years, applications that support Early Case Assessment (ECA) (or Early Data Assessment, as many prefer to call it) and First Pass Review (FPR) of ESI have become widely popular in eDiscovery as the analytical and culling benefits of conducting FPR have become obvious.  The benefit of these FPR tools to analyze and cull their ESI before conducting attorney review and producing relevant files has become increasingly clear.  But, nobody seems to talk about what these tools can do with opponent’s produced ESI.

Less Resources to Understand Data Produced to You

In eDiscovery, attorneys typically develop a reasonably in-depth understanding of their collection.  They know who the custodians are, have a chance to interview those custodians and develop a good knowledge of standard operating procedures and terminology of their client to effectively retrieve responsive ESI.  However, that same knowledge isn’t present when reviewing opponent’s data.  Unless they are deposed, the opposition’s custodians aren’t interviewed and where the data originated is often unclear.  The only source of information is the data itself, which requires in-depth analysis.  An FPR application like FirstPass®, powered by Venio FPR™, can make a significant difference in conducting that analysis – provided that you request a native production from your opponent, which is vital to being able to perform that in-depth analysis.

Email Analytics

The ability to see the communication patterns graphically – to identify the parties involved, with whom they communicated and how frequently – is a significant benefit to understanding the data received.  FirstPass provides email analytics to understand the parties involved and potentially identify other key opponent individuals to depose in the case.  Dedupe capabilities enable quick comparison against your production to confirm if the opposition has possibly withheld key emails between opposing parties.  FirstPass also provides an email timeline to enable you to determine whether any gaps exist in the opponent’s production.

Message Threading

The ability to view message threads for emails (which Microsoft Outlook® tracks), can also be a useful tool as it enables you to see the entire thread “tree” of a conversation, including any side discussions that break off from the original discussion.  Because Outlook tracks those message threads, any missing emails are identified with placeholders.  Those could be emails your opponent has withheld, so the ability to identify those quickly and address with opposing counsel (or with the court, if necessary) is key to evaluating the completeness of the production.

Tomorrow, I’ll talk about the use of synonym searching to find variations of your search terms that may be common terminology of your opponent.  Same bat time, same bat channel! 🙂

In the meantime, what do you think?  Have you used email analytics to analyze an opponent’s produced ESI?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.