Analysis

eDiscovery Best Practices: EDRM Data Set for Great Test Data

 

In it’s almost six years of existence, the Electronic Discovery Reference Model (EDRM) Project has implemented a number of mechanisms to standardize the practice of eDiscovery.  Having worked on the EDRM Metrics project for the past four years, I have seen some of those mechanisms implemented firsthand.

One of the most significant recent accomplishments by EDRM is the EDRM Data Set.  Anyone who works with eDiscovery applications and processes understands the importance to be able to test those applications in as many ways as possible using realistic data that will illustrate expected results.  The use of test data is extremely useful in crafting a defensible discovery approach, by enabling you to determine the expected results within those applications and processes before using them with your organization’s live data.  It can also help you identify potential anomalies (those never occur, right?) up front so that you can be proactive to develop an approach to address those anomalies before encountering them in your own data.

Using public domain data from Enron Corporation (originating from the Federal Energy Regulatory Commission Enron Investigation), the EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) to test those eDiscovery applications and processes.  In 2009, the EDRM Data Set project released its first version of the Enron Data Set, comprised of Enron e-mail messages and attachments within Outlook PST files, organized in 32 zipped files.

This past November, the EDRM Data Set project launched Version 2 of the EDRM Enron Email Data Set.  Straight from the press release announcing the launch, here are some of the improvements in the newest version:

  • Larger Data Set: Contains 1,227,255 emails with 493,384 attachments (included in the emails) covering 151 custodians;
  • Rich Metadata: Includes threading information, tracking IDs, and general Internet headers;
  • Multiple Email Formats: Provision of both full and de-duplicated email in PST, MIME and EDRM XML, which allows organizations to test and compare results across formats.

The Text REtrieval Conference (TREC) Legal Track project provided input for this version of the data set, which, as noted previously on this blog, has used the EDRM data set for its research.  Kudos to John Wang, Project Lead for the EDRM Data Set Project and Product Manager at ZL Technologies, Inc., and the rest of the Data Set team for such an extensive test set collection!

So, what do you think?  Do you use the EDRM Data Set for testing your eDiscovery processes?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: 2011 Predictions — By The Numbers

 

Comedian Nick Bakay”>Nick Bakay always ends his Tale of the Tape skits where he compares everything from Married vs. Single to Divas vs. Hot Dogs with the phrase “It's all so simple when you break things down scientifically.”

The late December/early January time frame is always when various people in eDiscovery make their annual predictions as to what trends to expect in the coming year.  We’ll have some of our own in the next few days (hey, the longer we wait, the more likely we are to be right!).  However, before stating those predictions, I thought we would take a look at other predictions and see if we can spot some common trends among those, “googling” for 2011 eDiscovery predictions, and organized the predictions into common themes.  I found serious predictions here, here, here, here and here.  Oh, also here and here.

A couple of quick comments: 1) I had NO IDEA how many times that predictions are re-posted by other sites, so it took some work to isolate each unique set of predictions.  I even found two sets of predictions from ZL Technologies, one with twelve predictions and another with seven, so I had to pick one set and I chose the one with seven (sorry, eWEEK!). If I have failed to accurately attribute the original source for a set of predictions, please feel free to comment.  2) This is probably not an exhaustive list of predictions (I have other duties in my “day job”, so I couldn’t search forever), so I apologize if I’ve left anybody’s published predictions out.  Again, feel free to comment if you’re aware of other predictions.

Here are some of the common themes:

  • Cloud and SaaS Computing: Six out of seven “prognosticators” indicated that adoption of Software as a Service (SaaS) “cloud” solutions will continue to increase, which will become increasingly relevant in eDiscovery.  No surprise here, given last year’s IDC forecast for SaaS growth and many articles addressing the subject, including a few posts right here on this blog.
  • Collaboration/Integration: Six out of seven “augurs” also had predictions related to various themes associated with collaboration (more collaboration tools, greater legal/IT coordination, etc.) and integration (greater focus by software vendors on data exchange with other systems, etc.).  Two people specifically noted an expectation of greater eDiscovery integration within organization governance, risk management and compliance (GRC) processes.
  • In-House Discovery: Five “pundits” forecasted eDiscovery functions and software will continue to be brought in-house, especially on the “left-side of the EDRM model” (Information Management).
  • Diverse Data Sources: Three “soothsayers” presaged that sources of data will continue to be more diverse, which shouldn’t be a surprise to anyone, given the popularity of gadgets and the rise of social media.
  • Social Media: Speaking of social media, three “prophets” (yes, I’ve been consulting my thesaurus!) expect social media to continue to be a big area to be addressed for eDiscovery.
  • End to End Discovery: Three “psychics” also predicted that there will continue to be more single-source end-to-end eDiscovery offerings in the marketplace.

The “others receiving votes” category (two predicting each of these) included maturing and acceptance of automated review (including predictive coding), early case assessment moving toward the Information Management stage, consolidation within the eDiscovery industry, more focus on proportionality, maturing of global eDiscovery and predictive/disruptive pricing.

Predictive/disruptive pricing (via Kriss Wilson of Superior Document Services and Charles Skamser of eDiscovery Solutions Group respective blogs) is a particularly intriguing prediction to me because data volumes are continuing to grow at an astronomical rate, so greater volumes lead to greater costs.  Creativity will be key in how companies deal with the larger volumes effectively, and pressures will become greater for providers (even, dare I say, review attorneys) to price their services more creatively.

Another interesting prediction (via ZL Technologies) is that “Discovery of Databases and other Structured Data will Increase”, which is something I’ve expected to see for some time.  I hope this is finally the year for that.

Finally, I said that I found serious predictions and analyzed them; however, there are a couple of not-so-serious sets of predictions here and here.  My favorite prediction is from The Posse List, as follows: “LegalTech…renames itself “EDiscoveryTech” after Law.com survey reveals that of the 422 vendors present, 419 do e-discovery, and the other 3 are Hyundai HotWheels, Speedway Racers and Convert-A-Van who thought they were at the Javits Auto Show.”

So, what do you think?  Care to offer your own “hunches” from your crystal ball?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Predictive Coding Strategy and Survey Results

Yesterday, we introduced the Virtual LegalTech online educational session Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding” and defined predictive coding while also noting the two “learning” methods that most predictive coding mechanisms use to predict document classifications.  To get background information regarding the session, including information about the speakers (Jason Baron, Maura Grossman and Bennett Borden), click here.

The session also focused on strategies for using predictive coding and results of the TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.  Strategies discussed by Bennett Borden include:

  • Understanding the technology used by a particular provider:  Not only will supervised and active learning mechanisms often yield different results, but there are differing technologies within each of these learning mechanisms.
  • Understand the state of the law regarding predictive coding technology: So far, there is no case law available regarding use of this technology and, while it may eventually be the future of document review, that has yet to be established.
  • Obtain buy-in by the requesting party to use predictive coding technology: It’s much easier when the requesting party has agreed to your proposed approach and that agreement is included in an order of the court which covers the approach and also includes a FRE 502 “clawback” agreement and order.  To have a chance to obtain that buy-in and agreement, you’ll need a diligent approach that includes “tiering” of the collection by probable responsiveness and appropriate sampling of each tier level.

Maura Grossman then described TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.  The team took the EDRM Enron Version 2 Dataset of 1.3 million public domain files, deduped it down to 685,000+ unique files and 5.5 GB of uncompressed data.  The team also identified eight different hypothetical eDiscovery requests for the test.

Participating predictive coding technologies were then given a “seed set” of roughly 1,000 documents that had previously been identified by TREC as responsive or non-responsive to each of the requests. Using this information, participants were required to rank the documents in the larger collection from most likely to least likely to be responsive, and estimate the likelihood of responsiveness as a probability for each document.  The study ranked the participants on recall rate accuracy based on 30% of the collection retrieved (200,000 files) and also on the predicted recall to determine a prediction accuracy.

The results?  Actual recall rates for all eight discovery requests ranged widely among the tools from 85.1% actual recall down to 38.2% (on individual requests, the range was even wider – as much as 82% different between the high and the low).  The prediction accuracy rates for the tools also ranged somewhat widely, from a high of 95% to a low of 42%.

Based on this study, it is clear that these technologies can differ significantly on how effective and efficient they are at correctly ranking and categorizing remaining documents in the collection based on the exemplar “seed set” of documents.  So, it’s always important to conduct sampling of both machine coded and human coded documents for quality control in any project, with or without predictive coding (we sometimes forget that human coded documents can just as often be incorrectly coded!).

For more about the TREC 2010 Legal Track study, click here.  As noted yesterday, you can also check out a replay of the session or download the slides for the presentation at the Virtual LegalTech site.

Full Disclosure: Trial Solutions provides predictive coding services using Hot Neuron LLC’s Clustify™, which categorizes documents by looking for similar documents in the exemplar set that satisfy a user-specified criteria, such as a minimum conceptual similarity or near-duplicate percentage.

So, what do you think?  Have you used predictive coding on a case?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: What the Heck is “Predictive Coding”?

 

Yesterday, ALM hosted another Virtual LegalTech online "live" day online.  Every quarter, theVirtual LegalTech site has a “live” day with educational sessions from 9 AM to 5 PM ET, most of which provide CLE credit in certain states (New York, California, Florida, and Illinois).

One of yesterday’s sessions was Frontiers of E-Discovery: What Lawyers Need to Know About “Predictive Coding”.  The speakers for this session were:

Jason Baron: Director of Litigation for the National Archives and Records Administration, a founding co-coordinator of the National Institute of Standards and Technology’s Text Retrieval Conference (“TREC”) legal track and co-chair and editor-in-chief for various working groups for The Sedona Conference®;

Maura Grossman: Counsel at Wachtell, Lipton, Rosen & Katz, co-chair of the eDiscovery Working Group advising the New York State Unified Court System and coordinator of the 2010 TREC legal track; and

Bennett Borden: co-chair of the e-Discovery and Information Governance Section at Williams Mullen and member of Working Group I of The Sedona Conference on Electronic Document Retention and Production, as well as the Cloud Computing Drafting Group.

This highly qualified panel discussed a number of topics related to predictive coding, including practical applications of predictive coding technologies and results of the TREC 2010 Legal Track Learning Task on the effectiveness of “Predictive Coding” technologies.

Before discussing the strategies for using predictive coding technologies and the results of the TREC study, it’s important to understand what predictive coding is.  The panel gave the best descriptive definition that I’ve seen yet for predictive coding, as follows:

“The use of machine learning technologies to categorize an entire collection of documents as responsive or non-responsive, based on human review of only a subset of the document collection. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc.”

The panel used an analogy for predictive coding by relating it to spam filters that review and classify email and learn based on previous classifications which emails can be considered “spam”.  Just as no spam filter perfectly classifies all emails as spam or legitimate, predictive coding does not perfectly identify all relevant documents.  However, they can “learn” to identify most of the relevant documents based on one of two “learning” methods:

  • Supervised Learning: a human chooses a set of “exemplar” documents that feed the system and enable it to rank the remaining documents in the collection based on their similarity to the exemplars (e.g., “more like this”);
  • Active Learning: the system chooses the exemplars on which human reviewers make relevancy determinations, then the system learns from those classifications to apply to the remaining documents in the collection.

Tomorrow, I “predict” we will get into the strategies and the results of the TREC study.  You can check out a replay of the session at theVirtual LegalTech site. You’ll need to register – it’s free – then login and go to the CLE Center Auditorium upon entering the site (which is up all year, not just on "live days").  Scroll down until you see this session and then click on “Attend Now” to view the replay presentation.  You can also go to the Resource Center at the site and download the slides for the presentation.

So, what do you think?  Do you have experience with predictive coding?  Please share any comments you might have or if you’d like to know more about a particular topic.

Thought Leader Q&A: Brad Jenkins of Trial Solutions

 

Tell me about your company and the products you represent. Trial Solutions is an electronic discovery software and services company in Houston, Texas that assists corporations and law firms in the collection, processing and review of electronic data. Trial Solutions developed OnDemand™, formerly known as ImageDepot™, an online e-discovery review application which is currently used by over fifty of the top 250 law firms including seven of the top ten.  Trial Solutions also offers FirstPass™, an early case assessment and first-pass review application.  Both applications are offered as a software-as-a-service (SaaS), where Trial Solutions licenses the applications to customers for use and provides access via the Internet. Trial Solutions provides litigation support services in over 90 metropolitan areas throughout the United States and Canada.

What do you see as emerging trends for eDiscovery SaaS solutions?  I believe that one emerging trend that you’ll see is simplified pricing.  Pricing for many eDiscovery SaaS solutions is too complex and difficult for clients to understand.  Many providers base pricing on a combination of collection size and number of users (among other factors) which is confusing and penalizes organizations for adding users into a case,  I believe that organizations will expect simpler pricing models from providers with the ability to add an unlimited number of users to each case.

Another trend I expect to see is provision of more self-service capabilities giving legal teams greater control over managing their own databases and cases.  Organizations need the ability to administer their own databases, add users and maintain their rights without having to rely on the hosting provider to provide these services.  A major self-service capability is the ability to load your own data on your schedule without having to pay load fees to the hosting provider.

Why do you think that more eDiscovery SaaS solutions don’t provide a free self loading capability?  I don’t know.  Many SaaS solutions outside of eDiscovery enable you to upload your own data to use and share via the Web.  Facebook and YouTube enable you to upload and share pictures and videos, Google Docs is designed for sharing and maintaining business documents, and even SalesForce.com allows you to upload contacts via a comma-separated values (CSV) file.  So, loading your own data is not a new concept for SaaS solutions.  OnDemand™ is about to roll out a new SelfLoader™ module to enable clients to load their own data, for free.  With SelfLoader, clients can load their own images, OCR text files, native files and metadata to an existing OnDemand database using an industry-standard load file (IPRO’s .lfp or Concordance’s .opt) format.

Are there any other trends that you see in the industry?  One clear trend is the rising popularity in first pass review/early case assessment (or, early data assessment, as some prefer) solutions like FirstPass as corporate data proliferates at an amazing pace.  According to International Data Corporation (IDC), the amount of digital information created, captured and replicated in the world as of 2006 was 161 exabytes or 161 billion gigabytes and that is expected to rise more than six-fold by 2010 (to 988 exabytes)!  That’s enough data for a stack of books from the sun to Pluto and back again!  With more data than ever to review, attorneys will have to turn to applications to enable them to quickly cull the data to a manageable level for review – it will simply be impossible to review the entire collection in a cost-efficient and timely manner.  It will also be important for there to be a seamless transition from first pass review for culling collections to attorney linear review for final determination of relevancy and privilege and Trial Solutions provides a fully integrated approach with FirstPass and OnDemand.

About Brad Jenkins
Brad Jenkins, President and CEO of Trial Solutions, has over 20 years of experience leading customer focused companies in the litigation support arena. Brad has authored many articles on litigation support issues, and has spoken before national audiences on document management practices and solutions.

Thought Leader Q&A: Chris Jurkiewicz of Venio Systems

 

Tell me about your company and the products you represent.  Venio Systems is an Electronic Discovery software solution provider specializing in early case assessment and first pass review.  Our product, Venio FPR™, allows forensic units, attorneys and litigation support teams to process, analyze, search, report, interact with and export responsive data for linear review or production.

What do you consider to be the reason for the enormous growth of early case assessment/first pass review tools in the industry?  I believe much of the growth we’ve seen in the past few years can be attributed to many factors, of which the primary one is the exponential growth of data within an organization.  The inexpensive cost of data storage available to an organization is making it easier for them to keep unnecessary data on their systems.  Companies who practice litigation and/or work with litigative data are seeking out quick and cost effective methods of funneling the necessary data from all the unnecessary data stored in these vast systems thereby making early case assessment/first pass review tools not only appealing but necessary.

Are there other areas where first pass review tools can be useful during eDiscovery?  Clients have found creative ways in using first pass review/ECA technology; recently a client utilized it to analyze a recent production received by opposing counsel. They were able to determine that the email information produced was not complete.  They were then able to force the opposing counsel to fill in the missing email gaps.

There have been several key cases related to search defensibility in the past couple of years.  How will those decisions affect organizations’ approach to ESI searching?  More organizations will have to adopt a defensible process for searching and use tools that support that process.  Venio’s software has many key features focused on search defensibility including: Search List Analysis, Wild Card Variation searching, Search Audit Reporting and Fuzzy Searching.  All searches run in Venio FPR™ are audited by user, date and time, terms, scope, and frequency.  By using these tools, clients have been able to find additional responsive files that would be otherwise missed and easily document their search approach and refinement.

How do you think the explosion of data and technology will affect the review process in the future?  I believe that technology will continue to evolve and provide innovative tools to allow for more efficient reviews of ESI.  In the past few years the industry has already seen several new technologies released such as near deduping, concept searching and clustering which have significantly improved the speed of the review.  Legal teams will have to continue to make greater utilization of these technologies to provide efficient and cost-effective review as their clients will demand it.

About Chris Jurkiewicz
Chris graduated in 2000 with a Bachelor of Science in Computer Information Systems at Marymount University in Arlington, Virginia.  He began working for On-Site Sourcing while still an intern at Marymount and became the youngest Director on On-Site’s management team within three years as the Director of their Electronic Data Discovery Division.  In 2009, Chris co-founded Venio Systems to fill a void in Early Case Assessment (ECA) technology with Venio FPR™ to provide law firms, corporations and government entities the ability to gain a comprehensive picture of their data set at the front-end; thereby, saving precious time and money on the back-end..  Chris is an industry recognized expert in the field of eDiscovery, having spoken on several eDiscovery panels and served as an eDiscovery expert witness.

Reporting from the EDRM Mid-Year Meeting

 

Launched in May 2005, the Electronic Discovery Reference Model (EDRM) Project was created to address the lack of standards and guidelines in the electronic discovery market.  Now, in its sixth year of operation, EDRM has become the gold standard for…well…standards in eDiscovery.  Most references to the eDiscovery industry these days refer to the EDRM model as a representation of the eDiscovery life cycle.

At the first meeting in May 2005, there were 35 attendees, according to Tom Gelbmann of Gelbmann & Associates, co-founder of EDRM along with George Socha of Socha Consulting LLC.  Check out the preliminary first draft of the EDRM diagram – it has evolved a bit!  Most participants were eDiscovery providers and, according to Gelbmann, they asked “Do you really expect us all to work together?”  The answer was “yes”, and the question hasn’t been asked again.  Today, there are over 300 members from 81 participating organizations including eDiscovery providers, law firms and corporations (as well as some individual participants).

This week, the EDRM Mid-Year meeting is taking place in St. Paul, MN.  Twice a year, in May and October, eDiscovery professionals who are EDRM members meet to continue the process of working together on various standards projects.  EDRM has eight currently active projects, as follows:

  • Data Set: provides industry-standard, reference data sets of electronically stored information (ESI) and software files that can be used to test various aspects of eDiscovery software and services,
  • Evergreen: ensures that EDRM remains current, practical and relevant and educates about how to make effective use of the Model,
  • Information Management Reference Model (IMRM): provides a common, practical, flexible framework to help organizations develop and implement effective and actionable information management programs,
  • Jobs: develops a framework for evaluating pre-discovery and discovery personnel needs or issues,
  • Metrics: provides an effective means of measuring the time, money and volumes associated with eDiscovery activities,
  • Model Code of Conduct: evaluates and defines acceptable boundaries of ethical business practices within the eDiscovery service industry,
  • Search: provides a framework for defining and managing various aspects of Search as applied to eDiscovery workflow,
  • XML: provides a standard format for e-discovery data exchange between parties and systems, reducing the time and risk involved with data exchange.

This is my fourth year participating in the EDRM Metrics project and it has been exciting to see several accomplishments made by the group, including creation of a code schema for measuring activities across the EDRM phases, glossary definitions of those codes and tools to track early data assessment, collection and review activities.  Today, we made significant progress in developing survey questions designed to gather and provide typical metrics experienced by eDiscovery legal teams in today’s environment.

So, what do you think?  Has EDRM impacted how you manage eDiscovery?  If so, how?  Please share any comments you might have or if you’d like to know more about a particular topic.

Announcing eDiscovery Thought Leader Q&A Series!

 

eDiscovery Daily is excited to announce a new blog series of Q&A interviews with various eDiscovery thought leaders.  Over the next three weeks, we will publish interviews conducted with six individuals with unique and informative perspectives on various eDiscovery topics.  Mark your calendars for these industry experts!

Christine Musil is Director of Marketing for Informative Graphics Corporation, a viewing, annotation and content management software company based in Arizona.  Christine will be discussing issues associated with native redaction and redaction of Adobe PDF files.  Her interview will be published this Thursday, October 14.

Jim McGann is Vice President of Information Discovery for Index Engines. Jim has extensive experience with the eDiscovery and Information Management.  Jim will be discussing issues associated with tape backup and retrieval.  His interview will be published this Friday, October 15.

Alon Israely is a Senior Advisor in BIA’s Advisory Services group and currently oversees BIA’s product development for its core technology products.  Alon will be discussing best practices associated with “left side of the EDRM model” processes such as preservation and collection.  His interview will be published next Thursday, October 21.

Chris Jurkiewicz is Co-Founder of Venio Systems, which provides Venio FPR™ allowing legal teams to analyze data, provide an early case assessment and a first pass review of any size data set.  Chris will be discussing current trends associated with early case assessment and first pass review tools.  His interview will be published next Friday, October 22.

Kirke Snyder is Owner of Legal Information Consultants, a consulting firm specializing in eDiscovery Process Audits to help organizations lower the risk and cost of e-discovery.  Kirke will be discussing best practices associated with records and information management.  His interview will be published on Monday, October 25.

Brad Jenkins is President and CEO for Trial Solutions, which is an electronic discovery software and services company that assists litigators in the collection, processing and review of electronic information.  Brad will be discussing trends associated with SaaS eDiscovery solutions.  His interview will be published on Tuesday, October 26.

We thank all of our guests for participating!

So, what do you think?  Is there someone you would like to see interviewed for the blog?  Are you an industry expert with some information to share from your “soapbox”?  If so, please share any comments or contact me at daustin@trialsolutions.net.  We’re looking to assemble our next group of interviews now!

eDiscovery Case Study: Term List Searching for Deadline Emergencies!

 

A few weeks ago, I was preparing to conduct a Friday morning training session for a client to show them how to use FirstPass™, powered by Venio FPR™, to conduct a first pass review of their data when I received a call from the client.  “We thought we were going to have a month to review this data, but because of a judge’s ruling in the case, we now have to start depo prep for two key custodians on Monday for depositions now scheduled next week”, said Megan Moore, attorney with Steele Sturm, PLLC, in Houston.  “We have to complete our review of their files this weekend.”

So, what do you do when you have to conduct both a first pass and final review of the data in a weekend?

It was determined that Steele Sturm had to complete first pass review that Friday, so that we could prepare the potentially responsive files for an attorney review starting Saturday morning.  Steele Sturm identified a list of responsive search terms and Trial Solutions worked with the attorneys to include variations of the terms (such as proximity searches and synonyms) to finalize a list of terms to apply to the data to identify potentially responsive files.  Because FirstPass provides the ability to import and search an entire term list at once, we were able to identify potentially responsive files in a simple, two step process.  “Using FirstPass, Trial Solutions helped us cull out 75% of the collection as non-responsive, enabling our review team to focus review on the remaining 25%”, said Moore.

Once the potentially responsive files were identified, they were imported into OnDemand™, powered by ImageDepot™, for linear attorney review.  During review, the attorneys identified that some of the terms used in identifying potentially responsive files were overbroad, so additional searches were performed in OnDemand to “group tag” those files as non-responsive.  “Trial Solutions provided training and support throughout the weekend to enable our review team to quickly "tag" each file using OnDemand as to responsiveness and privilege to enable us to meet our deadline”, said Moore.

So, what do you think?  Do you have any “emergency” war stories to share?  Please share any comments you might have or if you’d like to know more about a particular topic.