EDRM Archives

eDiscovery Trends: EDRM Metrics Privilege Survey

February 10, 2011

As a member of the EDRM Metrics Project for the past four years, I have seen several accomplishments by the group to provide an effective means of measuring the time, money and volumes associated with eDiscovery activities, including:

Code Set: An extensive code set of activities to be tracked from Identification through Presentation, as well as Project Management.
Case Study: A hypothetical case study that illustrates at each phase why metrics should be collected, what needs to be measured, how metrics are acquired and where they’re recorded, and how the metrics can be used.
Cube: A simple graphical model which illustrates the EDRM phases, aspects to be tracked (e.g., custodians, systems, media, QA, activities, etc.) and the metrics to be applied (i.e., items, cost, volume, time).

The EDRM Metrics project has also been heavily involved in proposing a standard set of eDiscovery activity codes for the ABA’s Uniform Task Based Management System (UTBMS) series of codes used to classify the legal services performed by a law firm in an electronic invoice submission.

Now, we need your help for an information gathering exercise.

We are currently conducting a Metrics Privilege survey to get a sense throughout the industry as to typical volumes and percentages of privileged documents within a collection. It’s a simple, 7 question survey that strives to gather information regarding your experiences with privileged documents (whether you work for a law firm, corporation, provider or some other organization).

If you have a minute (which is literally all the time it will take), please take the survey and pass along to your colleagues to do so as well. The more respondents who participate, the more representative the survey will be as to the current eDiscovery community. To take the survey, go to edrm.net or click here. EDRM will publish the results in the near future.

So, what do you think? What are your typical metrics with regard to privileged documents? Please share any comments you might have or if you’d like to know more about a particular topic.

Managing an eDiscovery Contract Review Team: Identify a Project Manager

January 26, 2011

Yesterday, we talked about applying topic codes to the documents to identify helpful or harmful documents. Today, we will talk about identifying a project manager for the review.

A good, experienced project manager is critical to the success of your review project. In fact, the project manager is the most important part of the equation. The project manager will be responsible for:

Creating a schedule and a budget
Determining the right staff size
Lining up all the resources that you’ll need like computer equipment, software, and supplies
Preparing training materials.
Coordinating training of the review team
Serving as a liaison with the service providers who are processing the data, loading data into the review tool, and making the review tool available to the review team
Monitoring status of the project and reporting to the litigation team
Identifying potential problems with schedule and budget and developing resolutions
Ensuring that questions are resolved quickly and that lines of communication between the review team and decision makers are open
Supervising workflow and quality control work

Choose someone who has project management experience and is experienced in litigation, technology, electronic discovery, working with vendors, and working with attorneys. Identify the project manager early on and get him or her involved in the project planning steps.

What do you look for in a project manager? Please share any comments you have and let us know if you’d like to know more about an eDiscovery topic.

Managing an eDiscovery Contract Review Team: Applying Topic Codes in the Document Review

January 25, 2011

So far we’ve covered drafting criteria for responsiveness and for privilege. You may, however, be asking the review team to do more than that in the document review. You might, for example, ask them to apply topic codes to the documents or to identify helpful or harmful documents. At this point in the case, you will be better off keeping this very simple. There are several reasons for this:

Chances are that you’re on a tight schedule. An in depth analysis of the collection at this point may cause you to miss production deadlines.
If you ask people to focus on too many things in the review, you increase the likelihood of errors and inconsistencies, especially if the team is inexperienced with the case, the client and the documents.
You’re still in the early stages of the case. As it evolves you’ll identify new facts, issues and witnesses that will be important. This will not be your only effort to match documents with issues, facts and witnesses.

It may be reasonable, however, to ask the team to do some very basic categorization of the documents around topics. Let me give you an example. Let’s say you are handling a pharmaceutical case involving a drug product that is alleged to have significant adverse reactions. You know that you’ll be interested in documents that discuss testing of the product, marketing, manufacturing, and so on. You could ask the team to apply those general types of topics to the documents. You could also identify a few examples of text that will be helpful and text that will be harmful, and create corresponding topic codes (using our pharmaceutical case illustration, you might have a topic code for “Death of a patient”). A very simple set of topic codes shouldn’t slow down the review, and this effort will provide some search hooks into the collection once the review is complete.

Once you’ve developed a simple, workable topic list, write clear, objective definitions for each topic, and find documents in the collection that serve as examples of each. Include those definitions and examples in the criteria.

Do you have topic codes applied to a collection in an initial review? How do you approach it and how well does it work? Please share any comments you have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Searching: For Defensible Searching, Be a "STARR"

January 24, 2011

Defensible searching has become a priority in eDiscovery as parties in several cases have experienced significant consequences (including sanctions) for not implementing a defensible search strategy in responding to discovery requests.

Probably the most famous case where search approach has been an issue was Victor Stanley, Inc. v. Creative Pipe , Inc., 250 F.R.D. 251 (D. Md. 2008), where Judge Paul Grimm noted that “only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents” and found that privilege on 165 inadvertently produced documents was waived, in part, because of the inadequacy of the search approach.

A defensible search strategy is part using an effective tool (with advanced search capabilities such as “fuzzy”, wildcard, synonym and proximity searching) and part using an effective approach to test and verify search results.

I have an acronym that I use to reflect the defensible search process. I call it “STARR” – as in “STAR” with an extra “R” or Green Bay Packer football legend Bart Starr (sorry, Bears fans!). For each search that you need to conduct, here’s how it goes:

Search: Construct the best search you can to maximize recall and precision for the desired result. An effective tool gives you more options for constructing a more effective search, which should help in maximizing recall and precision. For example, as noted on this blog a few days ago, a proximity search can, under the right circumstances, provide a more precise search result without sacrificing recall.
Test: Once you’ve conducted the search, it’s important to test two datasets to determine the effectiveness of the search:
- Result Set: Test the result set by randomly selecting an appropriate sample percentage of the files and reviewing those to determine their responsiveness to the intent of the search. The appropriate percentage of files to be reviewed depends on the size of the result set – the smaller the set, the higher percentage of it that should be reviewed.
- Files Not Retrieved: While testing the result set is important, it is also important to randomly select an appropriate sample percentage of the files that were not retrieved in the search and review those as well to see if any responsive hits are identified as missed by the search.
Analyze: Analyze the results of the random sample testing of both the result set and also the files not retrieved to determine how effective the search was in retrieving mostly responsive files and whether any responsive files were identified as missed by the search performed.
Revise: If the search retrieved a low percentage of responsive files and retrieved a high percentage of non-responsive files, then precision of the search may need to be improved. If the files not retrieved contained any responsive files, then recall of the search may need to be improved. Evaluate the results and see what, if any, revisions can be made to the search to improve precision and/or recall.
Repeat: Once you’ve identified revisions you can make to your search, repeat the process. Search, Test, Analyze and (if necessary) Revise the search again until the precision and recall of the search is maximized to the extent possible.

While you can’t guarantee that you will retrieve all of the responsive files or eliminate all of the non-responsive ones, a defensible approach to get as close as you can to that goal will minimize the number of files for review, potentially saving considerable costs and making you a “STARR” in the courtroom when defending your search approach.

So, what do you think? Are you a “STARR” when it comes to defensible searching? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: EDRM Data Set for Great Test Data

January 17, 2011

In it’s almost six years of existence, the Electronic Discovery Reference Model (EDRM) Project has implemented a number of mechanisms to standardize the practice of eDiscovery. Having worked on the EDRM Metrics project for the past four years, I have seen some of those mechanisms implemented firsthand.

One of the most significant recent accomplishments by EDRM is the EDRM Data Set. Anyone who works with eDiscovery applications and processes understands the importance to be able to test those applications in as many ways as possible using realistic data that will illustrate expected results. The use of test data is extremely useful in crafting a defensible discovery approach, by enabling you to determine the expected results within those applications and processes before using them with your organization’s live data. It can also help you identify potential anomalies (those never occur, right?) up front so that you can be proactive to develop an approach to address those anomalies before encountering them in your own data.

Using public domain data from Enron Corporation (originating from the Federal Energy Regulatory Commission Enron Investigation), the EDRM Data Set Project provides industry-standard, reference data sets of electronically stored information (ESI) to test those eDiscovery applications and processes. In 2009, the EDRM Data Set project released its first version of the Enron Data Set, comprised of Enron e-mail messages and attachments within Outlook PST files, organized in 32 zipped files.

This past November, the EDRM Data Set project launched Version 2 of the EDRM Enron Email Data Set. Straight from the press release announcing the launch, here are some of the improvements in the newest version:

Larger Data Set: Contains 1,227,255 emails with 493,384 attachments (included in the emails) covering 151 custodians;
Rich Metadata: Includes threading information, tracking IDs, and general Internet headers;
Multiple Email Formats: Provision of both full and de-duplicated email in PST, MIME and EDRM XML, which allows organizations to test and compare results across formats.

The Text REtrieval Conference (TREC) Legal Track project provided input for this version of the data set, which, as noted previously on this blog, has used the EDRM data set for its research. Kudos to John Wang, Project Lead for the EDRM Data Set Project and Product Manager at ZL Technologies, Inc., and the rest of the Data Set team for such an extensive test set collection!

So, what do you think? Do you use the EDRM Data Set for testing your eDiscovery processes? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

January 14, 2011

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation. Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things). However, few of them are developed with litigation retrieval in mind. Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.). You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.). You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other. A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”. Why? Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits. And, that’s where proximity searching comes in. Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order. Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How? We wound up taking a two-step approach to get the collection to a more likely responsive set. First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results. After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other. Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result? We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set. And, we probably saved $50,000 to $100,000 in review costs – on a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits. For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”. One character difference is all it takes for a phrase search to miss a potentially responsive file. With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think? Do you use proximity searching in your culling for review? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Sanctions Down in 2010 — at least thru December 1

January 11, 2011

Recently, this blog cited a Duke Law Journal study that indicated that eDiscovery sanctions were at an all-time high through 2009. Then, a couple of weeks ago, I saw a story recently from Williams Mullen recapping the 2010 year in eDiscovery. It provides a very thorough recap including 2010 trends in sanctions (identifying several cases where sanctions were at issue), advances made during the year in cooperation and proportionality, challenges associated with privacy concerns in foreign jurisdictions and trends in litigation dealing with social media. It’s a very comprehensive summary of the year in eDiscovery.

One noteworthy finding is that, according to the report, sanctions were sought and awarded in fewer cases in 2010. Some notable stats from the report:

There were 208 eDiscovery opinions in 2009 versus 209 through December 1, 2010;
Out of 209 cases with eDiscovery opinions in 2010, sanctions were sought in 79 of them (38%) and awarded in 49 (62% of those cases, and 23% of all eDiscovery cases).
Compare that with 2009 when sanctions were sought in 42% of eDiscovery cases and were awarded in 70% of the cases in which they were requested (30% of all eDiscovery cases).
While overall requests for sanctions decreased, motions to compel more than doubled in 2010, being filed in 43% of all e-discovery cases, compared to 20% in 2009.
Costs and fees were by far the most common sanction, being awarded in 60% of the cases involving sanctions.
However, there was a decline in each type of sanction as costs and fees (from 33 to 29 total sanctions), adverse inference (13 to 7), terminating (10 to 7), additional discovery (10 to 6) and preclusion (5 to 3) sanctions all declined.

The date of this report was December 17, and the report noted a total of 209 eDiscovery cases as of December 1, 2010. So, final tallies for the year were not yet tabulated. It will be interesting to see if the trend in decline of sanctions held true once the entire year is considered.

So, what do you think? Is this a significant indication that more organizations are getting a handle on their eDiscovery obligations – or just a “blip in the radar”? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: 2011 Predictions — By The Numbers

January 4, 2011

Comedian Nick Bakay”>Nick Bakay always ends his Tale of the Tape skits where he compares everything from Married vs. Single to Divas vs. Hot Dogs with the phrase “It's all so simple when you break things down scientifically.”

The late December/early January time frame is always when various people in eDiscovery make their annual predictions as to what trends to expect in the coming year. We’ll have some of our own in the next few days (hey, the longer we wait, the more likely we are to be right!). However, before stating those predictions, I thought we would take a look at other predictions and see if we can spot some common trends among those, “googling” for 2011 eDiscovery predictions, and organized the predictions into common themes. I found serious predictions here, here, here, here and here. Oh, also here and here.

A couple of quick comments: 1) I had NO IDEA how many times that predictions are re-posted by other sites, so it took some work to isolate each unique set of predictions. I even found two sets of predictions from ZL Technologies, one with twelve predictions and another with seven, so I had to pick one set and I chose the one with seven (sorry, eWEEK!). If I have failed to accurately attribute the original source for a set of predictions, please feel free to comment. 2) This is probably not an exhaustive list of predictions (I have other duties in my “day job”, so I couldn’t search forever), so I apologize if I’ve left anybody’s published predictions out. Again, feel free to comment if you’re aware of other predictions.

Here are some of the common themes:

Cloud and SaaS Computing: Six out of seven “prognosticators” indicated that adoption of Software as a Service (SaaS) “cloud” solutions will continue to increase, which will become increasingly relevant in eDiscovery. No surprise here, given last year’s IDC forecast for SaaS growth and many articles addressing the subject, including a few posts right here on this blog.
Collaboration/Integration: Six out of seven “augurs” also had predictions related to various themes associated with collaboration (more collaboration tools, greater legal/IT coordination, etc.) and integration (greater focus by software vendors on data exchange with other systems, etc.). Two people specifically noted an expectation of greater eDiscovery integration within organization governance, risk management and compliance (GRC) processes.
In-House Discovery: Five “pundits” forecasted eDiscovery functions and software will continue to be brought in-house, especially on the “left-side of the EDRM model” (Information Management).
Diverse Data Sources: Three “soothsayers” presaged that sources of data will continue to be more diverse, which shouldn’t be a surprise to anyone, given the popularity of gadgets and the rise of social media.
Social Media: Speaking of social media, three “prophets” (yes, I’ve been consulting my thesaurus!) expect social media to continue to be a big area to be addressed for eDiscovery.
End to End Discovery: Three “psychics” also predicted that there will continue to be more single-source end-to-end eDiscovery offerings in the marketplace.

The “others receiving votes” category (two predicting each of these) included maturing and acceptance of automated review (including predictive coding), early case assessment moving toward the Information Management stage, consolidation within the eDiscovery industry, more focus on proportionality, maturing of global eDiscovery and predictive/disruptive pricing.

Predictive/disruptive pricing (via Kriss Wilson of Superior Document Services and Charles Skamser of eDiscovery Solutions Group respective blogs) is a particularly intriguing prediction to me because data volumes are continuing to grow at an astronomical rate, so greater volumes lead to greater costs. Creativity will be key in how companies deal with the larger volumes effectively, and pressures will become greater for providers (even, dare I say, review attorneys) to price their services more creatively.

Another interesting prediction (via ZL Technologies) is that “Discovery of Databases and other Structured Data will Increase”, which is something I’ve expected to see for some time. I hope this is finally the year for that.

Finally, I said that I found serious predictions and analyzed them; however, there are a couple of not-so-serious sets of predictions here and here. My favorite prediction is from The Posse List, as follows: “LegalTech…renames itself “EDiscoveryTech” after Law.com survey reveals that of the 422 vendors present, 419 do e-discovery, and the other 3 are Hyundai HotWheels, Speedway Racers and Convert-A-Van who thought they were at the Javits Auto Show.”

So, what do you think? Care to offer your own “hunches” from your crystal ball? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Project Management: Effectively Manage your Time

December 22, 2010

Of all the project management techniques and activities we’ve discussed in the past weeks in this blog series, this is the one that gives many people the most trouble. There is no set of rules I can list that’s going to work well for everyone. I can, however, give you some tips to consider that may improve your time management skills:

Be organized. Use tools like calendars, to-do lists, email alarms and project management software to keep on top of all of the balls you need to keep in the air.
When possible, follow a routine and work from a plan. Start each day with your door closed for 15 minutes to plan your day. Set reasonable goals for the day and include time to respond to emails, return phone calls, review status reports, and to deal with the inevitable, unexpected situations that arise.
Delegate whenever you can. For every thing you have to do determine if it can be delegated, to whom, and if that person can take it on. If you delegate a task, define it well, give clear instructions, get agreement, make due-dates clear, and define authority levels (let the person to whom you are delegating know what they can make decisions on and what they need to come to you with).
Keep track of what you are doing. I always maintain a project diary where I document my activity.
Effectively facilitate meetings. Don’t let meetings for which you are responsible run over the scheduled time. Prepare an agenda and distribute it. Start the meeting on time. Up front, state the purpose of the meeting and describe the goals. Don’t let the discussion get off track.
Use standard materials and templates, such as project planning meeting agendas and reports, questionnaires to collect case information, technology surveys, requests for proposals, and status reports.

Managing your time effectively is critical, and it will set a good example for your staff. When I feel overwhelmed, I find that stepping back, prioritizing tasks and adjusting my to-do list helps. Always keep the big picture in mind when you caught up in chaos, and don’t sweat the small stuff

What do you think? Do you have good tips for managing your time? Please share any comments you might have or tell us if you’d like to know more about a topic.

eDiscovery Trends: Social Media in Litigation

December 21, 2010

Yesterday, we introduced the Virtual LegalTech online educational session Facing the Legal Dangers of Social Media and discussed what factors a social media governance policy should address. To get background information regarding the session, including information about the speakers (Harry Valetk, Daniel Goldman and Michael Lackey), click here.

The session also addressed social media in litigation, discussing several considerations about social media, including whether it’s discoverable, how it’s being used in litigation, how to request it, how to preserve it, and how to produce it. Between wall postings, status updates, personal photos, etc., there’s a lot of content out there and it’s just as discoverable as any other source of ESI – depending on its relevance to the case and the burden to collect, review and produce. The relevance of privacy settings may be a factor in the discoverability of this information as at least one case, Crispin v. Christian Audigier, Inc.,(C.D. Cal. May 26, 2010), held that private email messaging on Facebook, MySpace and Media Temple was protected as private.

So, how is social media content being used in litigation? Here are some examples:

Show Physical Health: A person claiming to be sick or injured at work who has photos on their Facebook profile showing them participating in strenuous recreation activities;
Discrimination and Harassment: Statements made online which can be considered discriminatory or harassing or if the person “likes” certain groups with “hate” agendas;
False Product Claims: Statements online about a product that are not true or verifiable;
Verify or Refute Alibis: Social media content (photos, location tracking, etc.) can verify or refute alibis provided by suspects in criminal cases;
Pre-Sentencing Reports: Social media content can support or refute claims of remorse – in one case, the convicted defendant was sentenced more harshly because of statements made online that refuted his statements of remorse in the courtroom;
Info Gathering: With so much information available online, you can gather information about opposing parties, witnesses, attorneys, judges, or even jurors. In some cases, attorneys have paid firms to ensure that positive information will bubble to the top when jurors “Google” those attorneys. And, in Ohio, at least, judges may not only have Facebook friends, but those friends can include attorneys appearing before them (interesting…).

If possible, request the social media content from your opponent as the third-party provider will probably fight having to provide the content, usually citing the Stored Communications Act. As noted previously on this blog, Facebook and Twitter have guidelines for requesting data – through subpoena and law enforcement agencies.

Social media content is generally stored by third-party Software as a Service (SaaS) providers (Facebook and Twitter are examples of SaaS providers), so it’s important to be prepared to address several key eDiscovery issues to proactively prepare to be able to preserve and produce the data for litigation purposes, just as you would with any SaaS provider.

So, what do you think? Has your organization been involved in litigation where social media content was requested? Please share any comments you might have or if you’d like to know more about a particular topic.

EDRM