Searching

eDiscovery Rewind: Eleven for 11-11-11

 

Since today is one of only 12 days this century where the month, day and year are the same two-digit numbers (not to mention the biggest day for “craps” players to hit Las Vegas since July 7, 2007!), it seems an appropriate time to look back at some of our recent topics.  So, in case you missed them, here are eleven of our recent posts that cover topics that hopefully make eDiscovery less of a “gamble” for you!

eDiscovery Best Practices: Testing Your Search Using Sampling: On April 1, we talked about how to determine an appropriate sample size to test your search results as well as the items NOT retrieved by the search, using a site that provides a sample size calculator. On April 4, we talked about how to make sure the sample set is randomly selected. In this post, we’ll walk through an example of how you can test and refine a search using sampling.

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think: Here’s a sample scenario: You identify custodians relevant to the case and collect files from each. Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians. You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel. After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!! What happened?!?

eDiscovery Trends: Why Predictive Coding is a Hot Topic: Last month, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

eDiscovery Best Practices: Does Anybody Really Know What Time It Is?: Does anybody really know what time it is? Does anybody really care? OK, it’s an old song by Chicago (back then, they were known as the Chicago Transit Authority). But, the question of what time it really is has a significant effect on how eDiscovery is handled.

eDiscovery Best Practices: Message Thread Review Saves Costs and Improves Consistency: Insanity is doing the same thing over and over again and expecting a different result. But, in ESI review, it can be even worse when you get a different result. Most email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion. To review each email in the discussion thread would result in much of the same information being reviewed over and over again. Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion.

eDiscovery Best Practices: When Collecting, Image is Not Always Everything: There was a commercial in the early 1990s for Canon cameras in which tennis player Andre Agassi uttered the quote that would haunt him for most of his early career – “Image is everything.” When it comes to eDiscovery preservation and collection, there are times when “Image is everything”, as in a forensic “image” of the media is necessary to preserve all potentially responsive ESI. However, forensic imaging of media is usually not necessary for Discovery purposes.

eDiscovery Trends: If You Use Auto-Delete, Know When to Turn It Off: Federal Rule of Civil Procedure 37(f), adopted in 2006, is known as the “safe harbor” rule. While it’s not always clear to what extent “safe harbor” protection extends, one case from a few years ago, Disability Rights Council of Greater Washington v. Washington Metrop. Trans. Auth., D.D.C. June 2007, seemed to indicate where it does NOT extend – auto-deletion of emails.

eDiscovery Best Practices: Checking for Malware is the First Step to eDiscovery Processing: A little over a month ago, I noted that we hadn’t missed a (business) day yet in publishing a post for the blog. That streak almost came to an end back in May. As I often do in the early mornings before getting ready for work, I spent some time searching for articles to read and identifying potential blog topics and found a link on a site related to “New Federal Rules”. Curious, I clicked on it and…up popped a pop-up window from our virus checking software (AVG Anti-Virus, or so I thought) that the site had found a file containing a “trojan horse” program. The odd thing about the pop-up window is that there was no “Fix” button to fix the trojan horse. So, I chose the best available option to move it to the vault. Then, all hell broke loose.

eDiscovery Trends: An Insufficient Password Will Thwart Even The Most Secure Site: Several months ago, we talked about how most litigators have come to accept that Software-as-a-Service (SaaS) systems are secure. However, according to a recent study by the Ponemon Institute, the chance of any business being hacked in the next 12 months is a “statistical certainty”. No matter how secure a system is, whether it’s local to your office or stored in the “cloud”, an insufficient password that can be easily guessed can allow hackers to get in and steal your data.

eDiscovery Trends: Social Media Lessons Learned Through Football: The NFL Football season began back in September with the kick-off game pitting the last two Super Bowl winners – the New Orleans Saints and the Green Bay Packers – against each other to start the season. An incident associated with my team – the Houston Texans – recently illustrated the issues associated with employees’ use of social media sites, which are being faced by every organization these days and can have eDiscovery impact as social media content has been ruled discoverable in many cases across the country.

eDiscovery Strategy: "Command" Model of eDiscovery Must Make Way for Collaboration: In her article "E-Discovery 'Command' Culture Must Collapse" (via Law Technology News), Monica Bay discusses the old “command” style of eDiscovery, with a senior partner leading his “troops” like General George Patton – a model that summit speakers agree is "doomed to failure" – and reports on the findings put forward by judges and litigators that the time has come for true collaboration.

So, what do you think?  Did you learn something from one of these topics?  If so, which one?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscoveryDaily would like to thank all veterans and the men and women serving in our armed forces for the sacrifices you make for our country.  Thanks to all of you and your families and have a happy and safe Veterans Day!

eDiscovery Best Practices: Cluster Documents for More Effective Review

 

With document review estimated to up to 80% of the total cost of the eDiscovery process and the amount of data in the world growing at an exponential rate, it’s no wonder that many firms are turning to technology to make the review process more efficient.  Whether using sophisticated searching capabilities of early case assessment (ECA) tools such as FirstPass®, powered by Venio FPR™ to filter collections more effectively or predictive coding techniques (as discussed in these two recent blog posts) to make the coding process more efficient, technology is playing an important role in saving review costs.  And, of course, review tools that manage the review process make review more efficient (like OnDemand®), simply by delivering documents efficiently and tracking review progress.

How the documents are organized for review can also make a big difference in the efficiency of review, not only saving costs, but also improving accuracy by assigning similar documents to the same reviewer.  This process of organizing documents with similar content into “clusters” (also known as “concepts”) helps each reviewer make quicker review decisions (if a single reviewer looks at one document to determine responsiveness and the next few documents are duplicates or mere variations of that first document, he or she can quickly “tag” most of those variations in the same manner or identify the duplicates).  It also promotes consistency by enabling the same reviewer to review all similar documents in a cluster (for example, you don’t get one reviewer marking a document as privileged while another reviewer fails to mark a copy of the that same document as such, leading to inconsistencies and potential inadvertent disclosures).  Reviewers are human and do make mistakes.

Clustering software such as Hot Neuron’s Clustify™ examines the text in your documents, determines which documents are related to each other, and groups them into clusters.  Clustering organizes the documents according to the structure that arises naturally, without preconceptions or query terms.  It labels each cluster with a set of keywords, providing a quick overview of the cluster.  It also identifies a “representative document” that can be used as a proxy for the cluster.

Examples of types of documents that can be organized into clusters:

  • Email Message Threads: Each message in the thread contains the conversation up to that point, so the ability to group those messages into a cluster enables the reviewer to quickly identify the email(s) containing the entire conversation, categorize those and possibly dismiss the rest as duplicative (if so instructed).
  • Document Versions: As “drafts” of documents are prepared, the content of each draft is similar to the previous version, so a review decision made on one version could be quickly applied to the rest of the versions.
  • Routine Reports: Sometimes, periodic reports are generated that may or may not be responsive – grouping those reports together in a cluster can enable a single reviewer to make that determination and quickly apply it to all documents in the cluster.
  • Published Documents: Have you ever published a file to Adobe PDF format?  Many of you have.  What you end up with is an exact copy of the original file (from Word, Excel or other application) in content, but different in format – hence, these documents won’t be identified as “dupes” based on a HASH value.  Clustering puts those documents together in a group so that the dupes can still be quickly identified and addressed.

Within the parameters of a review tool which manages the review process and delivers documents quickly and effectively for review, organizing documents into clusters can speed decision making during review, saving considerable time and review costs.

So, what do you think?  Have you used software to organize documents into clusters or concepts for more effective review?  Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: I work for CloudNine Discovery, which provides SaaS-based eDiscovery review applications FirstPass® (for early case assessment) and OnDemand® (for linear review and production).  CloudNine Discovery has an alliance with Hot Neuron and uses Clustify™ software to provide conceptual clustering and near-duplicate identification services for its clients.

eDiscovery Trends: Why Predictive Coding is a Hot Topic

 

Yesterday, we considered a recent article about the use of predictive coding in litigation by Judge Andrew Peck, United States magistrate judge for the Southern District of New York. The piece has prompted a lot of discussion in the profession. While most of the analysis centered on how much lawyers can rely on predictive coding technology in litigation, there were some deeper musings as well.

We all know the reasons why predictive coding is considered such a panacea, but it is easy to forget why it is needed and why the legal industry is still grappling with eDiscovery issues after so many years. Jason Baron, Director of Litigation at the U.S. National Archives and Records Administration, recently won the 2011 Emmett Leahy Award for excellence in records and information management. He took the opportunity to step back and consider why exactly the problem won’t go away. He believes that technology can help solve our problems, if applied intelligently. “We lawyers types remain stuck in a paradigm that too often relies on people and not automated technologies,” he said.

But he also warns that electronically stored data may soon overwhelm the profession. By now, readers of this blog are familiar with the dire and mind-boggling predictions about the volume of discoverable electronic data being created every day. Litigators are obviously concerned that new types of information and growing volumes of data will swamp the courts, but the problem could affect all aspects of modern life. “At the start of the second decade of the 21st century, we need to recognize that the time is now to prevent what I have termed the coming digital dark ages,” Baron said. “The ongoing and exponentially increasing explosion of information means that over the next several decades the world will be seeing records and information growth orders of magnitude greater than anything seen by humankind to date. We all need better ways to search through this information.”

As one of the leaders of the TREC Legal Track, a research experiment into searching large volumes of data more effectively, Baron has an intimate understanding of the challenges ahead, and he has serious concerns. “The paradox of our age is information overload followed by future inability to access anything of important. We cannot let that future happen” he said, talking to a roomful of records management experts and litigators. “We all need to be smarter in preventing this future dystopia.”

eDiscovery blogger Ralph Losey linked to both Judge Peck’s article and Jason’s speech, and expanded on those thoughts. Losey prefers to believe, as he wrote in a post called The Dawn of a Golden Age of Justice, that lawyers will not only survive, but thrive despite the explosion in information. “We must fight fire with fire by harnessing the new (Artificial Intelligence) capacities of computers,” he says. “If we boost our own intelligence and abilities with algorithmic agents we will be able to find the evidence we need in the trillions of documents implicated by even average disputes.”

So, what do you think? Will Artificial Intelligence in the hands of truth-seeking lawyers save us from information overload, or has the glut of electronic information already swamped the world? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: A Green Light for Predictive Coding?

 

There are a handful of judges whose pronouncements on anything eDiscovery-related are bound to get legal technologists talking. Judge Andrew Peck, United States magistrate judge for the Southern District of New York is one of them. His recent article, Search, Forward, published in Law Technology News, is one of the few judicial pronouncements on the use of predictive coding and has sparked a lively debate.

To date there is no reported case tackling the use of advanced computer-assisted search technology (“predictive coding” in the current vernacular) despite growing hype. Many litigators are hoping that judges will soon weigh in and give the profession some real guidance on the use of predictive coding in litigation. Peck says it will likely be a long time before a definitive statement come from the bench, but in the meantime his article provides perhaps the best insight into at least one judge’s thinking.

Judge Peck is probably best known in eDiscovery circles for the March 19, 2009 decision, William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 136 (S.D.N.Y. 2009) (Peck, M.J.). In it, he called for "careful thought, quality control, testing and cooperation with opposing counsel in designing search terms or 'keywords' to be used to produce emails or other electronically stored information".

Peck notes that lawyers are not eager to take the results of computer review before a judge and face possible rejection. However, he says those fears are misplaced, that admissibility is defined by content of a document, not how it was found. Peck also relies heavily on research we have discussed on this blog, including the TREC Legal Track, to argue that advanced search technology can provide defensible search methods.

While he stops short of green lighting the use of such technology, he does encourage lawyers in this direction. “Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval,” he writes. “In my opinion, computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ (Fed. R. Civ. P. 1) determination of cases in our e-discovery world.”

Silicon Valley consultant Mark Michels agrees with Peck’s article writing in Law Technology News that, “the key to (predictive coding’s) defensibility is upfront preparation to ensure that the applied tools and techniques are subject to thoughtful quality control during the review process.”

But other commenters are quick to point out the limitations of predictive coding. Ralph Losey expands on Peck’s argument, describing specific and defensible deployment of predictive coding (or Artificial Intelligence in Losey’s piece). He says predictive coding can speed up the process, but that the failure rate is still too high. Losey points out “the state of technology and law today still requires eyeballs on all ESI before it goes out the door and into the hands of the enemy,” he writes. “The negative consequences of disclosure of secrets, especially attorney-client privilege and work product privilege secrets, is simply too high.”

Judge Peck’s article is just one sign that thoughtful, technology-assisted review be deployed in litigation. Tomorrow, we will review some darker musings on the likelihood that predictive coding will save eDiscovery from the exploding universe of discoverable data.

So, what do you think? Is predictive coding ready for prime time?  Can lawyers confidently take results from new search technology before a judge without fear of rejection? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Searching: A Great Example of Why Search Results Need to Be Tested

 

In my efforts to stay abreast of current developments in eDiscovery (and also to identify great blog post ideas!), I subscribe to and read a number of different sources for information.  That includes some of the “web crawling” services that identify articles, press releases and other publications such as the Pinhawk Law Technology Daily Digest, which is one of my favorite resources and always has interesting stories to read.  I also have a Google Alert set up to deliver stories on “e-Discovery” via a daily email.

So, I got a chuckle out of one of the stories that both sources (and probably others, as well) highlighted last week:

A+E, Discovery get ready to roll out

The story is about two of the biggest players in the global TV, A+E Networks and Discovery Networks, rolling out their channels into India and Latin America respectively.  The article proceeds to discuss the challenges of rolling out these channels into markets with various requirements and several languages and dialects included in those markets.

This story has nothing to do with eDiscovery.

Why did it wind up in the list of eDiscovery stories returned by these two services?  Because the story title “A+E, Discovery get ready to roll out” retrieved a hit on “e-Discovery”.  Many search engines are generally set to ignore punctuation when searching, so a search for “e-Discovery” actually looks like a search for “e Discovery” to a search engine (keep in mind searches are also usually case insensitive).  So, a document with a title of “A+E, Discovery get ready to roll out” could actually be viewed by a search engine as “a e discovery get ready to roll out”, causing the document to be considered a “hit” for “e discovery”.

This is just one example why search results can retrieve unexpected results.  And, why a defensible search process (such as the “STARR” approach outlined here) that involves testing and refining searches is vital to maximizing your search recall and precision.

BTW, this can happen to any search engine, so it’s not a reflection on either Pinhawk or Google.  Both are excellent resources that can occasionally retrieve non relevant results, just like any other “web crawling” service.

So, what do you think?  Did you see this story crop up in the eDiscovery listings?  Have you encountered similar examples of search anomalies?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: Court Says Lack of eDiscovery Rules for Criminal Cases is a Crime

A New York district court recently ordered the United States Government to reproduce thousands of pages of electronic discovery materials in a criminal case involving the distribution of cocaine.

In United States v. Briggs, No. 10CR184S, 2011 WL 4017886 (W.D.N.Y. Sept. 8, 2011), the Government produced thousands of pages of electronic documents and a number of audio recordings, none of which were text searchable. The court ultimately decided that the onus of producing searchable materials for eDiscovery fell on the Government itself.

  • Defendants requested that the Government reproduce the discovery materials in a searchable format, but the Government refused, stating that it had used a program “routinely used” in criminal cases and would not bear the storage burden or cost of reproducing the documents.
  • The defense argued that the volume of production was virtually impossible to navigate without the ability to sort or search the documents, and that the materials presented for discovery lacked some relevant information. The court later made the comparison that a paper equivalent to this discovery situation “would be if the Government took photographs of thousands of pages… put them in boxes, and invited inspection by defense counsel.”
  • In light of the absence of a rule or standard for discovery of electronic materials in criminal cases such as this one, the court referred to other criminal cases in which the same issues were discussed, including United States v. Warshak, 631 F.3d 266 (6th Cir. 2010) and United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008). Both of these cases dealt at some point with similar debates over document format and extensive discovery production, with different findings of whether the producing party was required to produce in the requested format.
  • The court decided that, in light of the absence of a clear standard, the Government was the party “better able to bear the burden of organizing these records for over twenty defendants in a manner useful to all” and ordered the Government to produce the files in searchable PDF or native format.
  • Finally, the court expressed its hope that the Advisory Committee on Criminal Rules would soon establish rules addressing the production of ESI in criminal cases.

So, what do you think? Was the court fair to put the onus of searchable text production on the Government? Should there be similar rules governing eDiscovery issues in the Federal Rules of Criminal Procedure as there are in the Federal Rules of Civil Procedure? Please share any comments you might have or if you’d like to know more about a particular topic.

Our First Birthday! eDiscovery Daily is One Year Old Today!

 

Break out the birthday cake and the noisemakers!  eDiscovery Daily is now a year old!  One year ago today, we launched this blog with the ambitious goal of providing eDiscovery news and analysis every business day.  And, we haven’t missed a day yet!  Knock on wood!

Since we last reported, during our “sixmonthiversary”, we’ve almost doubled viewership (again!) since those first six months, and have increased our subscriber base over 2 1/2 times over that span!  Clearly, there is no shortage of topics to write about regarding eDiscovery and we appreciate your continued interest and support!

We also want to thank the blogs and publications that have linked to our posts and raised our public awareness, including Pinhawk, Litigation Support Blog.com, The Electronic Discovery Reading Room, Litigation Support Technology & News, eDiscovery News, InfoGovernance Engagement Area, Ride the Lightning, ABA Journal, ABC's of E-Discovery, Above the Law, EDD: Issues, Law, and Solutions, Law.com and any other publication that has picked up at least one of our posts for reference (sorry if I missed any!).

Finally, a quick “thanks” to all who contributed to the blog in the past year, including Jane Gennarelli, Jason Krause and Brad Jenkins (my boss, got to thank him, right?), as well as Melissa Rudy for assisting with several of the posts.

For those of you who have not been following eDiscovery Daily all year (which is most of you), here are some topics and posts you may have missed.  Feel free to check them out!

Case Law:

eDiscovery Daily has published 50 posts related to eDiscovery case decisions and activities over the past year!  Victor Stanley v. Creative Pipe, commonly referred to as the “Victor Stanley” case was followed throughout the year, including our very first post, as well as here, here and here.  More recently, the eDiscovery malpractice case involving McDermott, Will & Emery has captured considerable interest, with recent posts here, here and yesterday’s post here.

Also among the case law posts is Crispin v. Christian Audigier Inc., which seems to reflect growing interest in discoverability of social media data, as this post was the most viewed post of the year on our blog!

Project Management:

Project management in eDiscovery is a popular topic and Jane Gennarelli provided a couple of series of posts to address best practices in this very important area.  The eDiscovery Project Management series was published over the October, November and December months of 2010, while the Managing an eDiscovery Contract Review Team series ran over January, February and into March.

Thought Leaders:

eDiscovery Daily was able to sit down with numerous industry thought leaders, including George Socha, Craig Ball, Tom O’Connor, Tom Gelbmann, Jack Halprin, Deidre Paknad, Jeffrey Brandt, Alon Israely, Jim McGann and Christine Musil to get their “takes” on the state of the industry and where it’s headed.  Thanks to all of those individuals who agreed to speak with us this past year!  We will continue to bring you more perspectives throughout the industry in the coming year.

Search Best Practices:

There were several posts on search best practices, including don’t get “wild” with wildcards, these posts on how to look for misspellings, a case study for using term lists, these posts on handling exception files and this post on the benefits of proximity searching.  We also talked about the “STARR” approach for defensible searching and published this three part series on best practices for sampling and revising searches.

Cloud Computing:

As cloud computing has become a major organizational driving force (overall and as part of eDiscovery), we have addressed several topics related to it, including the importance to be able to load your own data, benefits of software-as-a-service (SaaS) solutions for eDiscovery, the truth about security of SaaS and cloud-based systems, the Forrester and Gartner forecasts for tremendous growth in cloud computing, and even Craig Ball’s thoughts on the benefits of cloud computing for eDiscovery.

And many more posts over the past year on various other topics that are too numerous to mention…

Finally, it’s important to mention that we have yet to archive any old posts, so every post we have ever published is still currently available on this site! (I can see the Information Governance buffs cringing at that statement!)  I believe that we are in the process of building an impressive knowledge base of information spanning all sorts of eDiscovery topics as well as the entire EDRM life cycle.  If there’s an eDiscovery topic you wish to research, chances are that it’s been discussed here at some point.  So, feel free to make eDiscovery Daily one of your first stops for your eDiscovery information needs!

So, what do you think? Do you have any topics that you would like to see covered in more depth? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Trends: Lawyers Versus Machines – Who’s “Winning”?

 

As discussed on this blog, mainstream publications including The New York Times and Forbes have noticed the rise of search technology in discovery, particularly predictive coding. The New York Times article, Armies of Expensive Lawyers, Replaced by Cheaper Software, inspired a lot of consternation in the legal community by proposing that technology was replacing human lawyers. Among the first to reply, Ralph Losey wrote a blog post New York Times Discovers eDiscovery, But Gets the Jobs Report  Wrong, arguing that “the supposed job-chilling impact of these new technologies on the legal profession was off the mark. In fact, the contrary is true.”

However, the Times article does point to a real trend – clients demanding that their outside counsel and litigation support teams use technology to work more efficiently. “Just because the “paper of record” says something doesn’t make it so, of course. But it does mean that every GC and Litigation DGC/AGC in America (and likely Canada) now has this trend on their radar,” litigation project management guru Steven Levy wrote on the blog Lexican.

The obvious problem with the New York Times article is that search and review is an iterative process and demands human intervention to make the machines involved function properly.  However, the missing piece of the discussion today is exactly what the relation between human reviewers and computers should be. There is a nascent movement to investigate this topic, finding the line where machine-led review ends and where human intervention is necessary.

Recent research by some of the leaders of the TREC Legal Track research project has begun to explore the interaction between human and machine review. Maura Grossman, a litigator with Wachtell, Lipton, Rosen & Katz and one of the TREC coordinators, and Gordon Cormack, a computer scientist and fellow TREC-er, wrote the research paper Technology Assisted Review in eDiscovery Can be More Effective and Efficient Than Manual Review. As the title indicates, human review cannot match the accuracy of technology-assisted review. However, the paper points out the need for a roadmap detailing the ideal interaction between human lawyers and machine review in litigation. “A technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege.”

What may be endangered is the existing review process, as it has traditionally been practiced, not human attorneys. Bennett Borden, an attorney with Williams Mullin, argues the linear review processes cannot produce the same results as the skillful use of technology. He has some interesting asides about the ways lawyers can do things computer searches cannot. For example, human reviewers are able to intuitively “come upon a scent” of relevant documents that machines missed. He says that reviewers not only are able to effectively pursue information by following leads initiated by a computer, but they actually enjoyed the process more than straight-ahead manual review.

Clearly, more research is needed in this area, but if lawyers are going to defend their role in litigation, defining the role of lawyers in discovery is an important question. What do you think?  Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Standards: How Does an Industry Get Them?

 

As discussed yesterday, there is a nascent, but growing, movement pushing for industry standards in eDiscovery. That’s something many litigators may chafe at, thinking that standards and industry benchmarks impose checklists or management processes that tell them how to do their job. But industry standards, when implemented well, provide not only a common standard of care, but can help provide a point of comparison to help drive buying decisions.

It’s probably understandable that many of the calls for standards today focus on the search process. Judge Shira Scheindlin wrote in Pension Committee of the University of Montreal Pension Plan v. Banc of America Securities, LLC that a party’s “failure to assess the accuracy and validity of selected search terms” was tantamount to negligence.  As mentioned yesterday, the Text Retrieval Conference TREC Legal Track has been benchmarking different search strategies, even finding ways to optimize the search process. The ultimate goal is to provide baseline standards and guidelines to allow parties to determine if they are being successful in searching electronically stored information in litigation.

Within these technical discussions a new emerging thread is a call for ethical standards and codes of conduct. Jason Baron, National Archives' Director of Litigation and one of the coordinators of the TREC Legal Track, organized the SIRE workshop that concluded last week, focused on information retrieval issues in large data sets. However, even he, who has been working on optimizing search technology, recognizes the need for standards of care and ethics in eDiscovery to manage the human element. In a paper released earlier this year, he noted, “While there are no reported cases discussing the matter of ‘keyword search ethics,’ it is only a matter of time before courts are faced with deciding difficult issues regarding the duty of responding parties and their counsel to make adequate disclosures.”

The leading provider of industry standards is the Electronic Discovery Resource Model (EDRM), which has a number of projects and efforts underway to create common frameworks and standards for managing eDiscovery. Many of the EDRM’s ongoing projects are aimed at creating a framework, and not standards. In addition to the EDRM Framework familiar to many eDiscovery professionals, the group has produced an EDRM Model Code of Conduct Project to issue aspiring eDiscovery ethics guidelines and is working on a model Search Project.

But biggest piece of the discussion is how to create benchmarks and standards for repeatable, defensible, and consistent business processes through the entire eDiscovery process. There are no current quality standards for eDiscovery, but there are several models that could be adopted. For example, the ISO 9000 quality management system defines industry-specific quality standards and could be tailored to eDiscovery. The Capability Maturity Model Integration (CMMI) in software engineering follows a similar model, but unlike ISO, does not require annual updates for certification.

This is still a nascent movement, characterized more by workshops and panel discussions than by actual standards efforts. Recent events include EDRM 2011-2012 Kickoff Meeting, St Paul, MN, May 11-12, ICAIL 2011 DESI IV Workshop, Pittsburgh, PA, June 6, TREC Legal Track, Gaithersburg, MD, November, and the SIRE workshop at the Special Interest Group on Information Retrieval (SIGIR) SIGIR 2011 on July 28.

There seems to be a growing consensus that industry standards are not just useful, but likely necessary in eDiscovery. The Sedona Commentary on Achieving Quality in eDiscovery Principle 3 says, “Implementing a well thought out e-discovery process should seek to enhance the overall quality of the production in the form of: (a) reducing the time from request to response; (b) reducing cost; and (c) improving the accuracy and completeness of responses to requests.”

The question now seems to be, what type of standards need to be in place and who is going to craft them. So, what do you think?  Please share any comments you might have or if you'd like to know more about a particular topic.

Editor's Note: Welcome Jason Krause as a guest author to eDiscovery Daily blog!  Jason is a freelance writer in Madison, Wisconsin. He has written about technology and the law for more than a dozen years, and has been writing about EDD issues since the first Zubulake decisions. Jason began his career in Silicon Valley, writing about technology for The Industry Standard, and later served as the technology reporter for the ABA Journal. He can be reached at jasonkrause@hotmail.com.

eDiscovery Trends: Cloud Covered by Ball

 

What is the cloud, why is it becoming so popular and why is it important to eDiscovery? These are the questions being addressed—and very ably answered—in the recent article Cloud Cover (via Law Technology News) by computer forensics and eDiscovery expert Craig Ball, a previous thought leader interviewee on this blog.

Ball believes that the fears about cloud data security are easily dismissed when considering that “neither local storage nor on-premises data centers have proved immune to failure and breach”. And as far as the cloud's importance to the law and to eDiscovery, he says, "the cloud is re-inventing electronic data discovery in marvelous new ways while most lawyers are still grappling with the old."

What kinds of marvelous new ways, and what do they mean for the future of eDiscovery?

What is the Cloud?

First we have to understand just what the cloud is.  The cloud is more than just the Internet, although it's that, too. In fact, what we call "the cloud" is made up of three on-demand services:

  • Software as a Service (SaaS) covers web-based software that performs tasks you once carried out on your computer's own hard drive, without requiring you to perform your own backups or updates. If you check your email virtually on Hotmail or Gmail or run a Google calendar, you're using SaaS.
  • Platform as a Service (PaaS) happens when companies or individuals rent virtual machines (VMs) to test software applications or to run processes that take up too much hard drive space to run on real machines.
  • Infrastructure as a Service (IaaS) encompasses the use and configuration of virtual machines or hard drive space in whatever manner you need to store, sort, or operate your electronic information.

These three models combine to make up the cloud, a virtual space where electronic storage and processing is faster, easier and more affordable.

How the Cloud Will Change eDiscovery

One reason that processing is faster is through distributed processing, which Ball calls “going wide”.  Here’s his analogy:

“Remember that scene in The Matrix where Neo and Trinity arm themselves from gun racks that appear out of nowhere? That's what it's like to go wide in the cloud. Cloud computing makes it possible to conjure up hundreds of virtual machines and make short work of complex computing tasks. Need a supercomputer-like array of VMs for a day? No problem. When the grunt work's done, those VMs pop like soap bubbles, and usage fees cease. There's no capital expenditure, no amortization, no idle capacity. Want to try the latest concept search tool? There's nothing to buy! Just throw the tool up on a VM and point it at the data.”

Because the cloud is entirely virtual, operating on servers whose locations are unknown and mostly irrelevant, it throws the rules for eDiscovery right out the metaphorical window.

Ball also believes that everything changes once discoverable information goes into the cloud. "Bringing ESI beneath one big tent narrows the gap between retention policy and practice and fosters compatible forms of ESI across web-enabled applications".

"Moving ESI to the cloud," Ball adds, "also spells an end to computer forensics." Where there are no hard drives, there can be no artifacts of deleted information—so, deleted really means deleted.

What's more, “[c]loud computing makes collection unnecessary”. Where discovery requires that information be collected to guarantee its preservation, putting a hold on ESI located in the cloud will safely keep any users from destroying it. And because cloud computing allows for faster processing than can be accomplished on a regular hard drive, the search for discovery documents will move to where they're located, in the cloud. Not only will this approach be easier, it will also save money.

Ball concludes his analysis with the statement, "That e-discovery will live primarily in the cloud isn't a question of whether but when."

So, what do you think? Is cloud computing the future of eDiscovery? Is that future already here? Please share any comments you might have or if you'd like to know more about a particular topic.