eDiscoveryDaily

eDiscovery Best Practices: Cost of Data Storage is Declining – Or Is It?

Recently, I was gathering information on the cost of data storage and ran across this ad from the early 1980s for a 10 MB disk drive – for $3,398! That’s MB (megabytes), not GB (gigabytes) or TB (terabytes). What a deal!

Even in 2000, storage costs were around $20 per GB, so an 8 GB drive would cost about $160.

Today, 1 TB is available for $100 or less. HP has a 2 TB external drive available at Best Buy for $140 (prices subject to change of course). That’s 7 cents per GB. Network storage drives are more expensive, but still available for around $100 per TB.

At these prices, it’s natural for online, accessible data in corporations to rise exponentially. It’s great to have more and more data readily available to you, until you are hit with litigation or regulatory requests. Then, you potentially have to go through all that data for discovery to determine what to preserve, collect, process, analyze, review and produce.

Here is what each additional GB can cost to review (based on typical industry averages):

  • 1 GB = 20,000 documents (can vary widely, depending on file formats)
  • Review attorneys typically average 60 documents reviewed per hour (for simple relevancy determinations)
  • That equals an average of 333 review hours per GB (20,000 / 60)
  • If you’re using contract reviewers at $50 per hour – each extra GB just cost you $16,650 to review (333×50)

That’s expensive storage! And, that doesn’t even take into consideration the costs to identify, preserve, collect, and process each additional GB.

Managing Storage Costs Effectively

One way to manage those costs is to limit the data retained in the first place through an effective records management program that calls for regular destruction of data not subject to a litigation hold. If you’re eliminating expired data on a regular basis, there is less data to go through the EDRM discovery “funnel” to production.

Sophisticated collection tools or first pass review tools (like FirstPass™, powered by Venio FPR™) can also help cull data for attorney review to reduce those costs, which is the most expensive component of eDiscovery.

So, what do you think? Do you track GB metrics for your eDiscovery cases? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Domain Categorization of Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through “fuzzy” searching to find misspellings or OCR errors in an opponent’s produced ESI.

Domain Categorization

Another type of analysis is the use of domain categorization. Email is generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

FirstPass supports domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection. Domain categorization provides several benefits when reviewing your opponent’s ESI:

  • Non-Responsive Produced ESI: Domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains can be “group-tagged” as non-responsive. If a significant percentage of files are identified as non-responsive, that may be a sign that your opponent is trying to “bury you with paper” (albeit electronic).
  • Inadvertent Disclosures: If there are any emails associated with outside counsel’s domain, they could be inadvertent disclosures of attorney work product or attorney-client privileged communications. If so, you can then address those according to the agreed-upon process for handling inadvertent disclosures and clawback of same.
  • Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly.

In summary, there are several ways to use first pass review tools, like FirstPass, for reviewing your opponent’s ESI production, including: email analytics, synonym searching, fuzzy searching and domain categorization. First pass review isn’t just for your own production; it’s also an effective process to quickly evaluate your opponent’s production.

So, what do you think? Have you used first pass review tools to assess an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Fuzzy Searching Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through synonym searching to find variations of your search terms to increase the possibility of finding the terminology used by your opponents.

Fuzzy Searching

Another type of analysis is the use of fuzzy searching. Attorneys know what terms they’re looking for, but those terms may not often be spelled correctly. Also, opposing counsel may produce a number of image only files that require Optical Character Recognition (OCR), which is usually not 100% accurate.

FirstPass supports “fuzzy” searching, which is a mechanism by finding alternate words that are close in spelling to the word you’re looking for (usually one or two characters off). FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for the term “petroleum”, you can find variations such as “peroleum”, “petoleum” or even “petroleom” – misspellings or OCR errors that could be relevant. Then, simply select the variations you wish to include in the search. Fuzzy searching is the best way to broaden your search to include potential misspellings and OCR errors and FirstPass provides a terrific capability to select those variations to review additional potential “hits” in your collection.

Tomorrow, I’ll talk about the use of domain categorization to quickly identify potential inadvertent disclosures and weed out non-responsive files produced by your opponent, based on the domain of the communicators. Hasta la vista, baby!  🙂

In the meantime, what do you think? Have you used fuzzy searching to find misspellings or OCR errors in an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Synonym Searching Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through email analytics to see the communication patterns graphically to identify key parties for deposition purposes and look for potential production omissions.

Synonym Searching

Another type of analysis is the use of synonym searching. Attorneys understand the key terminology their client uses, but they often don’t know the terminology their client’s opposition uses because they haven’t interviewed the opposition’s custodians. In a product defect case, the opposition may refer to admitted design or construction “mistakes” in their product or process as “flaws”, “errors”, “goofs” or even “flubs”. With FirstPass, you can enter your search term into the synonym searching section of the application and it will provide a list of synonyms (with hit counts of each, if selected). Then, you can simply select the synonyms you wish to include in the search. As a result, FirstPass identifies synonyms of your search terms to broaden the scope and catch key “hits” that could be the “smoking gun” in the case.

Tomorrow, I’ll talk about the use of fuzzy searching to find misspellings that may be commonly used by your opponent or errors resulting from Optical Character Recognition (OCR) of any image-only files that they produce. Stay tuned! 🙂

In the meantime, what do you think? Have you used synonym searching to identify variations on terms in an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Of Your Opponent’s Data

In the past few years, applications that support Early Case Assessment (ECA) (or Early Data Assessment, as I prefer to call it) and First Pass Review (FPR) of ESI have become widely popular in eDiscovery as the analytical and culling benefits of conducting FPR have become obvious. The benefit of these FPR tools to analyze and cull their ESI before conducting attorney review and producing relevant files has become increasingly clear. But, nobody seems to talk about what these tools can do with opponent’s produced ESI.

Less Resources to Understand Data Produced to You

In eDiscovery, attorneys typically develop a reasonably in-depth understanding of their collection. They know who the custodians are, have a chance to interview those custodians and develop a good knowledge of standard operating procedures and terminology of their client to effectively retrieve responsive ESI. However, that same knowledge isn’t present when reviewing opponent’s data. Unless they are deposed, the opposition’s custodians aren’t interviewed and where the data originated is often unclear. The only source of information is the data itself, which requires in-depth analysis. An FPR application like FirstPass™, powered by Venio FPR™, can make a significant difference in conducting that analysis – provided that you request a native production from your opponent, which is vital to being able to perform an in-depth analysis.

Email Analytics

The ability to see the communication patterns graphically – to identify the parties involved, with whom they communicated and how frequently – is a significant benefit to understanding the data received. FirstPass provides email analytics to understand the parties involved and potentially identify other key opponent individuals to depose in the case. Dedupe capabilities enable quick comparison against your production to confirm if the opposition has possibly withheld key emails between opposing parties. FirstPass also provides an email timeline to enable you to determine whether any gaps exist in the opponent’s production.

Tomorrow, I’ll talk about the use of synonym searching to find variations of your search terms that may be common terminology of your opponent. Same bat time, same bat channel! 🙂

In the meantime, what do you think? Have you used email analytics to analyze an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

Social Tech eDiscovery: Twitter Guidelines for Law Enforcement

Tuesday, I provided information regarding Facebook’s Law Enforcement page with information about serving civil subpoenas. Facebook provides quite a bit of useful information regarding serving subpoenas, including the address for registered agent (to process requests), information required to identify users, fee for processing, turnaround time, and fee to expedite responses. Facebook is very informative with regard to how subpoenas are handled in terms of cost and time to process.

So, it makes sense to look at other popular social media sites to see how they are handling this issue. Twitter is probably right behind Facebook in terms of popularity in the social media world and they have a “Guidelines for Law Enforcement” page to address requests for non-public information for Twitter users.

As the Twitter policy notes, most Twitter profile information is public, so anyone can see it. A Twitter profile contains a profile image, background image, as well as the status updates, which, of course, they call “tweets”. In addition, the user has the option to fill out location, a URL, and a short “bio” section about themselves for display on their public profile. Non-public information includes “log data” such as IP address, browser type, the referring domain, pages visited, search terms and interactions with advertisements (as noted in their Privacy Policy page).

Twitter doesn’t provide any cost information regarding processing subpoena requests, nor do they address standard turnaround times or fees to expedite processing. Their policy is to notify users of requests for their information prior to disclosure unless they are prohibited from doing so by statute or court order and they do require the URL of the Twitter profile in question to process any subpoena requests. They do provide email, fax and physical address contact information to address user information requests. FYI, only email from law enforcement domains will be accepted via the email address. Preservation requests must be signed with a valid return email address, and sent on law enforcement letterhead. Non-law enforcement requests should be sent through regular support methods (via their main support page).

So, what do you think? Have you ever needed to file a subpoena on Twitter? Please share, or let us know or if you’d like to know more about a particular topic.

eDiscovery Searching 101: Sites for Common Misspellings

Yesterday, we talked about the importance to include misspellings when searching for relevant ESI to broaden the search to retrieve potentially responsive files that might be otherwise missed and the use of “fuzzy searching” (with a product like FirstPass™, powered by Venio FPR™ that supports this capability) to identify variations as potential misspellings within the collection. Another way to identify misspellings is to use a resource that tracks the most typical misspellings for common words.

Examples of Sites

At Dumbtionary.com, you can check words against a list of over 10,000 misspelled words. Simply type the correct word into the search box with a “plus” before it (e.g., “+management”) to get the common misspellings for that word. You can also search for misspelled names and places.

Wikipedia has a list of common misspellings as well. It breaks the list down by starting letter, as well as variations on 0-9 (e.g., “3pm” or “3 pm”). You can go to the starting letter you want to search, then do a “find” on the page (by pressing Ctrl+F) and type in the string to search.

Wrongspelled.com and Spellgood.net are two other examples of sites for searching for common misspellings. Not all sites have the same misspellings, so it’s good to check multiple sites to comprise a comprehensive list. Each site provides an ability to search for your terms and identify common misspellings for each, enabling you to broaden your search to include those variations and most of these sites are updated regularly with new common misspellings.

Using Fuzzy search or sites with typical misspellings for your terms is one method of ensuring a more diligent eDiscovery search process by retrieving additional “hits” that might otherwise be missed. Over the weeks to come, we’ll talk about others.

In the meantime, what do you think? Are you aware of other sites to find common misspellings? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching 101: It's a Mistake to Ignore the Mistakes

How many times have you received an email sent to “All Employees” like this? “I am pleased to announce that Joe Smith has been promoted to the position of Operations Manger.”

Do you cringe when you see an email like that? I do. I cringe even more when the email comes from me, which happens more often than I’d like to admit.

Of course, we all make mistakes. And, forgetting that fact can be costly when searching for, or requesting, relevant documents in eDiscovery. For example, if you’re searching for e-mails that relate to management decisions, can you be certain that “management” is spelled perfectly throughout the collection? Unlikely. It could be spelled “managment” or “mangement” and you would miss those potentially critical emails without an effective plan to look for them.

Finding Misspellings Using Fuzzy Searching

How do you find them if you don’t know how they might be misspelled? Use a search tool like FirstPass™, powered by Venio FPR™ that supports “fuzzy” searching, which is a mechanism by finding alternate words that are close in spelling to the word you’re looking for (usually one or two characters off). FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for someone named “Brian”, you can find variations such as “Bryan” or even “brain” – that could be relevant. Then, simply select the variations you wish to include in the search. Fuzzy searching is the best way to broaden your search to include potential misspellings and FirstPass provides a terrific capability to select possible misspellings to review additional potential “hits” in your collection.

The most popular TV series all use “cliffhangers” to keep the audience hooked, so tomorrow, I’ll talk about sites available to identify common misspellings for terms as another way to broaden searches to include mistakes. 🙂

In the meantime, what do you think? Do you have any real-world examples of how fuzzy searching has aided in eDiscovery search and retrieval? Please share any comments you might have or if you’d like to know more about a particular topic.

Social Tech eDiscovery: Facebook Subpoena Policy

As President and CEO of Trial Solutions, I’ve noted and embraced the explosion in use of social technology over the past few years (Trial Solutions has a Facebook, Twitter and LinkedIn page, and this blog, with more to come soon). According to new statistics from Nielsen, social network sites now account for 22.7% of time spent on the web, a 43% jump in one year (by contrast, email only accounts for 8.3%). With that explosion in social tech use, companies have had to address social media as another form of media to collect for eDiscovery. It seems there’s a new article or blog post online every week on the subject and there is a social media webinar at Virtual Legal Tech this Thursday.

As probably the most popular social media site, Facebook is one of the most likely sites for relevant ESI. There are already a number of stories online about people who have lost their jobs due to Facebook postings, such as these. There is even a Facebook group to post stories about Facebook firings. Oh, the irony!

Naturally, cases related to Facebook eDiscovery issues have become more prevalent. One case, EEOC v. Simply Storage Management, resulted in a May ruling that “SNS (social networking site) content is not shielded from discovery simply because it is ‘locked’ or ‘private’”. So, request away!

If the employee resists or no longer has access to responsive content (or you need to request from their online friends through “Wall” posts), you may have to request content directly from Facebook through a subpoena. Facebook has a Law Enforcement page with information about serving civil subpoenas, including:

  • Address for Registered Agent (to process requests)
  • Information Required to Identify Users – Facebook user ID (“UID”) or email address
  • Fee for Processing ($500, plus an additional $100 if you want a notarized declaration)
  • Turnaround Time (minimum of 30 days)
  • Fee to Expedite Responses ($200)

Obviously, fees are subject to change, so check the page for the latest before serving your subpoena.

So, what do you think? Have you ever needed to file a subpoena on Facebook? Aware of other case law related to Facebook eDiscovery? Please share, or let us know or if you’d like to know more about a particular topic.

Case Law: Spoliate Evidence and Go to Jail?!?

One of the most well-known cases in eDiscovery is Victor Stanley (VSI) v. Creative Pipe (CPI) and is a prime example of what NOT to do when conducting a search for relevant ESI in litigation – Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 U.S. Dist. LEXIS 42025 (D. Md. May 29, 2008), – including not testing the search methodology, resulting in inadvertent disclosure of 185 privileged documents, and the waiving of privilege of same. If you’re not familiar with this case, Google it and you’ll find plenty of sites/articles that discuss its significance.

If that was a blow to Creative Pipe and their president, Mark Pappas, the order issued on September 9th for that same case (now widely referenced as “Victor Stanley II”) makes the May 2008 order pale in comparison.

Judge Grimm found that “Defendants…deleted, destroyed, and otherwise failed to preserve evidence; and repeatedly misrepresented the completeness of their discovery production to opposing counsel and the Court.” As a result, he ordered “that Pappas’s pervasive and willful violation of serial Court orders to preserve and produce ESI evidence be treated as contempt of court, and that he be imprisoned for a period not to exceed two years, unless and until he pays to Plaintiff the attorney’s fees and costs that will be awarded to Plaintiff as the prevailing party pursuant to Fed. R. Civ. P. 37(b)(2)(C).”

Ouch!

Clearly, Judge Grimm felt that Pappas’ and CPI’s behavior in this case over four years represented intentional destruction of evidence and he ruled accordingly on plaintiff’s motion regarding same. Perhaps his view of their actions can be summarized by footnote 19 in the order:

“CPI named one of its product lines the “Fuvista” line. Pappas admitted during discovery that “Fuvista” stood for “F**k you Victor Stanley,” (Pappas Dep. 22:20-24, Pl.’s Mot. Ex. 5, ECF No. 341-5), demonstrating that Pappas’s wit transcended sophomoric pranks such as logging into VSI’s web site as “Fred Bass” and extended to inventing insulting acronyms to name his competing products. When disclosed, the meaning of this acronym removes any doubt about his motive and intent. No doubt Pappas regarded this as hilarious at the time. It is less likely that he still does.”

So, what do you think? Is this the start of a trend – prison sentences for evidence spoliation? Or, is this an extreme example of clear intentional evidence destruction? Please share any comments you might have (including examples of other cases where sanctions included jail time) or if you’d like to know more about a particular topic.

More to come on this case in the future…