Processing Archives

Working Successfully with eDiscovery and Litigation Support Service Providers: Paper is Still Important, Part 2

August 1, 2011

Friday, we talked about the information you should include in a request for proposal for processing a paper discovery collection. Today we’ll review some questions you should ask of a service provider to help you to select the provider that’s the best fit for your case.

Of course, you’ll ask for pricing information, if the vendor can meet your schedule requirements, and for references. In addition, here are questions to ask and information to request:

Describe the qualifications of project management staff: What is the average tenure in the industry? At the organization? What education and prior work experience is required?
Describe the qualifications of project staff: What is the average tenure in the industry? At the organization? Describe the training given to new processing staff.
Describe the workflow process for the required services, including information on the flow of documents and data through the process.
What technology is used for the services that are required?
Describe quality control procedures and policies, including how errors are fixed and how feedback on work is funneled back to the staff.
Describe the level of quality control that is done. For example: percentage of the data checked, and whether that percentage applies to total characters, data fields or documents.
Describe the data entry system that you use, including a field-by-field description of any validation that occurs during data entry. Is double-key entry being conducted?
Describe post-processing automated validation that occurs.
If any portion of all of the work will be subcontracted to another service provider, identify that provider (including geographic location of the facility where the work will be done), and provide responses to each information-point above for each sub-contractor.

The response to these questions and information requests should give you the information you need to choose a vendor that’s a good fit for your project. This means Friday and today, this blog is officially renamed to pDiscovery Daily!

What questions to you ask and what information do you request in an RFP for paper processing? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Paper is Still Important

July 29, 2011

For several years now, the focus of our discovery efforts has been handling ESI. Paper, however, hasn’t gone away yet. And it probably won’t any time soon. People still have at least small collections of paper that need to be handled.

What’s the best way to handle paper? Convert it ASAP and blend it into the rest of the collection so attorneys can do a comprehensive review of the entire universe of potentially responsive documents. That means scanning, coding, and OCR processing to enable the paper to reviewed and searched.

Here’s information the vendor will need to give you accurate cost and schedule information for handling the paper portion of your collection:

A description of the services that you will require (for example, establishing document boundaries, establishing document relationships, document reassembly, periphery coding, in-text coding, scanning, OCR).
The approximate number of pages and documents in the collection.
A description of the condition of the paper and characteristics (are the pages photocopies or originals? Staples and paper clips? Oversized and undersized pages? Are there sticky notes?). Include special instructions, where warranted (for example, “Sticky notes are to be removed, scanned separately and placed before the documents to which they are attached”).
Whether paper will be shipped/delivered to the vendor or whether on-site work will be required. If on-site work is required, the locations at which the paper will be available.
The date on which the pages will be available to the service provider, and a schedule for collections that will be available for increments.
A description of the types of document in the collection (for example, correspondence, contracts, form documents, reports, and so on).
If coding is required, a list of the fields to be captured with descriptions and format requirements for each field.
If coding is required, a description of levels of treatment to be applied, if any have been established.
If coding is required, a description of any data standardization you will require, and lists of valid entries for fields with a controlled vocabulary.
A description of the deliverables you will require (image file formats, load file formats, single-page or multi-page text files, and so on)
The date by which the project must be completed.

Armed with this information, a good vendor should be able to provide accurate cost and schedule information for processing your paper collection. On Monday, we’ll cover RFP questions for the vendors to answer regarding their paper processing services. This means today and Monday, this blog is officially renamed to pDiscovery Daily!

What type of information do you provide to a vendor in an RFP for processing paper? Please share any comments you might have and let us know if you’d like to know more about an RFP topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Preparing an eDiscovery Processing RFP, Part 2

July 20, 2011

Yesterday, we talked about the information you should include in a request for proposal for eDiscovery processing. Today we’ll review some questions you should ask of a service provider to help you to select the one that’s the best fit for your case.

Of course, you’ll ask for pricing information and if the vendor can meet your schedule requirements. In addition, here are questions to ask and information to request:

To ensure that you understand the vendor’s pricing model and to avoid unexpected costs, ask the vendor to provide an estimate of total costs for the project, based on the information you’ve provided about the collection.
Ask the vendor to confirm that they can meet all of the requirements you’ve outlined in the information section of the RFP.
Ask what file types are handled, and what the standard protocol/recommendation is for handling other file types.
Ask the vendor how exception files, such as corrupted or password protected files, are handled.
Ask the vendor to describe its approach to processing, including discussion of de-duplication, handling attachments, handling email threads, culling/filtering, and handling metadata.
Ask what languages are supported.
Ask the vendor to describe its auditing and tracking procedures.
Ask the vendor to describe the quality assurance (measures to prevent errors) and quality control (measures to confirm that results are correct) mechanisms included associated with their processing.
Ask the vendor to describe what information, input and participation is required from you.

The response to these questions and information requests should give you the information you need to choose a vendor that’s a good fit for your project.

What questions to you ask and what information do you request in an RFP for eDiscovery processing? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

Working Successfully with eDiscovery and Litigation Support Service Providers: Preparing an eDiscovery Processing RFP

July 19, 2011

Last week, we covered preparing a RFP for eDiscovery Collection and Forensics. This week’s RFP discussion will focus on processing eDiscovery, and today we’ll cover the information you should provide to a vendor regarding your collection and your requirements. Remember, the more thorough you are, the better the vendor will be able to gauge the scope and complexity of your project.

Here’s information the vendor will need to give you accurate cost and schedule information:

An estimate of the volume. That is, the number of gigabytes or terabytes of data to be processed.
A description of the data files you expect will be found in the collection (for example, Word documents, Excel documents, PST files, and so on).
A description of the deliverable you’ll be providing to the service provider (the media on which the data will be provided, whether you’ll be uploading data to the service provider’s server) and a schedule for data delivery.
Will de-duplication be required, and if so, by case or by custodian?
What filtering will be required? Let the service provider know if you’ll be providing keywords, date ranges, and other criteria for filtering.
Are any files password protected, and if so, how should the vendor handle those? Should they try to crack the passwords?
If you are requiring images, are endorsements required? If so, what endorsements? Bates numbers? Text, such as confidential or other stamps?
Describe the deliverables you will require from the service provider, including data file formats, image file formats (single-page TIFF, multi-page TIFF, PDF), searchable text, load file fields, etc. Let the service provider know the target review tool you expect to use.
The date by when the work must be completed, and if there will be processing priorities and interim deadlines.
Describe your expectations regarding the need for the service provider to testify.

Armed with this information, a good vendor should be able to provide accurate cost and schedule information for processing your collection. In the next post, we’ll cover RFP questions for processing and conversion services.

What type of information do you provide to a vendor in an RFP for processing eDiscovery? Please share any comments you might have and let us know if you’d like to know more about an eDiscovery topic.

eDiscovery Trends: More On the Recommind Patent Controversy

June 20, 2011

Perhaps the most controversial story discussed in the eDiscovery community in quite some time is the controversy regarding the patent recently announced by Recommind for Predictive Coding via press release entitled, Recommind Patents Predictive Coding, issued on June 8. I haven’t seen this much backlash against a company or individual since last summer when LeBron James’ decision to leave the Cleveland Cavaliers for the Miami Heat (and the subsequent championship-like celebration that he and his teammates conducted before the season). How did that turn out? 😉

Since that announcement, there have been several articles and blog posts about it, including:

This one, from Monica Bay of Law Technology News, asking the question: “Is Recommind Blowing Smoke?” where discussed the buzz over Recommind’s announcement;
This one, from Evan Koblentz (also of Law Technology News), entitled “Recommend Intends to Flex Predictive Coding Muscles” which includes responses from Catalyst and Valora Technologies;
This one, also from Evan Koblentz, a blog post from EDD Update, where Recommind General Counsel and Vice President Craig Carpenter acknowledges that Recommind failed to obtain a trademark for the term Predictive Coding (though Recommind is still using the ™ symbol on the term Predictive Coding onthis page);
Three blog posts in four days from Sharon D. Nelson of Ride the Lightning blog, which debate the enforceability of the patent and include a response from OrcaTec, noting that Recommind’s implied threat of litigation is “nothing more than an attempt to bully the market place”.

There are several other articles and blog posts regarding the topic, but if I listed them all, I’d have no room left for anything new! Sorry that I couldn’t include them all.

I reached out to Bill Dimm, founder of Hot Neuron LLC, makers of Clustify, which clusters documents in groups for effective, expedited review and asked him his thoughts about the Recommind press release and patent. Here are his comments:

“Recommind’s press release would have been accurately titled ‘Recommind Patents a Method for Predictive Coding,’ but it went with the much more provocative title ‘Recommind Patents Predictive Coding,’ implying that its patent covers every conceivable way of doing predictive coding. The only way I can see that being accurate is if you DEFINE predictive coding to be exactly the procedure outlined in claim 1 of Recommind’s patent. Of course, ‘predictive coding’ is a relatively new term, so the definition is up for debate. The patent itself says:

‘Predictive coding refers to the capability to use a small set of coded documents (or partially coded documents) to predict document coding of a corpus.’ That sure sounds like it allows for a lot of possibilities beyond the procedure in claim 1 of the patent. The press release goes on to say: ‘ONLY [emphasis is mine] Recommind’s patented, iterative, computer-assisted approach can ‘bend the cost curve’ of document review.’ Really? So, Recommind has the ONLY product in the industry that works? A few of us disagree. Even clustering, which Recommind claims does not qualify as predictive coding will bend the cost curve because the efficiency boost it provides increases with the size of the document set.

Moving on from the press release to the patent itself, I would recommend reading claim 1 if you are interested in such things. It is the most general method that the USPTO allowed Recommind to claim — the other claims are all dependent claims that describe more specific embodiments of claim 1, presumably so that Recommind would have a leg left to stand on if prior art was found to invalidate claim 1. Claim 1 describes a procedure for predictive coding that involves quite a few steps. It is my understanding (I am NOT a lawyer) that the patent is irrelevant for any predictive coding procedure that does not include every single one of the steps listed in claim 1. Since claim 1 includes things like identification cycles, rolling loads, and random sampling, it seems unlikely that existing products would accidentally infringe on the patent.

As far as Clustify is concerned, Recommind’s patent is irrelevant since our procedure for predictive coding is different. In fact, I explained in a presentation at a recent conference why random sampling is a very inefficient approach (something that has been known for decades in other fields), so I wouldn’t even be tempted to follow Recommind’s procedure.”

So, what do you think? Will the Recommind predictive coding patent allow them to rule predictive coding? Or only their specific approach? Will LeBron James ever win a championship? Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: Hot Neuron is a partner of Trial Solutions, which has used their product, Clustify, in various client projects.

eDiscovery Best Practices: Avoiding eDiscovery Nightmares: 10 Ways CEOs Can Sleep Easier

June 16, 2011

I found this article in the CIO Central blog on Forbes.com from Robert D. Brownstone – it’s a good summary of issues for organizations to consider so that they can avoid major eDiscovery nightmares. The author counts down his top ten list David Letterman style (clever!) to provide a nice easy to follow summary of the issues. Here’s a summary recap, with my ‘two cents’ on each item:

10. Less is more: The U.S. Supreme Court ruled unanimously in 2005 in the Arthur Andersen case that a “retention” policy is actually a destruction policy. It’s important to routinely dispose of old data that is no longer needed to have less data subject to discovery and just as important to know where that data resides. My two cents: A data map is a great way to keep track of where the data resides.

9. Sing Kumbaya: They may speak different languages, but you need to find a way to bridge the communication gap between Legal and IT to develop an effective litigation-preparedness program. My two cents: Require cross-training so that each department can understand the terms and concepts important to the other. And, don’t forget the records management folks!

8. Preserve or Perish: Assign the litigation hold protocol to one key person, either a lawyer or a C-level executive to decide when a litigation hold must be issued. Ensure an adequate process and memorialize steps taken – and not taken. My two cents: Memorialize is underlined because an organization that has a defined process and the documentation to back it up is much more likely to be given leeway in the courts than a company that doesn’t document its decisions.

7. Build the Three-Legged Stool: A successful eDiscovery approach involves knowledgeable people, great technology, and up-to-date written protocols. My two cents: Up-to-date written protocols are the first thing to slide when people get busy – don’t let it happen.

6. Preserve, Protect, Defend: Your techs need the knowledge to avoid altering metadata, maintain chain-of-custody information and limit access to a working copy for processing and review. My two cents: A good review platform will assist greatly in all three areas.

5. Natives Need Not Make You Restless: Consider exchanging files to be produced in their original/”native” formats to avoid huge out-of-pocket costs of converting thousands of files to image format. My two cents: Be sure to address how redactions will be handled as some parties prefer to image those while others prefer to agree to alter the natives to obscure that information.

4. Get M.A.D.? Then Get Even: Apply the Mutually Assured Destruction (M.A.D.) principle to agree with the other side to take off the table costly volumes of data, such as digital voicemails and back-up data created down the road. My two cents: That’s assuming, of course, you have the same levels of data. If one party has a lot more data than the other party, there may be no incentive for that party to agree to concessions.

3. Cooperate to Cull Aggressively and to Preserve Clawback Rights: Setting expectations regarding culling efforts and reaching a clawback agreement with opposing counsel enables each side to cull more aggressively to reduce eDiscovery costs. My two cents: Some parties will agree on search terms up front while others will feel that gives away case strategy, so the level of cooperation may vary from case to case.

2. QA/QC: Employ Quality Assurance (QA) tests throughout review to ensure a high accuracy rate, then perform Quality Control (QC) testing before the data goes out the door, building time in the schedule for that QC testing. Also, consider involving a search-methodology expert. My two cents: I cannot stress that last point enough – the ability to illustrate how you got from the large collection set to the smaller production set will be imperative to responding to any objections you may encounter to the produced set.

1. Never Drop Your Laptop Bag and Run: Dig in, learn as much as you can and start building repeatable, efficient approaches. My two cents: It’s the duty of your attorneys and providers to demonstrate competency in eDiscovery best practices. How will you know whether they have or not unless you develop that competency yourself?

So, what do you think? Are there other ways for CEOs to avoid eDiscovery nightmares? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Competency Ethics – It’s Not Just About the Law Anymore

June 10, 2011

A few months ago at LegalTech New York, I conducted a thought leader interview with Tom O’Connor of Gulf Coast Legal Technology Center, who didn’t exactly mince words when talking about the trend for attorneys to “finally tak[e] technology seriously”. As he noted, “lawyers are finally trying to take some time to try to get up to speed – whining and screaming pitifully all the way about how it’s not fair, and the sanctions are too high and there’s too much data. Get a life, get a grip. Use the tools that are out there that have been given to you for years.”

Strong words, indeed. The American Bar Association (ABA) Model Rules of Professional Conduct (Model Rules) require that an attorney possess and demonstrate a certain requisite level of knowledge in order to be considered competent to handle a given matter. Specifically, Model Rule 1.1 states that, “[a] lawyer shall provide competent representation to a client. Competent representation requires the legal knowledge, skill, thoroughness, and preparation reasonably necessary for the representation.”

Preparation not only means understanding a specific area of the law (for example, antitrust or patent law, both highly specialized.). It also means having the technical knowledge and skills necessary to serve the client in the area of discovery.

The ethical responsibilities of counsel these days includes competently directing and managing the identification, preservation, collection, processing, analysis, review and production of electronically stored information (ESI) required to be produced pursuant to lawful discovery requests. If counsel does not have that level of competency in a particular area, he or she is obligated to either acquire the knowledge or skill necessary to support those needs, or include someone else who does have the requisite skills as part of the representation.

Not too long ago, I met with an attorney and discussed how they handled preservation obligations with their clients. The attorney indicated that he expected his clients to self-manage their own preservation and collection. When I asked him why he didn’t try to get more involved to make sure it was being handled properly, he said, “I don’t want to alarm them. They might decide they need a bigger firm.”

Recent case law is full of cases where counsel didn’t fully understand their eDiscovery obligations, and got themselves and their clients “burned” in the process. If your organization gets involved in litigation, make sure to include eDiscovery competence among the factors you consider when determining counsel qualifications to represent you.

So, what do you think? Is your counsel eDiscovery savvy? If not, do they use a provider that is? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Does Anybody Really Know What Time It Is?

May 9, 2011

Does anybody really know what time it is? Does anybody really care?

OK, it’s an old song by Chicago (back then, they were known as the Chicago Transit Authority). But, the question of what time it really is has a significant effect on how eDiscovery is handled.

Time Zone: In many litigation cases, one of the issues that should be discussed and agreed upon is the time zone to apply to the produced files. Why is it a big deal? Let’s look at one example:

A multinational corporation has offices from coast to coast and potentially responsive emails are routinely sent between East Coast and West Coast offices. If an email is sent from a party in the West Coast office at 10 PM on June 30, 2005 and is received by a party in the East Coast office at 1 AM on July 1, 2005, and the relevant date range is from July 1, 2005 thru December 31, 2006, then the choice of time zones will determine whether or not that email falls within the relevant date range. The time zone is based on the workstation setting, so they could actually be in the same office when the email is sent (if someone is traveling).

Usually the choice is to either use a standard time zone for all files in the litigation – such as Greenwich Mean Time (GMT) or the time zone where the producing party is located – or to use the time zone associated with each custodian, which means that the time zone used will depend on where the data came from. It’s important to determine the handling of time zones up front in cases where multiple time zones are involved to avoid potential disputes down the line.

Which Date to Use?: Each email and efile has one or more date and time stamps associated with it. Emails have date/time sent, as well as date/time received. Efiles have creation date/time, last modified date/time and even last printed date/time. Efile creation dates do not necessarily reflect when a file was actually created; they indicate when a file came to exist on a particular storage medium, such as a hard drive. So, creation dates can reflect when a user or computer process created a file. However, they can also reflect the date and time that a file was copied to the storage medium – as a result, the creation date can be later than the last modified date. It’s common to use date sent for Sent Items emails and date received for Inbox emails and to use last modified date for efiles. But, there are exceptions, so again it’s important to agree up front as to which date to use.

So, what do you think? Have you had any date disputes in your eDiscovery projects? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Checking for Malware is the First Step to eDiscovery Processing

May 2, 2011

A little over a month ago, I noted that we hadn’t missed a (business) day yet in publishing a post for the blog. That streak almost came to an end last week.

As I often do in the early mornings before getting ready for work, I spent some time searching for articles to read and identifying potential blog topics and found a link on a site related to “New Federal Rules”. Curious, I clicked on it and…up popped a pop-up window from our virus checking software (AVG Anti-Virus, or so I thought) that the site had found a file containing a “trojan horse” program.

The odd thing about the pop-up window is that there was no “Fix” button to fix the trojan horse. There were only choices to “Ignore” the virus or “Move it to the Vault”. So, I chose the best available option to move it to the vault.

Then, all hell broke loose.

I received error messages that my hard drive had corrupted, that my RAM was maxed – you name it.

Turns out the trojan horse has provided a “rogue” pop-up window, designed to look like AVG Anti-Virus, to dupe me into activating the program by clicking on a button. If you studied the Trojan War in school, you know that’s why they call it a “trojan horse” – it fools you into letting it into your system.

While its common to refer to all types of malware as “viruses”, a computer virus is only one type of malware. Malware includes computer viruses, worms, trojan horses, spyware, dishonest adware, scareware, crimeware, most rootkits, and other malicious and unwanted software or program. A report from Symantec published in 2008 suggested that “the release rate of malicious code and other unwanted programs may be exceeding that of legitimate software applications”.

I’ve worked with a lot of clients who don’t understand why it can take time to get ESI processed and loaded into their review platform. Depending on the types of files, several steps can be required to get the files ready to review, including “unarchiving” of container files, OCR (of image only files) and, of course, indexing of the files for searchability (among other possible steps). But, the first step is to scan the files for viruses and other malware that may be infecting the files. If malware is found in any files, the files have to be identified. Then, those files will either be isolated and logged as exceptions or the virus software will attempt to remove the malware. While it may seem logical that the malware should always be removed, doing so is technically altering the file, so counsel need to agree that malware removal is acceptable. Either way, the malware needs to be addressed so that it doesn’t affect the entire collection.

As for me, as soon as the infection was evident, I turned my laptop off and turned it over to our support department at Trial Solutions. By the end of the day, I had it back, good as new! Thanks, Tony Cullather!

So, what do you think? How do you handle malware in your collections? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: 4 Steps to Effective eDiscovery With Software Analytics

April 29, 2011

I read an interesting article from Texas Lawyer via Law.com entitled “4 Steps to Effective E-Discovery With Software Analytics” that has some interesting takes on project management principles related to eDiscovery and I’ve interjected some of my thoughts into the analysis below. A copy of the full article is located here. The steps are as follows:

1. With the vendor, negotiate clear terms that serve the project’s key objectives. The article notes the important of tying each collection and review milestone (e.g., collecting and imaging data; filtering data by file type; removing duplicates; processing data for review in a specific review platform; processing data to allow for optical character recognition (OCR) searching; and converting data into a tag image file format (TIFF) for final production to opposing counsel) to contract terms with the vendor.

The specific milestones will vary – for example, conversion to TIFF may not be necessary if the parties agree to a native production – so it’s important to know the size and complexity of the project, and choose only an experienced eDiscovery vendor who can handle the variations.

2. Collect and process data. Forensically sound data collection and culling of obviously unresponsive files (such as system files) to drastically decrease the overall review costs are key services that a vendor provides in this area. As we’ve noted many times on this blog, effective culling can save considerable review costs – each gigabyte (GB) culled can save $16-$18K in attorney review costs.

The article notes that a hidden cost is the OCR process of translating extracted text into a searchable form and that it’s an optimal negotiation point with the vendor. This may have been true when most collections were paper based, but as most collections today are electronic based, the percentage of documents requiring OCR is considerably less than it used to be. However, it is important to be prepared that there are some native files which will be “image only”, such as TIFFs and scanned PDFs – those will require OCR to be effectively searched.

3. Select a data and document review platform. Factors such as ease of use, robustness, and reliability of analytic tools, support staff accessibility to fix software bugs quickly, monthly user and hosting fees, and software training and support fees should be considered when selecting a document review platform.

The article notes that a hidden cost is selecting a platform with which the firm’s litigation support staff has no experience as follow-up consultation with the vendor could be costly. This can be true, though a good vendor training program and an intuitive interface can minimize or even eliminate this component.

The article also notes that to take advantage of the vendor’s more modern technology “[a] viable option is to use a vendor’s review platform that fits the needs of the current data set and then transfer the data to the in-house system”. I’m not sure why the need exists to transfer the data back – there are a number of vendors that provide a cost-effective solution appropriate for the duration of the case.

4. Designate clear areas of responsibility. By doing so, you minimize or eliminate inefficiencies in the project and the article mentions the RACI matrix to determine who is responsible (individuals responsible for performing each task, such as review or litigation support), accountable (the attorney in charge of discovery), consulted (the lead attorney on the case), and informed (the client).

Managing these areas of responsibility effectively is probably the biggest key to project success and the article does a nice job of providing a handy reference model (the RACI matrix) for defining responsibility within the project.

So, what do you think? Do you have any specific thoughts about this article? Please share any comments you might have or if you’d like to know more about a particular topic.