Review

eDiscovery Trends: Cloud Covered by Ball

 

What is the cloud, why is it becoming so popular and why is it important to eDiscovery? These are the questions being addressed—and very ably answered—in the recent article Cloud Cover (via Law Technology News) by computer forensics and eDiscovery expert Craig Ball, a previous thought leader interviewee on this blog.

Ball believes that the fears about cloud data security are easily dismissed when considering that “neither local storage nor on-premises data centers have proved immune to failure and breach”. And as far as the cloud's importance to the law and to eDiscovery, he says, "the cloud is re-inventing electronic data discovery in marvelous new ways while most lawyers are still grappling with the old."

What kinds of marvelous new ways, and what do they mean for the future of eDiscovery?

What is the Cloud?

First we have to understand just what the cloud is.  The cloud is more than just the Internet, although it's that, too. In fact, what we call "the cloud" is made up of three on-demand services:

  • Software as a Service (SaaS) covers web-based software that performs tasks you once carried out on your computer's own hard drive, without requiring you to perform your own backups or updates. If you check your email virtually on Hotmail or Gmail or run a Google calendar, you're using SaaS.
  • Platform as a Service (PaaS) happens when companies or individuals rent virtual machines (VMs) to test software applications or to run processes that take up too much hard drive space to run on real machines.
  • Infrastructure as a Service (IaaS) encompasses the use and configuration of virtual machines or hard drive space in whatever manner you need to store, sort, or operate your electronic information.

These three models combine to make up the cloud, a virtual space where electronic storage and processing is faster, easier and more affordable.

How the Cloud Will Change eDiscovery

One reason that processing is faster is through distributed processing, which Ball calls “going wide”.  Here’s his analogy:

“Remember that scene in The Matrix where Neo and Trinity arm themselves from gun racks that appear out of nowhere? That's what it's like to go wide in the cloud. Cloud computing makes it possible to conjure up hundreds of virtual machines and make short work of complex computing tasks. Need a supercomputer-like array of VMs for a day? No problem. When the grunt work's done, those VMs pop like soap bubbles, and usage fees cease. There's no capital expenditure, no amortization, no idle capacity. Want to try the latest concept search tool? There's nothing to buy! Just throw the tool up on a VM and point it at the data.”

Because the cloud is entirely virtual, operating on servers whose locations are unknown and mostly irrelevant, it throws the rules for eDiscovery right out the metaphorical window.

Ball also believes that everything changes once discoverable information goes into the cloud. "Bringing ESI beneath one big tent narrows the gap between retention policy and practice and fosters compatible forms of ESI across web-enabled applications".

"Moving ESI to the cloud," Ball adds, "also spells an end to computer forensics." Where there are no hard drives, there can be no artifacts of deleted information—so, deleted really means deleted.

What's more, “[c]loud computing makes collection unnecessary”. Where discovery requires that information be collected to guarantee its preservation, putting a hold on ESI located in the cloud will safely keep any users from destroying it. And because cloud computing allows for faster processing than can be accomplished on a regular hard drive, the search for discovery documents will move to where they're located, in the cloud. Not only will this approach be easier, it will also save money.

Ball concludes his analysis with the statement, "That e-discovery will live primarily in the cloud isn't a question of whether but when."

So, what do you think? Is cloud computing the future of eDiscovery? Is that future already here? Please share any comments you might have or if you'd like to know more about a particular topic.

eDiscovery Best Practices: Legal Project Management is the Same as Project Management

 

I found this article (Holy semantics Batman! There is no such thing as ‘legal project management’) which provides a good look at legal project management on the Legal IT Professionals site from Jeffrey Brandt, a previous thought leader interviewee of eDiscovery Daily.  I like this article for two reasons:

  • References to the Old Batman TV Series: Like the author, I watched every episode of the show back in the day, so I had to appreciate the analogy of putting the prefix “Bat” on everything (e.g., “Batcave”, “Batmobile”, “Shark Repellent Bat Spray”, etc.) to adding “legal” to “project management”.  It also gave me the opportunity to re-link to one of very first posts, which has a link at the bottom to a snippet from the old Batman series that always makes me laugh.
  • Clarification as to the Differentiation of ‘Legal Project Management’: According to the author, there is no differentiation.

The author notes that “The underpinnings and basic tenets of project management are 1) accomplishing a defined goal or set of goals; 2) working within a specific time line; and 3) working within a set of defined resources (most often personnel and cost). That can be applied to literally anything.”

True.  While I don’t necessarily believe that an experienced project manager can just “waltz” into managing legal-related projects with no knowledge of the legal industry and what the issues are, the best practices of project management are the same, regardless of the type of project being managed.

For example, I manage rollout coordination for our review platform, OnDemand®.  In a past life, I used to develop, but now I’m too far removed from the process to write web code, implement server configurations or fully understand all of the differences between the different versions of SQL Server.  My primary focus in the rollout management role is to coordinate communication between the developers, testers and support staff to make sure we stay on schedule for each software release to get as many of the proposed features as ready for rollout as possible.  Every time I try to get too much into the details of development, I get in trouble.  Just ask the development staff!  😉

So, what do you think?  Is there a difference between ‘legal project management’ and ‘project management’?   How much legal industry experience do you need to have to manage legal-related projects?  Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: I work for Trial Solutions, which provides SaaS-based eDiscovery review applications FirstPass® (for first pass review) and OnDemand® (for linear review and production).

eDiscovery Trends: More On the Recommind Patent Controversy

 

Perhaps the most controversial story discussed in the eDiscovery community in quite some time is the controversy regarding the patent recently announced by Recommind for Predictive Coding via press release entitled, Recommind Patents Predictive Coding, issued on June 8.  I haven’t seen this much backlash against a company or individual since last summer when LeBron James’ decision to leave the Cleveland Cavaliers for the Miami Heat (and the subsequent championship-like celebration that he and his teammates conducted before the season).  How did that turn out?  😉

Since that announcement, there have been several articles and blog posts about it, including:

  • This one, from Monica Bay of Law Technology News, asking the question: “Is Recommind Blowing Smoke?”  where discussed the buzz over Recommind’s announcement;
  • This one, from Evan Koblentz (also of Law Technology News), entitled “Recommend Intends to Flex Predictive Coding Muscles” which includes responses from Catalyst and Valora Technologies;
  • This one, also from Evan Koblentz, a blog post from EDD Update, where Recommind General Counsel and Vice President Craig Carpenter acknowledges that Recommind failed to obtain a trademark for the term Predictive Coding (though Recommind is still using the ™ symbol on the term Predictive Coding onthis page);
  • Three blog posts in four days from Sharon D. Nelson of Ride the Lightning blog, which debate the enforceability of the patent and include a response from OrcaTec, noting that Recommind’s implied threat of litigation is “nothing more than an attempt to bully the market place”.

There are several other articles and blog posts regarding the topic, but if I listed them all, I’d have no room left for anything new!  Sorry that I couldn’t include them all.

I reached out to Bill Dimm, founder of Hot Neuron LLC, makers of Clustify, which clusters documents in groups for effective, expedited review and asked him his thoughts about the Recommind press release and patent.  Here are his comments:

"Recommind's press release would have been accurately titled 'Recommind Patents a Method for Predictive Coding,' but it went with the much more provocative title 'Recommind Patents Predictive Coding,' implying  that its patent covers every conceivable way of doing predictive coding.  The only way I can see that being accurate is if you DEFINE predictive coding to be exactly the procedure outlined in claim 1 of Recommind's patent.  Of course, 'predictive coding' is a relatively new term, so the definition is up for debate.  The patent itself says:

'Predictive coding refers to the capability to use a small set of coded documents (or partially coded documents) to predict document coding of a corpus.' That sure sounds like it allows for a lot of possibilities beyond the procedure in claim 1 of the patent.  The press release goes on to say: 'ONLY [emphasis is mine] Recommind's patented, iterative, computer-assisted approach can 'bend the cost curve' of document review.'  Really?  So, Recommind has the ONLY product in the industry that works?  A few of us disagree.  Even clustering, which Recommind claims does not qualify as predictive coding will bend the cost curve because the efficiency boost it provides increases with the size of the document set.

Moving on from the press release to the patent itself, I would recommend reading claim 1 if you are interested in such things.  It is the most general method that the USPTO allowed Recommind to claim –  the other claims are all dependent claims that describe more specific embodiments of claim 1, presumably so that Recommind would have a leg left to stand on if prior art was found to invalidate claim 1.  Claim 1 describes a procedure for predictive coding that involves quite a few steps.  It is my understanding (I am NOT a lawyer) that the patent is irrelevant for any predictive coding procedure that does not include every single one of the steps listed in claim 1.  Since claim 1 includes things like identification cycles, rolling loads, and random sampling, it seems unlikely that existing products would accidentally infringe on the patent.

As far as Clustify is concerned, Recommind's patent is irrelevant since our procedure for predictive coding is different.  In fact, I explained in a presentation at a recent conference why random sampling is a very inefficient approach (something that has been known for decades in other fields), so I wouldn't even be tempted to follow Recommind's procedure."

So, what do you think?  Will the Recommind predictive coding patent allow them to rule predictive coding?  Or only their specific approach?  Will LeBron James ever win a championship?  Please share any comments you might have or if you’d like to know more about a particular topic.

Full disclosure: Hot Neuron is a partner of Trial Solutions, which has used their product, Clustify, in various client projects.

eDiscovery Best Practices: Avoiding eDiscovery Nightmares: 10 Ways CEOs Can Sleep Easier

 

I found this article in the CIO Central blog on Forbes.com from Robert D. Brownstone – it’s a good summary of issues for organizations to consider so that they can avoid major eDiscovery nightmares.  The author counts down his top ten list David Letterman style (clever!) to provide a nice easy to follow summary of the issues.  Here’s a summary recap, with my ‘two cents’ on each item:

10. Less is more: The U.S. Supreme Court ruled unanimously in 2005 in the Arthur Andersen case that a “retention” policy is actually a destruction policy.  It’s important to routinely dispose of old data that is no longer needed to have less data subject to discovery and just as important to know where that data resides.  My two cents: A data map is a great way to keep track of where the data resides.

9. Sing Kumbaya: They may speak different languages, but you need to find a way to bridge the communication gap between Legal and IT to develop an effective litigation-preparedness program.  My two cents: Require cross-training so that each department can understand the terms and concepts important to the other.  And, don’t forget the records management folks!

8. Preserve or Perish: Assign the litigation hold protocol to one key person, either a lawyer or a C-level executive to decide when a litigation hold must be issued.  Ensure an adequate process and memorialize steps taken – and not taken.  My two cents: Memorialize is underlined because an organization that has a defined process and the documentation to back it up is much more likely to be given leeway in the courts than a company that doesn’t document its decisions.

7. Build the Three-Legged Stool: A successful eDiscovery approach involves knowledgeable people, great technology, and up-to-date written protocols.  My two cents: Up-to-date written protocols are the first thing to slide when people get busy – don’t let it happen.

6. Preserve, Protect, Defend: Your techs need the knowledge to avoid altering metadata, maintain chain-of-custody information and limit access to a working copy for processing and review.  My two cents: A good review platform will assist greatly in all three areas.

5. Natives Need Not Make You Restless: Consider exchanging files to be produced in their original/”native” formats to avoid huge out-of-pocket costs of converting thousands of files to image format.  My two cents: Be sure to address how redactions will be handled as some parties prefer to image those while others prefer to agree to alter the natives to obscure that information.

4. Get M.A.D.?  Then Get Even: Apply the Mutually Assured Destruction (M.A.D.) principle to agree with the other side to take off the table costly volumes of data, such as digital voicemails and back-up data created down the road.  My two cents: That’s assuming, of course, you have the same levels of data.  If one party has a lot more data than the other party, there may be no incentive for that party to agree to concessions.

3. Cooperate to Cull Aggressively and to Preserve Clawback Rights: Setting expectations regarding culling efforts and reaching a clawback agreement with opposing counsel enables each side to cull more aggressively to reduce eDiscovery costs.  My two cents: Some parties will agree on search terms up front while others will feel that gives away case strategy, so the level of cooperation may vary from case to case.

2. QA/QC: Employ Quality Assurance (QA) tests throughout review to ensure a high accuracy rate, then perform Quality Control (QC) testing before the data goes out the door, building time in the schedule for that QC testing.  Also, consider involving a search-methodology expert.  My two cents: I cannot stress that last point enough – the ability to illustrate how you got from the large collection set to the smaller production set will be imperative to responding to any objections you may encounter to the produced set.

1. Never Drop Your Laptop Bag and Run: Dig in, learn as much as you can and start building repeatable, efficient approaches.  My two cents: It’s the duty of your attorneys and providers to demonstrate competency in eDiscovery best practices.  How will you know whether they have or not unless you develop that competency yourself?

So, what do you think?  Are there other ways for CEOs to avoid eDiscovery nightmares?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Message Thread Review Saves Costs and Improves Consistency

 

Insanity is doing the same thing over and over again and expecting a different result.  But, in ESI review, it can be even worse when you get a different result.

One of the biggest challenges when reviewing ESI is identifying duplicates so that your reviewers aren’t reviewing the same files again and again.  Not only does that drive up costs unnecessarily, but it could lead to problems if the same file is categorized differently by different reviewers (for example, inadvertent production of a duplicate of a privileged file if it is not correctly categorized).

Of course, there are a number of ways to identify duplicates.  Exact duplicates (that contain the exact same content in the same file format) can be identified through hash values, which are a digital fingerprint of the content of the file.  MD5 and SHA-1 are the most popular hashing algorithms, which can identify exact duplicates of a file, so that they can be removed from the review population.  Since many of the same emails are emailed to multiple parties and the same files are stored on different drives, deduplication through hashing can save considerable review costs.

Sometimes, files are not exact duplicates but contain the same (or almost the same) information.  One example is a Word document published to an Adobe PDF file – the content is the same, but the file format is different, so the hash value will be different.  Near-deduplication can be used to identify files where most or all of the content matches so they can be verified as duplicates and eliminated from review.

Then, there is message thread analysis.  Of course, most email messages are part of a larger discussion, which could be just between two parties, or include a number of parties in the discussion.  To review each email in the discussion thread would result in much of the same information being reviewed over and over again.  Instead, message thread analysis pulls those messages together and enables them to be reviewed as an entire discussion.  That includes any side conversations within the discussion that may or may not be related to the original topic (e.g., a side discussion about lunch plans or did you see American Idol last night).

FirstPass®, powered by Venio FPR™, is one example of an application that provides a mechanism for message thread analysis of Outlook emails that pulls the entire thread into one conversation for review as one big “tree”.  The “tree” representation gives you the ability to see all of the conversations within the discussion and focus your review on the last emails in each conversation to see what is said without having to review each email.  Side conversations are “branches” of the tree and FirstPass enables you to tag individual messages, specific branches or the entire tree as responsive, non-responsive, privileged or some other designation.  Also, because of the way that Outlook tracks emails in the thread, FirstPass identifies messages that are missing from the collection with a red X, enabling you to investigate and determine if additional collection is needed and avoiding potential spoliation claims.

With message thread analysis, you can minimize review of duplicative information within emails, saving time and cost and ensuring consistency in the review.

So, what do you think?  Does your review tool support message thread analysis?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: Competency Ethics – It’s Not Just About the Law Anymore

 

A few months ago at LegalTech New York, I conducted a thought leader interview with Tom O’Connor of Gulf Coast Legal Technology Center, who didn’t exactly mince words when talking about the trend for attorneys to “finally tak[e] technology seriously”.  As he noted, “lawyers are finally trying to take some time to try to get up to speed – whining and screaming pitifully all the way about how it’s not fair, and the sanctions are too high and there’s too much data.  Get a life, get a grip.  Use the tools that are out there that have been given to you for years.” 

Strong words, indeed.  The American Bar Association (ABA) Model Rules of Professional Conduct (Model Rules) require that an attorney possess and demonstrate a certain requisite level of knowledge in order to be considered competent to handle a given matter.  Specifically, Model Rule 1.1 states that, "[a] lawyer shall provide competent representation to a client. Competent representation requires the legal knowledge, skill, thoroughness, and preparation reasonably necessary for the representation."

Preparation not only means understanding a specific area of the law (for example, antitrust or patent law, both highly specialized.).  It also means having the technical knowledge and skills necessary to serve the client in the area of discovery.

The ethical responsibilities of counsel these days includes competently directing and managing the identification, preservation, collection, processing, analysis, review and production of electronically stored information (ESI) required to be produced pursuant to lawful discovery requests.  If counsel does not have that level of competency in a particular area, he or she is obligated to either acquire the knowledge or skill necessary to support those needs, or include someone else who does have the requisite skills as part of the representation.

Not too long ago, I met with an attorney and discussed how they handled preservation obligations with their clients.  The attorney indicated that he expected his clients to self-manage their own preservation and collection.  When I asked him why he didn’t try to get more involved to make sure it was being handled properly, he said, “I don’t want to alarm them.  They might decide they need a bigger firm.”

Recent case law is full of cases where counsel didn’t fully understand their eDiscovery obligations, and got themselves and their clients “burned” in the process.  If your organization gets involved in litigation, make sure to include eDiscovery competence among the factors you consider when determining counsel qualifications to represent you.

So, what do you think?  Is your counsel eDiscovery savvy?  If not, do they use a provider that is?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Case Law: Defendant Can’t Be Plaintiff’s Friend on Facebook

In Piccolo v. Paterson, Bucks County, Pa., Common Pleas Court Judge Albert J. Cepparulo denied the motion from the defendant requesting access to the photos of plaintiff Sara Piccolo posted in her Facebook account.

Piccolo filed an action against the defendants after being injured in a one-car accident while a passenger in a car driven by defendant Lindsay Paterson. According to the defense motion, filed by attorneys at Moore & Riemenschneider, Piccolo testified she had a Facebook account and was asked at deposition if the defense counsel could send a “neutral friend request” to Piccolo so that he could review the Facebook postings Piccolo testified she made every day.  Piccolo’s attorney, Benjamin G. Lipman , ultimately denied the request, responding that the “‘materiality and importance of the evidence … is outweighed by the annoyance, embarrassment, oppression and burden to which it exposes'” the plaintiff.

The defense argued that access to Piccolo’s Facebook page would provide necessary and relevant information related to the claims by Piccolo and cited a case, McMillen v. Hummingbird Speedway, Inc. (previously summarized by eDiscoveryDaily here), in which the court ordered the plaintiff to provide his username and password to the defendant’s attorney. The plaintiff’s attorney argued that the defense had only asked for the pictures Piccolo posted on Facebook and that they had already been provided with “as complete a photographic record of the pre-accident and post-accident condition” of Piccolo.

As a result of the accident in May 2007, Piccolo suffered lacerations to her lip and chin when hit in the face with an airbag. She had 95 stitches to her face and then surgery to repair her scarring six months later. With permanent scars on her face, Piccolo allowed the insurer in 2008 to take photographs of her face and gave the defense 20 photos of her face from the week following the accident and five photos from the months just before the accident.

In Piccolo’s response to the defense motion, Lipman argued that defense counsel had only asked at Piccolo’s deposition about the pictures she posted on Facebook, not any textual postings. He said that the defendant had already been provided “as complete a photographic record of the pre-accident and post-accident condition” of Piccolo as she “could reasonably have a right to expect in this case.”

Judge Cepparulo agreed, ruling with the plaintiff and denying the defense access to Piccolo’s Facebook page in a one-paragraph motion.

So, what do you think?  Did the judge make the correct call or should he have issued a ruling consistent with McMillen?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Best Practices: 4 Steps to Effective eDiscovery With Software Analytics

 

I read an interesting article from Texas Lawyer via Law.com entitled “4 Steps to Effective E-Discovery With Software Analytics” that has some interesting takes on project management principles related to eDiscovery and I’ve interjected some of my thoughts into the analysis below.  A copy of the full article is located here.  The steps are as follows:

1. With the vendor, negotiate clear terms that serve the project's key objectives.  The article notes the important of tying each collection and review milestone (e.g., collecting and imaging data; filtering data by file type; removing duplicates; processing data for review in a specific review platform; processing data to allow for optical character recognition (OCR) searching; and converting data into a tag image file format (TIFF) for final production to opposing counsel) to contract terms with the vendor. 

The specific milestones will vary – for example, conversion to TIFF may not be necessary if the parties agree to a native production – so it’s important to know the size and complexity of the project, and choose only an experienced eDiscovery vendor who can handle the variations.

2. Collect and process data.  Forensically sound data collection and culling of obviously unresponsive files (such as system files) to drastically decrease the overall review costs are key services that a vendor provides in this area.  As we’ve noted many times on this blog, effective culling can save considerable review costs – each gigabyte (GB) culled can save $16-$18K in attorney review costs.

The article notes that a hidden cost is the OCR process of translating extracted text into a searchable form and that it’s an optimal negotiation point with the vendor.  This may have been true when most collections were paper based, but as most collections today are electronic based, the percentage of documents requiring OCR is considerably less than it used to be.  However, it is important to be prepared that there are some native files which will be “image only”, such as TIFFs and scanned PDFs – those will require OCR to be effectively searched.

3. Select a data and document review platform.  Factors such as ease of use, robustness, and reliability of analytic tools, support staff accessibility to fix software bugs quickly, monthly user and hosting fees, and software training and support fees should be considered when selecting a document review platform.

The article notes that a hidden cost is selecting a platform with which the firm’s litigation support staff has no experience as follow-up consultation with the vendor could be costly.  This can be true, though a good vendor training program and an intuitive interface can minimize or even eliminate this component.

The article also notes that to take advantage of the vendor’s more modern technology “[a] viable option is to use a vendor's review platform that fits the needs of the current data set and then transfer the data to the in-house system”.  I’m not sure why the need exists to transfer the data back – there are a number of vendors that provide a cost-effective solution appropriate for the duration of the case.

4. Designate clear areas of responsibility.  By doing so, you minimize or eliminate inefficiencies in the project and the article mentions the RACI matrix to determine who is responsible (individuals responsible for performing each task, such as review or litigation support), accountable (the attorney in charge of discovery), consulted (the lead attorney on the case), and informed (the client).

Managing these areas of responsibility effectively is probably the biggest key to project success and the article does a nice job of providing a handy reference model (the RACI matrix) for defining responsibility within the project.

So, what do you think?  Do you have any specific thoughts about this article?   Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Trends: Thought Leader Interview with Jeffrey Brandt, Editor of Pinhawk Law Technology Daily Digest

 

As eDiscovery Daily has done in the past, we have periodically interviewed various thought leaders in eDiscovery and legal technology to provide insight as to trends in the industry for our readers to consider.  Recently, I was able to interview Jeffrey Brandt, Editor of the Pinhawk Law Technology Daily Digest and columnist for Legal IT Professionals.

With an educational background in computer science and mathematics from the University of Pittsburgh, Jeff has over twenty four years of experience in the field of legal automation working with various organizations in the United States, Canada, and the United Kingdom.  As a technology and management consultant to hundreds of law firms and corporate law departments he has worked on information management projects including: long range strategic planning, workflow management and reengineering, knowledge management, IT structure and personnel requirements and budgeting. Working as a CIO at several large law firms, Jeff has helped bring oversight, coordination and change management to initiatives including: knowledge management, library & research services, eDiscovery, records management, technology and more. Most recently, he served as the Chief Information and Knowledge Officer with an AMLaw 100 law firm based out of Washington, DC.

Jeff has also been asked to serve on numerous advisory councils and CIO advisory boards for key vendors in the legal space, advising them on issues of client service and future product direction.  He is a long time member (and former board member) of the International Legal Technology Association (ILTA) and has taught CLE classes on topics ranging from litigation support to ethics and technology.

What do you consider to be the current significant trends in eDiscovery in 2011 and beyond on which people in the industry are, or should be, focused?

I would say that the biggest two are the project management component and, for lack of a better term, automated or artificial intelligence.

The whole concept and the complexities of what it takes to manage a case today are more challenging than ever, including issues like the number of sources, the amount of data in the sources, the format in which you’re producing, where can the data go and who can see it.  I remember the days when people used to take a couple of bankers boxes, put them in their car and go home and work on the documents.  You simply cannot do that today – the amount of information today is just insane.

As for artificial intelligence, as was discussed in the (Pinhawk) digest recently, you’re seeing the emergence of predictive coding and using computers to cull through the massive amounts of information so that a human can take the final pass.  I think more and more we’re going to see people relying on those types of technologies – some because they embrace it, others because there is no other way to humanly do it.

I think if there’s any third trend it would probably be where do we go next to get the data?  In terms of social media, mining Facebook and Twitter and all the other various sources for additional data as part of the discovery process has become a challenge.

You recently became editor of the Pinhawk Law Technology Daily Digest.  Tell me about that and about your plans for the digest.

Well, I think there are several things going forward.  My role is to keep up the good work that Curt Meltzer, the founding editor, started and fill the “big shoes” that Curt left behind.  My goal is to expand the sources of information from which Pinhawk draws.  There are about 400 sources today and I think by the time my sources (and possibly a few others) are added in, there will be over 500.  We’ve also talked about going to our readership and asking them “what are your go-to and must read sources?” to include those sources as well.  We’ll also be looking to incorporate social media tools to hopefully make the experience much more comprehensive and easier to participate in for the Pinhawk digest reader.

And, what should we be looking for in your column in Legal IT Professionals?

Well, I like to dabble in multiple areas – in the small consulting practice that I have, I do a little bit of everything.  I’ve recently done some very interesting work in communities of practice, using social media tools, focusing them inward in law firms to provide the forum for lawyers to open up, share and mentor to others.  I like KM (Knowledge Management) and related topics and we had a recent post in Pinhawk talking about the future of the law firm.  To me, those types of discussions are fascinating.

You take the extremes and you’ve got the “law factory”, you take the high-end and you’ve got the “bet the farm” law firm.  How technology plays a role in whatever culture, whatever focus a law firm puts itself on is interesting.  And then you watch and see some of the rumblings and inklings of what can be done in places like Australia, where you have third-party investment of law firms and the United Kingdom, where they are about to get third-party investment.  There was a recent article about third-party ownership of law firms in North Carolina.  You look at examples like that and you say “is the model of partnership alive?”  When you get into “big law”, are they really partnerships?  Where are they in the spectrum of a thousand sole practitioners operating under one letterhead to a firm of a thousand lawyers?  That’s where I think that communities of practice and social media tools are going to help lawyers know more about their own partners and own firms. 

It’s sad that in some firms the lawyers on the north side of the building don’t even know the lawyers on the south side of the building, let alone the people on the eighth floor vs. the tenth floor.  It’s a changing landscape.  When I got into legal and was first a CIO at Porter, Wright, Morris & Arthur, 250 lawyers in Columbus, Ohio was the 83rd largest law firm in the US – an AMLAW 100 firm.  Today, does that size a firm even make it into the AMLAW 250?

In my column at Legal IT Professionals, you’ll see more about KM and change management.  Another part of my practice is mentoring IT executives in how to deal with business problems related to the business of law and I think that might be my next post – free advice to the aspiring CIO.

This might sound odd coming from a technologist, but…it’s not really about the technology.  From a broad standpoint, you can be successful with most software tools.  A law firm isn’t going to be made or broken whether it chose OpenText or iManage as a document management tool or chose a specific litigation support tool.  It is more about the people, the education and the process than it is the actual tool.  Yes, there are some horrible tools that you should avoid, but, all things being equal, it’s really more the other pieces of the equation that determine your success.

Thanks, Jeff, for participating in the interview!

And to the readers, as always, please share any comments you might have or if you’d like to know more about a particular topic!

eDiscovery Best Practices: Your ESI Collection May Be Larger Than You Think

 

Here’s a sample scenario: You identify custodians relevant to the case and collect files from each.  Roughly 100 gigabytes (GB) of Microsoft Outlook email PST files and loose “efiles” is collected in total from the custodians.  You identify a vendor to process the files to load into a review tool, so that you can perform first pass review and, eventually, linear review and produce the files to opposing counsel.  After processing, the vendor sends you a bill – and they’ve charged you to process over 200 GB!!  What happened?!?

Did the vendor accidentally “double-bill” you?  That would be great – but no.  There’s a much more logical explanation and, unfortunately, you may wind up paying a lot more to process these files that you expected.

Many of the files in most ESI collections are stored in what are known as “archive” or “container” files.  For example, as noted above, Outlook emails are typically saved for each custodian in a personal storage (.PST) file format, which is an expanding container file. For most custodians, all of their email (and the corresponding attachments, if present) resides in a few PST files.  The scanned size for the PST file is the size of the file on disk.

Did you ever see one of those vacuum bags that you store clothes in and then suck all the air out so that the clothes won’t take as much space?  The PST file is like one of those vacuum bags – it typically stores the emails and attachments in a compressed format to save space.  When the emails and attachments are processed into a review tool, they are expanded into their normal size.  This expanded size can be 1.5 to 2 times larger than the scanned size (or more).  And, that’s what many vendors will bill on – the expanded size.

There are other types of archive container files that compress the contents – .zip and .rar files are two examples of compressed container files.  These files are often used to not only to compress files for storage on hard drives, but they are also used to compact or group a set of files when transmitting them, usually in – you guessed it – email.  With email comprising a majority of most ESI collections and the popularity of other archive container files for compressing file collections, the expanded size of your collection may be considerably larger than it appears when stored on disk.  It’s important to be prepared for that and know your options when processing that data, so you can effectively anticipate those processing costs.

So, what do you think?  Have you ever been surprised by processing costs of your ESI?   Please share any comments you might have or if you’d like to know more about a particular topic.