Production

eDiscovery Trends: Myth of SaaS Insecurity Finally Busted

Eleven years ago, when I first began talking to attorneys about hosting document collections online to manage the review and production process for discovery, the typical response that I got was “I would never consider putting my client’s documents online – it’s just not secure”.  Let’s face it – lawyers are not exactly early adopters of technology… 😉

These days, few folks seem to have that concern any more when it comes to putting sensitive data and documents online.  Many people bank online, buy items from Amazon and other “etailers”, share pictures and other personal information on Facebook, etc.  As for business data, SalesForce.com has become the top customer relationship management (CRM) application and many business users are using Google Docs to share documents with colleagues, as just two examples.

What do all of these applications have in common?  They are Software as a Service (SaaS) applications, delivering data and functionality via an online application.  As noted previously on this blog, a new IDC study forecasts the SaaS market to reach $40.5 billion by 2014, an annual growth rate of 25.3%.  Also by 2014, about 34% of all new business software purchases will be via SaaS applications, according to IDC.

SaaS review applications have also become increasingly popular in eDiscovery with several eDiscovery SaaS applications available that provide benefits including: no software to install, intuitive browser-based interfaces and ability to share the collection with your client, experts, and co-counsel without distributing anything more than a login.

As for security concerns, most litigators have come to accept that these systems are secure.  But, do they realize just how secure they are?

As an example, at Trial Solutions, the servers hosting data for our OnDemand® and FirstPass™ (powered by Venio FPR™) platforms are housed in a Tier 4 data center in Houston (which is where our headquarters is).  The security at this data center is military grade: 24 x 7 x 365 onsite security guards (I feel sorry for the folks who have to work this Saturday!), video surveillance, biometric and card key security required just to get into the building.  Not to mention a building that features concrete bollards, steel lined walls, bulletproof glass, and barbed wire fencing.  And, if you’re even able to get into the building, you then have to find the right server (in the right locked room) and break into the server security.  It’s like the movie Mission Impossible where Tom Cruise has to break into the CIA, except for the laser beams over the air vent (anyone who watches movies knows those can be easily thwarted by putting mirrors over them).  To replicate that level of security infrastructure would be cost prohibitive for even most large companies.

From the outside, SaaS applications secure data with login authentication and Secured Sockets Layer (SSL) encryption.  SSL encryption is like taking a piece of paper with text on it, scrambling the letters on that piece of paper and then tearing it up into many pieces and throwing the scraps into the wind.  To intercept a communication (one request to the server), you have to intercept all of the packets of a communication, then unscramble each packet individually and then reassemble them in the correct order.

Conversely, desktop review application data could be one stolen laptop away from being compromised.  No wonder why nobody talks about security concerns anymore with SaaS applications.

So, what do you think?  How secure is your document collection?  Please share any comments you might have or if you’d like to know more about a particular topic.

Happy Holidays from all of us at Trial Solutions and eDiscovery Daily!

eDiscoveryJournal Webinar: More on Native Format Production and Redaction

As noted yesterday, eDiscoveryJournal conducted a webinar last Friday with some notable eDiscovery industry thought leaders regarding issues associated with native format production and redaction, including George Socha, Craig Ball and Tom O’Connor, and moderated by Greg Buckles, co-founder of eDiscoveryJournal, who has over 20 years experience in discovery and consulting.

What follows is more highlights of the discussion, based on my observations and notes from the webinar.  If anyone who attended the webinar feels that there are any inaccuracies in this account, please feel free to submit a comment to this post and I will be happy to address it.

More highlights of the discussion:

  • Redaction – Is it Possible, Practical, Acceptable?: George said it’s certainly possible and practical, but the biggest problem he sees is that redaction is often done without agreement between parties as to how it will be done.  Tom noted that the knee jerk reaction for most of his clients is “no” – to do it effectively, you need to know your capabilities and what information you’re trying to change.  Craig indicated that it’s not only possible and practical, but often desirable; however, when removing information such as columns from databases or spreadsheets, you need to know data dependencies and the possibility of “breaking” the file by removing that data.  Craig also remarked that certain file types (such as Microsoft Office files) are now stored in XML format, making it easier to redact them natively without breaking functionality.
  • How to Authenticate Redacted Files based on HASH Value?:  Craig said you don’t – it’s a changing of the file.  Although Craig indicated that some research has been done on “near-HASH” values, George noted that there is currently no such thing and that the HASH value changes completely with a change as small as one character.  Tom noted that it’s “tall weeds” when discussing HASH values with clients to authenticate files as many don’t fully understand the issues – it’s a “where angels fear to tread” concern.
  • Biggest Piece of Advice Regarding Redaction?: Craig said that redaction of native files is hard – So what?  Is the percentage of files requiring redaction so great that it needs to drive the process?  If it’s a small percentage, you can always simply TIFF the files requiring redaction and redact the TIFFs.  George indicated that one of the first things he advises clients to do is to work with the other side on how to handle redactions and if they won’t work with you, go to the judge to address it.  Tom indicated that he asks the client questions to find out what issues are associated with the redaction, such as what the client wants to accomplish, percentage of redaction expected, etc. and then provides advice based on those answers.
  • Redaction for Confidentiality (e.g., personal information, trade secrets, etc.): George noted that, while in many cases, it’s not a big issue; in some cases, it’s a huge issue.  There are currently 48 states that have at least some laws regarding safeguarding personal information and also efforts underway to do so at a national level.  We’re a long way from coming up with an effective way to address this issue.  Craig said that sometimes there are ways to address programmatically – in one case where he served as special master, his client had a number of spreadsheets with columns of confidential data and they were able to identify a way to handle those programmatically.  Tom has worked on cases where redaction of social security numbers through search and replace was necessary, but that there was a discussion and agreement with opposing counsel before proceeding.
  • How to Guarantee that Redaction Actually Deletes the Data and Doesn’t Just Obscure it?: Tom said he had a situation on a criminal case where they received police reports from the Federal government with information on protected witnesses, which they gave back.  There is not a “cookie-cutter” approach, but you have to understand the data, what’s possible and provide diligent QC.  Craig indicated that he conducts searches for the redacted data to confirm it has been deleted.  Greg noted that you have to make sure that the search tool will reach all of the redacted areas of the file.  George said too often people simply fail to check the results – providers often say that they can’t afford to perform the QC, but law firms often don’t do it either, so it falls through the cracks.  Tom recommends to his law firm clients that they take responsibility to perform that check as they are responsible for the production.  As part of QC, it’s important to have a different set of eyes and even different QC/search tools to confirm successful redaction.

Thanks to eDiscoveryJournal for a very informative webinar!

So, what do you think?  Do you have any other questions about native format production and redaction?  Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscoveryJournal Webinar: Debate on Native Format Production and Redaction

 

eDiscoveryJournal conducted a webinar last Friday with some notable eDiscovery industry thought leaders regarding issues associated with native format production and redaction.  The panel included George Socha of Socha Consulting, LLC and co-founder of EDRM, Craig Ball of Craig D. Ball, P.C. and author of numerous articles on eDiscovery and computer forensics, and Tom O’Connor, who is a nationally known consultant, speaker and writer in the area of computerized litigation support systems.  All three panelists are nationally recognized speakers and experts on eDiscovery topics.  The panel discussion was moderated by Greg Buckles, co-founder of eDiscoveryJournal, who is also a recognized expert with over 20 years experience in discovery and consulting.

I wrote an article a few years ago on review and production of native files, so this is a subject of particular interest to me.  What follows is highlights of the discussion, based on my observations and notes from the webinar.  If anyone who attended the webinar feels that there are any inaccuracies in this account, please feel free to submit a comment to this post and I will be happy to address it.

Having said that, here are the highlights:

  • Definition of Native Files: George noted that the technical definition of native files is “in the format as used during the normal course of business”, but in the application of that concept, there is no real consensus.  Tom, who has worked on a number of multi-party cases has found consensus difficult as parties have different interpretations as to what defines native files.  Craig noted that it’s less about format than it is ensuring a “level of information parity” so that both sides have the opportunity to access the same information for those files.
  • “Near-Native” Files: George noted that there is a “quasi-native” or “near-native” format, which is still a native format, even if it isn’t in the original form.  If you have a huge SQL database, but only produce a relevant subset out of it in a smaller SQL database, that would be an example of a “near-native” format.  Individual Outlook MSG files are another example that, as Craig noted, are smaller components of the original Outlook mailbox container for which individual message metadata is preserved.
  • Position of Producing Native Files: Craig noted that the position is often to provide in a less usable format (such as TIFF images) because of attorneys’ fear that the opposition will be able to get more information out of the native files than they did.  George noted that you can expect expert fees to double or even quadruple when expecting them to work with image files as opposed to native files.
  • Negotiation and Production of Metadata: Tom noted that there is a lack of understanding by attorneys as to how metadata differs for each file format.  Craig noted that there is certain “dog tag” metadata such as file name, path, last modified date and time, custodian name and hash value, that serve as a “driver’s license” for files whereas the rest of the more esoteric metadata complete the “DNA” for each file.  George noted that the EDRM XML project is working towards facilitating standard transfer of file metadata between parties.
  • Advice on Meet and Confer Preparation: When asked by Greg what factor is most important when preparing for meet and confer, Craig said it depends partly on whether you’re the primary producing or requesting party in the case.  Some people prefer “dumbed down” images, so it’s important to know what format you can handle, the issues in the case and cost considerations, of course.  George noted that there is little or no attention on how the files are going to be used later in the case at depositions and trial and that it’s important to think about how you plan to use the files in presentation and work backward.  Tom noted it’s really important to understand your collection as completely as possible and ask questions such as: What do you have?  How much?  What formats?  Where does it reside?  Tom indicated that he’s astonished how difficult it is for many of his clients to answer these questions.

Want to know more?  Tune in tomorrow for the second half of the webinar!  And, as always, please share any comments you might have or if you’d like to know more about a particular topic.

Reporting from the EDRM Mid-Year Meeting

 

Launched in May 2005, the Electronic Discovery Reference Model (EDRM) Project was created to address the lack of standards and guidelines in the electronic discovery market.  Now, in its sixth year of operation, EDRM has become the gold standard for…well…standards in eDiscovery.  Most references to the eDiscovery industry these days refer to the EDRM model as a representation of the eDiscovery life cycle.

At the first meeting in May 2005, there were 35 attendees, according to Tom Gelbmann of Gelbmann & Associates, co-founder of EDRM along with George Socha of Socha Consulting LLC.  Check out the preliminary first draft of the EDRM diagram – it has evolved a bit!  Most participants were eDiscovery providers and, according to Gelbmann, they asked “Do you really expect us all to work together?”  The answer was “yes”, and the question hasn’t been asked again.  Today, there are over 300 members from 81 participating organizations including eDiscovery providers, law firms and corporations (as well as some individual participants).

This week, the EDRM Mid-Year meeting is taking place in St. Paul, MN.  Twice a year, in May and October, eDiscovery professionals who are EDRM members meet to continue the process of working together on various standards projects.  EDRM has eight currently active projects, as follows:

  • Data Set: provides industry-standard, reference data sets of electronically stored information (ESI) and software files that can be used to test various aspects of eDiscovery software and services,
  • Evergreen: ensures that EDRM remains current, practical and relevant and educates about how to make effective use of the Model,
  • Information Management Reference Model (IMRM): provides a common, practical, flexible framework to help organizations develop and implement effective and actionable information management programs,
  • Jobs: develops a framework for evaluating pre-discovery and discovery personnel needs or issues,
  • Metrics: provides an effective means of measuring the time, money and volumes associated with eDiscovery activities,
  • Model Code of Conduct: evaluates and defines acceptable boundaries of ethical business practices within the eDiscovery service industry,
  • Search: provides a framework for defining and managing various aspects of Search as applied to eDiscovery workflow,
  • XML: provides a standard format for e-discovery data exchange between parties and systems, reducing the time and risk involved with data exchange.

This is my fourth year participating in the EDRM Metrics project and it has been exciting to see several accomplishments made by the group, including creation of a code schema for measuring activities across the EDRM phases, glossary definitions of those codes and tools to track early data assessment, collection and review activities.  Today, we made significant progress in developing survey questions designed to gather and provide typical metrics experienced by eDiscovery legal teams in today’s environment.

So, what do you think?  Has EDRM impacted how you manage eDiscovery?  If so, how?  Please share any comments you might have or if you’d like to know more about a particular topic.

Thought Leader Q&A: Christine Musil of Informative Graphics Corporation

 

Tell me about your company and the products you represent.  Informative Graphics Corp. (IGC) is a leading developer of commercial software to view, collaborate on, redact and publish documents. Our products are used by corporations, law firms and government agencies around the world to access and safely share content without altering the original document.

What are some examples of how electronic redaction has been relevant in eDiscovery lately?  Redaction is walking the line between being responsive and protecting privilege and privacy. A great recent example of a redaction mistake having pretty broad implications includes the lawyers for former Illinois governor Rod Blagojevich requesting a subpoena of President Obama. The court filing included areas that had been improperly redacted by Blagojevich’s lawyers. While nothing new or shocking was revealed, this snafu put his reputation up for public inspection and opinion once again.  

What are some of the pitfalls in redacting PDFs?  The big pitfall is not understanding what a redaction is and why it is important to do it correctly. People continue to make the mistake of using a drawing tool to cover text and then publishing the document to PDF. The drawing shape visually blocks the text, but someone can use the Text tool in Acrobat to highlight the text and paste it into Notepad.  Using a true electronic redaction tool like Redact-It and being properly trained to use it is essential. 

Is there such thing as native redaction?  This is such a hot topic that I recently wrote a white paper on the subject titled “The Reality of Native Format Production and Redaction.” The answer is: It depends who you ask. From a realistic perspective, no, there is no such thing as native redaction. There is no tool that supports multiple formats and gives you back the document in the same format as the original. Even if there was such a tool, this seems dangerous and ripe for abuse (what else might “accidentally” get changed while they are at it?). 

You recently joined EDRM’s XML section. What are you currently working on in that endeavor, to the extent you can talk about, and why do you think XML is an important part of the EDRM?  The EDRM XML project is all about creating a single, universal format for eDiscovery. The organization’s goal is really to eliminate issues around the multitude of formats in the world and streamline review and production. Imagine never again receiving a CD full of flat TIFF files with separate text files! This whole issue of how users control and see document content is at the core of what IGC does, which makes this project a great fit for IGC’s expertise.  

About Christine Musil

Christine Musil is Director of Marketing for Informative Graphics Corporation, a viewing, annotation and content management software company based in Arizona. Informative Graphics makes several products including Redact-It, an electronic redaction solution used by law firms, corporate legal departments, government agencies and a variety of other professional service companies.

Announcing eDiscovery Thought Leader Q&A Series!

 

eDiscovery Daily is excited to announce a new blog series of Q&A interviews with various eDiscovery thought leaders.  Over the next three weeks, we will publish interviews conducted with six individuals with unique and informative perspectives on various eDiscovery topics.  Mark your calendars for these industry experts!

Christine Musil is Director of Marketing for Informative Graphics Corporation, a viewing, annotation and content management software company based in Arizona.  Christine will be discussing issues associated with native redaction and redaction of Adobe PDF files.  Her interview will be published this Thursday, October 14.

Jim McGann is Vice President of Information Discovery for Index Engines. Jim has extensive experience with the eDiscovery and Information Management.  Jim will be discussing issues associated with tape backup and retrieval.  His interview will be published this Friday, October 15.

Alon Israely is a Senior Advisor in BIA’s Advisory Services group and currently oversees BIA’s product development for its core technology products.  Alon will be discussing best practices associated with “left side of the EDRM model” processes such as preservation and collection.  His interview will be published next Thursday, October 21.

Chris Jurkiewicz is Co-Founder of Venio Systems, which provides Venio FPR™ allowing legal teams to analyze data, provide an early case assessment and a first pass review of any size data set.  Chris will be discussing current trends associated with early case assessment and first pass review tools.  His interview will be published next Friday, October 22.

Kirke Snyder is Owner of Legal Information Consultants, a consulting firm specializing in eDiscovery Process Audits to help organizations lower the risk and cost of e-discovery.  Kirke will be discussing best practices associated with records and information management.  His interview will be published on Monday, October 25.

Brad Jenkins is President and CEO for Trial Solutions, which is an electronic discovery software and services company that assists litigators in the collection, processing and review of electronic information.  Brad will be discussing trends associated with SaaS eDiscovery solutions.  His interview will be published on Tuesday, October 26.

We thank all of our guests for participating!

So, what do you think?  Is there someone you would like to see interviewed for the blog?  Are you an industry expert with some information to share from your “soapbox”?  If so, please share any comments or contact me at daustin@trialsolutions.net.  We’re looking to assemble our next group of interviews now!

First Pass Review: Domain Categorization of Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through “fuzzy” searching to find misspellings or OCR errors in an opponent’s produced ESI.

Domain Categorization

Another type of analysis is the use of domain categorization. Email is generally the biggest component of most ESI collections and each participant in an email communication belongs to a domain associated with the email server that manages their email.

FirstPass supports domain categorization by providing a list of domains associated with the ESI collection being reviewed, with a count for each domain that appears in emails in the collection. Domain categorization provides several benefits when reviewing your opponent’s ESI:

  • Non-Responsive Produced ESI: Domains in the list that are obviously non-responsive to the case can be quickly identified and all messages associated with those domains can be “group-tagged” as non-responsive. If a significant percentage of files are identified as non-responsive, that may be a sign that your opponent is trying to “bury you with paper” (albeit electronic).
  • Inadvertent Disclosures: If there are any emails associated with outside counsel’s domain, they could be inadvertent disclosures of attorney work product or attorney-client privileged communications. If so, you can then address those according to the agreed-upon process for handling inadvertent disclosures and clawback of same.
  • Issue Identification: Messages associated with certain parties might be related to specific issues (e.g., an alleged design flaw of a specific subcontractor’s product), so domain categorization can isolate those messages more quickly.

In summary, there are several ways to use first pass review tools, like FirstPass, for reviewing your opponent’s ESI production, including: email analytics, synonym searching, fuzzy searching and domain categorization. First pass review isn’t just for your own production; it’s also an effective process to quickly evaluate your opponent’s production.

So, what do you think? Have you used first pass review tools to assess an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Fuzzy Searching Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through synonym searching to find variations of your search terms to increase the possibility of finding the terminology used by your opponents.

Fuzzy Searching

Another type of analysis is the use of fuzzy searching. Attorneys know what terms they’re looking for, but those terms may not often be spelled correctly. Also, opposing counsel may produce a number of image only files that require Optical Character Recognition (OCR), which is usually not 100% accurate.

FirstPass supports “fuzzy” searching, which is a mechanism by finding alternate words that are close in spelling to the word you’re looking for (usually one or two characters off). FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for the term “petroleum”, you can find variations such as “peroleum”, “petoleum” or even “petroleom” – misspellings or OCR errors that could be relevant. Then, simply select the variations you wish to include in the search. Fuzzy searching is the best way to broaden your search to include potential misspellings and OCR errors and FirstPass provides a terrific capability to select those variations to review additional potential “hits” in your collection.

Tomorrow, I’ll talk about the use of domain categorization to quickly identify potential inadvertent disclosures and weed out non-responsive files produced by your opponent, based on the domain of the communicators. Hasta la vista, baby!  🙂

In the meantime, what do you think? Have you used fuzzy searching to find misspellings or OCR errors in an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Synonym Searching Your Opponent’s Data

Yesterday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass™, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through email analytics to see the communication patterns graphically to identify key parties for deposition purposes and look for potential production omissions.

Synonym Searching

Another type of analysis is the use of synonym searching. Attorneys understand the key terminology their client uses, but they often don’t know the terminology their client’s opposition uses because they haven’t interviewed the opposition’s custodians. In a product defect case, the opposition may refer to admitted design or construction “mistakes” in their product or process as “flaws”, “errors”, “goofs” or even “flubs”. With FirstPass, you can enter your search term into the synonym searching section of the application and it will provide a list of synonyms (with hit counts of each, if selected). Then, you can simply select the synonyms you wish to include in the search. As a result, FirstPass identifies synonyms of your search terms to broaden the scope and catch key “hits” that could be the “smoking gun” in the case.

Tomorrow, I’ll talk about the use of fuzzy searching to find misspellings that may be commonly used by your opponent or errors resulting from Optical Character Recognition (OCR) of any image-only files that they produce. Stay tuned! 🙂

In the meantime, what do you think? Have you used synonym searching to identify variations on terms in an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.

First Pass Review: Of Your Opponent’s Data

In the past few years, applications that support Early Case Assessment (ECA) (or Early Data Assessment, as I prefer to call it) and First Pass Review (FPR) of ESI have become widely popular in eDiscovery as the analytical and culling benefits of conducting FPR have become obvious. The benefit of these FPR tools to analyze and cull their ESI before conducting attorney review and producing relevant files has become increasingly clear. But, nobody seems to talk about what these tools can do with opponent’s produced ESI.

Less Resources to Understand Data Produced to You

In eDiscovery, attorneys typically develop a reasonably in-depth understanding of their collection. They know who the custodians are, have a chance to interview those custodians and develop a good knowledge of standard operating procedures and terminology of their client to effectively retrieve responsive ESI. However, that same knowledge isn’t present when reviewing opponent’s data. Unless they are deposed, the opposition’s custodians aren’t interviewed and where the data originated is often unclear. The only source of information is the data itself, which requires in-depth analysis. An FPR application like FirstPass™, powered by Venio FPR™, can make a significant difference in conducting that analysis – provided that you request a native production from your opponent, which is vital to being able to perform an in-depth analysis.

Email Analytics

The ability to see the communication patterns graphically – to identify the parties involved, with whom they communicated and how frequently – is a significant benefit to understanding the data received. FirstPass provides email analytics to understand the parties involved and potentially identify other key opponent individuals to depose in the case. Dedupe capabilities enable quick comparison against your production to confirm if the opposition has possibly withheld key emails between opposing parties. FirstPass also provides an email timeline to enable you to determine whether any gaps exist in the opponent’s production.

Tomorrow, I’ll talk about the use of synonym searching to find variations of your search terms that may be common terminology of your opponent. Same bat time, same bat channel! 🙂

In the meantime, what do you think? Have you used email analytics to analyze an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.