Electronic Discovery

Good Processing Requires a Sound Process – Best of eDiscovery Daily

Home at last!  Today, we are recovering from our trip, after arriving back home one day late and without our luggage.  Satan, thy name is Lufthansa!  Anyway, for these past two weeks except for Jane Gennarelli’s Throwback Thursday series, we have been re-publishing some of our more popular and frequently referenced posts.  Today’s post is a topic that comes up often with our clients.  Enjoy!  New posts next week!

As we discussed Wednesday, working with electronic files in a review tool is NOT just simply a matter of loading the files and getting started.  Electronic files are diverse and can represent a whole collection of issues to address in order to process them for loading.  To address those issues effectively, processing requires a sound process.

eDiscovery providers like (shameless plus warning!) CloudNine Discovery process electronic files regularly to enable their clients to work with those files during review and production.  As a result, we are aware of some of the information that must be provided by the client to ensure that the resulting processed data meets their needs and have created an EDD processing spec sheet to gather that information before processing.  Examples of information we collect from our clients:

  • Do you need de-duplication?  If so, should it performed at the case or the custodian level?
  • Should Outlook emails be extracted in MSG or HTM format?
  • What time zone should we use for email extraction?  Typically, it’s the local time zone of the client or Greenwich Mean Time (GMT).  If you don’t think that matters, consider this example.
  • Should we perform Optical Character Recognition (OCR) for image-only files that don’t have corresponding text?  If we don’t OCR those files, these could be responsive files that are missed during searching.
  • If any password-protected files are encountered, should we attempt to crack those passwords or log them as exception files?
  • Should the collection be culled based on a responsive date range?
  • Should the collection be culled based on key terms?

Those are some general examples for native processing.  If the client requests creation of image files (many still do, despite the well documented advantages of native files), there are a number of additional questions we ask regarding the image processing.  Some examples:

  • Generate as single-page TIFF, multi-page TIFF, text-searchable PDF or non text-searchable PDF?
  • Should color images be created when appropriate?
  • Should we generate placeholder images for unsupported or corrupt files that cannot be repaired?
  • Should we create images of Excel files?  If so, we proceed to ask a series of questions about formatting preferences, including orientation (portrait or landscape), scaling options (auto-size columns or fit to page), printing gridlines, printing hidden rows/columns/sheets, etc.
  • Should we endorse the images?  If so, how?

Those are just some examples.  Questions about print format options for Excel, Word and PowerPoint take up almost a full page by themselves – there are a lot of formatting options for those files and we identify default parameters that we typically use.  Don’t get me started.

We also ask questions about load file generation (if the data is not being loaded into our own review tool, OnDemand®), including what load file format is preferred and parameters associated with the desired load file format.

This isn’t a comprehensive list of questions we ask, just a sample to illustrate how many decisions must be made to effectively process electronic data.  Processing data is not just a matter of feeding native electronic files into the processing tool and generating results, it requires a sound process to ensure that the resulting output will meet the needs of the case.

So, what do you think?  How do you handle processing of electronic files?  Please share any comments you might have or if you’d like to know more about a particular topic.

P.S. – No hamsters were harmed in the making of this blog post.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Blog Throwback Thursdays – How things evolved, Part 2

So far in this blog series, we’ve taken a look at the ‘litigation support culture’ circa 1980, and we’ve covered how databases were built and used.  We’ve come a long way since then, and in last week’s blog, we started discussing how things have evolved.  In the next posts, we’ll continue discussion of things evolved, but first, if you missed the earlier posts in this series, they can be found here, here, here, here, here, here, here, here, here, and here.

Last week, I described the use of microfilm and microfiche to store document collections.  As most of you know, the next step in the evolution process was a move to storing documents as images.

This was a huge step in the world of litigation support, and honestly it was long overdue when it finally became adopted as a standard.  Like so many advancements, it was ‘looked at’ and ‘talked about’ for years before it became the norm.  One of the most significant hurdles was simply cost:  while the cost to scan documents to create images wasn’t much different than the costs to photocopy or film, image viewing technology was expensive.  Firms did not already have this technology, and corporate clients were not willing to bear the cost.  Eventually, however, it caught on.  By the late 1980’s more and more litigation teams were building databases with images.

There were other changes happening that helped this along – a couple of which meant using images only made sense:

  1. The use of computers in general was becoming more widespread.  Computers were no longer only used by large companies.  Small and mid-sized companies were using them.  PCs were introduced to the world so large main-frame computers and mini computers were not the only option. Desktop computers were becoming widespread.
  2. Because the use of computers was growing, more and more commercial software products were available, including commercial litigation support products.  Two of the first popular commercial products were Inmagic and BRS Search.

Because of these changes, technology use in law firms grew.  Law firms were buying computers for use by attorneys and paralegals.  Law firms started hiring IT staff.  Law firms started hiring litigation support professionals and buying litigation support software.  In short, law firms were developing internal resources to build and maintain databases.  They were creating an infrastructure that could support the use of images.

Including images in litigation support databases caused another shift in the way databases were used:  because the documents themselves were immediately available in a database, databases were being used more and more often directly by attorneys.  They were no longer a ‘back-office’ function.  For many years, it was common for law firms to have ‘walk-up’ litigation support stations, but these ‘walk-up’ stations were often used by attorneys, and eventually it became normal to see a computer on every desk in a law firm.

Tune in next week and we’ll continue discussion of how the litigation world circa 1980 evolved and got to where it is today.

Please let us know if there are eDiscovery topics you’d like to see us cover in eDiscoveryDaily.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

The Files are Already Electronic, How Hard Can They Be to Load? – Best of eDiscovery Daily

Come fly with me!  Today we are winding our way back home from Paris, by way of Frankfurt.  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is a topic that relates to a question that I get asked often.  Enjoy!

Since hard copy discovery became electronic discovery, I’ve worked with a number of clients who expect that working with electronic files in a review tool is simply a matter of loading the files and getting started.  Unfortunately, it’s not that simple!

Back when most discovery was paper based, the usefulness of the documents was understandably limited.  Documents were paper and they all required conversion to image to be viewed electronically, optical character recognition (OCR) to capture their text (though not 100% accurately) and coding (i.e., data entry) to capture key data elements (e.g., author, recipient, subject, document date, document type, names mentioned, etc.).  It was a problem, but it was a consistent problem – all documents needed the same treatment to make them searchable and usable electronically.

Though electronic files are already electronic, that doesn’t mean that they’re ready for review as is.  They don’t just represent one problem, they can represent a whole collection of problems.  For example:

These are just a few examples of why working with electronic files for review isn’t necessarily straightforward.  Of course, when processed correctly, electronic files include considerable metadata that provides useful information about how and when the files were created and used, and by whom.  They’re way more useful than paper documents.  So, it’s still preferable to work with electronic files instead of hard copy files whenever they are available.  But, despite what you might think, that doesn’t make them ready to review as is.

So, what do you think?  Have you encountered difficulties or challenges when processing electronic files?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

When Preparing Production Sets, Quality is Job 1 – Best of eDiscovery Daily

OK, I admit I stole that line from an old Ford commercial😉

France Strikes Back!  Today, we’re heading back to Paris for one final evening before heading home (assuming the Air France pilots let us).  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is a best practice topic for preparing production sets.  Enjoy!

Yesterday, we talked about addressing parameters of production up front to ensure that those requirements make sense and avoid foreseeable production problems well before the production step.  Today, we will talk about quality control (QC) mechanisms to make sure that the production is complete and accurate.

Quality Control Checks

There are a number of checks that can and should be performed on the production set, prior to producing it to the requesting party.  Here are some examples:

  • File Counts: The most obvious check you can perform is to ensure that the count of files matches the count of documents or pages you have identified to be produced.  However, depending on the production, there may be multiple file counts to check:
    • Image Files: If you have agreed with opposing counsel to produce images for all documents, then there will be a count of images to confirm.  If you’re producing multi-page image files (typically, PDF or TIFF), the count of images should match the count of documents being produced.  If you’re producing single-page image files (usually TIFF), then the count should match the number of pages being produced.
    • Text Files: When producing image files, you may also be producing searchable text files.  Again, the count should match either the documents (multi-page text files) or pages (single-page text files) with one possible exception.  If a document or page has no searchable text, are you still producing an empty file for those?  If not, you will need to be aware of how many of those instances there are and adjust the count accordingly to verify for QC purposes.
    • Native Files: Native files (if produced) are typically at the document level, so you would want to confirm that one exists for each document being produced.
    • Subset Counts: If the documents are being produced in a certain organized manner (e.g., a folder for each custodian), it’s a good idea to identify subset counts at those levels and verify those counts as well.  Not only does this provide an extra level of count verification, but it helps to find the problem more quickly if the overall count is off.
    • Verify Counts on Final Production Media: If you’re verifying counts of the production set before copying it to the media (which is common when burning files to CD or DVD), you will need to verify those counts again after copying to ensure that all files made it to the final media.
    • Sampling of Results: Unless the production is relatively small, it may be impractical to open every last file to be produced to confirm that it is correct.  If so, employ accepted statistical sampling procedures (such as those described here and here for searching) to identify an appropriate sample size and randomly select that sample to open and confirm that the correct files were selected, HASH values of produced native files match the original source versions of those files, images are clear and text files contain the correct text.
    • Redacted Files: If any redacted files are being produced, each of these (not just a sample subset) should be reviewed to confirm that redactions of privileged or confidential information made it to the produced file.  Many review platforms overlay redactions which have to be burned into the images at production time, so it’s easy for mistakes in the process to cause those redactions to be left out or burned in at the wrong location.  Very Important! – You also need to confirm that the redacted text has been removed from any text files that have been produced
    • Inclusion of Logs: Depending on agreed upon parameters, the production may include log files such as:
      • Production Log: Listing of all files being produced, with an agreed upon list of metadata fields to identify those files.
      • Privilege Log: Listing of responsive files not being produced because of privilege (and possibly confidentiality as well).  This listing often identifies the privilege being asserted for each file in the privilege log.
      • Exception Log: Listing of files that could not be produced because of a problem with the file.  Examples of types of exception files are included here.

Each production will have different parameters, so the QC requirements will differ, so these are examples, but not necessarily a comprehensive list of all potential QC checks to perform.

So, what do you think?  Can you think of other appropriate QC checks to perform on production sets?  If so, please share them!  As well as any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Production is the “Ringo” of the eDiscovery Phases – Best of eDiscovery Daily

 

God Save the Queen!  Today is our last full day in London and we’re planning to visit Westminster Abbey, which is where all of England’s kings and queens are crowned.  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is a topic where people can frequently make mistakes, causing production delays and costly rework.  Enjoy!

Most of the “press” associated with eDiscovery ranges from the “left side of the EDRM model” (i.e., Information Management, Identification, Preservation, Collection) through the stages to prepare materials for production (i.e., Processing, Review and Analysis).  All of those phases lead to one inevitable stage in eDiscovery: Production.  Yet, few people talk about the actual production step.  If Preservation, Collection and Review are the “John”, “Paul” and “George” of the eDiscovery process, Production is “Ringo”.

It’s the final crucial step in the process, and if it’s not handled correctly, all of the due diligence spent in the earlier phases could mean nothing.  So, it’s important to plan for production up front and to apply a number of quality control (QC) checks to the actual production set to ensure that the production process goes as smooth as possible.

Planning for Production Up Front

When discussing the production requirements with opposing counsel, it’s important to ensure that those requirements make sense, not only from a legal standpoint, but a technical standpoint as well.  Involve support and IT personnel in the process of deciding those parameters as they will be the people who have to meet them.  Issues to be addressed include, but not limited to:

  • Format of production (e.g., paper, images or native files);
  • Organization of files (e.g., organized by custodian, legal issue, etc.);
  • Numbering scheme (e.g., Bates labels for images, sequential file names for native files);
  • Handling of confidential and privileged documents, including log requirements and stamps to be applied;
  • Handling of redactions;
  • Format and content of production log;
  • Production media (e.g., CD, DVD, portable hard drive, FTP, etc.).

I was involved in a case a couple of years ago where opposing counsel was requesting an unusual production format where the names of the files would be the subject line of the emails being produced (for example, “Re: Completed Contract, dated 12/01/2011”).  Two issues with that approach: 1) The proposed format only addressed emails, and 2) Windows file names don’t support certain characters, such as colons (:) or slashes (/).  I provided that feedback to the attorneys so that they could address with opposing counsel and hopefully agree on a revised format that made more sense.  So, let the tech folks confirm the feasibility of the production parameters.

The workflow throughout the eDiscovery process should also keep in mind the end goal of meeting the agreed upon production requirements.  For example, if you’re producing native files with metadata, you may need to take appropriate steps to keep the metadata intact during the collection and review process so that the metadata is not inadvertently changed. For some file types, metadata is changed merely by opening the file, so it may be necessary to collect the files in a forensically sound manner and conduct review using copies of the files to keep the originals intact.

Tomorrow, we will talk about preparing the production set and performing QC checks to ensure that the ESI being produced to the requesting party is complete and accurate.

So, what do you think?  Have you had issues with production planning in your cases?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Proximity, Not Absence, Makes the Heart Grow Fonder – Best of eDiscovery Daily

 

God Save the Queen!  Today is our first full day in London and we’re planning to visit The Tower of London, which is only about a thousand years old.  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is a topic that has come up often as I work with clients and have referenced frequently over the years.  Enjoy!

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with an eDiscovery review application that has more precise search alternatives (at CloudNine Discovery, we use OnDemand®), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” may not retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Throwback Thursdays – How Things Evolved

So far in this blog series, we’ve taken a look at the ‘litigation support culture’ circa1980, and we’ve covered how databases were built and used.  We’ve come a long way since then, and in the next posts I’m going to talk a bit about how things evolved.  But first, if you missed the earlier posts in this series, they can be found here, here, here, here, here, here, here, here and here.

Litigators who used databases circa 1980 – for the most part – recognized a significant improvement in efficiencies.  As technology and approaches evolved over time, more efficiencies were realized.

One of the first big changes in how we worked was the use of microfilm.  Paper documents were still photocopied and coded, but microfilm became the preferred mechanism for storing and retrieving documents.  While the technology had been around for quite a long time, the litigation projects I worked on used paper repositories up until the early 1980s, which is when microfilm started to become the standard. This approach offered multiple advantages, the most significant being:

  1. It dramatically reduced the amount of space required to store a document collection.  The documents for a large case could be stored in a box or two rather than in a room or two. This also meant that it was reasonable to have multiple copies of a document collection stored in offices convenient for the litigation team, rather than a single, central repository of documents.
  2. Attorneys still used central repositories to handle large document pulls, but with microfilm It was faster and easier to retrieve those documents — turnaround time was much better.
  3. It preserved the integrity of the document collection.  Once a collection was filmed, pages wouldn’t be lost, shuffled, or damaged.

So, what is microfilm and how does it work?  Micro-reproductions of document pages are stored on reels of film.  Here’s a picture:

 

Those reels are labeled with the inclusive document number range.  Now — when doing a document pull – instead of locating a box and pulling a document to photocopy, you would locate a reel, thread it on a microfilm reader (see picture above), scroll to the correct frame, and hit a print button.

This approach evolved even further, and we started using microfiche.  The principle was the same, but the film was stored on cards instead of reels:

 

The cards were stored in sleeves labeled with the inclusive document numbers, and the cards were inserted into a microfiche reader.

Let me point out that microfilm and microfiche are still in use today in many libraries around the country.  Most libraries are no longer ‘filming’ new documents (they’re using imaging technology), but many still have historic collections of newspaper and magazine articles stored on microfilm or microfiche.

Tune in next week — we’ll continue discussion of how the litigation world circa 1980 evolved and got to where it is today.

Please let us know if there are eDiscovery topics you’d like to see us cover in eDiscoveryDaily.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Does Size Matter? – Best of eDiscovery Daily

Vive la France!  Today is our third full day in Paris and we’re planning to have lunch at the Eiffel Tower, which is really large.  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is one that has generated a lot of discussion over the years.  Enjoy!

I admit it, with a title like “Does Size Matter?”, I’m looking for a few extra page views.  😉

I frequently get asked how big does an ESI collection need to be to benefit from eDiscovery technology.  In a recent case with one of my clients, the client had a fairly small collection – only about 4 GB.  But, when a judge ruled that they had to start conducting depositions in a week, they needed to review that data in a weekend.  Without culling the data and using OnDemand® to manage the linear review, they would not have been able to make that deadline.  So, they clearly benefited from the use of eDiscovery technology in that case.

But, if you’re not facing a tight deadline, how large does your collection need to be for the use of eDiscovery technology to provide benefits?

I recently conducted a webinar regarding the benefits of First Pass Review – aka Early Case Assessment (ECA), or a more accurate term (as George Socha points out regularly), Early Data Assessment.  One of the topics discussed in that webinar was the cost of review for each gigabyte (GB).  Extrapolated from an analysis conducted by Anne Kershaw a few years ago (and published in the Gartner report E-Discovery: Project Planning and Budgeting 2008-2011), here is a breakdown:

Estimated Cost to Review All Documents in a GB:

  • Pages per GB:                   75,000
  • Pages per Document:        4
  • Documents Per GB:           18,750
  • Review Rate:                    50 documents per hour
  • Total Review Hours:          375
  • Reviewer Billing Rate:       $50 per hour

Total Cost to Review Each GB:      $18,750

Notes: The number of pages per GB can vary widely.  Page per GB estimates tend to range from 50,000 to 100,000 pages per GB, so 75,000 pages (18,750 documents) seems an appropriate average.  50 documents reviewed per hour is considered to be a fast review rate and $50 per hour is considered to be a bargain price.  eDiscovery Daily provided an earlier estimate of $16,650 per GB based on assumptions of 20,000 documents per GB and 60 documents reviewed per hour – the assumptions may change somewhat, but, either way, the cost for attorney review of each GB could be expected to range from at least $16,000 to $18,000, possibly more.

Advanced culling and searching capabilities of tools like OnDemand can enable you to cull out 70-80% of most collections as clearly non-responsive without having to conduct attorney review on those files.  If you have merely a 2 GB collection and assume the lowest review cost above of $16,000 per GB, the use of an ECA tool to cull out 70% of the collection can save $22,400 in attorney review costs.  Is that worth it?

So, what do you think?  Do you use eDiscovery technology for only the really large cases or ALL cases?   Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Don’t Get “Wild” with Wildcards – Best of eDiscovery Daily

 

Vive la France!  Today is our second full day in Paris and we’re planning to visit Versailles, which Marie Antoinette loved so much, she lost her head.  For the next two weeks except for Jane Gennarelli’s Throwback Thursday series, we will be re-publishing some of our more popular and frequently referenced posts.  Today’s post is one that we published on our very first day and have referenced frequently over the years.  Enjoy!

Several months ago, I provided search strategy assistance to a client that had already agreed upon several searches with opposing counsel.  One search related to mining activities, so the attorney decided to use a wildcard of “min*” to retrieve variations like “mine”, “mines” and “mining”.

That one search retrieved over 300,000 files with hits.

Why?  Because there are 269 words in the English language that begin with the letters “min”.  Words like “mink”, “mind”, “mint” and “minion” were all being retrieved in this search for files related to “mining”.  We ultimately had to go back to opposing counsel and negotiate a revised search that was more appropriate.

How do you ensure that you’re retrieving all variations of your search term?

Stem Searches

One way to capture the variations is with stem searching.  Applications that support stem searching give you an ability to enter the root word (e.g., mine) and it will locate that word and its variations.  Stem searching provides the ability to find all variations of a word without having to use wildcards.

Other Methods

If your application doesn’t support stem searches, Morewords.com shows list of words that begin with your search string (e.g., to get all 269 words beginning with “min”, go here – simply substitute any characters for “min” to see the words that start with those characters).  Choose the variations you want and incorporate them into the search instead of the wildcard – i.e., use “(mine or “mines or mining)” instead of “min*” to retrieve a more relevant result set.

Some applications let you select the wildcard variations you wish to use.  OnDemand® enables you to type in the wildcard string, display all the words – in your collection – that begin with that string, and select the variations on which to search.  As a result, you can avoid all of the non-relevant variations and limit the search to the relevant hits.

So, what do you think?  Have you ever been “burned” by wildcard searching?  Do you have any other suggested methods for effectively handling them?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily is Four Years Old!

Believe it or not, it has been four years ago this past Saturday since we launched the eDiscovery Daily blog!

When we launched nearly four years ago on September 20, 2010, our goal was to be a daily resource for eDiscovery news and analysis.  Now, we’ve done so for four years.  We’ve lasted as long as many presidential administrations (and probably worked WAY more days).  We hit over 300,000 visits to the site in May and, earlier this month published our 1,000th post!  And, every post we have published is still available on the site for your reference, which has made eDiscovery Daily into quite a knowledgebase!  We’re quite proud of that.

Comparing our first three months of existence to now, we have seen traffic on our site grow an amazing 474%!  Our subscriber base has more than tripled in the last three years!  And, as always, we have you to thank for that!  Thanks for making the eDiscoveryDaily blog a regular resource for your eDiscovery news and analysis!  We really appreciate the support!

As many of you know by now, we like to take a look back every six months at some of the important stories and topics during that time.  So, here are some posts over the last six months you may have missed.  Enjoy!

After 2,354 Public Comments, One Major Change to the Proposed Federal Rules: After lots of controversy about Rule 37(e), two subcommittees made significant changes to the rule.  It was amended again later.

Definition of “Electronic Storage” Considered in Invasion of Privacy Lawsuit: There goes the Stored Communications Act again.

Daughter’s Facebook Post Voids $80,000 Settlement: Moral of the story – don’t publicly gloat if your dad has a non-disclosure agreement as part of the settlement.

New California Proposed Opinion Requires eDiscovery Competence: Shouldn’t every state have one on these?

How to “Alert” Yourself to Interesting eDiscovery News and Announcements: Here’s where I get some of my topic ideas.

The Mergers and Acquisitions Keep on Coming: Thanks to Rob Robinson and my boss, Brad Jenkins, you’ll be able to tell the players with or without a scorecard.

Surprisingly Few States Have an Ethics Opinion Regarding Lawyer Cloud Usage: Go figure.

Everything You Wanted to Know about Forms of Production, Don’t Be Afraid to Ask: Leave it to Craig Ball to provide an extremely useful guide.

The Pitfalls of Self-Culling and Image Files: Why having custodians do their own self-collection and culling may be a bad idea.

Want to Craft Better Searches? Use a Dictionary: Without it, you can retrieve too many non-responsive documents or, even worse, miss some important ones.

Are eDiscovery Vendor Fees “Unconscionable”?: This is why it pays to compare rates.

Failure to Preserve Cloud-Based Data Results in Severe Sanction for Defendant: You are still responsible for your Salesforce.com database, even if you don’t store it.

Production from a Provider’s Point of View:  What does an eDiscovery provider need to know when producing your data?  Get answers here and here.

It’s Friday at 5 and I Need Data Processed to Review this Weekend: It might be funnier if it didn’t actually happen sometimes.

When Reviewing and Producing Documents, Don’t Forget the “Mother and Child Reunion”: It’s easy to leave out “family” members of responsive files if you’re not careful.

Unfortunately, we also lost two amazing eDiscovery pioneers in the past few months: Richard G. Braman and Browning Marean.

This is just a sampling of topics that we’ve covered.  Hope you enjoyed them!

For the next two weeks (other than Jane’s Thursday posts), we will be re-publishing a few of our popular posts from the past as I will be on my honeymoon!  While blog editors don’t need as much vacation as US presidents do, we still need a break every once in a while.  🙂

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.