eDiscovery Daily Blog

eDiscovery Searching: Types of Exception Files

Friday, we talked about how to address the handling of exception files through agreement with opposing counsel (typically, via the meet and confer) to manage costs and avoid the potential for spoliation claims.  There are different types of exception files that might be encountered in a typical ESI collection and it’s important to know how those files can be recovered.

Types of Exception Files

It’s important to note that efforts to “fix” these files will often also change the files (and the metadata associated with them), so it’s important to establish with opposing counsel what measures to address the exceptions are acceptable.  Some files may not be recoverable and you need to agree up front how far to go to attempt to recover them.

  • Corrupted Files: Files can become corrupted for a variety of reasons, from application failures to system crashes to computer viruses.  I recently had a case where 40% of the collection was contained in 2 corrupt Outlook PST files – fortunately, we were able to repair those files and recover the messages.  If you have readily accessible backups of the files, try to restore them from backup.  If not, you will need to try using a repair utility.  Outlook comes with a utility called SCANPST.EXE that scans and repairs PST and OST files, and there are utilities (including freeware utilities) available via the web for most file types.  If all else fails, you can hire a data recovery expert, but that can get very expensive.
  • Password Protected Files: Most collections usually contain at least some password protected files.  Files can require a password to enable them to be edited, or even just to view them.  As the most popular publication format, PDF files are often password protected from editing, but they can still be viewed to support review (though some search engines may fail to index them).  If a file is password protected, you can try to obtain the password from the custodian providing the file – if the custodian is unavailable or unable to remember the password, you can try a password cracking application, which will run through a series of character combinations to attempt to find the password.  Be patient, it takes time, and doesn’t always succeed.
  • Unsupported File Types: In most collections, there are some unusual file types that aren’t supported by the review application, such as files for legacy or specialized applications (e.g., AutoCad for engineering drawings).  You may not even initially know what type of files they are; if not, you can find out based on file extension by looking the file extension up in FILExt.  If your review application can’t read the files, it also can’t index the files for searching or display them for review.  If those files may be responsive to discovery requests, review them with the native application to determine their relevancy.
  • No-Text Files: Files with no searchable text aren’t really exceptions – they have to be accounted for, but they won’t be retrieved in searches, so it’s important to make sure they don’t “slip through the cracks”.  It’s common to perform Optical Character Recognition (OCR) on TIFF files and image-only PDF files, because they are common document formats.  Other types of no-text files, such as pictures in JPEG or PNG format, are usually not OCRed, unless there is an expectation that they will have significant text.

It’s important for review applications to be able to identify exception files, so that you know they won’t be retrieved in searches without additional processing.  FirstPass™, powered by Venio FPR™, is one example of an application that will flag those files during processing and enable you to search for those exceptions, so you can determine how to handle them.

So, what do you think?  Have you encountered other types of exceptions?  Please share any comments you might have or if you’d like to know more about a particular topic.