eDiscovery Daily Blog

eDiscovery Trends: When you DE-NIST, A Lot May Be Missed


eDiscovery Daily has referenced several articles in the past by Craig Ball, including this one and this one, and also conducted a thought leader interview with him at LegalTech New York earlier this year.  Craig regularly has great observations about eDiscovery trends that are not talked about in other forums, so I try to “keep tabs” on his articles and provide some of those useful insights to this blog.

Last week on his blog, “Ball in your court”, Craig discussed shortcomings associated with “DE-NISTing”, which is the process of removing files from review that are standard components of the computer’s operating system and off-the-shelf software applications such as Microsoft Office applications.  There’s no need to review these files as they are considered system files and would not generally contain work product of the user.  These files are identified by their known HASH values that uniquely identify their content and matched against a list maintained by the National Software Reference Library, a branch of the National Institute for Standards and Technology (NIST – hence the term “DE-NISTing” to reference removing these files from the review set).

While the NIST list is updated four times per year, Craig was noting that a number of these system files were not being removed during the “DE-NISTing” process on workstations using Windows 7 and the latest release of Microsoft Office.  So, Craig ran a test by performing a “pristine install” of Windows 7 on a “sterile” hard drive, which consisted of 47,690 files.  Of those, only 7,277 were removed during “DE-NISTing”, meaning that 85% of the files were not removed during this process and could be left in the review set if not removed via any other means.

Why were so many files missed?  Evidently, the NIST list does not yet include Windows 7 files, despite the fact that there are more than 350 million workstations that run Windows 7.  It also doesn’t include Microsoft Office 2010 files yet either.  So, the NIST list is not as up to date as it could be.

As a result, several service providers supplement the NIST list with other files, but as Craig notes, it’s important to be able to trace and defend the supplemented list if required and not try to pass it off as the official NIST list (which Craig likens to selling a “Prada knockoff”).

Supplementing the NIST list by removing system files such as EXE and DLL files is a clearly documentable method to reduce the number of files in the review set.  This method doesn’t depend on HASH values and, assuming that these file types are not responsive (which is usually the case) can be an effective method for eliminating files to review.

So, what do you think? Do you depend on the NIST list to remove files from review sets?  Do you use any supplemental methods for further reducing these sets?  Please share any comments you might have or if you'd like to know more about a particular topic.