eDiscovery Daily Blog
eDiscovery Trends: First Pass Review – Fuzzy Searching Your Opponent’s Data
Even those of us at eDiscoveryDaily have to take an occasional vacation; however, instead of “going dark” for the week, we thought we would republish a post series from the early days of the blog (when we didn’t have many readers yet) So chances are, you haven’t seen these posts yet! Enjoy!
Tuesday, we talked about the use of First Pass Review (FPR) applications (such as FirstPass®, powered by Venio FPR™) to not only conduct first pass review of your own collection, but also to analyze your opponent’s ESI production. One way to analyze that data is through synonym searching to find variations of your search terms to increase the possibility of finding the terminology used by your opponents.
Fuzzy Searching
Another type of analysis is the use of fuzzy searching. Attorneys know what terms they’re looking for, but those terms may not often be spelled correctly. Also, opposing counsel may produce a number of image only files that require Optical Character Recognition (OCR), which is usually not 100% accurate.
FirstPass supports "fuzzy" searching, which is a mechanism by finding alternate words that are close in spelling to the word you're looking for (usually one or two characters off). FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for the term “petroleum”, you can find variations such as “peroleum”, “petoleum” or even “petroleom” – misspellings or OCR errors that could be relevant. Then, simply select the variations you wish to include in the search. Fuzzy searching is the best way to broaden your search to include potential misspellings and OCR errors and FirstPass provides a terrific capability to select those variations to review additional potential “hits” in your collection.
Tomorrow, I’ll talk about the use of domain categorization to quickly identify potential inadvertent disclosures and weed out non-responsive files produced by your opponent, based on the domain of the communicators. Hasta la vista, baby! J
In the meantime, what do you think? Have you used fuzzy searching to find misspellings or OCR errors in an opponent’s produced ESI? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.