What is Document Clustering?
How the documents are organized for review can make a BIG difference in the efficiency of review, not only saving costs, but also improving accuracy by assigning similar documents to the same reviewer. Clustering software examines the text in your documents, determines which documents are related to each other, and groups them into clusters. Clustering makes it easy to explore and categorize “big data” sets of documents, bringing efficiency to the electronic discovery process.
Clustering does the electronic equivalent of putting your documents into labeled boxes so that things only end up in the same box if they belong together. This allows you to explore and manage your documents by browsing through a relatively small set of boxes (clusters) instead of digging through the much bigger data set of documents directly.
It organizes the documents according to the structure that arises naturally, without preconceptions or query terms. It labels each cluster with a set of keywords, providing a quick overview of the cluster. It also identifies a “representative document” that can be used as a proxy for the cluster.
More Efficient Review
- Apply tags to a single document, a cluster of documents, or a group of clusters with a single mouse click, greatly reducing labor during the review process.
- Review similar documents together to reveal relationships and context.
- Identify near-duplicates and review them as a group or individually.
Understand Your Document Set
- Quickly identify major topics and sub-topics.
- Prioritize documents so you can analyze the most promising items first, and begin planning your case right away.
Reduce Risk of Errors
- Improve consistency and reduce errors because similar documents are processed together.
- Find evidence that may be missed by a search query, e.g. due to synonyms.
Define the hierarchy of tags that you want to apply to your documents; anything from “evidence of price manipulation” to “privileged” or “work product.” Then apply the tags to an individual document, all documents in a cluster, or all documents in all clusters labeled with a specific set of keywords. With document clustering, you can tag hundreds of documents with just a few mouse clicks, deciding whether a cluster containing a thread of emails or a set of revisions to an acquisition proposal should be treated as a single entity, or whether the items within the cluster should be handled individually. Duplicate or near-duplicate documents will appear in the same cluster, enabling you to decide whether or not to treat them as a unit.
Clustering provides significant benefits even if you decide to click into each cluster to review and tag the documents one by one.
- First, you can start with the clusters having the most promising keywords, helping you to find the most important evidence early, which gives you more time to think about your strategy for the case.
- Second, you will tag the documents more consistently because you will be working on sets of similar documents instead of jumping from one topic to another.
- Finally, having similar items grouped together can provide additional insight, because you can compare different versions of a document or see the replies to an email.
Search Is Not Enough
Search engines are great tools when you know what you are looking for, but they aren’t very good for viewing the structure of your big data set or discovering things that you aren’t looking for explicitly. You can miss important evidence if you don’t construct your search query carefully (e.g. failing to account for synonyms). Integrated with your search engine, clustering allows you to see the overall structure of your document set and browse as deep into it as you want. When you find an important piece of evidence, other documents in the same cluster are probably important too, so clustering can help you find evidence that doesn’t match your search query exactly.
Real World Examples
- Look for responsive documents missed during manual review. One option is to use clustering to find documents marked as non-responsive that are similar to responsive ones and re-review them to see if there are errors in the tagging.
- Look for responsive documents missed by keyword search. Another option is to take a case where a search engine query was used to choose documents for production and use clustering to find similar documents (that didn’t match the query) and review them to see if responsive documents had been missed.
CLOUDNINE SOFTWARE FEATURES
- Free Trial
- Full Self-Service
- Upload Your Own Data
- Extract Metadata
- Extract/Index Text
- Extract Email Attachments
- Near-Native Rendering (HTML)
- Early Data Assessment
- Create Load Files
- Customizable Data Views
- Data Filtering and Analytics
- Annotate and Redact (TIFF/PDF)
- User Defined Fields and Ordering
- Group Tagging
- Unicode Support
- Simultaneous Index Field and Full-Text Searching
- Fuzzy, Synonym, and Proximity Searching
- Hit Highlighting on Text, HTML and Image (PDF) Tabs
- Save Function for Both User and Global Search Definitions
- Batch Export of Documents
- Online, Real-time, Custom and Printable Doc Audit Reports
- Free Help and Support
- Add Own Users and Maintain Access Rights
- User-based Security Profiles (Down to Field Level)
- Protected Cloud (Dedicated Servers/Tier IV Data Center)
- Full Portfolio of Professional Services
PARTNERS, LEADING LAW FIRMS AND CORPORATIONS USING CLOUDNINE
INDUSTRY AFFIRMATIONS OF CLOUDNINE
WHAT CLIENTS ARE SAYING ABOUT CLOUDNINE
This software is easy to use and allows us to upload and download documents as they become ready, saving us both time and money.
Great E-Discovery Company!
We have worked with CloudNine on several cases, and have been quite happy with the service that we have received. Their support staff is responsive, friendly and helpful in assisting me with providing the technical information that my clients need. Their review platform, CloudNine, is robust, easy to use and ideal for a firm like ours where we don’t have to buy and support the software.
Great database for both small and voluminous litigation matters.
User friendly software and great customer service!
Reviewed, analyzed and tagged 100,000+ documents as part of class action law suit.
Overall great experience. Software is very user-friendly. Search filters were very helpful. Customer service was always quick to diagnose and solve any issues I ran into or answer any questions I had. I would definitely use this product and CloudNine in the future.
Amazing People, Great Software
I contacted CloudNine with a last-minute request for help with the hands-on portion of an e-discovery class I taught at the law school with Donna Chesteen. The staff at CloudNine were amazing. They set up a custom data set for the students to search against, and spent time teaching Donna and me the basics of their system, and had us up and running in less than a day. The students were able to quickly learn the search tools and produce meaningful results for purposes of the class with minimal effort. We also demoed other cloud-based software for the class, but the students clearly preferred the CloudNine system. Cloud Nine is straightforward enough to figure out on your own, but powerful enough to handle the most complicated searches. Having seen it in action in the classroom, I am making the switch to CloudNine in my private practice.