What is Document Clustering?
How the documents are organized for review can make a BIG difference in the efficiency of review, not only saving costs, but also improving accuracy by assigning similar documents to the same reviewer. Clustering software examines the text in your documents, determines which documents are related to each other, and groups them into clusters. Clustering makes it easy to explore and categorize “big data” sets of documents, bringing efficiency to the electronic discovery process.
Clustering does the electronic equivalent of putting your documents into labeled boxes so that things only end up in the same box if they belong together. This allows you to explore and manage your documents by browsing through a relatively small set of boxes (clusters) instead of digging through the much bigger data set of documents directly.
It organizes the documents according to the structure that arises naturally, without preconceptions or query terms. It labels each cluster with a set of keywords, providing a quick overview of the cluster. It also identifies a “representative document” that can be used as a proxy for the cluster.
More Efficient Review
- Apply tags to a single document, a cluster of documents, or a group of clusters with a single mouse click, greatly reducing labor during the review process.
- Review similar documents together to reveal relationships and context.
- Identify near-duplicates and review them as a group or individually.
Understand Your Document Set
- Quickly identify major topics and sub-topics.
- Prioritize documents so you can analyze the most promising items first, and begin planning your case right away.
Reduce Risk of Errors
- Improve consistency and reduce errors because similar documents are processed together.
- Find evidence that may be missed by a search query, e.g. due to synonyms.
Define the hierarchy of tags that you want to apply to your documents; anything from “evidence of price manipulation” to “privileged” or “work product.” Then apply the tags to an individual document, all documents in a cluster, or all documents in all clusters labeled with a specific set of keywords. With document clustering, you can tag hundreds of documents with just a few mouse clicks, deciding whether a cluster containing a thread of emails or a set of revisions to an acquisition proposal should be treated as a single entity, or whether the items within the cluster should be handled individually. Duplicate or near-duplicate documents will appear in the same cluster, enabling you to decide whether or not to treat them as a unit.
Clustering provides significant benefits even if you decide to click into each cluster to review and tag the documents one by one.
- First, you can start with the clusters having the most promising keywords, helping you to find the most important evidence early, which gives you more time to think about your strategy for the case.
- Second, you will tag the documents more consistently because you will be working on sets of similar documents instead of jumping from one topic to another.
- Finally, having similar items grouped together can provide additional insight, because you can compare different versions of a document or see the replies to an email.
Search Is Not Enough
Search engines are great tools when you know what you are looking for, but they aren’t very good for viewing the structure of your big data set or discovering things that you aren’t looking for explicitly. You can miss important evidence if you don’t construct your search query carefully (e.g. failing to account for synonyms). Integrated with your search engine, clustering allows you to see the overall structure of your document set and browse as deep into it as you want. When you find an important piece of evidence, other documents in the same cluster are probably important too, so clustering can help you find evidence that doesn’t match your search query exactly.
Real World Examples
- Look for responsive documents missed during manual review. One option is to use clustering to find documents marked as non-responsive that are similar to responsive ones and re-review them to see if there are errors in the tagging.
- Look for responsive documents missed by keyword search. Another option is to take a case where a search engine query was used to choose documents for production and use clustering to find similar documents (that didn’t match the query) and review them to see if responsive documents had been missed.
- Free Trial
- Cloud-based
- Full Self-Service
- Upload Your Own Data
- Extract Metadata
- Extract/Index Text
- Extract Email Attachments
- Near-Native Rendering (HTML)
- Deduplication
- Early Data Assessment
- Create Load Files
- Customizable Data Views
- Data Filtering and Analytics
- Annotate and Redact (TIFF/PDF)
- User Defined Fields and Ordering
- Group Tagging
- Unicode Support
- Simultaneous Index Field and Full-Text Searching
- Fuzzy, Synonym, and Proximity Searching
- Hit Highlighting on Text, HTML and Image (PDF) Tabs
- Save Function for Both User and Global Search Definitions
- Batch Export of Documents
- Online, Real-time, Custom and Printable Doc Audit Reports
- Free Help and Support
- Add Own Users and Maintain Access Rights
- User-based Security Profiles (Down to Field Level)
- Protected Cloud (Dedicated Servers/Tier IV Data Center)
- Full Portfolio of Professional Services


Great value product.
“Offers the major features we were looking for, at a fraction of pricing of other competitors.”
I used CloudNine as part of fraud investigation for email searches.
“…The tag function made it easy to flag the search results. I was impressed with the ease of use for a first-time user. The speed and ease of loading data and being able to review it immediately is a tremendous advantage over other Cloud-based platforms.”
Excellent tool with outstanding support
“CloudNine Review is excellent, it takes the best of the (market leader) review solution and leaves out all of the fiddly bits that make that product excruciating to use. Their upload and processing is automatic, and their pricing structure is the best I’ve seen.”
Great software that is easy to log on, user-friendly, has a great layout, and is easy to navigate.
“…CloudNine is great at searching documents, including tagging, and exporting. Software tailored to our business needs and streamlined the task at hand.”
Discovery Production
This software is easy to use and allows us to upload and download documents as they become ready, saving us both time and money.