Near Duplicates Report
IN THIS ARTICLE:
The Near Duplication algorithm operates only on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.
Near Duplicates (ND) summary card
The email threading summary shows the "headline," 33.06% Document Reduction, and other summary information for the Near Duplicate algorithm set run.
Algorithm Name and Set ID |
The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Near Duplicate fields after they have been overlaid into Relativity. |
% DOCUMENT REDUCTION
|
The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The first document in a near-duplicate group is called the pivot. It’s the point of reference for the other documents in the group. If you only review the pivot from a set you will see a reduction of this much. We recommend removing the documents but instead recommend using the near duplicate groupings to expedite review. |
Pivots |
The number of documents that are first identified in an ND group |
Non Pivots |
The number of documents that aren't pivots that were analyzed for ND |
Ignored/Exceptions |
The number of documents excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs. |
Total |
Total number of documents run over |
AVERAGE GROUP SIZE |
The average number of documents in an ND group |
SIMILARITY THRESHOLD |
The percent similarity threshold that was set on the Create New Set page |
Near Duplicates (ND) details
Use the carousel controls to advance to the Near Duplicates Details report.
Near Duplicates Details Example
Near Duplicates Details
Scope Summary |
Analyzed—emails, attachments, and loose files compared Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions Total Documents—The total number of documents |
reduction *
|
Graphic showing reduction due to identifying and grouping near-duplicates |
Near Duplicate Results |
Pivots—the total near-duplicate pivots identified then a breakdown of emails, attachments and loose files Non-pivots—the total near-duplicate non-pivots identified then a breakdown of emails, attachments and loose files Ignored/Exceptions—documents ignored or excepted Total Documents—total number of documents analyzed |
Group Count Summary |
Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review. |
Average group size |
The average number of documents in an ND group |