Near Duplicates Report

IN THIS ARTICLE:

The Near Duplication algorithm operates only on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.

Near Duplicates (ND) summary card

The email threading summary shows the "headline," 33.06% Document Reduction, and other summary information for the Near Duplicate algorithm set run.

Algorithm Name and Set ID	The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Near Duplicate fields after they have been overlaid into Relativity.
% DOCUMENT REDUCTION	The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The first document in a near-duplicate group is called the pivot. It’s the point of reference for the other documents in the group. If you only review the pivot from a set you will see a reduction of this much. We recommend removing the documents but instead recommend using the near duplicate groupings to expedite review.
Pivots	The number of documents that are first identified in an ND group
Non Pivots	The number of documents that aren't pivots that were analyzed for ND
Ignored/Exceptions	The number of documents excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs.
Total	Total number of documents run over
AVERAGE GROUP SIZE	The average number of documents in an ND group
SIMILARITY THRESHOLD	The percent similarity threshold that was set on the Create New Set page

Near Duplicates (ND) details

Use the carousel controls to advance to the Near Duplicates Details report.

Near Duplicates Details Example

Near Duplicates Details

Scope Summary	Analyzed—emails, attachments, and loose files compared Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions Total Documents—The total number of documents
reduction *	Graphic showing reduction due to identifying and grouping near-duplicates
Near Duplicate Results	Pivots—the total near-duplicate pivots identified then a breakdown of emails, attachments and loose files Non-pivots—the total near-duplicate non-pivots identified then a breakdown of emails, attachments and loose files Ignored/Exceptions—documents ignored or excepted Total Documents—total number of documents analyzed
Group Count Summary	Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review.
Average group size	The average number of documents in an ND group

Back to top

Last updated on August 21, 2023

Near Duplicates Report

Near Duplicates (ND) summary card

Near Duplicates (ND) details

Back to top

Related Articles