Near Duplicates Report

IN THIS ARTICLE:

The Near Duplication algorithm operates only on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.

Near Duplicates (ND) summary card

The email threading summary shows the "headline," 33.06% Document Reduction, and other summary information for the Near Duplicate algorithm set run.

Algorithm Name and Set ID

The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Near Duplicate fields after they have been overlaid into Relativity.

% DOCUMENT REDUCTION

The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The first document in a near-duplicate group is called the pivot. It’s the point of reference for the other documents in the group. If you only review the pivot from a set you will see a reduction of this much. We recommend removing the documents but instead recommend using the near duplicate groupings to expedite review.

Pivots   

The number of documents that are first identified in an ND group

Non Pivots

The number of documents that aren't pivots that were analyzed for ND

Ignored/Exceptions

The number of documents  excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs.

Total

Total number of documents run over

AVERAGE GROUP SIZE

The average number of documents in an ND group

SIMILARITY THRESHOLD

The percent similarity threshold that was set on the Create New Set page

Near Duplicates (ND) details

Use the carousel controls to advance to the Near Duplicates Details report.

Near Duplicates Details Example

Near Duplicates Details

Scope Summary

Analyzed—emails, attachments, and loose files compared

Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions

Total Documents—The total number of documents

reduction * 

Graphic showing reduction due to identifying and grouping near-duplicates

Near Duplicate Results

Pivots—the total near-duplicate pivots identified then a breakdown of emails, attachments and loose files

Non-pivots—the total near-duplicate non-pivots identified then a breakdown of emails, attachments and loose files

Ignored/Exceptions—documents ignored or excepted

Total Documents—total number of documents analyzed

Group Count Summary

Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review.

Average group size

The average number of documents in an ND group

Back to top