Text Duplicates Report

IN THIS ARTICLE:

The Text Duplication algorithm only operates on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.

Text Duplicates (TD) summary

The email threading summary shows the "headline" (8.50% DOCUMENT REDUCTION) and other major information for this algorithm's set run.

Algorithm Name and Set ID

The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Text Duplicate fields after they have been overlaid into Relativity.

% DOCUMENT REDUCTION

The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The pivot is the first document in an email chain that combines to form a text-duplicate group. It’s the point of reference for the other documents in the group. If you only review the pivot from a set, you will see a reduction of this much. Lighthouse doesn't recommend removing the documents but instead recommends using the neat duplicate groupings to expedite review.

Pivots   

The number of documents that are first identified in a TD group. 

Non Pivots

The number of documents that aren't pivots that were analyzed for TD. 

Ignored/Exceptions

The number of documents excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs. 

Total

The total number of documents run over.

AVERAGE GROUP SIZE

The average number of documents in a TD group. So in this set, the average size is, roughly, one document, or more exactly, 1.09.

Text Duplicates (TD) details

You can use the carousel controls to advance to the Text Duplicates Details report.

Algorithm Name and Code

The algorithm that was run and the system-generated code for this group. This example shows (MATD-262)

SCOPE SUMMARY

Analyzed—emails, attachments, and loose files compared

Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions

Total Documents—The total number of documents. 

TEXT DUPLICATE RESULTS

Pivots—the total text duplicate pivots identified then a breakdown of emails, attachments, and loose files

Non-pivots—the total text duplicate non-pivots identified then a breakdown of emails, attachments, and loose files

Ignored/Exceptions—documents ignored or excepted

Total Documents—total number of documents analyzed

Group Count Summary

Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review.

AVERAGE GROUP SIZE

The average number of documents in a TD group. So in this set, the average size is, roughly, two documents, or more exactly, 1.09.

Back to top