Text Duplicates Report

IN THIS ARTICLE:

The Text Duplication algorithm only operates on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.

Text Duplicates (TD) summary

The email threading summary shows the "headline" (8.50% DOCUMENT REDUCTION) and other major information for this algorithm's set run.

Algorithm Name and Set ID	The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Text Duplicate fields after they have been overlaid into Relativity.
% DOCUMENT REDUCTION	The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The pivot is the first document in an email chain that combines to form a text-duplicate group. It’s the point of reference for the other documents in the group. If you only review the pivot from a set, you will see a reduction of this much. Lighthouse doesn't recommend removing the documents but instead recommends using the neat duplicate groupings to expedite review.
Pivots	The number of documents that are first identified in a TD group.
Non Pivots	The number of documents that aren't pivots that were analyzed for TD.
Ignored/Exceptions	The number of documents excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs.
Total	The total number of documents run over.
AVERAGE GROUP SIZE	The average number of documents in a TD group. So in this set, the average size is, roughly, one document, or more exactly, 1.09.

Text Duplicates (TD) details

You can use the carousel controls to advance to the Text Duplicates Details report.

Algorithm Name and Code	The algorithm that was run and the system-generated code for this group. This example shows (MATD-262)
SCOPE SUMMARY	Analyzed—emails, attachments, and loose files compared Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions Total Documents—The total number of documents.
TEXT DUPLICATE RESULTS	Pivots—the total text duplicate pivots identified then a breakdown of emails, attachments, and loose files Non-pivots—the total text duplicate non-pivots identified then a breakdown of emails, attachments, and loose files Ignored/Exceptions—documents ignored or excepted Total Documents—total number of documents analyzed
Group Count Summary	Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review.
AVERAGE GROUP SIZE	The average number of documents in a TD group. So in this set, the average size is, roughly, two documents, or more exactly, 1.09.

Text Duplicates Report

Text Duplicates (TD) summary

Text Duplicates (TD) details

Back to top

Related Articles