Text Duplicates Report
IN THIS ARTICLE:
The Text Duplication algorithm only operates on the extracted text of a document, excluding signature images, embedded objects, and other heavily duplicative content.
Text Duplicates (TD) summary
The email threading summary shows the "headline" (8.50% DOCUMENT REDUCTION) and other major information for this algorithm's set run.
Algorithm Name and Set ID |
The algorithm name and the system-generated, unique code for this set's algorithm run. Use this number to search for new Text Duplicate fields after they have been overlaid into Relativity. |
% DOCUMENT REDUCTION
|
The percent reduction when non-pivot documents are removed. Document reduction looks at only pivot documents. The pivot is the first document in an email chain that combines to form a text-duplicate group. It’s the point of reference for the other documents in the group. If you only review the pivot from a set, you will see a reduction of this much. Lighthouse doesn't recommend removing the documents but instead recommends using the neat duplicate groupings to expedite review. |
Pivots |
The number of documents that are first identified in a TD group. |
Non Pivots |
The number of documents that aren't pivots that were analyzed for TD. |
Ignored/Exceptions |
The number of documents excluded or exceptions in this run. Exclusions are due to large/small settings or the exclusion of emails from ND runs. |
Total |
The total number of documents run over. |
AVERAGE GROUP SIZE |
The average number of documents in a TD group. So in this set, the average size is, roughly, one document, or more exactly, 1.09. |
Text Duplicates (TD) details
You can use the carousel controls to advance to the Text Duplicates Details report.
Algorithm Name and Code |
The algorithm that was run and the system-generated code for this group. This example shows (MATD-262) |
SCOPE SUMMARY |
Analyzed—emails, attachments, and loose files compared Ignored/Exceptions—Large documents ignored, Small documents ignored, and Exceptions Total Documents—The total number of documents. |
TEXT DUPLICATE RESULTS |
Pivots—the total text duplicate pivots identified then a breakdown of emails, attachments, and loose files Non-pivots—the total text duplicate non-pivots identified then a breakdown of emails, attachments, and loose files Ignored/Exceptions—documents ignored or excepted Total Documents—total number of documents analyzed |
Group Count Summary |
Shows a group distribution by size of the group. Large groups can be easily reviewed together to expedite review. |
AVERAGE GROUP SIZE |
The average number of documents in a TD group. So in this set, the average size is, roughly, two documents, or more exactly, 1.09. |