Create New Set

IN THIS ARTICLE:

Creating and running a new set adds it to the Sets list, creates a report with clickable links to document groups, adds new fields in Relativity (if overlay is turned on, which it is by default), and enables Lighthouse's proprietary Email Thread Viewer. On the Sets page, click the CREATE NEW SET button to run a new analytics set over a saved search. Doing so adds a new set to this page's list, creates interactive reports with clickable links to document groups, adds new Lighthouse Analytics fields in Relativity, and enables the Lighthouse Email Thread Viewer, where you can clearly see email threads and conversations.

NOTE: After successfully running an analytics set, if "Automatically Overlay All Results" was not on, you must manually overlay the results on the Sets page to overlay the set's algorithm report and field results.

WHO CAN PERFORM:

To create and manage analytics sets, users need the Create and manage sets permission. To toggle the overlay settings, users need the Overlay results permission. Note that the overlay is turned on by default, and users without the overlay permission cannot edit this setting while creating a set, meaning they cannot turn it off.

Quick steps to run an MA Set

Use these steps to set up and run an algorithm over a saved search:

  1. On the Sets page, click the CREATE NEW SET button.
  2. On the Create New Set page, fill in the fields and select the algorithm(s) you wish to run from the ADD ALGORITHM drop-down list.
  3. Click the SAVE AND RUN button.

Set Fields

Field Name Description
SET NAME Give this set a name. Example: "Demo Threading Set."
DOCUMENT SET

The document set is a search you created. When you begin typing, a drop-down list of saved searches appears. The full pathname displays when you hover over a Document Set in the drop-down list.


ALGORITHM Select an algorithm to run. An algorithm is a set of procedures used for data processing and analysis. When you click the ALGORITHM button, a drop-down menu shows the list of available algorithms grouped under a parent group, for example, Structured Analytics.
Automatically Overlay All Results If toggled "on" when the set completes, the overlay step will be automatically added to the queue. Users should choose to auto-overlay only when they already have confirmation/permission from a client to use the results. This can save time for cases when a job finishes after hours.
NOTIFICATION RECIPIENT Enter at least one email recipient by typing in email addresses. Your Relativity login email address is added by default.

Quality of Input Data for Document Set

Because Lighthouse Analytics algorithms are text-based, the quality of the output for each algorithm is dependent on the quality and the formatting of the extracted text and/or metadata used as input. Poor OCR quality, poor text and/or metadata formatting, and/or a lack of reliable metadata may result in lower-quality output.

Algorithm Descriptions

Email Threading (ET)

An "email thread" is a group of emails and attachments that all belong to the same "email chain" or "email conversation." The Email Threading algorithm groups documents together into email threads and identifies which documents in the threads contain unique content. Documents that do not contain unique content can be removed to form a reduced set. When Email Threading is selected, Name Normalization is automatically selected. See the Email Threading Settings article for details.

Name Normalization and Entity Analysis (NN)

Lighthouse Analytics' Name Normalization and Entity Analysis algorithm extracts and parses names, email addresses, and email domains from all detected email message headers and then generates normalized fielded information. It generates People and Organization entities but does not tie these to specific places in documents. Name Normalization evaluates the relationship of different variations of how an individual appears in email headers, such as "Jane Smith," "Smith, Jane," "Janey Smith," and jsmith@lighthouseglobal.com, and automates the merging of these variations into one entity, "Jane Smith," and stores the normalized value in a Relativity object linked to each document. This example would also generate an Organization entity, "Lighthouse Global." When Email Threading is selected, Name Normalization is automatically selected, though Name Normalization may be selected and run without Email Threading. See the Name Normalization and Entity Analysis Settings article for details.

Near Duplicates (ND)

Two documents that are similar within a particular similarity threshold. The default setting is 75% similarity. You can lower the similarity as low as 60% or as high as 100%. 100% near-duplicate similarity is the same as text duplicate. Generally, there is not a strong reason to run at 100%. See the Near and Text Duplicate Settings article for details.

Text Duplicates (TD)

The normalized text is exactly the same as another document. The most obvious use case is taking a word doc and printing it to a PDF. Those two documents will never hash together as being identical because they’re completely different file formats, but because they do contain identical text, they’ll show up in the report as text duplicates. See the Near and Text Duplicate Settings article for details.

SAVE and SAVE AND RUN buttons

After entering the required fields and selecting algorithm(s), click SAVE AND RUN to run this set. If you choose SAVE, you are taken to the Sets list page. Here you will see your Set Name listed. From your new set's row menu, select Edit Set Name to make changes to your analytics set name.


Set Throughput Metrics

Because every data set is different, these are only estimates. Your performance will vary.

Analytics Algorithm Sets 2 Agents 4 Agents 8 Agents
Relationship Analytics when using all algorithms 150K documents per hour 250K documents per hour 350K documents per hour

Notification Emails

Users listed in the Create New Set Analytics page receive two types of emails when a set has run: Success and Failure. Clicking the blue hyperlink in the notification takes you to the Relativity Documents page.

  • Algorithm: The name of the algorithm run over this set of documents.
  • Result: Shows the percent reduction by algorithm run.
  • Detail: Email Threading shows the number of Unique and Non-Unique emails. Name Normalization and Entity Analysis show the number of Name variants and Unique Individuals. Text and Near Duplication Identification show the number of Pivots and Non-Pivots.

Exceptions

When the STATUS in the notification is an Exception, view the run details from the Sets page list by clicking on the status. A good first step is to re-run the set. If a re-run fails, file a ticket with support and include the error text.

Back to top