Skip to content

Run a Simulation?

TL;DR
Open any registered object, click Run → Simulation, fill in the required fields in the job form, and submit. Simulations execute the object's logic on data so you can validate performance and produce evidence to attach to an approval request.

What is it?

A simulation executes the logic of a registered object (a data element, feature, model, or policy) on data. It is the primary way to test your work, validate performance, and generate the evidence required for approval.

Simulations can be run at any point. You do not need to wait for approval to test your work.

Launching a Simulation

  1. Navigate to the object you want to simulate in Data Vault, Feature Engineering, Model Studio, or a Policy module, and open its registry page.
  2. Click Run in the top right corner.
  3. Select Simulation from the dropdown.

    Run dropdown on a registry page with Simulation selected.
    Open any registered object and choose Run → Simulation to launch the job form.

This opens the simulation job form, where you configure how the object will be executed.

Completing the Job Form

The form has several sections. Only required fields need to be completed; optional sections customize the run. Sections are independent, so you can fill them in any order.

Full simulation job form with collapsible sections.
The job form is a single scrollable page; each section can be filled independently.

Description (Optional)

Document the purpose of the simulation and customize the execution environment.

  • Add a short description of what you are testing.
  • Use Job Configs to override Spark settings for this run (for example, executor memory, yarn queue, or a specific library version such as scikit-learn=0.24.2). Leave blank to use the default environment.
Description section with the notes field and Job Configs editor.
Use the Description section to record what you are testing and override Spark settings if needed.

Scheduling (Optional)

Control when and how often the simulation runs.

  • Leave Start Date and Time empty to run immediately.
  • Enable Recurrence to schedule recurring runs (daily, weekly, monthly, and so on). Each scheduled run creates a new simulation job; previous runs are preserved on the Jobs tab, so you can compare results across periods or track a metric over time.
Scheduling section with Start Date, Time, and Recurrence controls.
Leave the schedule empty to run immediately, or set a Start Date and Recurrence for repeating runs.

Reports (Optional)

Select the reports to generate as part of the simulation, and provide any parameters they require.

  • Choose one or more Reports to run. Each selected report renders as its own tab in the results.
  • Under Parameters, expand each report group to view its inputs (for example, weight for a weighted-average report, or threshold for a flagging report).
  • For scheduled jobs, if the selected report tracks a metric over time, set upper and lower thresholds to flag values that breach those limits in the Tracking Report.
Reports section with a report selector and expandable Parameters groups.
Select the reports to run and expand each report group to supply its parameters.
Metric thresholds in the Reports section for scheduled tracking reports.
On scheduled jobs, set upper and lower thresholds to flag metrics that breach those limits in the Tracking Report.

Dependencies

If the object uses global variables, runtime parameters, or product configuration values, they appear here grouped under Current Object and Dependents (inputs required by dependent variables).

Provide values for each required input. The simulation cannot run until all required dependencies are filled.

Sampling

Sampling controls which records the simulation runs on. Use the toggle on the right of the section header to enable or disable sampling. There are two modes:

  • Custom: Define a population by size, date range, and rules from your data-tables.
  • Prespecified Ids: Provide an explicit list of record IDs as a file or table; the simulation runs on exactly those records.

Use the toggle on the right of the section header to enable or disable sampling entirely.

Custom

Field Description
Sample Select Sample Size (fixed row count), Sample Ratio (percentage of the table), or Full (entire population), then enter the value.
Date (Optional) Expand Show optional filter properties and choose a date column to filter the population by a time window. When a date column is selected, enter the From and To values for the window.
Additional rules (Optional) Click Add additional rules to add expression filters (for example, FICO >= 680).
Sampling section in Custom mode with sample size, optional date filter, and additional rules.
In Custom mode, set the sample size or ratio and optionally add date or expression filters.

Prespecified Ids

Field Description
Table Location A path on any supported source (Hive, S3, GCP, HDFS), or upload a CSV/Excel file from your local drive.
ID Column Name The column containing the IDs. Defaults to the entity column of the object; override if your file uses a different name.
Sampling section in Prespecified Ids mode with Table Location and ID Column Name fields.
In Prespecified Ids mode, point to a table or upload a file containing the exact record IDs to score.

Data Sources

By default, the source is set to From data file, which reads from the tables registered in Data Vault. Use the radio button at the top of the section to switch sources.

From data file

Override the registered location for one or more input tables (for example, Applicant Table, Application Table).

  • For each table, select the source type (Hive, S3, GCP, HDFS) and enter the location, or upload a CSV/Excel file. Column names and types must match the registered schema.
  • Click Add Subsetting Criteria next to a table to filter rows before they are read (for example, application_date >= '2024-01-01' or region = 'US').

Tip

Subsetting criteria reduce how much data is read from the data lake and can significantly cut runtime on large tables.

From data file source with per-table location overrides and Add Subsetting Criteria.
Add subsetting criteria to filter rows before they are read. Table locations default to the registered paths; override them if needed.

From existing job

Reuse the output of a previous job as the input for this one.

Note

From existing job requires sampling to be disabled. The output of a previous job is already a fixed population, so layering sampling on top would be ambiguous.

  • Select Object Type, Object, and then the specific Job to reuse.
  • The reused job is not limited to the same object type, as long as its output contains the inputs this object needs (for example, a Feature job can feed a Data Element simulation).
From existing job source with Object Type, Object, and Job selectors.
Pick a previous job whose output supplies the inputs this simulation needs.

Submitting and Monitoring the Job

Click Run Job at the bottom of the form. You are redirected to the Jobs tab, where the job moves through these states:

StateMeaning
QUEUEDWaiting for resources.
COMPILINGJob is being compiled.
RUNNINGExecution is in progress.
COMPLETEDJob finished successfully.
PARTIALLY COMPLETEDSome outputs succeeded; others failed. Open the job to see which.
FAILEDExecution failed.

Execution time depends on data size and system load. Use the Job Board icon to track queue position and progress. If the job fails, download the log file from the job details page to identify the issue.

Reading the Results

Once the simulation completes, open it from the Jobs tab. Use the outputs to validate correctness, review performance, and identify issues before moving to approval.

Tab Description
Job Details Summary of how the job was configured (sample, filters, data source, scheduling, object version). Useful for reproducing the run later.
Job Result Row-level scored output, downloadable as CSV or Parquet, plus runtime breakdown and execution logs.
Report Tabs One tab per report selected in the Reports step. Whatever reports you chose render here. For a Model, this is where reports like ROC/AUC, KS, lift, confusion matrix, or any custom report you selected appear. For a Policy, it is where segment-level approval rates, risk distributions, or other custom reports you added appear.

Comparing simulations

Compare two simulations side by side to analyze differences in results, metrics, and reports (for example, a candidate model against the production champion, or the same model across two date ranges).

Open the comparison view

  1. Open any completed simulation.
  2. Click Compare with other Simulation.
  3. Choose the job to compare. By default, the picker lists completed jobs from the same object and same version.
  4. Use the small switch arrow next to the picker to choose a job from a different version, or a different object of the same type.

Working in the comparison view

The original job stays on the left and the chosen job appears on the right. Every tab (Job Details, Job Result, and each Report tab) renders side by side, and the two panes scroll together so equivalent sections stay aligned.

From the comparison header you can:

  • Switch the compared job on its side using its dropdown.
  • Swap left and right.
  • Exit the comparison from the kebab menu (⋮) to return to the single-job view.

What's next