Run a Simulation?
What is it?
A simulation executes the logic of a registered object (a data element, feature, model, or policy) on data. It is the primary way to test your work, validate performance, and generate the evidence required for approval.
Simulations can be run at any point. You do not need to wait for approval to test your work.
Launching a Simulation
- Navigate to the object you want to simulate in Data Vault, Feature Engineering, Model Studio, or a Policy module, and open its registry page.
- Click Run in the top right corner.
-
Select Simulation from the dropdown.
Open any registered object and choose Run → Simulation to launch the job form.
This opens the simulation job form, where you configure how the object will be executed.
Completing the Job Form
The form has several sections. Only required fields need to be completed; optional sections customize the run. Sections are independent, so you can fill them in any order.
Description (Optional)
Document the purpose of the simulation and customize the execution environment.
- Add a short description of what you are testing.
- Use Job Configs to override Spark settings for this run (for example,
executor memory,yarn queue, or a specific library version such asscikit-learn=0.24.2). Leave blank to use the default environment.
Scheduling (Optional)
Control when and how often the simulation runs.
- Leave Start Date and Time empty to run immediately.
- Enable Recurrence to schedule recurring runs (daily, weekly, monthly, and so on). Each scheduled run creates a new simulation job; previous runs are preserved on the Jobs tab, so you can compare results across periods or track a metric over time.
Reports (Optional)
Select the reports to generate as part of the simulation, and provide any parameters they require.
- Choose one or more Reports to run. Each selected report renders as its own tab in the results.
- Under Parameters, expand each report group to view its inputs (for example, weight for a weighted-average report, or threshold for a flagging report).
- For scheduled jobs, if the selected report tracks a metric over time, set upper and lower thresholds to flag values that breach those limits in the Tracking Report.
Dependencies
If the object uses global variables, runtime parameters, or product configuration values, they appear here grouped under Current Object and Dependents (inputs required by dependent variables).
Provide values for each required input. The simulation cannot run until all required dependencies are filled.
Sampling
Sampling controls which records the simulation runs on. Use the toggle on the right of the section header to enable or disable sampling. There are two modes:
- Custom: Define a population by size, date range, and rules from your data-tables.
- Prespecified Ids: Provide an explicit list of record IDs as a file or table; the simulation runs on exactly those records.
Use the toggle on the right of the section header to enable or disable sampling entirely.
Custom
| Field | Description |
|---|---|
| Sample | Select Sample Size (fixed row count), Sample Ratio (percentage of the table), or Full (entire population), then enter the value. |
| Date | (Optional) Expand Show optional filter properties and choose a date column to filter the population by a time window. When a date column is selected, enter the From and To values for the window. |
| Additional rules | (Optional) Click Add additional rules to add expression filters (for example, FICO >= 680). |
Prespecified Ids
| Field | Description |
|---|---|
| Table Location | A path on any supported source (Hive, S3, GCP, HDFS), or upload a CSV/Excel file from your local drive. |
| ID Column Name | The column containing the IDs. Defaults to the entity column of the object; override if your file uses a different name. |
Data Sources
By default, the source is set to From data file, which reads from the tables registered in Data Vault. Use the radio button at the top of the section to switch sources.
From data file
Override the registered location for one or more input tables (for example, Applicant Table, Application Table).
- For each table, select the source type (Hive, S3, GCP, HDFS) and enter the location, or upload a CSV/Excel file. Column names and types must match the registered schema.
- Click Add Subsetting Criteria next to a table to filter rows before they are read (for example,
application_date >= '2024-01-01'orregion = 'US').
Tip
Subsetting criteria reduce how much data is read from the data lake and can significantly cut runtime on large tables.
From existing job
Reuse the output of a previous job as the input for this one.
Note
From existing job requires sampling to be disabled. The output of a previous job is already a fixed population, so layering sampling on top would be ambiguous.
- Select Object Type, Object, and then the specific Job to reuse.
- The reused job is not limited to the same object type, as long as its output contains the inputs this object needs (for example, a Feature job can feed a Data Element simulation).
Submitting and Monitoring the Job
Click Run Job at the bottom of the form. You are redirected to the Jobs tab, where the job moves through these states:
| State | Meaning |
|---|---|
| QUEUED | Waiting for resources. |
| COMPILING | Job is being compiled. |
| RUNNING | Execution is in progress. |
| COMPLETED | Job finished successfully. |
| PARTIALLY COMPLETED | Some outputs succeeded; others failed. Open the job to see which. |
| FAILED | Execution failed. |
Execution time depends on data size and system load. Use the Job Board icon to track queue position and progress. If the job fails, download the log file from the job details page to identify the issue.
Reading the Results
Once the simulation completes, open it from the Jobs tab. Use the outputs to validate correctness, review performance, and identify issues before moving to approval.
| Tab | Description |
|---|---|
| Job Details | Summary of how the job was configured (sample, filters, data source, scheduling, object version). Useful for reproducing the run later. |
| Job Result | Row-level scored output, downloadable as CSV or Parquet, plus runtime breakdown and execution logs. |
| Report Tabs | One tab per report selected in the Reports step. Whatever reports you chose render here. For a Model, this is where reports like ROC/AUC, KS, lift, confusion matrix, or any custom report you selected appear. For a Policy, it is where segment-level approval rates, risk distributions, or other custom reports you added appear. |
Comparing simulations
Compare two simulations side by side to analyze differences in results, metrics, and reports (for example, a candidate model against the production champion, or the same model across two date ranges).
Open the comparison view
- Open any completed simulation.
- Click Compare with other Simulation.
- Choose the job to compare. By default, the picker lists completed jobs from the same object and same version.
- Use the small switch arrow next to the picker to choose a job from a different version, or a different object of the same type.
Working in the comparison view
The original job stays on the left and the chosen job appears on the right. Every tab (Job Details, Job Result, and each Report tab) renders side by side, and the two panes scroll together so equivalent sections stay aligned.
From the comparison header you can:
- Switch the compared job on its side using its dropdown.
- Swap left and right.
- Exit the comparison from the kebab menu (⋮) to return to the single-job view.
What's next
- Attach the completed simulation as Supporting Analysis when you Send an Approval Request.
- For an in-production model or policy, Monitor your Model or Policy using scheduled simulations and metric thresholds.