How to Build a Custom Report
What is it?
A Report is a piece of reusable analytics that runs on a job's output. For example, a decile report that runs on every model simulation, or a fair-lending report that runs on every policy simulation. You define the calculation once and the platform runs it on every relevant job.
Each report has two layers:
- Report Definition (the Formula): Python that computes the statistics and returns a Pandas DataFrame. One definition per report.
- Figures: one or more visualizations of that DataFrame. A single report can show the same statistics as a table, a chart, a downloadable Excel, a Markdown summary, or a tracked metric over time.
You select the reports you want when you run a simulation, and each one renders as its own tab in the results.
Before you start
- You need write access to Resources → Reports.
- Know which object type the report targets (Data Element, Feature, Model, Policy, Global Function). Each type exposes different columns in the job output, so a report for one type usually won't work for another.
- Run at least one simulation on the intended object type first. The job result is what you'll build the report against, and you can use it as a reference for what columns are available.
- If you plan to use shared helper functions (mean, variance, custom statistics), register them first in Resources → Global Functions.
How to Build a Report
Step 1: Open the Report registry
Go to Resources → Reports and click + Create. Give the report a clear name (for example, Decile Table, Confusion Matrix at Configurable Threshold).
Step 2: Set the Attributes
The Attributes section tells the platform when this report should be available.
| Field | What to enter |
|---|---|
| Object TypeRequired | The type of job this report runs on (Model, Policy, Data Element, Feature, Global Function). Drives which columns are available in data. |
| Object Subtype | (When Object Type = Model) Restrict the report to specific algorithm types (Binary Classification, Regression, and so on). (When Object Type = Feature or Data Element) Restrict to specific feature types. Leave empty for "all". |
Step 3: Write the Report Definition (Formula)
Open the Formula section. Add any Resources (registered Global Functions you want to call inside the definition) and any Parameters the report should accept at runtime.
| Field | What to enter |
|---|---|
| Resources | (Optional) Multi-select of registered Global Functions. Each becomes callable in the definition by its alias. |
| Parameters | (Optional) Values the user supplies when running the job. Each parameter has a Name, Alias, Type (String, Number, Single Object), Is Mandatory, and a description. Common use: a configurable threshold for a confusion matrix. |
| Report DefinitionRequired | Python that returns the calculated statistics as a Pandas DataFrame. See variables and example below. |
Variables you can use in the definition:
data(PySpark DataFrame): the job's result data, including every input and output column. The exact columns depend on the object type. For a Model job,dataincludes inputs, the dependent variable, and the model score; for a Data Element job it does not.entity: a Corridor Python object for the object the job ran on. For a Model report,entity.algorithm_typereturns'Binary Classification','Regression', and so on.job: a Corridor Python object for the job itself.job.job_typereturns'Simulation','Monitoring', and so on.- Selected Global Functions: callable by their alias.
- Parameters: accessed by alias.
StringandNumberparameters resolve to a single value (orNoneif optional and unset).Single Objectparameters appear as a column indatawith the alias as the column name.
Example: confusion matrix from a model score and threshold
from pyspark.sql import functions as F
# 'threshold' is a Number parameter declared above; fall back to 0.5 if not provided
threshold = 0.5 if threshold is None else threshold
# the model score is in the 'output' column; the dependent variable is 'dependent'
data = data.withColumn("predicted", F.when(F.col("output") >= threshold, 1).otherwise(0))
crosstab = data.groupBy("predicted", "dependent").count().toPandas()
return crosstab
The returned DataFrame becomes available to every Figure as raw_output.
Note
The definition must return a Pandas DataFrame, not PySpark. A report is meant to produce a summary, not pass through a whole dataset.
Step 4: Add Figures
Open the Figures section. Each figure is a separate visualization. Click Add Figure to add one; drag the grip handle to reorder.
For each figure, give it a name (this becomes the tab title within the report), pick its Output Type, and write its logic.
| Output Type | What it shows |
|---|---|
| Figure | A visualization. Can return a Pandas DataFrame (renders as a table), a Plotly figure (renders interactively), a formatted Markdown string, a Pandas ExcelWriter (downloadable Excel), or a ReportLab Canvas (PDF). |
| Metric | A single numeric value tracked over time. Returns a number. On recurring simulation jobs, the platform builds a tracking report showing the metric's trend, with optional upper and lower thresholds for alerting. |
Variables you can use in each figure:
raw_output(Pandas DataFrame): whatever the Report Definition returned. Every figure sees the sameraw_output.entityandjob: same Corridor objects as in the definition.- Selected Global Functions: each figure selects its own; selecting a function in the definition does not carry over.
- Parameters: accessible by alias. (Note:
Single Objectparameters are not accessible in figures, because they live ondataand figures don't seedata.)
Example: render the confusion matrix as a Plotly table
import plotly.graph_objects as go
import pandas as pd
conf_matrix = pd.crosstab(
raw_output["dependent"],
raw_output["predicted"],
values=raw_output["count"],
aggfunc="sum",
).fillna(0).reset_index()
fig = go.Figure(data=[go.Table(
header=dict(values=list(conf_matrix.columns)),
cells=dict(values=[conf_matrix[c].tolist() for c in conf_matrix.columns]),
)])
return fig
Example: track FPR over time as a Metric
fp = raw_output.loc[
(raw_output["predicted"] == 1) & (raw_output["dependent"] == 0), "count"
].values[0]
return fp / raw_output["count"].sum()
On recurring simulations, this metric appears under the Tracking tab with the value plotted across iterations.
Step 5: Enable Custom Comparison (optional)
If a figure is meant to compare two jobs (for example, PSI between a base and a comparison sample, or a policy swapset), open the figure's options and turn on Enable custom comparison. The figure then exposes:
base_raw_output,compared_raw_outputbase_entity,compared_entitybase_job,compared_job- Parameters by alias (from the base job)
Constraints: custom comparison is only available for Figure outputs (not Metric), and comparison figures always span the full report width.
Step 6: Save
Click Save at the bottom of the form. The report appears in the registry list. Submit it for approval through the standard approval workflow when ready.
Running the Report
To run a report, select it in the Reports section of the job form when you Run a Simulation. Each selected report runs on the job's output and renders as its own tab in the results.
To run a report on every job by default, add it to your Default Reports in settings. Default reports are preselected on new jobs for the matching object type.
Common Patterns
- One report, many figures. Compute statistics once, render as table + chart + Markdown summary. Each figure is a tab within the report.
- Parameters for thresholds. Don't hard-code values like
threshold = 0.5. Declare a Parameter so the report can be reused with different thresholds in different simulations. - Metrics for monitoring. For anything you'd want a recurring alert on (KS dropping, error rate rising), add a Metric figure and set upper/lower thresholds on the Tracking report from the simulation form.
What's next
- Run a Simulation with this report selected to see it in action.
- Set up Alerts on a tracked Metric to get notified when it crosses a threshold.