Skip to content

How to Build a Custom Report

TL;DR
In Resources → Reports, create a report by picking the target object type, writing a Report Definition (Python that returns a Pandas DataFrame), then adding one or more Figures to visualize it (table, plot, Excel download, or a tracked metric). Select the report when you run a job and it renders as its own tab in the results.

What is it?

A Report is a piece of reusable analytics that runs on a job's output. For example, a decile report that runs on every model simulation, or a fair-lending report that runs on every policy simulation. You define the calculation once and the platform runs it on every relevant job.

Each report has two layers:

  • Report Definition (the Formula): Python that computes the statistics and returns a Pandas DataFrame. One definition per report.
  • Figures: one or more visualizations of that DataFrame. A single report can show the same statistics as a table, a chart, a downloadable Excel, a Markdown summary, or a tracked metric over time.

You select the reports you want when you run a simulation, and each one renders as its own tab in the results.

Before you start

  • You need write access to Resources → Reports.
  • Know which object type the report targets (Data Element, Feature, Model, Policy, Global Function). Each type exposes different columns in the job output, so a report for one type usually won't work for another.
  • Run at least one simulation on the intended object type first. The job result is what you'll build the report against, and you can use it as a reference for what columns are available.
  • If you plan to use shared helper functions (mean, variance, custom statistics), register them first in Resources → Global Functions.

How to Build a Report

Step 1: Open the Report registry

Go to Resources → Reports and click + Create. Give the report a clear name (for example, Decile Table, Confusion Matrix at Configurable Threshold).

Step 2: Set the Attributes

The Attributes section tells the platform when this report should be available.

Field What to enter
Object TypeRequired The type of job this report runs on (Model, Policy, Data Element, Feature, Global Function). Drives which columns are available in data.
Object Subtype (When Object Type = Model) Restrict the report to specific algorithm types (Binary Classification, Regression, and so on). (When Object Type = Feature or Data Element) Restrict to specific feature types. Leave empty for "all".

Step 3: Write the Report Definition (Formula)

Open the Formula section. Add any Resources (registered Global Functions you want to call inside the definition) and any Parameters the report should accept at runtime.

Field What to enter
Resources (Optional) Multi-select of registered Global Functions. Each becomes callable in the definition by its alias.
Parameters (Optional) Values the user supplies when running the job. Each parameter has a Name, Alias, Type (String, Number, Single Object), Is Mandatory, and a description. Common use: a configurable threshold for a confusion matrix.
Report DefinitionRequired Python that returns the calculated statistics as a Pandas DataFrame. See variables and example below.

Variables you can use in the definition:

  • data (PySpark DataFrame): the job's result data, including every input and output column. The exact columns depend on the object type. For a Model job, data includes inputs, the dependent variable, and the model score; for a Data Element job it does not.
  • entity: a Corridor Python object for the object the job ran on. For a Model report, entity.algorithm_type returns 'Binary Classification', 'Regression', and so on.
  • job: a Corridor Python object for the job itself. job.job_type returns 'Simulation', 'Monitoring', and so on.
  • Selected Global Functions: callable by their alias.
  • Parameters: accessed by alias. String and Number parameters resolve to a single value (or None if optional and unset). Single Object parameters appear as a column in data with the alias as the column name.

Example: confusion matrix from a model score and threshold

from pyspark.sql import functions as F

# 'threshold' is a Number parameter declared above; fall back to 0.5 if not provided
threshold = 0.5 if threshold is None else threshold

# the model score is in the 'output' column; the dependent variable is 'dependent'
data = data.withColumn("predicted", F.when(F.col("output") >= threshold, 1).otherwise(0))

crosstab = data.groupBy("predicted", "dependent").count().toPandas()
return crosstab

The returned DataFrame becomes available to every Figure as raw_output.

Note

The definition must return a Pandas DataFrame, not PySpark. A report is meant to produce a summary, not pass through a whole dataset.

Step 4: Add Figures

Open the Figures section. Each figure is a separate visualization. Click Add Figure to add one; drag the grip handle to reorder.

For each figure, give it a name (this becomes the tab title within the report), pick its Output Type, and write its logic.

Output Type What it shows
Figure A visualization. Can return a Pandas DataFrame (renders as a table), a Plotly figure (renders interactively), a formatted Markdown string, a Pandas ExcelWriter (downloadable Excel), or a ReportLab Canvas (PDF).
Metric A single numeric value tracked over time. Returns a number. On recurring simulation jobs, the platform builds a tracking report showing the metric's trend, with optional upper and lower thresholds for alerting.

Variables you can use in each figure:

  • raw_output (Pandas DataFrame): whatever the Report Definition returned. Every figure sees the same raw_output.
  • entity and job: same Corridor objects as in the definition.
  • Selected Global Functions: each figure selects its own; selecting a function in the definition does not carry over.
  • Parameters: accessible by alias. (Note: Single Object parameters are not accessible in figures, because they live on data and figures don't see data.)

Example: render the confusion matrix as a Plotly table

import plotly.graph_objects as go
import pandas as pd

conf_matrix = pd.crosstab(
    raw_output["dependent"],
    raw_output["predicted"],
    values=raw_output["count"],
    aggfunc="sum",
).fillna(0).reset_index()

fig = go.Figure(data=[go.Table(
    header=dict(values=list(conf_matrix.columns)),
    cells=dict(values=[conf_matrix[c].tolist() for c in conf_matrix.columns]),
)])
return fig

Example: track FPR over time as a Metric

fp = raw_output.loc[
    (raw_output["predicted"] == 1) & (raw_output["dependent"] == 0), "count"
].values[0]
return fp / raw_output["count"].sum()

On recurring simulations, this metric appears under the Tracking tab with the value plotted across iterations.

Step 5: Enable Custom Comparison (optional)

If a figure is meant to compare two jobs (for example, PSI between a base and a comparison sample, or a policy swapset), open the figure's options and turn on Enable custom comparison. The figure then exposes:

  • base_raw_output, compared_raw_output
  • base_entity, compared_entity
  • base_job, compared_job
  • Parameters by alias (from the base job)

Constraints: custom comparison is only available for Figure outputs (not Metric), and comparison figures always span the full report width.

Step 6: Save

Click Save at the bottom of the form. The report appears in the registry list. Submit it for approval through the standard approval workflow when ready.

Running the Report

To run a report, select it in the Reports section of the job form when you Run a Simulation. Each selected report runs on the job's output and renders as its own tab in the results.

To run a report on every job by default, add it to your Default Reports in settings. Default reports are preselected on new jobs for the matching object type.

Common Patterns

  • One report, many figures. Compute statistics once, render as table + chart + Markdown summary. Each figure is a tab within the report.
  • Parameters for thresholds. Don't hard-code values like threshold = 0.5. Declare a Parameter so the report can be reused with different thresholds in different simulations.
  • Metrics for monitoring. For anything you'd want a recurring alert on (KS dropping, error rate rising), add a Metric figure and set upper/lower thresholds on the Tracking report from the simulation form.

What's next