Run a Portfolio Analysis

Ask AI

TL;DR

A Portfolio Analysis pulls one dataset across a slice of your portfolio and gives you a place to report on it. Pick your Input Tables, write PySpark Data Pull Logic that returns a single DataFrame, and click Data Pull. The result lands in Result Data, where you can download it or add your own tabs of Python reports on top.

What is it?

A Portfolio Analysis answers questions like “what would our approval rate look like across the auto book if we ran the new policy?” by building the dataset that answers it.

You select the registered tables you want to read from and write PySpark that returns one final DataFrame. Inside that logic you can call registered Policies and Features through the create_data helper, so the pull is not just a query: it runs your platform objects across the population and hands back their outputs as columns.

The result is a dataset, not a verdict. Where a Simulation reports population-level metrics for one object, a Portfolio Analysis assembles whatever you need across many objects and tables, then leaves the analysis to you.

Three things make one up:

Input Tables - the registered DataTables your logic reads from.
Data Pull Logic - the PySpark that returns the final DataFrame.
Result Data and your own report tabs - where you read, export, and explore the output.

Before you start

You need write access to Portfolio Analysis. Without it the form loads read-only and the action buttons are hidden.

The DataTables you want to pull from must be registered in Data Vault → Table Registry. Any Policies or Features you plan to run inside the logic must be registered too, though they do not have to be approved.

If you want shared helpers available in the logic, register them under Resources → Global Functions first.

You should be comfortable writing PySpark. The data pull logic is code, not a form.

How to Run a Portfolio Analysis

Open the create form

Open Portfolio Analysis from the home page. It sits under Prospecting, Underwriting, Customer Management, and Loan Investment.
Click + New Portfolio Analysis at the top right.

The analysis opens with a generated name in the title bar. Click it to rename. The name is required, so if you clear it the form will not save.

Fill in the Data Pull

Field	What to enter
Input TablesRequired	The registered DataTables you want to read from. Each one becomes a PySpark DataFrame in your logic, named by its alias (for example, `application_table`).
Global Functions	Registered Global Functions to make available. Each becomes callable by its alias inside the logic.
Data Pull Logic	The PySpark that produces the dataset. Must return a single DataFrame.

Only Input Tables and the name are enforced. The logic is not marked required, so an empty pull will submit and come back with nothing useful. Treat it as required in practice.

An Examples link sits next to the Data Pull Logic label and opens a sample in a side panel.

Write the Data Pull Logic

Inside the logic you have every selected input table as a DataFrame by its alias, every selected global function by its alias, and the create_data helper for running registered objects:

create_data(*aliases, data={...}, runtime_params={...}, ignore_errors=False)

Pass aliases as separate arguments or as a single list; both work. Each alias is a string naming a Policy, Feature, Data Element, or Model, and a string resolves to the object’s latest approved version. data maps table aliases to the DataFrames you want the object to run on.

Two rules the helper enforces, both of which raise if you break them:

A Policy cannot be mixed with Features, Data Elements, or Models in one call.
Only one Policy per call.

So a multi-policy pull means one create_data call per policy, combined afterwards. For Features and Models, the row-key column comes back as id so the output joins cleanly.

Example: run two product-specific policies and combine the results

from pyspark.sql import functions as F

# Filter applications by date
application_table = application_table[
    application_table["application_date"] >= "01/01/2026"
]

# Indirect new auto
new_apps = application_table[application_table["product_type"] == "indirect_new_auto"]
new_out = create_data(
    ["Indirect New Auto"],
    data={"application_table": new_apps, "credit_summary_table": credit_summary_table},
).withColumn("product_type", F.lit("indirect_new_auto"))

# Indirect used auto
used_apps = application_table[application_table["product_type"] == "indirect_used_auto"]
used_out = create_data(
    ["Indirect Used Auto"],
    data={"application_table": used_apps, "credit_summary_table": credit_summary_table},
).withColumn("product_type", F.lit("indirect_used_auto"))

# Combine and join back to the application table
policy_outputs = used_out.unionByName(new_out)
applications = application_table[["application_id", "fico", "application_date"]]

return applications.join(
    policy_outputs,
    applications["application_id"] == policy_outputs["id"],
    "inner",
)

Run the pull

Click Data Pull at the bottom of the form. It reads Pulling Data… while the job runs.

Which buttons you get depends on how you arrived:

Button	What it does
Data Pull	On a new analysis, creates it and runs. On an existing one, replaces the current result with a fresh pull.
Rerun as New	Forks into a separate analysis, leaving the original untouched. Offered only when you came in through Rerun.
Replace Existing	Overwrites the analysis you started from. Offered alongside Rerun as New.

Rerun in the title bar is what puts you on that forking route.

Read the results

The finished analysis opens on Result Data: your dataset as a sortable, filterable grid, with the run logs beneath it. Use Download as CSV or Download as Excel for the grid itself, or Export Excel in the title bar to export the analysis as a whole.

To explore further, click + on the tab strip. A New Data Explore dialog asks for a name, and Create adds the tab. Each new tab arrives with a Sample Report already in it returning the first ten rows.

Inside a tab, Create Report (or Add Report once one exists) opens a report: Python you run interactively against the Result Data and save into the tab with a title and optional description. A tab can hold many, so use them to slice the data, pivot it, or keep commentary next to saved figures.

Iterating

Re-pull the data and your saved reports stay, but they do not re-run. Each one is flagged stale with a banner:

The underlying data has been re-pulled since this report was last run. Edit the report and refresh it against the latest data.

Open the report and run it again to clear this. Reports copied into a forked analysis behave the same way, waiting on the new pull before they can run.

What’s Next

Run a Simulation when you want population-level metrics on a single Policy or Model rather than a cross-portfolio dataset.
Run a What-If Analysis to test rule changes against a baseline instead of assembling data.
Register Reports to make your analytics reusable across every job.

Was this page helpful?

Thanks for the feedback.