Skip to content

How to Run a Portfolio Analysis

TL;DR
In Portfolio Analysis, select input tables and global functions, write PySpark Data Pull Logic that runs policies or features and returns one final DataFrame, then click Data Pull. Review the pulled data in Result Data and build custom Data Explore tabs for ad-hoc reports on top of it.

What is it?

Portfolio Analysis runs your policies and features across a slice of your portfolio (a product type, a segment, a date range) and assembles the results into a single dataset for reporting and analytics. It's the tool for questions like:

  • "What does the approval rate look like across our auto loan portfolio if we run the new policy?"
  • "Combine the credit-tier feature with two regional policies and join everything back to the application table."
  • "Build a dataset I can hand to a stakeholder as an Excel."

Three things make a portfolio analysis:

  • Input Tables — the registered DataTables you pull from.
  • Data Pull Logic — PySpark code that runs registered objects (policies, features) and returns one final DataFrame.
  • Result Data and Data Explore tabs — where you read, filter, and visualize the output.

Before you start

  • You need write access to Portfolio Analysis.
  • The DataTables you want to use must be registered in Data Vault → Table Registry.
  • The policies or features you intend to run inside the data pull must already be registered (they don't have to be approved).
  • If you plan to use shared helpers in the logic (mean, custom transformations), register them in Resources → Global Functions first.
  • You should be comfortable writing PySpark. The data pull logic is Python, not a form.

How to Run an Analysis

Step 1: Open Portfolio Analysis

Navigate to Portfolio Analysis in the app and click + Create (or open an existing analysis to edit and rerun it).

Step 2: Configure the Data Pull

Field What to enter
Input TablesRequired Multi-select. The registered DataTables you'll read from. Each table becomes a PySpark DataFrame in the logic, accessible by its alias (for example, application_table, credit_summary_table).
Global Functions (Optional) Multi-select of registered Global Functions. Each becomes callable by its alias inside the data pull logic.
Data Pull LogicRequired The PySpark code that produces the final dataset. Must return a single PySpark DataFrame. See variables and example below.

Step 3: Write the Data Pull Logic

In the logic, you have:

  • Every selected input table as a PySpark DataFrame, by its alias.
  • Every selected global function, callable by alias.
  • create_data(['object_alias'], data={...}): a helper that runs registered policies or features. The result is a DataFrame containing the object's output columns (for policies: block, segment, rule outputs, and the final decision column).

Example: run two product-specific policies and combine the results

from pyspark.sql import functions as F

# Filter applications by date
application_table = application_table[
    application_table["application_data"] >= "01/01/2026"
]

# Indirect new auto
new_apps = application_table[application_table["product_type"] == "indirect_new_auto"]
new_out = create_data(
    ["Indirect New Auto"],
    data={"application_table": new_apps, "credit_summary_table": credit_summary_table},
).withColumn("product_type", F.lit("indirect_new_auto"))

# Indirect used auto
used_apps = application_table[application_table["product_type"] == "indirect_used_auto"]
used_out = create_data(
    ["Indirect Used Auto"],
    data={"application_table": used_apps, "credit_summary_table": credit_summary_table},
).withColumn("product_type", F.lit("indirect_used_auto"))

# Combine and join back to the application table
policy_outputs = used_out.unionByName(new_out)
applications = application_table[["application_id", "fico", "application_date"]]

return applications.join(
    policy_outputs,
    applications["application_id"] == policy_outputs["id"],
    "inner",
)

Tip

Filter your inputs before you call create_data. Running a policy on the full table when you only need one product type wastes runtime on every iteration.

Step 4: Run the Data Pull

Click Data Pull at the bottom of the form. The button label and behavior depend on context:

Situation Button
New analysis Data Pull (creates the analysis and runs it)
Existing analysis, editing Data Pull (replaces the existing result with a fresh pull)
Forking from an existing analysis Rerun as New (creates a new analysis) or Replace Existing (overwrites the source)

The button shows Pulling Data… while the job runs.

Reading the Results

Once the pull completes, the analysis opens with two kinds of tabs.

Result Data

The full output of your data pull logic, rendered as a sortable, filterable grid. This is where you confirm the data looks right and download it (CSV/Excel) if a stakeholder asked for the dataset.

Data Explore tabs

Custom workspaces built on top of the Result Data. Each tab can hold one or more reports, where each report is Python logic you run interactively to explore the data, then save into the tab.

To add an exploration:

  1. From the tabs row, create a new Data Explore tab and give it a name.
  2. Click Create Report. Write Python that returns a value or visualization based on the Result Data.
  3. Save the report into the tab. It renders inline with its title and (optional) description.

A tab can hold many reports. Use them to slice the data ("approval rates by region"), pivot it ("offers by credit tier"), or attach commentary alongside saved figures.

Note

Data Explore reports are scoped to this Portfolio Analysis. For reusable analytics that run automatically on every job (across many simulations), use Build a Custom Report instead.

Iterating

Portfolio Analysis is built for iteration. Adjust the data pull logic (or input tables) and:

  • Data Pull: replaces the existing result with a fresh pull. Saved Data Explore tabs stay; their reports re-execute against the new result.
  • Rerun as New: forks into a new analysis, preserving the original. Useful for what-if branches.
  • Replace Existing: overwrites the source analysis you forked from.

What's next

  • Build a Custom Report if you want the analytics in your Data Explore tabs to be reusable across every simulation, not just this analysis.
  • Run a Simulation on a single policy or model when you want population-level performance metrics rather than a cross-portfolio dataset.