Skip to content

Register a Feature

TL;DR
Register a Feature to package custom transformation logic on top of Data Elements (or other Features) so it can be reused, versioned, and governed across Models, Policies, and downstream Features.

What is it?

A Feature lets you apply custom logic to your data to create new variables tailored to your business needs. Once registered, these curated Features can be reused across Models, Policies, and other downstream Features.

For example, if fico is registered as a numerical Data Element holding the FICO score for each application, you can register a risk_tier Feature that buckets applicants into tiers:

risk_tier = None
if fico is None:
    risk_tier = 'Non-Scorable'
elif fico >= 750:
    risk_tier = 'A'
elif fico >= 680:
    risk_tier = 'B'
elif fico >= 600:
    risk_tier = 'C'
else:
    risk_tier = 'D'

return risk_tier

A Feature sits one layer above Data Elements in the registry hierarchy:

flowchart LR
    DE["Data Elements<br/>fico, dti, ..."]:::de --> F["Feature<br/>risk_tier"]:::feature
    F --> M["Models"]:::model
    F --> P["Policies"]:::policy
    F --> F2["Other Features"]:::feature

    classDef de      fill:#f3f0ff,stroke:#9b7fe8,color:#3b1f8c,font-weight:600
    classDef feature fill:#f0faf4,stroke:#34a85a,color:#144d28
    classDef model   fill:#fffbea,stroke:#d4a017,color:#6b4c00
    classDef policy  fill:#fff4ec,stroke:#d4622a,color:#7a2e0e

Benefits of a registered Feature

1. Experiment and refine custom logic

Iterate on your transformation logic and use Simulation and Comparison jobs to identify the most effective version for your business.

2. Built-in governance

  • Change management and version control — every edit is tracked.
  • Lineage — see exactly which Data Elements, Features, or Models feed into this Feature, and where it is consumed downstream.
  • Permissible purpose — tag the Feature so it can only be used in approved contexts (for example, restrict PII-derived features from marketing use).

3. A standardized, well-tested library

Build up a curated catalog of business Features that any user can reuse, instead of reimplementing the same logic across notebooks and projects.

Two Types of Features

A Feature can be either Simple (a row-by-row transformation) or Aggregated (a value computed by rolling up rows from a lower entity level).

Simple Aggregated
Computes from One row at the Feature's entity level Multiple rows at a lower entity level, rolled up to this Feature's entity
Use when The output depends only on inputs that already live at the same entity You need to summarize across child rows (for example, accounts under a customer)
Example risk_tier from an Application's fico total_number_of_accounts for each Customer

You pick which kind to register by ticking or leaving the Is Aggregated checkbox in the Attributes section.

Simple vs Aggregated

  • Simple Feature — transforms inputs that already sit at the same entity level.
  • Aggregated Feature — rolls up rows from a lower entity level to the Feature's entity (for example, Account → Customer).

How to Register

The form has three sections: Attributes, Formula, and Properties. Required fields are marked with *.

Step 1: Open the Create Form

  1. Open Feature Engineering → Feature Registry from the home page.
  2. Click Create.
  3. Type a clear, descriptive name at the top of the form (for example, Auto Loan Credit Tier) and click ✓.

Step 2: Fill in Attributes

Leave Is Aggregated unticked.

Field What to enter
Type * The output type of your logic (for example, String, Numerical, Boolean, Date, or an Array type). The platform validates the return value against this type when you save.
Alias * Short identifier used across the platform (for example, auto_loan_credit_tier).
Entity * Prospect, Application, Account, or Customer — the entity the Feature is computed for.
Is Aggregated Leave unticked.

Step 3: Fill in Formula

This section has three parts: Input, Definition, and Additional Outputs.

Field What to enter
Input * Pick the registered objects this Feature depends on. Inputs can include Data Elements, other Features, and Models. Only approved objects appear by default. Each chosen input becomes a variable in the editor — a Data Element aliased primary_applicant_primary_score is referenced as primary_applicant_primary_score in code.
Definition * The code editor. Pick an input language at the top, then write the transformation. Use Format Code to tidy indentation and Ask AI for inline suggestions.
Additional Outputs Optional. Click Add Output to expose intermediate values from your logic as separate, queryable outputs alongside the main return value (useful for debugging or for surfacing reason codes).

Each input is exposed as a plain Python value, already resolved for the current entity — a Data Element aliased primary_applicant_primary_score is referenced as primary_applicant_primary_score in code. Write your logic and return a single value that matches the Type set in Step 2. You can import standard libraries (numpy, math, datetime, etc.) inside the editor.

Example: Auto loan credit tier from primary applicant score

if primary_applicant_primary_score is None:
    return 'E'

if primary_applicant_primary_score >= 750:
    return 'A+'
elif primary_applicant_primary_score >= 700:
    return 'A'
elif primary_applicant_primary_score >= 670:
    return 'B'
elif primary_applicant_primary_score >= 640:
    return 'C'
elif primary_applicant_primary_score >= 600:
    return 'D'
else:
    return 'E'

Use Test Syntax in the editor to validate before saving.

Step 4: Fill in Properties

Field What to enter
Description * Plain-language explanation of what the Feature represents and how the logic works (for example, Credit Tier for auto loan based on primary score).
Permissible Purpose * Tags that control where the Feature may be used (for example, Underwriting, Marketing).
Group * Logical grouping (for example, Credit Risk Assessment). Type to search existing groups.
Keywords Free-form tags that help search for the Feature later.

Step 5: Click Create

The Feature is saved as a draft and you land on its details page.

Step 1: Open the Create Form

  1. Open Feature Engineering → Feature Registry from the home page.
  2. Click Create.
  3. Type a clear, descriptive name at the top of the form (for example, Max Account Balance) and click ✓.

Step 2: Fill in Attributes

Tick Is Aggregated. The Formula section will reveal a logic editor that operates on entity-level tables.

Field What to enter
Type * The output type of your aggregation, not the source column type. Summing or taking a max → Numerical, returning a list → an Array type, returning a flag or label → String or Boolean.
Alias * Short identifier (for example, max_account_balance).
Entity * The entity to roll up to (for example, Customer). Inputs must come from this entity or a lower one.
Is Aggregated Ticked.

Step 3: Fill in Formula

For an Aggregated Feature, you select platform entity tables at or below the Feature's entity level. The platform handles the groupBy for you — your code runs as if the input table contains only the rows for one entity.

Field What to enter
Input * One or more platform entity tables. With Entity set to Customer, you can pick the Customer table plus any lower-level entity tables — Prospect, Application, Account. Each chosen table becomes a variable in the editor: an Account-level table is referenced as account in code.
Definition * The code editor. Pick an input language at the top, then write the aggregation.
Additional Outputs Optional intermediate outputs, same as for Simple Features.

How the platform groups data

With Entity = Customer and account (an Account-level table) as the input, the platform groups Account rows by Customer before handing them to your code. Python and Pandas receive the data already filtered to one customer's rows, so you write logic for a single entity — no groupBy needed. Spark receives the full source table and you must do the groupBy('customer_id') yourself, returning a DataFrame keyed by the entity.

The editor supports three input languages for Aggregated Features:

Each input table is exposed as a dictionary of lists, sliced to the rows belonging to the current entity. Each column lookup returns a Python list.

What the variable looks like.

If account has these rows for one customer:

account = {
    'account_id':      [101, 102, 103],
    'current_balance': [1200, 4500, 800],
    'status':          ['open', 'open', 'closed'],
}

account['current_balance']      # → [1200, 4500, 800]   (Python list)
max(account['current_balance']) # → 4500

You can import standard libraries (numpy, math, datetime, etc.) inside the editor.

Example: Maximum balance across a customer's open accounts

# `account` is a data dictionary
open_balances = [
    bal for bal, status in zip(
        account['current_balance'], account['status']
    )
    if status == 'open' and bal is not None
]

if not open_balances:
    return None
return max(open_balances)

Each input table is exposed as a pandas DataFrame, sliced to the rows belonging to the current entity.

What the variable looks like.

If account has these rows for one customer:

# `account` is a pandas DataFrame:
#    account_id  current_balance  status
# 0         101             1200    open
# 1         102             4500    open
# 2         103              800  closed
account['current_balance'].max()           # → 4500
account[account['status'] == 'open']       # → DataFrame: open accounts only

Example: Maximum balance across a customer's open accounts

# `account` is a pandas DataFrame
open_accounts = account[account['status'] == 'open']
if open_accounts.empty:
    return None
return open_accounts['current_balance'].max()

Each input table is exposed as a Spark DataFrame containing the full, ungrouped table. You do the groupBy yourself in code, and the return value must be a 2-column DataFrame: the entity key and the aggregated value.

Example: Maximum balance across a customer's open accounts

import pyspark.sql.functions as F

open_accounts = account.filter(F.col('status') == 'open')
return open_accounts.groupBy('customer_id').agg(
    F.max('current_balance').alias('max_account_balance')
)

Spark constraints

  • SELECT only. CREATE, INSERT, DROP, and similar mutations are not allowed.
  • No external data reads. Use only the tables provided as inputs.
  • Self-contained logic. Do not depend on intermediate state from another Feature.
  • No cross-Feature UDFs or temp views. UDFs and temp views registered in one Feature must not be used in another.
  • Set the Output Dataframe Key. Specify the entity key column name in the Output Dataframe Key input below the editor (for example, customer_id).

Return value

The returned object must match the Type set in Step 2:

  • Simple types: a single value (per entity).
  • Array types: a list or Series.
  • Spark: a DataFrame with the entity key and aggregation columns.

Use Test Syntax in the editor to validate before saving.

Step 4: Fill in Properties

Field What to enter
Description * Plain-language explanation of what the Feature represents.
Permissible Purpose * Tags that control where the Feature may be used.
Group * Logical grouping.
Keywords Free-form tags for search.

Step 5: Click Create

The Feature is saved as a draft and you land on its details page.

How can it be used?

Once registered, you can:

  • Run a Simulation to generate basic summary statistics (min, max, mean, distribution, percentage missing).
  • Run a Comparison job to evaluate this Feature against previous iterations.
  • Use the Feature downstream in other Features, Models, and Policies.
  • Track its lineage to see every input it reads from and every object that consumes it.

What's Next

  • Build a Model that uses this Feature.
  • Use the Feature in a Policy.
  • Send the Feature for approval so it can be used outside your draft workspace.