Skip to content

Register a Table

TL;DR
Open Data Vault → Table Registry, click + New Data Table, and register each source table that downstream Data Elements, Features, Models, or Policies will read from. Registration gives you change history, lineage, and quality checks for free.

What is it?

DataTable serves as the starting point of the platform, allowing you to register all data sources needed for analytics.

For example, if you plan to register and monitor a Model with the following structure, you must first register all tables the Model depends on.

%%{init: {'theme':'base', 'themeVariables': {'edgeLabelBackground':'#ffffff'}}}%%
flowchart TD
    M["Model: Probability of Default"]

    M -->|uses| V1["Variable: Debt Capacity"]
    M -->|uses| V2["Variable: Age of Credit File"]

    V1 -->|uses| DE1["Column: Loan Amount"]
    V1 -->|uses| DE2["Column: Annual Income"]

    V2 -->|uses| DE3["Column: Application Date"]
    V2 -->|uses| DE4["Column: Earliest Credit Line"]

    DE1 & DE2 & DE3 -->|from| T1["Table: Application"]
    DE4 -->|from| T2["Table: Bureau Summary"]

    classDef model    fill:#fffbea,stroke:#d4a017,color:#6b4c00,font-weight:600
    classDef variable fill:#f0faf4,stroke:#34a85a,color:#144d28,font-weight:600
    classDef column   fill:#f3f0ff,stroke:#9b7fe8,color:#3b1f8c
    classDef table    fill:#f3f0ff,stroke:#9b7fe8,color:#3b1f8c

    class M model
    class V1,V2 variable
    class DE1,DE2,DE3,DE4 column
    class T1,T2 table

In this case, both the Application table and the Bureau Summary table must be registered as DataTable before proceeding with the Model registration.

Benefits of a registered DataTable

  • Change history is automatically recorded so every modification to the table is tracked.
  • Lineage shows how the table is consumed across downstream Features, Models, and Policies.
  • Quality Checks can be run on the table to verify data integrity.

How to Register?

The Table Registry maintains records of the location, content, and structure of source data tables used for analytics.

Step 1: Open the Create Form

  1. Open Data Vault → Table Registry from the home page.
  2. Click + New Data Table.
Table Registry page in Corridor with the + New Data Table button highlighted.
Table Registry is the data table catalog. Use + New Data Table to start a new Data Table entry.

A name prompt opens first. Enter a name for the table (for example, Application Table) and confirm to land on the full New Data Table form.

New Data Table form with the name field at the top.
Give the table a descriptive name

Step 2: Fill in Attributes

Field What to enter
AliasRequired Short identifier used across the platform (for example, app_table).
GroupRequired Logical grouping for the table, used for display and organising similar use cases.
Primary Table Tick this if the table is the exhaustive list of unique IDs for an entity (for example, every Application ID). Each entity can have only one primary table.
Input SourceRequired Where the data lives and how the platform reads it. See Input Source below.
DescriptionRequired A short description of what the table contains.
Attributes section of the New Data Table form with Alias, Group, and Description filled in.
Attributes to capture the table metadata

Input Source

Input Source is where you tell the platform where to read the data from and, optionally, how to preprocess it before any job runs against this table.

Fill in:

  1. Location: the path or connection string (for example, hdfs://..., s3://..., or a JDBC URL). For CSV, you can also upload a file directly instead of providing a path.
  2. Format: Parquet (default), Hive, ORC, CSV, Snowflake, or JDBC.
Input Source selector showing source types in the New Data Table form.
Choose the input source, or paste the input source location.

To optionally preprocess the data on top of the read, click Add Logic on the right of the Input Source row. See Step 3.

Step 3: (Optional) Add Read Logic

Read Logic lets you define custom Python or PySpark logic to read and preprocess data before it is used in any job.

Common use cases:

  • Type casting specific columns (for example, ID from string to bigint).
  • Transforming column values (string cleanup, normalising case, replacing nulls with defaults).
  • Merging multiple data sources into a unified table (joining application data with bureau pulls).
  • Subsetting to a relevant slice (for example, the last 10 years of data, or a specific product segment).

Inside the editor, location is a pre-defined variable that contains the path you entered in Input Source. Use it directly to read the data. Whatever object you return is the data the platform uses when running any job against this DataTable.

Add Logic editor in the New Data Table form with Clear Logic option.
The Read Logic editor lives next to Input Source. Use Clear Logic to drop back to a plain Location read.

Example 1: Typecast the ID column to bigint

import pyspark
from pyspark.sql import functions as F

# set up spark session
spark = pyspark.sql.SparkSession.builder.getOrCreate()

# read in the original data
# `location` is a pre-defined variable that contains the data location provided in Input Source
original_data = spark.read.parquet(location)

# type cast the ID column from string to bigint
type_casted_data = original_data.withColumn('ID', F.col('ID').cast('bigint'))

return type_casted_data

Example 2: Merge an applications file with a bureau pull and keep only the last 10 years

import pyspark
from pyspark.sql import functions as F

spark = pyspark.sql.SparkSession.builder.getOrCreate()

# `location` points to the applications dataset configured in Input Source
applications = spark.read.parquet(location)

# read the bureau pull from its own absolute path
bureau = spark.read.parquet('s3://risk-data-prod/bureau/bureau_summary/')

# join on application_id, then keep only the last 10 years of applications
merged = applications.join(bureau, on='application_id', how='left')
recent = merged.where(F.col('application_date') >= F.add_months(F.current_date(), -120))

return recent

If you leave Read Logic empty, the platform reads the source directly using the format you selected.

Step 4: Click Create

Click Create at the bottom right. The DataTable is saved and you land on its details page, which shows the Details tab and a Change History tab.

Create button at the bottom right of the New Data Table form.
Create button at the bottom right of the New Data Table form

Step 5: Fetch Columns

After creation, fetch the table's columns from the source. The columns appear in the Columns section at the bottom of the Details tab.

Fetch Columns action on the DataTable details page.
Fetch Columns inspects the source (or runs your Read Logic) and pulls the schema into the registry.

For each fetched column, configure:

Field What to enter
Alias Column name (auto-populated from the source).
Type Detected data type, such as string, double, or bigint.
Key For entity identifiers, pick the matching key from the dropdown. Keys are configured under Settings.
Constraint Primary for the entity's primary identifier (only available on primary tables), Unique for columns whose values must not repeat, or None.
Description A short note describing the column. Worth filling in, since it makes the table much easier to use later.
Columns section populated after fetching the schema from the source.
Each column is now available in the registry

Once columns are reviewed, the DataTable appears in Table Registry and is ready to use downstream.

How can it be used?

The Table Registry page displays a list of registered DataTables, serving as a data catalog of all available data sources.

Once a DataTable is registered, you can:

  • Navigate to Data Vault → Quality Profile to perform basic summary statistics, such as minimum, maximum, and number of missing values for any column.
  • Begin registering individual columns as DataElements, enabling their use in Feature creation, Model Registry, and other platform functionalities.
  • Run jobs against any registered object. By default, the platform retrieves data from the location specified in the DataTable.

What's Next

  • Register a data element to expose columns from this table for use across the platform.
  • See where tables fit in the broader flow in Start with a Model.