Pseudonymisation Quickstart: Banking KYC

Register four query-time pseudonymisation rules on a retail-bank KYC table, redact SSN, mask phone, keyed-hash last name, generalize date of birth, and prove that COUNT/SUM/AVG still read the underlying clean values.

Category: pseudonymisation

Syntax

-- ============================================================================
-- SETUP: Zone, schema, Delta table, and 4 pseudonymisation rules
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
    TYPE EXTERNAL
    COMMENT 'Demo zone for pseudonymisation quickstart';

CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
    COMMENT 'Schema for banking KYC pseudonymisation demo';

CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.bank_customers (
    customer_id    VARCHAR,
    first_name     VARCHAR,
    last_name      VARCHAR,
    date_of_birth  DATE,
    email          VARCHAR,
    phone          VARCHAR,
    ssn            VARCHAR,
    address_line   VARCHAR,
    city           VARCHAR,
    state          VARCHAR,
    zip_code       VARCHAR,
    account_tier   VARCHAR,
    balance        DOUBLE,
    active         BOOLEAN
) LOCATION '{{data_path}}/bank_customers';

GRANT ADMIN ON TABLE {{zone_name}}.pseudonymisation_demos.bank_customers TO USER {{current_user}};

-- 1. Redact SSN, fully hidden
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (ssn)
    TRANSFORM redact
    PARAMS (mask = '***-**-****');

-- 2. Mask phone, show last 5 characters
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (phone)
    TRANSFORM mask
    PARAMS (show = 5);

-- 3. Keyed-hash last name, deterministic pseudonym for linkage
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (last_name)
    TRANSFORM keyed_hash
    SCOPE person
    PARAMS (salt = 'bank_name_salt_2024');

-- 4. Generalize date of birth, bucket to decade
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (date_of_birth)
    TRANSFORM generalize
    SCOPE relationship
    PARAMS (range = 10);

-- ============================================================================
-- QUERIES: Inspect rules, read pseudonymised data, aggregate
-- ============================================================================
-- All 4 rules visible in the catalog
ASSERT ROW_COUNT = 4
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.bank_customers;

-- Protected SELECT: SSN is literally '***-**-****' for every row
ASSERT ROW_COUNT = 6
ASSERT VALUE ssn_redacted = '***-**-****' WHERE customer_id = 'C001'
ASSERT VALUE account_tier = 'Premium' WHERE customer_id = 'C001'
SELECT customer_id, first_name,
       last_name     AS last_name_hashed,
       date_of_birth AS dob_generalized,
       phone         AS phone_masked,
       ssn           AS ssn_redacted,
       account_tier, balance, active
FROM {{zone_name}}.pseudonymisation_demos.bank_customers;

-- Aggregations untouched, balance totals reflect stored bytes
ASSERT ROW_COUNT = 2
ASSERT VALUE customer_count = 3 WHERE account_tier = 'Premium'
ASSERT VALUE customer_count = 3 WHERE account_tier = 'Standard'
SELECT account_tier,
       COUNT(*)               AS customer_count,
       ROUND(AVG(balance), 2) AS avg_balance,
       ROUND(SUM(balance), 2) AS total_balance
FROM {{zone_name}}.pseudonymisation_demos.bank_customers
GROUP BY account_tier
ORDER BY account_tier;

-- ============================================================================
-- CLEANUP: Rules drop per-table when the column pattern is omitted
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers;
DROP TABLE   IF EXISTS {{zone_name}}.pseudonymisation_demos.bank_customers;
DROP SCHEMA  IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE    IF EXISTS {{zone_name}};

Description

## When to Use Use this demo when onboarding analysts or auditors to a KYC / retail-banking dataset where the raw PII (SSN, full phone, full name, exact date of birth) must never reach the screen, but downstream aggregations, balance per tier, active-customer counts, cohort averages, must remain exact. It is the shortest path from "I have a Delta table with PII" to "my analysts see safe values while my numbers stay correct", using only four CREATE PSEUDONYMISATION RULE statements and no data rewrites. ## What You Will Learn 1. How CREATE PSEUDONYMISATION RULE attaches a runtime transform to a Delta table without touching a single row on disk [ref: CREATE_PSEUDONYM_RULE] 2. How the four core transforms differ in intent: `redact` for totally hidden fields, `mask` for partial disclosure, `keyed_hash` for deterministic pseudonyms that still support joins and GROUP BY, and `generalize` for k-anonymity-style bucketing 3. Why SCOPE matters: `PERSON` gives the same pseudonym everywhere for the same individual (enabling cross-table linkage), `RELATIONSHIP` is the default and scopes to related records, `TRANSACTION` scopes only to the current query 4. Why COUNT, SUM, and AVG over `balance` keep producing correct, byte-exact values: pseudonymisation rewrites the TableProvider's display path, not the stored bytes 5. How SHOW PSEUDONYMISATION RULES FOR `<table>` returns exactly the active rule set for a single table so you can audit coverage ## Prerequisites - DeltaForge GUI or CLI session connected to a workspace with CREATE ZONE / CREATE SCHEMA / CREATE DELTA TABLE privileges - `{{data_path}}` resolvable to a writable location (the demo creates `{{data_path}}/bank_customers`) - `{{current_user}}` has (or will receive) ADMIN on the created table - A Key Vault / KeyStore backing the `keyed_hash` salt, in the default compute-node deployment this is the universal Key Vault backed by the OS credential store; in standalone/test mode the salt is held in session memory only

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →