Register four query-time pseudonymisation rules on a retail-bank KYC table, redact SSN, mask phone, keyed-hash last name, generalize date of birth, and prove that COUNT/SUM/AVG still read the underlying clean values.
-- ============================================================================
-- SETUP: Zone, schema, Delta table, and 4 pseudonymisation rules
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
TYPE EXTERNAL
COMMENT 'Demo zone for pseudonymisation quickstart';
CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
COMMENT 'Schema for banking KYC pseudonymisation demo';
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.bank_customers (
customer_id VARCHAR,
first_name VARCHAR,
last_name VARCHAR,
date_of_birth DATE,
email VARCHAR,
phone VARCHAR,
ssn VARCHAR,
address_line VARCHAR,
city VARCHAR,
state VARCHAR,
zip_code VARCHAR,
account_tier VARCHAR,
balance DOUBLE,
active BOOLEAN
) LOCATION '{{data_path}}/bank_customers';
GRANT ADMIN ON TABLE {{zone_name}}.pseudonymisation_demos.bank_customers TO USER {{current_user}};
-- 1. Redact SSN, fully hidden
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (ssn)
TRANSFORM redact
PARAMS (mask = '***-**-****');
-- 2. Mask phone, show last 5 characters
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (phone)
TRANSFORM mask
PARAMS (show = 5);
-- 3. Keyed-hash last name, deterministic pseudonym for linkage
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (last_name)
TRANSFORM keyed_hash
SCOPE person
PARAMS (salt = 'bank_name_salt_2024');
-- 4. Generalize date of birth, bucket to decade
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers (date_of_birth)
TRANSFORM generalize
SCOPE relationship
PARAMS (range = 10);
-- ============================================================================
-- QUERIES: Inspect rules, read pseudonymised data, aggregate
-- ============================================================================
-- All 4 rules visible in the catalog
ASSERT ROW_COUNT = 4
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.bank_customers;
-- Protected SELECT: SSN is literally '***-**-****' for every row
ASSERT ROW_COUNT = 6
ASSERT VALUE ssn_redacted = '***-**-****' WHERE customer_id = 'C001'
ASSERT VALUE account_tier = 'Premium' WHERE customer_id = 'C001'
SELECT customer_id, first_name,
last_name AS last_name_hashed,
date_of_birth AS dob_generalized,
phone AS phone_masked,
ssn AS ssn_redacted,
account_tier, balance, active
FROM {{zone_name}}.pseudonymisation_demos.bank_customers;
-- Aggregations untouched, balance totals reflect stored bytes
ASSERT ROW_COUNT = 2
ASSERT VALUE customer_count = 3 WHERE account_tier = 'Premium'
ASSERT VALUE customer_count = 3 WHERE account_tier = 'Standard'
SELECT account_tier,
COUNT(*) AS customer_count,
ROUND(AVG(balance), 2) AS avg_balance,
ROUND(SUM(balance), 2) AS total_balance
FROM {{zone_name}}.pseudonymisation_demos.bank_customers
GROUP BY account_tier
ORDER BY account_tier;
-- ============================================================================
-- CLEANUP: Rules drop per-table when the column pattern is omitted
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.bank_customers;
DROP TABLE IF EXISTS {{zone_name}}.pseudonymisation_demos.bank_customers;
DROP SCHEMA IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE IF EXISTS {{zone_name}};
## When to Use Use this demo when onboarding analysts or auditors to a KYC / retail-banking dataset where the raw PII (SSN, full phone, full name, exact date of birth) must never reach the screen, but downstream aggregations, balance per tier, active-customer counts, cohort averages, must remain exact. It is the shortest path from "I have a Delta table with PII" to "my analysts see safe values while my numbers stay correct", using only four CREATE PSEUDONYMISATION RULE statements and no data rewrites. ## What You Will Learn 1. How CREATE PSEUDONYMISATION RULE attaches a runtime transform to a Delta table without touching a single row on disk [ref: CREATE_PSEUDONYM_RULE] 2. How the four core transforms differ in intent: `redact` for totally hidden fields, `mask` for partial disclosure, `keyed_hash` for deterministic pseudonyms that still support joins and GROUP BY, and `generalize` for k-anonymity-style bucketing 3. Why SCOPE matters: `PERSON` gives the same pseudonym everywhere for the same individual (enabling cross-table linkage), `RELATIONSHIP` is the default and scopes to related records, `TRANSACTION` scopes only to the current query 4. Why COUNT, SUM, and AVG over `balance` keep producing correct, byte-exact values: pseudonymisation rewrites the TableProvider's display path, not the stored bytes 5. How SHOW PSEUDONYMISATION RULES FOR `<table>` returns exactly the active rule set for a single table so you can audit coverage ## Prerequisites - DeltaForge GUI or CLI session connected to a workspace with CREATE ZONE / CREATE SCHEMA / CREATE DELTA TABLE privileges - `{{data_path}}` resolvable to a writable location (the demo creates `{{data_path}}/bank_customers`) - `{{current_user}}` has (or will receive) ADMIN on the created table - A Key Vault / KeyStore backing the `keyed_hash` salt, in the default compute-node deployment this is the universal Key Vault backed by the OS credential store; in standalone/test mode the salt is held in session memory only