Insurance-claims walkthrough of the full pseudonymisation rule lifecycle: CREATE five rules with PRIORITY, SHOW them, SELECT through them, DROP a single rule, and verify that surviving rules still protect PII after the catalog sync.
-- ============================================================================
-- SETUP: Insurance claims + 5 pseudonymisation rules with PRIORITY
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
TYPE EXTERNAL
COMMENT 'External tables, demo datasets and file-backed data';
CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
COMMENT 'Insurance claims with protection rules';
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.insurance_claims (
claim_id VARCHAR,
policy_holder_id VARCHAR,
claimant_name VARCHAR,
date_of_birth VARCHAR,
ssn VARCHAR,
claim_type VARCHAR,
description VARCHAR,
amount DOUBLE,
status VARCHAR,
filed_date VARCHAR
) LOCATION '{{data_path}}/insurance_claims';
GRANT ADMIN ON TABLE {{zone_name}}.pseudonymisation_demos.insurance_claims TO USER {{current_user}};
-- Five rules, different PRIORITY levels
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (ssn)
TRANSFORM redact PRIORITY 20
PARAMS (mask = '***-**-****');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (claimant_name)
TRANSFORM keyed_hash SCOPE person PRIORITY 10
PARAMS (salt = 'insurance_name_salt');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (date_of_birth)
TRANSFORM generalize SCOPE relationship
PARAMS (range = 10);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (policy_holder_id)
TRANSFORM tokenize SCOPE person PRIORITY 5;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (description)
TRANSFORM mask
PARAMS (show = 10);
-- ============================================================================
-- QUERIES: Lifecycle: list, read, aggregate, drop one, re-verify
-- ============================================================================
-- 1. Enumerate the five rules
ASSERT ROW_COUNT = 5
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.insurance_claims;
-- 2. Protected SELECT across all 5 columns
ASSERT ROW_COUNT = 5
ASSERT VALUE claim_type = 'Auto' WHERE claim_id = 'CLM-2024-001'
ASSERT VALUE ssn_redacted = '***-**-****' WHERE claim_id = 'CLM-2024-001'
SELECT claim_id,
policy_holder_id AS holder_token,
claimant_name AS name_hash,
date_of_birth AS dob_generalized,
ssn AS ssn_redacted,
claim_type,
description AS desc_masked,
amount, status
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims;
-- 3. Aggregations untouched by pseudonymisation
ASSERT ROW_COUNT = 3
ASSERT VALUE claim_count = 2 WHERE claim_type = 'Auto'
ASSERT VALUE total_amount = 7700.00 WHERE claim_type = 'Auto'
SELECT claim_type, COUNT(*) AS claim_count, SUM(amount) AS total_amount
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims
GROUP BY claim_type;
-- 4. Selective DROP, remove the description mask only
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (description);
-- 5. Re-verify, catalog sync is asynchronous, so <= 5
ASSERT ROW_COUNT <= 5
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.insurance_claims;
-- SSN still redacted after the DROP
ASSERT VALUE ssn = '***-**-****' WHERE claim_id = 'CLM-2024-001'
SELECT claim_id, ssn, claim_type, amount, status
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims;
-- ============================================================================
-- CLEANUP
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims;
DROP TABLE IF EXISTS {{zone_name}}.pseudonymisation_demos.insurance_claims;
DROP SCHEMA IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE IF EXISTS {{zone_name}};
## When to Use Use this demo when your compliance or data-protection team needs to see the day-two operational story for pseudonymisation, not just creating rules, but reviewing, selectively dropping, and auditing coverage over time. The scenario is an insurance claims pipeline: routine analytics needs five PII columns protected (SSN, claimant name, date of birth, policy-holder id, free-text description), but a policy review later determines that the `description` column no longer needs masking. This demo exercises exactly that motion: DROP one rule, leave the other four intact, confirm the result with SHOW. It is also the canonical demo for understanding rule PRIORITY: five rules on the same table with priorities 0, 5, 10, 20, the highest priority wins when more than one rule matches a column. ## What You Will Learn 1. How to register five heterogeneous rules on one table (`redact`, `keyed_hash`, `generalize`, `tokenize`, `mask`) covering all the common PII shapes a claims team sees [ref: CREATE_PSEUDONYM_RULE] 2. How PRIORITY resolves overlapping column patterns: higher priority wins, and ties break by creation order 3. How SHOW PSEUDONYMISATION RULES FOR `<table>` becomes the audit surface, call it before, after DROP, and again post-cleanup to prove rule count [ref: SHOW_PSEUDONYM_RULES] 4. How DROP PSEUDONYMISATION RULE ON `<table>` `(column)` removes exactly one rule, leaves the others untouched, and propagates through the catalog sync so subsequent SELECT sees the un-masked column immediately 5. Why the post-DROP ASSERT uses `ROW_COUNT <= 5` rather than `= 4`: catalog sync is asynchronous, so the harness tolerates either the pre- or post-sync state [ref: DROP_PSEUDONYM_RULE] ## Prerequisites - Workspace with CREATE / DROP authority on zones, schemas, Delta tables, and pseudonymisation rules - `{{data_path}}/insurance_claims` reachable and writable - KeyStore configured for the two salts used (`insurance_name_salt`), test environments may keep these in session memory - Understanding that `tokenize` emits opaque `TOK_...` strings that are only re-identifiable by the tokenisation service, unlike `keyed_hash` which a holder of the salt can re-derive