Pseudonymisation Lifecycle: Insurance Claims

Insurance-claims walkthrough of the full pseudonymisation rule lifecycle: CREATE five rules with PRIORITY, SHOW them, SELECT through them, DROP a single rule, and verify that surviving rules still protect PII after the catalog sync.

Category: pseudonymisation

Syntax

-- ============================================================================
-- SETUP: Insurance claims + 5 pseudonymisation rules with PRIORITY
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
    TYPE EXTERNAL
    COMMENT 'External tables, demo datasets and file-backed data';

CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
    COMMENT 'Insurance claims with protection rules';

CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.insurance_claims (
    claim_id         VARCHAR,
    policy_holder_id VARCHAR,
    claimant_name    VARCHAR,
    date_of_birth    VARCHAR,
    ssn              VARCHAR,
    claim_type       VARCHAR,
    description      VARCHAR,
    amount           DOUBLE,
    status           VARCHAR,
    filed_date       VARCHAR
) LOCATION '{{data_path}}/insurance_claims';

GRANT ADMIN ON TABLE {{zone_name}}.pseudonymisation_demos.insurance_claims TO USER {{current_user}};

-- Five rules, different PRIORITY levels
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (ssn)
    TRANSFORM redact  PRIORITY 20
    PARAMS (mask = '***-**-****');

CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (claimant_name)
    TRANSFORM keyed_hash  SCOPE person  PRIORITY 10
    PARAMS (salt = 'insurance_name_salt');

CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (date_of_birth)
    TRANSFORM generalize  SCOPE relationship
    PARAMS (range = 10);

CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (policy_holder_id)
    TRANSFORM tokenize  SCOPE person  PRIORITY 5;

CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (description)
    TRANSFORM mask
    PARAMS (show = 10);

-- ============================================================================
-- QUERIES: Lifecycle: list, read, aggregate, drop one, re-verify
-- ============================================================================
-- 1. Enumerate the five rules
ASSERT ROW_COUNT = 5
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.insurance_claims;

-- 2. Protected SELECT across all 5 columns
ASSERT ROW_COUNT = 5
ASSERT VALUE claim_type   = 'Auto'        WHERE claim_id = 'CLM-2024-001'
ASSERT VALUE ssn_redacted = '***-**-****' WHERE claim_id = 'CLM-2024-001'
SELECT claim_id,
       policy_holder_id AS holder_token,
       claimant_name    AS name_hash,
       date_of_birth    AS dob_generalized,
       ssn              AS ssn_redacted,
       claim_type,
       description      AS desc_masked,
       amount, status
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims;

-- 3. Aggregations untouched by pseudonymisation
ASSERT ROW_COUNT = 3
ASSERT VALUE claim_count  = 2       WHERE claim_type = 'Auto'
ASSERT VALUE total_amount = 7700.00 WHERE claim_type = 'Auto'
SELECT claim_type, COUNT(*) AS claim_count, SUM(amount) AS total_amount
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims
GROUP BY claim_type;

-- 4. Selective DROP, remove the description mask only
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims (description);

-- 5. Re-verify, catalog sync is asynchronous, so <= 5
ASSERT ROW_COUNT <= 5
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.insurance_claims;

-- SSN still redacted after the DROP
ASSERT VALUE ssn = '***-**-****' WHERE claim_id = 'CLM-2024-001'
SELECT claim_id, ssn, claim_type, amount, status
FROM {{zone_name}}.pseudonymisation_demos.insurance_claims;

-- ============================================================================
-- CLEANUP
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.insurance_claims;
DROP TABLE   IF EXISTS {{zone_name}}.pseudonymisation_demos.insurance_claims;
DROP SCHEMA  IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE    IF EXISTS {{zone_name}};

Description

## When to Use Use this demo when your compliance or data-protection team needs to see the day-two operational story for pseudonymisation, not just creating rules, but reviewing, selectively dropping, and auditing coverage over time. The scenario is an insurance claims pipeline: routine analytics needs five PII columns protected (SSN, claimant name, date of birth, policy-holder id, free-text description), but a policy review later determines that the `description` column no longer needs masking. This demo exercises exactly that motion: DROP one rule, leave the other four intact, confirm the result with SHOW. It is also the canonical demo for understanding rule PRIORITY: five rules on the same table with priorities 0, 5, 10, 20, the highest priority wins when more than one rule matches a column. ## What You Will Learn 1. How to register five heterogeneous rules on one table (`redact`, `keyed_hash`, `generalize`, `tokenize`, `mask`) covering all the common PII shapes a claims team sees [ref: CREATE_PSEUDONYM_RULE] 2. How PRIORITY resolves overlapping column patterns: higher priority wins, and ties break by creation order 3. How SHOW PSEUDONYMISATION RULES FOR `<table>` becomes the audit surface, call it before, after DROP, and again post-cleanup to prove rule count [ref: SHOW_PSEUDONYM_RULES] 4. How DROP PSEUDONYMISATION RULE ON `<table>` `(column)` removes exactly one rule, leaves the others untouched, and propagates through the catalog sync so subsequent SELECT sees the un-masked column immediately 5. Why the post-DROP ASSERT uses `ROW_COUNT <= 5` rather than `= 4`: catalog sync is asynchronous, so the harness tolerates either the pre- or post-sync state [ref: DROP_PSEUDONYM_RULE] ## Prerequisites - Workspace with CREATE / DROP authority on zones, schemas, Delta tables, and pseudonymisation rules - `{{data_path}}/insurance_claims` reachable and writable - KeyStore configured for the two salts used (`insurance_name_salt`), test environments may keep these in session memory - Understanding that `tokenize` emits opaque `TOK_...` strings that are only re-identifiable by the tokenisation service, unlike `keyed_hash` which a holder of the salt can re-derive

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →