Exercise all seven transform types and all three linkability scopes across three healthcare formats: HL7 v2 ADT, FHIR R4 Patient, EDI HIPAA X12, including wildcard column patterns (`address_*`, `*_name`) that protect many columns with a single rule.
-- ============================================================================
-- SETUP: Three tables, 22 rules total (6 HL7 + 8 FHIR + 8 EDI)
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
TYPE EXTERNAL
COMMENT 'External tables, demo datasets and file-backed data';
CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
COMMENT 'Healthcare data with protection rules';
-- HL7 v2 ADT, materialised PID/PV1/EVN fields
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.hl7_patients (
df_message_id VARCHAR,
pid_3 VARCHAR, pid_5 VARCHAR, pid_7 VARCHAR, pid_8 VARCHAR,
pid_11 VARCHAR, pid_13 VARCHAR, pid_19 VARCHAR,
pv1_2 VARCHAR, pv1_3 VARCHAR, pv1_7 VARCHAR,
evn_1 VARCHAR, status VARCHAR
) LOCATION '{{data_path}}/hl7_patients';
-- FHIR R4 Patient, flattened
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.fhir_patients (
patient_id VARCHAR, family_name VARCHAR, given_name VARCHAR,
birth_date DATE, gender VARCHAR,
email VARCHAR, phone VARCHAR,
address_line VARCHAR, address_city VARCHAR,
address_state VARCHAR, address_postal VARCHAR,
mrn VARCHAR, ssn VARCHAR, marital_status VARCHAR, active BOOLEAN
) LOCATION '{{data_path}}/fhir_patients';
-- EDI HIPAA X12, materialised NM1/CLM/BPR/DMG
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.edi_claims (
df_transaction_id VARCHAR,
st_1 VARCHAR, bht_2 VARCHAR,
nm1_1 VARCHAR, nm1_3 VARCHAR, nm1_4 VARCHAR, nm1_8 VARCHAR,
dmg_1 VARCHAR, dmg_2 VARCHAR,
clm_1 VARCHAR, clm_2 DOUBLE,
bpr_1 VARCHAR, bpr_2 DOUBLE, bpr_8 VARCHAR, bpr_14 VARCHAR
) LOCATION '{{data_path}}/edi_claims';
-- HL7 rules (6): PID-3 keyed_hash, PID-5 tokenize, PID-7 generalize,
-- PID-11 hash, PID-13 mask, PID-19 redact
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_3)
TRANSFORM keyed_hash SCOPE person PRIORITY 10 PARAMS (salt = 'hl7_mrn_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_19)
TRANSFORM redact PRIORITY 20 PARAMS (mask = '***-**-****');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_7)
TRANSFORM generalize SCOPE relationship PARAMS (range = 10000);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_13)
TRANSFORM mask PARAMS (show = 5);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_11)
TRANSFORM hash;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_5)
TRANSFORM tokenize SCOPE person PRIORITY 5;
-- FHIR rules (8): incl. wildcard address_* and *_name, AES256 on email
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (email)
TRANSFORM encrypt SCOPE person PRIORITY 10 PARAMS (algorithm = 'AES256');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (patient_id)
TRANSFORM tokenize SCOPE person PRIORITY 10;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (ssn)
TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'fhir_ssn_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (phone)
TRANSFORM mask PRIORITY 5 PARAMS (show = 4);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (mrn)
TRANSFORM redact PARAMS (mask = '[REDACTED]');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (address_*)
TRANSFORM hash PRIORITY 1;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (birth_date)
TRANSFORM generalize SCOPE relationship PARAMS (range = 10);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (*_name)
TRANSFORM keyed_hash SCOPE person PRIORITY 3 PARAMS (salt = 'fhir_name_salt_2024');
-- EDI rules (8): member id, bank accounts, claim number, charges, DOB
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_8)
TRANSFORM keyed_hash SCOPE person PRIORITY 20 PARAMS (salt = 'edi_member_id_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (bpr_8)
TRANSFORM redact PRIORITY 20 PARAMS (mask = '**********');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (bpr_14)
TRANSFORM redact PRIORITY 20 PARAMS (mask = '**********');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (clm_1)
TRANSFORM tokenize SCOPE transaction PRIORITY 10;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (clm_2)
TRANSFORM mask PARAMS (show = 2);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_3)
TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'edi_name_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_4)
TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'edi_name_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (dmg_1)
TRANSFORM generalize SCOPE relationship PARAMS (range = 10000);
-- ============================================================================
-- QUERIES: Per-table audit, protected reads, wildcard verification, drop one
-- ============================================================================
ASSERT ROW_COUNT = 6
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.hl7_patients;
ASSERT ROW_COUNT = 8
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.fhir_patients;
ASSERT ROW_COUNT = 8
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.edi_claims;
-- HL7 protected SELECT
ASSERT ROW_COUNT = 4
ASSERT VALUE pv1_2 = 'I' WHERE df_message_id = 'MSG001'
ASSERT VALUE status = 'Active' WHERE df_message_id = 'MSG001'
ASSERT VALUE physician = 'DR JONES' WHERE df_message_id = 'MSG001'
SELECT df_message_id,
pid_3 AS mrn_hash, pid_5 AS name_token,
pid_7 AS dob_generalized, pid_13 AS phone_masked,
pid_19 AS ssn_redacted, pv1_2, pv1_7 AS physician, status
FROM {{zone_name}}.pseudonymisation_demos.hl7_patients;
-- FHIR wildcard proof, all four address_* columns are hashed
ASSERT ROW_COUNT = 4
SELECT patient_id AS id_token,
address_line AS addr_hash,
address_city AS city_hash,
address_state AS state_hash,
address_postal AS zip_hash
FROM {{zone_name}}.pseudonymisation_demos.fhir_patients;
-- EDI 837 claims with scope=transaction on clm_1
ASSERT ROW_COUNT = 3
ASSERT VALUE txn_type = '837' WHERE df_transaction_id = 'TXN-837-001'
ASSERT VALUE bank_acct_redacted = '**********' WHERE df_transaction_id = 'TXN-837-001'
SELECT df_transaction_id, st_1 AS txn_type,
nm1_3 AS name_hash, nm1_4 AS first_hash, nm1_8 AS member_id_hash,
clm_1 AS acct_token, clm_2 AS amount_masked,
bpr_8 AS bank_acct_redacted, bpr_14 AS bank_acct2_redacted
FROM {{zone_name}}.pseudonymisation_demos.edi_claims
WHERE st_1 = '837';
-- Drop one, phone masking no longer required on HL7
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_13);
ASSERT ROW_COUNT <= 6
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.hl7_patients;
-- ============================================================================
-- CLEANUP
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients;
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients;
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims;
DROP TABLE IF EXISTS {{zone_name}}.pseudonymisation_demos.edi_claims;
DROP TABLE IF EXISTS {{zone_name}}.pseudonymisation_demos.fhir_patients;
DROP TABLE IF EXISTS {{zone_name}}.pseudonymisation_demos.hl7_patients;
DROP SCHEMA IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE IF EXISTS {{zone_name}};
## When to Use Use this demo when standing up pseudonymisation across a mixed healthcare lake: HL7 messages from the EHR, FHIR resources from the integration layer, and EDI X12 transactions from the payer side. It is the most complete pseudonymisation surface in the demo suite: 22 rules across three tables, all 7 built-in transforms (`keyed_hash`, `encrypt`, `redact`, `generalize`, `tokenize`, `mask`, `hash`), and all 3 SCOPEs (`PERSON`, `RELATIONSHIP`, `TRANSACTION`). It is also the demo where wildcard patterns become load-bearing: `address_*` protects four FHIR address columns with one rule, and `*_name` covers both `family_name` and `given_name`. It is the right demo to reference when mapping HIPAA Safe Harbor's 18 identifier categories and GDPR Article 4(5) pseudonymisation to concrete CREATE PSEUDONYMISATION RULE statements on realistic column layouts. ## What You Will Learn 1. How to use all seven transforms against appropriate PII: `keyed_hash` for MRN/member-id, `encrypt` (AES256) for email that must be reversible, `redact` for SSN and bank accounts, `generalize` for DOB, `tokenize` for patient_id, `mask` for phone, `hash` for unsalted fingerprints [ref: CREATE_PSEUDONYM_RULE] 2. How wildcard patterns work: `address_*` matches `address_line`, `address_city`, `address_state`, `address_postal` in one rule; `*_name` matches both `family_name` and `given_name` 3. How SCOPE `TRANSACTION` on `clm_1` (patient account number) ensures the pseudonym is only stable within a single claim-processing query, preventing cross-query correlation by the external party 4. How PRIORITY resolves wildcard versus exact-match collisions on the FHIR table (e.g. an explicit `ssn` rule at priority 0 coexisting with a wildcard `*_name` rule at priority 3) 5. How the same data is SHOW-able per table (`FOR <table>`), letting you audit HL7 / FHIR / EDI rule coverage independently before an external handoff [ref: SHOW_PSEUDONYM_RULES] ## Prerequisites - Workspace with authority to CREATE ZONE, CREATE SCHEMA, CREATE DELTA TABLE, and register pseudonymisation rules on all three tables - `{{data_path}}` writable, the demo creates `hl7_patients`, `fhir_patients`, `edi_claims` subdirectories - KeyStore / Key Vault configured for the salts (`hl7_mrn_salt_2024`, `fhir_ssn_salt_2024`, `fhir_name_salt_2024`, `edi_member_id_salt_2024`, `edi_name_salt_2024`) and for the AES256 encryption key used by the FHIR `email` rule - Familiarity with HL7 v2 PID field positions (PID-3 MRN, PID-5 name, PID-7 DOB, PID-11 address, PID-13 phone, PID-19 SSN), FHIR R4 Patient resource shape, and EDI X12 segment ids (NM1, CLM, BPR, DMG)