Pseudonymisation Healthcare: HL7, FHIR, and EDI De-identification

Exercise all seven transform types and all three linkability scopes across three healthcare formats: HL7 v2 ADT, FHIR R4 Patient, EDI HIPAA X12, including wildcard column patterns (`address_*`, `*_name`) that protect many columns with a single rule.

Category: pseudonymisation

Syntax

-- ============================================================================
-- SETUP: Three tables, 22 rules total (6 HL7 + 8 FHIR + 8 EDI)
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}}
    TYPE EXTERNAL
    COMMENT 'External tables, demo datasets and file-backed data';

CREATE SCHEMA IF NOT EXISTS {{zone_name}}.pseudonymisation_demos
    COMMENT 'Healthcare data with protection rules';

-- HL7 v2 ADT, materialised PID/PV1/EVN fields
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.hl7_patients (
    df_message_id VARCHAR,
    pid_3  VARCHAR, pid_5  VARCHAR, pid_7 VARCHAR, pid_8 VARCHAR,
    pid_11 VARCHAR, pid_13 VARCHAR, pid_19 VARCHAR,
    pv1_2 VARCHAR, pv1_3 VARCHAR, pv1_7 VARCHAR,
    evn_1 VARCHAR, status VARCHAR
) LOCATION '{{data_path}}/hl7_patients';

-- FHIR R4 Patient, flattened
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.fhir_patients (
    patient_id VARCHAR, family_name VARCHAR, given_name VARCHAR,
    birth_date DATE, gender VARCHAR,
    email VARCHAR, phone VARCHAR,
    address_line VARCHAR, address_city VARCHAR,
    address_state VARCHAR, address_postal VARCHAR,
    mrn VARCHAR, ssn VARCHAR, marital_status VARCHAR, active BOOLEAN
) LOCATION '{{data_path}}/fhir_patients';

-- EDI HIPAA X12, materialised NM1/CLM/BPR/DMG
CREATE DELTA TABLE IF NOT EXISTS {{zone_name}}.pseudonymisation_demos.edi_claims (
    df_transaction_id VARCHAR,
    st_1 VARCHAR, bht_2 VARCHAR,
    nm1_1 VARCHAR, nm1_3 VARCHAR, nm1_4 VARCHAR, nm1_8 VARCHAR,
    dmg_1 VARCHAR, dmg_2 VARCHAR,
    clm_1 VARCHAR, clm_2 DOUBLE,
    bpr_1 VARCHAR, bpr_2 DOUBLE, bpr_8 VARCHAR, bpr_14 VARCHAR
) LOCATION '{{data_path}}/edi_claims';

-- HL7 rules (6): PID-3 keyed_hash, PID-5 tokenize, PID-7 generalize,
--                PID-11 hash, PID-13 mask, PID-19 redact
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_3)
    TRANSFORM keyed_hash SCOPE person PRIORITY 10 PARAMS (salt = 'hl7_mrn_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_19)
    TRANSFORM redact PRIORITY 20 PARAMS (mask = '***-**-****');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_7)
    TRANSFORM generalize SCOPE relationship PARAMS (range = 10000);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_13)
    TRANSFORM mask PARAMS (show = 5);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_11)
    TRANSFORM hash;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_5)
    TRANSFORM tokenize SCOPE person PRIORITY 5;

-- FHIR rules (8): incl. wildcard address_* and *_name, AES256 on email
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (email)
    TRANSFORM encrypt SCOPE person PRIORITY 10 PARAMS (algorithm = 'AES256');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (patient_id)
    TRANSFORM tokenize SCOPE person PRIORITY 10;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (ssn)
    TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'fhir_ssn_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (phone)
    TRANSFORM mask PRIORITY 5 PARAMS (show = 4);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (mrn)
    TRANSFORM redact PARAMS (mask = '[REDACTED]');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (address_*)
    TRANSFORM hash PRIORITY 1;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (birth_date)
    TRANSFORM generalize SCOPE relationship PARAMS (range = 10);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients (*_name)
    TRANSFORM keyed_hash SCOPE person PRIORITY 3 PARAMS (salt = 'fhir_name_salt_2024');

-- EDI rules (8): member id, bank accounts, claim number, charges, DOB
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_8)
    TRANSFORM keyed_hash SCOPE person PRIORITY 20 PARAMS (salt = 'edi_member_id_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (bpr_8)
    TRANSFORM redact PRIORITY 20 PARAMS (mask = '**********');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (bpr_14)
    TRANSFORM redact PRIORITY 20 PARAMS (mask = '**********');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (clm_1)
    TRANSFORM tokenize SCOPE transaction PRIORITY 10;
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (clm_2)
    TRANSFORM mask PARAMS (show = 2);
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_3)
    TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'edi_name_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (nm1_4)
    TRANSFORM keyed_hash SCOPE person PARAMS (salt = 'edi_name_salt_2024');
CREATE PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims (dmg_1)
    TRANSFORM generalize SCOPE relationship PARAMS (range = 10000);

-- ============================================================================
-- QUERIES: Per-table audit, protected reads, wildcard verification, drop one
-- ============================================================================
ASSERT ROW_COUNT = 6
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.hl7_patients;

ASSERT ROW_COUNT = 8
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.fhir_patients;

ASSERT ROW_COUNT = 8
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.edi_claims;

-- HL7 protected SELECT
ASSERT ROW_COUNT = 4
ASSERT VALUE pv1_2     = 'I'         WHERE df_message_id = 'MSG001'
ASSERT VALUE status    = 'Active'    WHERE df_message_id = 'MSG001'
ASSERT VALUE physician = 'DR JONES'  WHERE df_message_id = 'MSG001'
SELECT df_message_id,
       pid_3  AS mrn_hash, pid_5  AS name_token,
       pid_7  AS dob_generalized, pid_13 AS phone_masked,
       pid_19 AS ssn_redacted, pv1_2, pv1_7 AS physician, status
FROM {{zone_name}}.pseudonymisation_demos.hl7_patients;

-- FHIR wildcard proof, all four address_* columns are hashed
ASSERT ROW_COUNT = 4
SELECT patient_id    AS id_token,
       address_line  AS addr_hash,
       address_city  AS city_hash,
       address_state AS state_hash,
       address_postal AS zip_hash
FROM {{zone_name}}.pseudonymisation_demos.fhir_patients;

-- EDI 837 claims with scope=transaction on clm_1
ASSERT ROW_COUNT = 3
ASSERT VALUE txn_type           = '837'        WHERE df_transaction_id = 'TXN-837-001'
ASSERT VALUE bank_acct_redacted = '**********' WHERE df_transaction_id = 'TXN-837-001'
SELECT df_transaction_id, st_1 AS txn_type,
       nm1_3 AS name_hash, nm1_4 AS first_hash, nm1_8 AS member_id_hash,
       clm_1 AS acct_token, clm_2 AS amount_masked,
       bpr_8 AS bank_acct_redacted, bpr_14 AS bank_acct2_redacted
FROM {{zone_name}}.pseudonymisation_demos.edi_claims
WHERE st_1 = '837';

-- Drop one, phone masking no longer required on HL7
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients (pid_13);

ASSERT ROW_COUNT <= 6
SHOW PSEUDONYMISATION RULES FOR {{zone_name}}.pseudonymisation_demos.hl7_patients;

-- ============================================================================
-- CLEANUP
-- ============================================================================
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.hl7_patients;
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.fhir_patients;
DROP PSEUDONYMISATION RULE ON {{zone_name}}.pseudonymisation_demos.edi_claims;
DROP TABLE  IF EXISTS {{zone_name}}.pseudonymisation_demos.edi_claims;
DROP TABLE  IF EXISTS {{zone_name}}.pseudonymisation_demos.fhir_patients;
DROP TABLE  IF EXISTS {{zone_name}}.pseudonymisation_demos.hl7_patients;
DROP SCHEMA IF EXISTS {{zone_name}}.pseudonymisation_demos;
DROP ZONE   IF EXISTS {{zone_name}};

Description

## When to Use Use this demo when standing up pseudonymisation across a mixed healthcare lake: HL7 messages from the EHR, FHIR resources from the integration layer, and EDI X12 transactions from the payer side. It is the most complete pseudonymisation surface in the demo suite: 22 rules across three tables, all 7 built-in transforms (`keyed_hash`, `encrypt`, `redact`, `generalize`, `tokenize`, `mask`, `hash`), and all 3 SCOPEs (`PERSON`, `RELATIONSHIP`, `TRANSACTION`). It is also the demo where wildcard patterns become load-bearing: `address_*` protects four FHIR address columns with one rule, and `*_name` covers both `family_name` and `given_name`. It is the right demo to reference when mapping HIPAA Safe Harbor's 18 identifier categories and GDPR Article 4(5) pseudonymisation to concrete CREATE PSEUDONYMISATION RULE statements on realistic column layouts. ## What You Will Learn 1. How to use all seven transforms against appropriate PII: `keyed_hash` for MRN/member-id, `encrypt` (AES256) for email that must be reversible, `redact` for SSN and bank accounts, `generalize` for DOB, `tokenize` for patient_id, `mask` for phone, `hash` for unsalted fingerprints [ref: CREATE_PSEUDONYM_RULE] 2. How wildcard patterns work: `address_*` matches `address_line`, `address_city`, `address_state`, `address_postal` in one rule; `*_name` matches both `family_name` and `given_name` 3. How SCOPE `TRANSACTION` on `clm_1` (patient account number) ensures the pseudonym is only stable within a single claim-processing query, preventing cross-query correlation by the external party 4. How PRIORITY resolves wildcard versus exact-match collisions on the FHIR table (e.g. an explicit `ssn` rule at priority 0 coexisting with a wildcard `*_name` rule at priority 3) 5. How the same data is SHOW-able per table (`FOR <table>`), letting you audit HL7 / FHIR / EDI rule coverage independently before an external handoff [ref: SHOW_PSEUDONYM_RULES] ## Prerequisites - Workspace with authority to CREATE ZONE, CREATE SCHEMA, CREATE DELTA TABLE, and register pseudonymisation rules on all three tables - `{{data_path}}` writable, the demo creates `hl7_patients`, `fhir_patients`, `edi_claims` subdirectories - KeyStore / Key Vault configured for the salts (`hl7_mrn_salt_2024`, `fhir_ssn_salt_2024`, `fhir_name_salt_2024`, `edi_member_id_salt_2024`, `edi_name_salt_2024`) and for the AES256 encryption key used by the FHIR `email` rule - Familiarity with HL7 v2 PID field positions (PID-3 MRN, PID-5 name, PID-7 DOB, PID-11 address, PID-13 phone, PID-19 SSN), FHIR R4 Patient resource shape, and EDI X12 segment ids (NM1, CLM, BPR, DMG)

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →