Permanently transforms existing data (destructive -- cannot be undone).
APPLY PSEUDONYMISATION ON <table> (<column_pattern>)
TRANSFORM <type>
[PARAMS (<key>=<val>, ...)]
[WHERE <condition>]
## Overview APPLY PSEUDONYMISATION permanently transforms column data by generating and executing an UPDATE statement against the target table. Unlike CREATE PSEUDONYMISATION RULE, which applies transforms at query time without modifying stored data, APPLY PSEUDONYMISATION writes transformed values directly to disk. This operation is destructive and cannot be reversed through the pseudonymisation system. However, because DeltaForge tables maintain a Delta log, prior versions of the data may still be accessible through time travel until the retention period expires or a VACUUM operation removes old data files. This command is intended for data lifecycle scenarios where PII must be permanently removed from storage, such as archiving records that have exceeded a retention period, de-identifying datasets before sharing with external parties, or fulfilling GDPR erasure requests on historical data. ## Behavior - The executor maps the specified transform type to a SQL expression and generates an UPDATE statement of the form: UPDATE <table> SET <column> = <transform_expression> [WHERE <condition>]. This UPDATE is then executed through the unified executor pipeline. - Each transform type produces a specific SQL expression: keyed_hash and encrypt generate sha2(concat(key, CAST(column AS VARCHAR)), 256); hash generates sha2(CAST(column AS VARCHAR), 256); redact replaces the column with a literal string; mask uses concat(substring(...), repeat('*', ...)) to preserve leading characters; generalize uses floor division and multiplication to bucket values; tokenize generates concat('TOK_', substring(sha2(...), 1, 16)). - The WHERE clause restricts which rows are updated. When omitted, all rows in the table are transformed. The clause is passed as raw SQL and supports any valid boolean expression. - The response message includes the number of rows affected by the UPDATE. This count reflects the actual number of rows modified on disk. - The command does not interact with the pseudonymisation rule system. Existing query-time rules on the same column remain active after APPLY completes. If both a permanent transformation and a query-time rule are applied to the same column, the query-time rule will transform the already-transformed stored values, which may produce unintended results. - The generated UPDATE writes new data files to the Delta log. The original data files remain on disk until a VACUUM operation removes them. During the retention window, time travel queries (SELECT ... VERSION AS OF) can still access the pre-transformation data. ## Compatibility APPLY PSEUDONYMISATION supports GDPR Article 17 (right to erasure) and HIPAA de-identification requirements for permanently removing identifiable information from stored datasets. The optional WHERE clause enables targeted transformations, such as de-identifying only records older than a specified date, which aligns with data minimization principles.
| Name | Type | Description |
|---|---|---|
table_name | Specify the fully qualified table name in zone.schema.table format. The table must already exist and contain the target column. | |
column_pattern | Specify the column name to transform. Provide the name in parentheses after the table name. Unlike CREATE PSEUDONYMISATION RULE, this parameter targets a single concrete column rather than a pattern, because the command generates a physical UPDATE statement. | |
transform_type | Specify the transformation function to apply permanently. Valid values: keyed_hash (SHA256 of the concatenation of a key string and the column value), hash (one-way SHA256 of the column value), encrypt (SHA256 with a key, functionally equivalent to keyed_hash), redact (replace with a fixed mask string), mask (preserve the first N characters and replace the remainder with '*'), generalize (bucket numeric values into ranges by dividing and flooring), tokenize (generate a 'TOK_' prefixed 16-character SHA256 substring). | |
params | Provide transform-specific parameters as key-value pairs. Supported parameters vary by transform: 'key' (salt string for keyed_hash and encrypt, defaults to 'default_key' and 'encryption_key' respectively), 'mask' (replacement string for redact, defaults to '***REDACTED***'), 'show' (number of leading characters to preserve for mask, defaults to 4), 'range' (bucket size for generalize, defaults to 10). | |
where_clause | Specify a SQL boolean expression to restrict which rows are transformed. When omitted, all rows in the table are updated. The WHERE clause is appended verbatim to the generated UPDATE statement. |
APPLY PSEUDONYMISATION ON medical.patients (patient_name)
TRANSFORM keyed_hash
WHERE discharge_date < '2020-01-01';
APPLY PSEUDONYMISATION ON finance.accounts (ssn)
TRANSFORM redact
PARAMS (mask = '***-**-****');
APPLY PSEUDONYMISATION ON hr.former_employees (salary)
TRANSFORM generalize
PARAMS (range = 10000)
WHERE termination_date < '2023-01-01';
APPLY PSEUDONYMISATION ON clinical.archived_trials (participant_name)
TRANSFORM hash;
APPLY PSEUDONYMISATION ON insurance.closed_claims (claimant_name)
TRANSFORM tokenize
WHERE status = 'closed';
APPLY PSEUDONYMISATION ON banking.dormant_accounts (phone)
TRANSFORM mask
PARAMS (show = 3);