Parses 22 UN/EDIFACT and EANCOM messages (ORDERS, ORDRSP, INVOIC, CUSCAR, BAPLIE, PAXLST, PNRGOV, DESADV, PRICAT, IFTSTA, APERAK, CONTRL) spanning directories D:95B through D:13B and six syntax identifiers, unified into a single external table keyed on UNB interchange and UNH message headers.
-- Zone & schema
CREATE ZONE IF NOT EXISTS external
TYPE EXTERNAL
COMMENT 'External tables, demo datasets and file-backed data';
CREATE SCHEMA IF NOT EXISTS external.edi_demos
COMMENT 'EDI transaction-backed external tables';
-- External table: UNB/UNH envelope + full message JSON
CREATE EXTERNAL TABLE IF NOT EXISTS external.edi_demos.edifact_messages
USING EDI
LOCATION '/path/to/edi/*.edi'
OPTIONS (
edi_config = '{"ediFormat": "edifact"}',
file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);
-- Query 1, interchange header per message
SELECT
df_file_name,
unb_1 AS syntax_id,
unb_2 AS sender,
unb_3 AS receiver,
unb_4 AS prepared_at,
unh_2 AS message_identifier
FROM external.edi_demos.edifact_messages
ORDER BY df_file_name;
-- Query 2, decompose UNH_2 into message type + directory
SELECT
SPLIT_PART(unh_2, ':', 1) AS message_type,
SPLIT_PART(unh_2, ':', 3) AS directory,
COUNT(*) AS msg_count
FROM external.edi_demos.edifact_messages
GROUP BY SPLIT_PART(unh_2, ':', 1), SPLIT_PART(unh_2, ':', 3)
ORDER BY message_type, directory;
-- Query 3, syntax identifier distribution (UNB_1 drives charset)
SELECT unb_1 AS syntax_id, COUNT(*) AS msgs
FROM external.edi_demos.edifact_messages
GROUP BY unb_1
ORDER BY msgs DESC;
-- Query 4, deep access via df_transaction_json
SELECT
df_file_name,
json_array_length(df_transaction_json::json -> 'segments') AS segment_count
FROM external.edi_demos.edifact_messages
ORDER BY segment_count DESC
LIMIT 10;
## When to Use Use this demo when your feed is UN/EDIFACT (or its GS1 subset EANCOM) rather than X12, typical in European trade, customs, logistics, and passenger data pipelines. EDIFACT differs from X12 in envelope structure: an interchange opens with UNB and closes with UNZ, each message is wrapped in UNH/UNT, and the UNA service-string advice declares separators explicitly. This demo shows how a single CREATE EXTERNAL TABLE ingests files from multiple directory versions (D:95B, D:96A, D:01B, D:03B, D:10A, D:13B, D:00B) and multiple syntax identifiers (UNOA/UNOB/UNOC/UNOL/IATA/IATB) without per-version configuration. ## What You Will Learn 1. How `edi_config = '{"ediFormat": "edifact"}'` switches the parser from X12 to EDIFACT 2. How UNB envelope columns (UNB_1 syntax id, UNB_2 sender, UNB_3 receiver, UNB_4 timestamp, UNB_5 control reference) always surface 3. How UNH_2 carries the full message identifier (e.g. `ORDERS:D:96A:UN`, `INVOIC:D:01B:UN:EAN010`) and how to split it into message type + directory + association code with SPLIT_PART 4. How df_transaction_json exposes every segment, enabling on-demand access to message-specific segments (BGM, NAD, LIN, MOA, TAX, TDT, LOC, EQD, etc.) 5. How one envelope can carry multiple UNH..UNT messages, the parser emits one row per message (e.g. `edifact_multi_message.edi` produces 2 rows, `CONTRL` with 2 UCI messages also produces 2 rows) 6. How EANCOM files parse with the same `ediFormat=edifact` setting: EANCOM is syntactically EDIFACT with GS1-specific code values ## Prerequisites - EDIFACT or EANCOM `.edi` files on a reachable path - Familiarity with the UNB/UNH header concept (analogous to X12 ISA/ST) - A zone of TYPE EXTERNAL with a schema available