Extracts deeply-nested X12 segments ad hoc using SQL JSON functions, json_typeof, json_array_length, json_extract_path_text, jsonb_pretty, and the #>> path operator, against the df_transaction_json column, without pre-configuring materialized_paths.
-- Envelope-only table; full transaction body lives in df_transaction_json
CREATE EXTERNAL TABLE IF NOT EXISTS external.edi_demos.json_extraction_messages
USING EDI
LOCATION '/path/to/edi/*.edi'
OPTIONS (
edi_config = '{"ediFormat": "x12"}',
file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);
-- Query 1, structural probe: segment count per transaction
SELECT
df_file_name,
st_1 AS txn_type,
json_typeof(df_transaction_json::json) AS root_type,
json_array_length(df_transaction_json::json -> 'segments') AS segment_count
FROM external.edi_demos.json_extraction_messages
ORDER BY segment_count DESC;
-- Query 2, read the first segment's name and third element via path operators
SELECT
df_file_name,
df_transaction_json::json #>> '{segments,0,name}' AS first_seg_name,
df_transaction_json::json #>> '{segments,0,elements,2}' AS first_seg_elem3,
json_extract_path_text(df_transaction_json::json, 'segments', '1', 'name') AS second_seg_name
FROM external.edi_demos.json_extraction_messages
LIMIT 5;
-- Query 3, pretty-print one transaction for debugging
SELECT
df_file_name,
jsonb_pretty(df_transaction_json::jsonb) AS pretty
FROM external.edi_demos.json_extraction_messages
WHERE df_file_name = 'x12_850_purchase_order_a.edi'
LIMIT 1;
-- Query 4, count PO1 line items without materializing
WITH parsed AS (
SELECT
df_file_name,
st_1,
df_transaction_json::json AS j
FROM external.edi_demos.json_extraction_messages
WHERE st_1 = '850'
)
SELECT
df_file_name,
(
SELECT COUNT(*)
FROM json_array_elements(j -> 'segments') AS seg
WHERE seg ->> 'name' = 'PO1'
) AS po1_line_count
FROM parsed
ORDER BY po1_line_count DESC;
## When to Use Use this pattern when materialized_paths is the wrong tool: when you receive hundreds of heterogeneous X12 messages daily and cannot pre-declare every segment of interest, when a one-off investigation needs access to an obscure segment (SAC, TXI, HL, PID, MSG) that is not worth materializing permanently, or when segment occurrence counts vary so wildly that neither `indexed` nor a fixed `max_repeating_segments` fits. `df_transaction_json` is produced on every EDI external table by default and contains the complete parsed transaction as a JSON object, every segment, every element, every component, preserving occurrence order. Materialized_paths should remain your default for hot-path columns. Use JSON extraction when the query is exploratory, the segment is rare, or the shape varies across files. ## What You Will Learn 1. How df_transaction_json structures an X12 transaction as `{segments: [{name, elements: [...]}, ...]}` (same shape for EDIFACT/TRADACOMS, only element content differs) 2. How to use `json_typeof`, `json_array_length`, and the `#>>` path operator to probe the structure without pre-knowledge 3. How to cast STRING to JSON (`df_transaction_json::json`), required; raw JSON functions on a STRING return NULL 4. How to extract a named segment using a WHERE clause inside a LATERAL subquery over the segments array 5. How to count occurrences of a specific segment name (e.g. PO1 line items) without materializing them 6. When JSON extraction is slower than materialized_paths, every query reparses the JSON string, so high-fanout queries should graduate to materialized columns ## Prerequisites - Familiarity with the X12 or EDIFACT structure of the files you are querying - Familiarity with PostgreSQL-style JSON operators (`->`, `->>`, `#>`, `#>>`) and functions (`json_typeof`, `json_array_length`, `json_extract_path_text`) - A working external table that leaves `materialized_paths` unset (or minimally set) so df_transaction_json carries the full payload