Parse ASC X12 EDI messages used in North American B2B transactions (healthcare, supply chain, finance)
## Overview ASC X12 is the dominant electronic data interchange standard for business-to-business transactions in North America. Maintained by the Accredited Standards Committee X12, the format covers hundreds of transaction sets spanning healthcare claims (837), remittance advice (835), eligibility inquiries (270/271), purchase orders (850), invoices (810), advance ship notices (856), and more. Each X12 file consists of nested envelopes: an Interchange (ISA/IEA) wrapping one or more Functional Groups (GS/GE), each containing one or more Transaction Sets (ST/SE). DeltaForge reads X12 files through the unified EDI format handler. The format identifier in SQL is `EDI`, and the `edi_config` option with `ediFormat` set to `x12` selects the X12 dialect. Delimiters are auto-detected from the fixed-position ISA header: the element separator at ISA position 4, the sub-element separator at ISA position 105, and the segment terminator immediately following ISA-16. ## Usage X12 files are registered as external tables using `CREATE EXTERNAL TABLE` with the `EDI` format: ```sql CREATE EXTERNAL TABLE IF NOT EXISTS zone.edi_demos.x12_transactions USING EDI LOCATION '{{data_path}}/*.edi' OPTIONS ( edi_config = '{"ediFormat": "x12"}', file_metadata = '{"columns":["df_file_name","df_row_number"]}' ); ``` Once the external table is created, query it with standard SQL: ```sql SELECT ISA_6, ISA_8, GS_1, ST_1, df_transaction_json FROM zone.edi_demos.x12_transactions WHERE ST_1 = '850'; ``` Use JSON functions on the `df_transaction_json` column for deep segment-level access beyond the default envelope columns. ## Output Schema Each X12 transaction set produces one row. The default output columns are: | Column | Description | |---|---| | ISA_1 through ISA_16 | Interchange envelope fields (authorization, security, sender/receiver IDs, date/time, control number, etc.) | | GS_1 through GS_8 | Functional group header fields (functional identifier code, sender/receiver codes, date/time, group control number, responsible agency, version) | | ST_1 | Transaction set identifier code (e.g., 850, 810, 856, 837, 835) | | ST_2 | Transaction set control number | | df_transaction_json | Full transaction set content as a JSON string | | df_transaction_id | Unique deterministic hash identifying the transaction | When `file_metadata` is configured, additional columns such as `df_file_name` and `df_row_number` are appended. ## Key Options - **edi_config**: JSON string with `ediFormat` set to `x12`. This is the primary option that selects the X12 dialect within the unified EDI reader. - **file_metadata**: JSON string specifying which system columns to inject, e.g., `{"columns":["df_file_name","df_row_number"]}`. - **strict**: Validates that IEA/GE/SE trailer counts match their corresponding headers. Disable for lenient parsing of malformed files. - **flatten_composites**: Expands composite sub-elements into separate columns, useful for healthcare claim segments with multi-part fields. ## Common Transaction Sets - **850**: Purchase Order - **810**: Invoice - **856**: Advance Ship Notice / Manifest - **837**: Healthcare Claim (Professional, Institutional, Dental) - **835**: Healthcare Remittance Advice - **270/271**: Eligibility Inquiry and Response For healthcare transaction sets (837, 835, 270/271), X12 processing is subject to HIPAA electronic transaction regulations. DeltaForge preserves all control numbers and envelope metadata to support audit trail requirements.