Read FHIR R4/R5 resources from XML format using the XML format handler with xml_flatten_config
## Overview FHIR XML is the XML-based serialization of FHIR (Fast Healthcare Interoperability Resources) defined by HL7 International. While FHIR JSON has become the more popular format for modern API integrations, FHIR XML remains prevalent in enterprise healthcare environments, particularly those with existing CDA (Clinical Document Architecture) infrastructure, HL7 v3 messaging systems, and regulatory reporting workflows that mandate XML. FHIR XML uses namespace-aware elements under the http://hl7.org/fhir namespace and supports the full FHIR resource model including R4 and R5 releases. DeltaForge reads FHIR XML files using the standard `XML` format handler (not a separate FHIR-specific format). The `xml_flatten_config` option controls how the FHIR XML element hierarchy is mapped to tabular columns. This approach mirrors how FHIR JSON files are read using the `JSON` format handler with `json_flatten_config`. ## Usage FHIR XML files are registered as external tables using `CREATE EXTERNAL TABLE` with the `XML` format and an `xml_flatten_config` option: ```sql CREATE EXTERNAL TABLE IF NOT EXISTS zone.fhir_demos.patients_xml USING XML LOCATION '{{data_path}}' OPTIONS ( file_filter = '*.xml', xml_flatten_config = '{ "root_path": "Patient", "include_paths": ["id", "name", "gender", "birthDate"], "column_mappings": {"id": "patient_id", "birthDate": "birth_date"}, "max_depth": 3, "separator": "_", "default_array_handling": "to_json" }', file_metadata = '{"columns":["df_file_name","df_row_number"]}' ); ``` Once the external table is created, query it with standard SQL: ```sql SELECT patient_id, name, gender, birth_date FROM zone.fhir_demos.patients_xml WHERE gender = 'male'; ``` ## Output Schema The output schema depends on the `xml_flatten_config` settings. XML element attributes (such as 'value' on primitive types) are extracted into columns. Nested elements can be preserved as JSON strings or flattened into separated column names when configured. When `file_metadata` is configured, additional columns such as `df_file_name` and `df_row_number` are appended. ## Key Options - **xml_flatten_config**: JSON string controlling how nested FHIR XML elements are flattened into columns. Key fields within this config: - `root_path`: XPath-like expression identifying the root element for each resource - `include_paths`: Array of element paths to extract - `column_mappings`: Object mapping element paths to custom column names - `max_depth`: Maximum nesting depth to flatten - `separator`: Character used to join nested path segments into column names - `default_array_handling`: How to handle repeating elements (e.g., `to_json`) - **file_filter**: Glob pattern to filter files within the LOCATION directory. - **file_metadata**: JSON string specifying which system columns to inject. - **suppress_narratives**: Removes rendered XHTML narrative blocks that often contain unstructured clinical text. FHIR XML data is subject to the same HIPAA privacy and security regulations as other FHIR serializations. DeltaForge preserves resource identifiers, meta elements, and all namespace-qualified content to maintain data fidelity.