Read FHIR R4/R5 resources from JSON and NDJSON formats using the JSON format handler with json_flatten_config
## Overview FHIR (Fast Healthcare Interoperability Resources) is the modern healthcare data exchange standard published by HL7 International. FHIR JSON is the most widely adopted serialization format, used by EHR systems, health information exchanges, payer platforms, and public health registries. The standard defines over 150 resource types covering clinical data (Patient, Observation, Condition, MedicationRequest), administrative data (Encounter, Claim, Coverage), and infrastructure (Bundle, OperationOutcome). DeltaForge reads FHIR JSON files using the standard `JSON` format handler (not a separate FHIR-specific format). The `json_flatten_config` option controls how FHIR resource structure is mapped to tabular columns. This approach supports both individual JSON resource files and NDJSON (Newline Delimited JSON) bulk export files. ## Usage FHIR JSON files are registered as external tables using `CREATE EXTERNAL TABLE` with the `JSON` format and a `json_flatten_config` option that controls field extraction and flattening: ```sql CREATE EXTERNAL TABLE IF NOT EXISTS zone.fhir_demos.patients_bulk USING JSON LOCATION '{{data_path}}' OPTIONS ( file_filter = '*.ndjson', json_flatten_config = '{ "root_path": "$", "include_paths": ["$.resourceType", "$.id", "$.name", "$.gender", "$.birthDate"], "column_mappings": {"$.id": "patient_id", "$.birthDate": "birth_date"}, "max_depth": 3, "separator": "_", "default_array_handling": "to_json", "infer_types": true }', file_metadata = '{"columns":["df_file_name","df_row_number"]}' ); ``` Once the external table is created, query it with standard SQL: ```sql SELECT patient_id, name, gender, birth_date FROM zone.fhir_demos.patients_bulk WHERE gender = 'female'; ``` ## Output Schema The output schema depends on the `json_flatten_config` settings. With the example configuration above: | Column | Description | |---|---| | resourceType | FHIR resource type (e.g., Patient, Observation) | | patient_id | Resource identifier (renamed from $.id via column_mappings) | | name | Patient name array, serialized as JSON (due to default_array_handling = to_json) | | gender | Administrative gender code | | birth_date | Date of birth (renamed from $.birthDate via column_mappings) | When `file_metadata` is configured, additional columns such as `df_file_name` and `df_row_number` are appended. ## Key Options - **json_flatten_config**: JSON string controlling how nested FHIR resources are flattened into columns. Key fields within this config: - `root_path`: JSONPath to the root of each resource (typically `$`) - `include_paths`: Array of JSONPath expressions selecting which fields to extract - `column_mappings`: Object mapping JSONPath expressions to custom column names - `max_depth`: Maximum nesting depth to flatten (integer) - `separator`: Character used to join nested path segments into column names (e.g., `_`) - `default_array_handling`: How to handle arrays -- `to_json` serializes them as JSON strings - `infer_types`: When true, automatically infers column types from values - **file_filter**: Glob pattern to filter files within the LOCATION directory (e.g., `*.ndjson`). - **file_metadata**: JSON string specifying which system columns to inject. ## NDJSON Bulk Export Support The FHIR Bulk Data Access specification produces NDJSON files where each line is a separate FHIR resource. DeltaForge handles NDJSON files natively through the JSON format handler. Use the `file_filter` option set to `*.ndjson` when the LOCATION directory contains bulk export files. FHIR data is subject to HIPAA privacy and security regulations in the United States and comparable data protection frameworks internationally. DeltaForge preserves resource identifiers and metadata to support audit trail requirements.