FHIR Clinical Observations: Vital Signs & Lab Results

Flatten HL7 FHIR R5 Observation resources, heart rate NDJSON bulk exports, vital-sign panels, lab results, and blood pressure with nested component[] arrays, into queryable SQL tables.

Category: fhir

Syntax

-- 1. Zone + schema
CREATE ZONE IF NOT EXISTS {{zone_name}} TYPE EXTERNAL
    COMMENT 'External tables, demo datasets and file-backed data';
CREATE SCHEMA IF NOT EXISTS {{zone_name}}.fhir_demos
    COMMENT 'FHIR-backed external tables: HL7 FHIR R5 resources as JSON';

-- 2. Bulk heart-rate NDJSON, 100 Observation resources, one per line
CREATE EXTERNAL TABLE IF NOT EXISTS {{zone_name}}.fhir_demos.observations_bulk
USING JSON
LOCATION '{{data_path}}'
OPTIONS (
    file_filter = '*.ndjson',
    json_flatten_config = '{
        "root_path": "$",
        "include_paths": [
            "$.resourceType", "$.id", "$.status", "$.code",
            "$.subject", "$.effectiveDateTime", "$.valueQuantity"
        ],
        "json_paths": ["$.code", "$.subject", "$.valueQuantity"],
        "column_mappings": {
            "$.id": "observation_id",
            "$.effectiveDateTime": "effective_date",
            "$.valueQuantity": "value_quantity"
        },
        "max_depth": 4,
        "separator": "_",
        "default_array_handling": "to_json",
        "infer_types": true
    }',
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

-- 3. Mixed-type clinical Observations, 14 files (vital signs, labs, assessments)
CREATE EXTERNAL TABLE IF NOT EXISTS {{zone_name}}.fhir_demos.observations_clinical
USING JSON
LOCATION '{{data_path}}'
OPTIONS (
    file_filter = 'observation-example*.json',
    json_flatten_config = '{
        "root_path": "$",
        "include_paths": [
            "$.resourceType", "$.id", "$.status", "$.code", "$.category",
            "$.subject", "$.effectiveDateTime", "$.effectivePeriod",
            "$.valueQuantity", "$.valueString", "$.interpretation",
            "$.referenceRange", "$.component"
        ],
        "json_paths": ["$.code", "$.subject", "$.effectivePeriod", "$.valueQuantity"],
        "column_mappings": {
            "$.id": "observation_id",
            "$.effectiveDateTime": "effective_date",
            "$.effectivePeriod": "effective_period",
            "$.valueQuantity": "value_quantity",
            "$.valueString": "value_string",
            "$.referenceRange": "reference_range"
        },
        "max_depth": 4,
        "separator": "_",
        "default_array_handling": "to_json",
        "infer_types": true
    }',
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

-- 4. Schema-evolution probe, which optional fields does each observation populate?
SELECT observation_id,
       CASE WHEN value_quantity IS NOT NULL THEN 'Y' ELSE '-' END AS has_value,
       CASE WHEN component IS NOT NULL THEN 'Y' ELSE '-' END AS has_component,
       CASE WHEN reference_range IS NOT NULL THEN 'Y' ELSE '-' END AS has_ref_range,
       CASE WHEN interpretation IS NOT NULL THEN 'Y' ELSE '-' END AS has_interp
FROM {{zone_name}}.fhir_demos.observations_clinical
ORDER BY observation_id;

-- 5. Cleanup
DROP EXTERNAL TABLE IF EXISTS {{zone_name}}.fhir_demos.observations_bulk WITH FILES;
DROP EXTERNAL TABLE IF EXISTS {{zone_name}}.fhir_demos.observations_clinical WITH FILES;
DROP SCHEMA IF EXISTS {{zone_name}}.fhir_demos;

Description

## When to Use Use this demo when integrating clinical telemetry, vital-sign panels, or lab results from a FHIR server into DeltaForge. `Observation` is the most heterogeneous FHIR resource type: vital signs populate `valueQuantity`, blood pressure uses `component[]` with systolic/diastolic sub-observations, lab results add `referenceRange[]` and `interpretation`, and assessments like the Glasgow Coma Scale use their own component shape. This demo shows how one `json_flatten_config` handles all of them with graceful NULL padding. ## What You Will Learn 1. How to flatten FHIR `CodeableConcept` (LOINC / SNOMED coding): `code.coding[].system`, `code.coding[].code`, `code.coding[].display`, via `max_depth: 4`. 2. How to preserve complex sub-trees (`code`, `subject`, `valueQuantity`, `effectivePeriod`) as JSON blobs using `json_paths`, keeping them queryable but not exploded. 3. How FHIR Bulk Data NDJSON (100 heart rate readings) and individual Observation JSON files (14 mixed types) coexist in one directory and are split by `file_filter`. 4. How `Observation.subject.reference` links each observation to a `Patient/{id}` resource for downstream joins. 5. How schema evolution between vital-signs, labs, and assessments produces a unified table where `valueQuantity`, `valueString`, `component`, `referenceRange`, and `interpretation` are each populated only where applicable. 6. How to write audit-grade provenance queries using `df_file_name` and `df_row_number`. ## Prerequisites - Completion of `fhir-patient-demographics` (recommended), covers the baseline JSON flattening pattern. - Familiarity with FHIR `Observation` resource basics (status, code, subject, effective[x], value[x], component[]). - Knowledge of LOINC/SNOMED coding systems is helpful for interpreting query results.

Pitfalls

`Observation.value[x]` is a choice element, a single observation has exactly one of `valueQuantity`, `valueString`, `valueCodeableConcept`, `valueBoolean`, `valueInteger`, `valueRange`, `valueRatio`, `valueSampledData`, `valueTime`, `valueDateTime`, or `valuePeriod`. Blood pressure has NONE of these because it uses `component[]` for systolic/diastolic. A query that assumes `value_quantity IS NOT NULL` will silently drop BP observations.
`effective[x]` is also a choice: `effectiveDateTime`, `effectivePeriod`, `effectiveTiming`, or `effectiveInstant`. Filtering only on `effective_date` (mapped from `effectiveDateTime`) excludes any observation that used `effectivePeriod` for a timed measurement window.
`CodeableConcept.coding` is an ARRAY, a single observation can carry both a LOINC and a SNOMED coding for the same concept. Treating `code.coding[0].code` as THE code hides duplicates and obscures the multi-terminology binding that makes FHIR interoperable.
`subject.reference` is a relative string like `Patient/bulk-patient-41`, not a resolved resource. Joining to a Patient table requires matching on that exact reference format, do not strip the `Patient/` prefix unless both sides agree.
`value_quantity` kept as a JSON blob (via `json_paths`) cannot be aggregated with SQL arithmetic. To compute mean heart rate you must either drop it from `json_paths` and let the flattener produce `value_quantity_value` / `value_quantity_unit`, or parse the JSON at query time.
`Observation.status` is required with values `registered | preliminary | final | amended | corrected | cancelled | entered-in-error | unknown`. Always filter `status = 'final'` before computing clinical analytics: `preliminary` and `amended` rows will otherwise skew your cohorts.
`max_depth: 4` is tuned for `code.coding[*].code` (4 levels). Deeper FHIR structures like `component[*].valueQuantity.code` require `max_depth: 5` or they will be emitted as JSON blobs instead of scalar columns.

FHIR Clinical Observations: Vital Signs & Lab Results

Syntax

Description

Pitfalls

See Also