JSON Customers — Basics

Read a flat JSON array file into an external table with include_paths, type inference, and file_metadata.

Category: json

Syntax

-- ============================================================================
-- Setup: zone, schema, and external table over customers.json
-- ============================================================================
CREATE ZONE IF NOT EXISTS {{zone_name}} TYPE EXTERNAL
    COMMENT 'External tables — demo datasets and file-backed data';

CREATE SCHEMA IF NOT EXISTS {{zone_name}}.json_demos
    COMMENT 'JSON-backed external tables';

CREATE EXTERNAL TABLE IF NOT EXISTS {{zone_name}}.json_demos.customers
USING JSON
LOCATION '{{data_path}}'
OPTIONS (
    json_flatten_config = '{
        "root_path": "$",
        "include_paths": [
            "$.id",
            "$.email",
            "$.first",
            "$.last",
            "$.company",
            "$.created_at",
            "$.country"
        ],
        "max_depth": 1,
        "separator": "_",
        "default_array_handling": "to_json",
        "infer_types": true
    }',
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);
GRANT ADMIN ON TABLE {{zone_name}}.json_demos.customers TO USER {{current_user}};

-- ============================================================================
-- Queries: verify row count, column mapping, type inference, metadata
-- ============================================================================
-- Total row count (array length)
ASSERT ROW_COUNT = 200
SELECT * FROM {{zone_name}}.json_demos.customers;

-- Browse with auto-detected column names
ASSERT ROW_COUNT = 10
ASSERT VALUE first = 'Torrey' WHERE id = '1'
ASSERT VALUE country = 'Switzerland' WHERE id = '1'
SELECT id, email, first, last, company, created_at, country
FROM {{zone_name}}.json_demos.customers
ORDER BY id
LIMIT 10;

-- Type inference: created_at is a real TIMESTAMP, so MIN/MAX work
ASSERT ROW_COUNT = 1
ASSERT VALUE earliest_signup >= '2014-04-02'
ASSERT VALUE latest_signup  <  '2015-03-22'
SELECT MIN(created_at) AS earliest_signup,
       MAX(created_at) AS latest_signup
FROM {{zone_name}}.json_demos.customers;

-- Country distribution (top 10)
ASSERT ROW_COUNT = 10
ASSERT VALUE customer_count = 4 WHERE country = 'Congo'
SELECT country, COUNT(*) AS customer_count
FROM {{zone_name}}.json_demos.customers
GROUP BY country
ORDER BY customer_count DESC
LIMIT 10;

-- file_metadata: df_file_name + row numbering 1..200
ASSERT ROW_COUNT = 1
ASSERT VALUE df_file_name LIKE '%customers%'
ASSERT VALUE first_row = 1
ASSERT VALUE last_row  = 200
SELECT df_file_name,
       MIN(df_row_number) AS first_row,
       MAX(df_row_number) AS last_row
FROM {{zone_name}}.json_demos.customers
GROUP BY df_file_name;

-- Required-field integrity
ASSERT VALUE null_row_count = 0
SELECT COUNT(*) AS null_row_count
FROM {{zone_name}}.json_demos.customers
WHERE id IS NULL OR email IS NULL
   OR first IS NULL OR last IS NULL
   OR country IS NULL;

-- ============================================================================
-- Cleanup
-- ============================================================================
DROP EXTERNAL TABLE IF EXISTS {{zone_name}}.json_demos.customers WITH FILES;
DROP SCHEMA IF EXISTS {{zone_name}}.json_demos;
DROP ZONE   IF EXISTS {{zone_name}};

Description

## When to Use Use this demo when you have a single JSON file containing a top-level array of flat objects (a classic CRM / API export shape) and you want a queryable external table with proper types and source-file provenance. It is the canonical starting point for JSON in DeltaForge: no nesting, no explode, just `json_flatten_config` with `include_paths`, `infer_types`, and `file_metadata` on top of `CREATE EXTERNAL TABLE ... USING JSON`. ## What You Will Learn 1. How to register a JSON array file (`[{...}, {...}]`) as an external table. 2. How `json_flatten_config.root_path = "$"` tells DeltaForge to emit one row per array element. 3. How `include_paths` restricts the projection to just the fields you need. 4. How `max_depth: 1` skips any (absent) nested structures and keeps the table flat. 5. How `infer_types: true` promotes ISO-8601 strings (e.g. `2014-12-25T04:06:27.981Z`) to real `TIMESTAMP` so `MIN`/`MAX` and range filters work. 6. How auto-detected column names drop the `$.` prefix and lowercase the leaf (so `$.first` becomes `first`). 7. How `file_metadata` injects `df_file_name` and `df_row_number` for per-row provenance. 8. How `ASSERT ROW_COUNT` and `ASSERT VALUE ... WHERE ...` build a self-verifying demo. ## Prerequisites A running DeltaForge catalog, permission to create zones and schemas, and the demo data file `customers.json` (200 records) in the directory passed as `{{data_path}}`. No Delta table, Iceberg catalog, or object-store credential is needed — the file is read directly from the local path.

Pitfalls

`max_depth: 1` is correct ONLY because the objects are flat. If the file later gains a nested `address` object, those fields will be silently dropped; bump `max_depth` or add the nested paths to `include_paths` before expecting them in SQL.
Auto-detected column names strip the `$.` prefix and lowercase the leaf, so `$.first` becomes `first` — a reserved-adjacent word in some dialects. Queries must quote or alias it if you layer views on top, and renaming via `column_mappings` (e.g. `first_name`) is strongly preferred for anything that will ship to BI.
`infer_types: true` types `created_at` as `TIMESTAMP` ONLY when every row parses. A single malformed or empty string in that column will demote the whole column back to `VARCHAR` and break `MIN`/`MAX` comparisons. Validate with `ASSERT VALUE earliest_signup >= '...'` as this demo does.
`id` is inferred as an integer when every value is numeric, but the ASSERT `WHERE id = '1'` uses a string literal; DeltaForge coerces the literal. If you write `WHERE id = 1` and the column ends up `VARCHAR` (e.g. one row has `"1a"`), the predicate flips to the string path and may miss rows — prefer explicit `CAST` in production views.
`default_array_handling: to_json` is set, but this file has no inner arrays, so it is a no-op here. Do not assume it will explode arrays — that is what `explode_paths` is for (see the music-catalog demo).
`file_metadata` columns (`df_file_name`, `df_row_number`) are injected by DeltaForge and are NOT fields in the source JSON; they disappear if you rebuild the table without that option, which can silently break downstream queries that rely on them.

JSON Customers — Basics

Syntax

Description

Pitfalls

See Also