Creates a Delta table from a SELECT query by splitting the source on a numeric key column and running multiple writers concurrently.
PARALLEL CREATE TABLE <table> AS
<select_query>
SPLIT BY <column>
[BATCH SIZE <n>]
[THREADS <n>]
## Overview PARALLEL CREATE TABLE AS executes a CTAS using a range-partitioning strategy on a numeric key column. The source is divided into contiguous value ranges, each range is materialized by an independent writer, and the writers run concurrently up to the configured thread cap. ## Behavior - The runtime queries `MIN(<split_by>)` and `MAX(<split_by>)` on the source query, then divides the value space into ranges using the supplied BATCH SIZE (or a default) as the target rows per range. - One INSERT subquery is generated per range, of the form `<source> WHERE <split_by> >= <lo> AND <split_by> <= <hi>`. A separate trailing batch handles rows where `<split_by>` is NULL. - Batches execute concurrently in chunks of THREADS (default 8). All batches must complete successfully for the CTAS to succeed. - The result is identical to a single-threaded CTAS over the same source query: every source row lands in exactly one range or in the NULL batch. - Empty source: the runtime returns 'source is empty: no rows inserted' with no commit. - All-NULL split column: only the NULL batch executes; range batches are skipped. ## Restrictions - SPLIT BY is mandatory. Omitting it raises a parse error; there is no auto-detection. - The split column must be numeric. Non-numeric columns fail at MIN/MAX cast time. - The target table must not already exist. To replace, DROP the table first or use the non-parallel CREATE OR REPLACE form. ## Result format The statement returns a concurrent-execution summary table with one row per batch. Columns: `index`, `status`, `sql`, `message`, `rows_affected`, `elapsed_ms`, `error`. A failed batch surfaces its error on its row; surviving batches continue. ## Access Control No specific privilege is required beyond what the source SELECT and target CREATE need. ## Compatibility DeltaForge extension.
| Name | Type | Description |
|---|---|---|
table | Specifies the target table name. Must not already exist. | |
as_query | Specifies the source SELECT query. Must include the `SPLIT BY` column in its result schema. | |
split_by | Specifies the column used to range-partition the source. Must be numeric (cast to Int64 internally) so contiguous value ranges can be computed. | |
batch_size | Approximate rows per range batch. Affects how the value range is divided. | |
threads | Maximum concurrent writers. The runtime executes batches in chunks of this size. |
-- Eight-way parallel CTAS over a numeric primary key
PARALLEL CREATE TABLE gold.summary AS
SELECT * FROM silver.facts
SPLIT BY fact_id
THREADS 8;
-- Smaller batches to reduce per-writer memory pressure
PARALLEL CREATE TABLE gold.facts AS
SELECT * FROM silver.facts
SPLIT BY fact_id
BATCH SIZE 50000
THREADS 4;