PARALLEL CREATE TABLE AS

Creates a Delta table from a SELECT query by splitting the source on a numeric key column and running multiple writers concurrently.

Category: dmlDeltaForge extension

Syntax

PARALLEL CREATE TABLE <table> AS
  <select_query>
  SPLIT BY <column>
  [BATCH SIZE <n>]
  [THREADS <n>]

Description

## Overview PARALLEL CREATE TABLE AS executes a CTAS using a range-partitioning strategy on a numeric key column. The source is divided into contiguous value ranges, each range is materialized by an independent writer, and the writers run concurrently up to the configured thread cap. ## Behavior - The runtime queries `MIN(<split_by>)` and `MAX(<split_by>)` on the source query, then divides the value space into ranges using the supplied BATCH SIZE (or a default) as the target rows per range. - One INSERT subquery is generated per range, of the form `<source> WHERE <split_by> >= <lo> AND <split_by> <= <hi>`. A separate trailing batch handles rows where `<split_by>` is NULL. - Batches execute concurrently in chunks of THREADS (default 8). All batches must complete successfully for the CTAS to succeed. - The result is identical to a single-threaded CTAS over the same source query: every source row lands in exactly one range or in the NULL batch. - Empty source: the runtime returns 'source is empty: no rows inserted' with no commit. - All-NULL split column: only the NULL batch executes; range batches are skipped. ## Restrictions - SPLIT BY is mandatory. Omitting it raises a parse error; there is no auto-detection. - The split column must be numeric. Non-numeric columns fail at MIN/MAX cast time. - The target table must not already exist. To replace, DROP the table first or use the non-parallel CREATE OR REPLACE form. ## Result format The statement returns a concurrent-execution summary table with one row per batch. Columns: `index`, `status`, `sql`, `message`, `rows_affected`, `elapsed_ms`, `error`. A failed batch surfaces its error on its row; surviving batches continue. ## Access Control No specific privilege is required beyond what the source SELECT and target CREATE need. ## Compatibility DeltaForge extension.

Parameters

NameTypeDescription
tableSpecifies the target table name. Must not already exist.
as_querySpecifies the source SELECT query. Must include the `SPLIT BY` column in its result schema.
split_bySpecifies the column used to range-partition the source. Must be numeric (cast to Int64 internally) so contiguous value ranges can be computed.
batch_sizeApproximate rows per range batch. Affects how the value range is divided.
threadsMaximum concurrent writers. The runtime executes batches in chunks of this size.

Examples

-- Eight-way parallel CTAS over a numeric primary key
PARALLEL CREATE TABLE gold.summary AS
  SELECT * FROM silver.facts
  SPLIT BY fact_id
  THREADS 8;
-- Smaller batches to reduce per-writer memory pressure
PARALLEL CREATE TABLE gold.facts AS
  SELECT * FROM silver.facts
  SPLIT BY fact_id
  BATCH SIZE 50000
  THREADS 4;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →