CREATE METASTORE

Mirrors a Delta table's `_delta_log/` action stream and full Parquet footer state into the Postgres-backed metastore so the query planner can serve `statistics()` and `scan()` from indexed SQL.

Category: metastorePrivilege: READ on the tableDeltaForge extension

Syntax

CREATE METASTORE [IF NOT EXISTS] '<zone.schema.table>'
  [WITH OPTIONS ('<key>' = '<value>', ...)]

Description

## Overview Mirrors a Delta table into the metastore (`df_metastore` Postgres schema) and runs an initial full ingest. After this command completes, every `_delta_log/` action and every Parquet footer field the planner uses for `statistics()` and `scan()` is mirrored into indexed SQL. The metastore stores only the 3-part name; the control plane is the authoritative `qname -> storage_path` map and is consulted at create time. No property is written into `_delta_log/` and the Delta table itself is never modified. ## When to use Use CREATE METASTORE when **any** of the following is true and the table is read-heavy enough to make the cost of metadata I/O matter: - The table lives on **cloud object storage** (S3 / ADLS / GCS). Per-footer fetch latency dominates planning on cold queries; the metastore turns that into one indexed SQL call. - The table has **thousands of active files** (or thousands of row groups). At that scale, opening Parquet footers per query is the bottleneck even on fast storage. - The table is queried by **BI tools** (Power BI, Tableau, Excel via ODBC). BI workloads issue many small planning queries per dashboard; the metastore makes each plan a single-digit-millisecond Postgres lookup. - The deployment runs **multiple compute nodes**. All nodes share one metastore over Arrow Flight, so cold-start planning latency disappears for every worker, not just the one whose in-memory snapshot is warm. Do **not** use CREATE METASTORE when: - The table has only a handful of files on local NVMe (footer reads cost microseconds; the metastore adds operational surface for no planning win). - The table is write-only or never read by an interactive workload (the metastore's value is on the read path). - The table is on a deployment where the metastore pool hasn't been wired into the UnifiedExecutor yet (the command will return a configuration error). ## Behavior - Inserts a row into `df_metastore.table_registry` keyed on `table_qname`. - Creates per-table LIST partitions in every partitioned metastore table (`parquet_file`, `row_group`, `column_chunk`, the typed `column_stat_*` tables, deletion vectors, etc.). - Walks the table's `_delta_log/` from version 0 to current log head, projecting every action type into its destination table. - Opens every Parquet data file's footer in bounded morsel-style parallelism (default 32 concurrent) and mirrors the full footer state into `row_group` / `column_chunk` / the typed stat tables / page indexes. - Stamps `cache_state.cached_through_version` so the planner knows which Delta version the metastore authoritatively serves. - Idempotent: re-running CREATE METASTORE on a table already mirrored is a full ingest. Use `IF NOT EXISTS` to no-op, or `ALTER METASTORE ... REFRESH` for incremental catch-up. ## Read path enabled by this command Once mirrored, the planner can fetch the table's scan plan over Arrow Flight from the control plane in one round trip per query (file list, row-group offsets, column-chunk byte ranges, deletion vectors, stats) instead of replaying `_delta_log/` and opening each Parquet footer on object storage. Typical planning latency on a mirrored table drops from seconds to milliseconds. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | READ | Table | The caller must be able to read the table (storage path is resolved via the control plane's permission-aware lookup). | ## Compatibility CREATE METASTORE is a DeltaForge extension. It has no effect on the Delta protocol; external Delta engines (Spark, Trino, DuckDB) read the table unchanged.

Parameters

NameTypeDescription
qualified_nameCanonical 3-part name (`zone.schema.table`) of the Delta table to mirror. The table must already be a fully-registered Delta table in the control plane; the metastore resolves its storage path from there.
IF NOT EXISTSWhen present, silently succeed if the table is already registered in the metastore instead of returning `TableAlreadyRegistered`. Use in idempotent setup scripts.
concurrencyWITH OPTIONS key. Controls the number of Parquet footers ingested in parallel during the initial full ingest. Higher values speed up registration of tables with thousands of files at the cost of more concurrent reads against object storage.

Examples

-- Mirror a Delta table into the metastore (initial full ingest, default concurrency).
CREATE METASTORE 'analytics.sales.orders';
-- Idempotent variant for setup scripts.
CREATE METASTORE IF NOT EXISTS 'analytics.sales.orders';
-- Higher concurrency for a table with many files.
CREATE METASTORE 'analytics.sales.orders'
  WITH OPTIONS ('concurrency' = '128');

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →