CONVERT TO DELTA

Converts an existing Parquet or Iceberg directory to Delta Lake format in place by writing a fresh _delta_log alongside the existing files. No data is rewritten.

Category: maintenanceDeltaForge extension

Syntax

CONVERT TO DELTA '<path>' [PARTITIONED BY (<columns>)] [NO STATISTICS]

Description

## Overview CONVERT TO DELTA transforms an existing Parquet (or Iceberg) directory into a Delta Lake table by creating a Delta transaction log alongside the existing data files. The original Parquet files are preserved in place; no data is rewritten or moved. ## Path resolution Absolute filesystem paths and cloud URLs are honored verbatim. CONVERT TO DELTA does **not** clamp absolute paths to a zone storage_root: it operates on the data the user actually points at. Only a relative path is resolved against the (best-effort) zone root, so `CONVERT TO DELTA 'foo'` lands in the same place a prior `CREATE DELTA TABLE LOCATION 'foo'` did. This means you can safely convert a directory that is outside any catalog zone (for example, a directory written by an external tool such as DuckDB or PySpark): the directory is wrapped in place and can subsequently be opened via `OPEN DELTA TABLE '<path>' AS <alias>`. ## How it works 1. **File discovery**: Scans the specified directory for Parquet files. For partitioned tables, Hive-style partition directories (`col=value`) are traversed recursively. 2. **Schema inference**: Reads Parquet file footers to infer the table schema. All files in the directory must share a compatible schema. 3. **Transaction log creation**: Creates the `_delta_log/` directory and writes an initial commit (version 0) containing: - A Metadata action with the inferred schema, partition columns, and table properties. - An AddFile action for each discovered Parquet file, including file size and optional column statistics. 4. **Statistics collection** (unless NO STATISTICS is specified): Reads Parquet file footers to extract min/max column statistics and row counts. These statistics enable data skipping during queries. ## Partition schema For Hive-style partitioned directories, the `PARTITIONED BY` clause declares the partition column names and types. The converter uses this schema to parse partition values from directory names. The partition columns are excluded from the data file schema and stored as partition metadata in the Delta log. ## Iceberg conversion When the source directory contains Iceberg metadata, the converter reads the Iceberg file manifest, schema, and partitioning information. The Iceberg data files are referenced in the new Delta transaction log without modification. ## Result set Returns a result set with four rows: | metric | value | |--------|-------| | files_added | Number of Parquet files registered in the Delta log | | bytes_converted | Total bytes of data files | | version | Initial table version (0) | | statistics_collected | 1 if statistics were collected, 0 otherwise | ## Access control | Privilege | Object | Notes | |-----------|--------|-------| | write | Target directory | Required to create the _delta_log directory and write metadata. | | read | Source files | Required to read Parquet footers for schema inference. | ## Compatibility CONVERT TO DELTA is a DeltaForge extension implementing the Delta Lake in-place conversion protocol.

Parameters

NameTypeDescription
pathStorage path of the directory to convert. Absolute filesystem paths (`B:/data/...`, `/data/...`, `file:///data/...`) and cloud URLs (`s3://bucket/...`, `abfss://container@acct.dfs.core.windows.net/...`, `gs://bucket/...`) are honored verbatim. A bare relative path (e.g. `'my_parquet_dir'`) is resolved against the current zone's storage_root the same way CREATE DELTA TABLE LOCATION does.
partitioned_byPartition column names and types in DDL form (e.g. `PARTITIONED BY (year INT, month INT)`). Required when the source directory is laid out in Hive-style `col=value/...` subdirectories; the converter parses partition values from the directory names using this declaration.

Examples

-- Convert an unpartitioned Parquet directory at an absolute path
CONVERT TO DELTA '/data/warehouse/customers';
-- Convert a partitioned Parquet table with explicit partition schema
CONVERT TO DELTA '/data/warehouse/events'
  PARTITIONED BY (year INT, month INT);
-- Convert from a cloud URL (S3)
CONVERT TO DELTA 's3://my-bucket/lake/transactions';
-- Convert from ADLS Gen2
CONVERT TO DELTA 'abfss://gold@storageacct.dfs.core.windows.net/marts/orders';
-- Convert without collecting statistics for faster conversion
CONVERT TO DELTA '/data/warehouse/logs'
  PARTITIONED BY (date STRING)
  NO STATISTICS;
-- Convert a zone-relative directory (resolves under the active zone's storage_root)
CONVERT TO DELTA 'incoming/orders';

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →