Loads data from external files into a Delta table, with format inference, glob filtering, and idempotent file tracking.
COPY INTO <table> FROM '<location>'
[FILE_FORMAT = <format>]
[FILES = ('<f1>', '<f2>', ...)]
[PATTERN = '<glob>']
[COPY_OPTIONS (<key>=<val>, ...)]
[FORMAT_OPTIONS (<key>=<val>, ...)]
[FORCE]
## Overview COPY INTO loads data from external file sources into an existing Delta table. It reads files from the specified location, parses them according to the declared (or inferred) format, and appends the resulting rows as a new Delta transaction. If the target table does not yet contain data, the schema is inferred from the source files and the table is initialized automatically. The command returns a result set with execution metrics: rows_copied, files_processed, files_added, bytes_read, bytes_written, and target_version. ## Format Inference When FILE_FORMAT is omitted, the engine inspects the file extensions at the source location: | Extension | Inferred Format | |-----------|----------------| | .csv | CSV | | .json | JSON | | .parquet | PARQUET | If no extension is recognizable, CSV is assumed. ## Format Options (CSV) | Option | Type | Default | Description | |-------------|---------|---------|-------------| | header | boolean | true | First row contains column names | | has_header | boolean | true | Alias for header | | delimiter | char | , | Field separator character | | quote | char | " | Quote character for fields containing delimiters | | escape | char | none | Escape character within quoted fields | ## Format Options (JSON) | Option | Type | Default | Description | |-----------------|--------|---------|-------------| | flatten_config | JSON | none | Controls flattening of nested JSON documents | ## Idempotent File Tracking By default, COPY INTO tracks which source files have already been loaded into the target table. On subsequent executions with the same location, only new (unprocessed) files are loaded. This makes COPY INTO safe to run on a schedule or in a retry loop without producing duplicates. To override this behavior and reload all files, specify FORCE = TRUE. ## Cloud Storage Paths The location parameter accepts cloud URIs with credentials resolved from the session's permission token: | Cloud Provider | URI Scheme | |---------------------|---------------------------------------------------------| | Amazon S3 | s3://bucket/prefix/ | | Azure Blob / ADLS | abfss://container@account.dfs.core.windows.net/path/ | | Google Cloud Storage | gs://bucket/prefix/ | | Local filesystem | file:///absolute/path/ or /absolute/path/ | ## Write Behavior The write operation uses CreateOrAppend mode. If the target Delta table is empty or does not yet have a commit log, the first COPY INTO creates the table with the inferred schema. Subsequent executions append new data files to the existing table. Each COPY INTO produces exactly one new Delta version. ## Result Columns | Column | Type | Description | |------------------|--------|-------------| | metric | STRING | Metric name | | value | BIGINT | Metric value | Metric rows: rows_copied, files_processed, files_added, bytes_read, bytes_written, target_version.
| Name | Type | Description |
|---|---|---|
table | Specifies the target Delta table to load data into. The table must already be registered in the session (via CREATE DELTA TABLE or OPEN DELTA TABLE). Accepts unqualified names (e.g., orders) or fully qualified three-part names (e.g., bronze.raw.orders). | |
location | Specifies the source path or URI containing the files to load. Supports local file paths (file:///data/incoming/), Amazon S3 (s3://bucket/prefix/), Azure Blob Storage (abfss://container@account.dfs.core.windows.net/path/), Google Cloud Storage (gs://bucket/prefix/), and tilde-expanded home directory paths (~/data/). When FILES or PATTERN is not provided, all files at the location are considered. | |
file_format | Specifies the format of the source files. Accepted values: CSV, JSON, PARQUET, AVRO, ORC. If omitted, the engine infers the format from the file extension at the source location (e.g., .csv, .json, .parquet). When the extension is ambiguous or absent, CSV is assumed. The format may also be specified inside a FILE_FORMAT parenthesized block using TYPE = <format>. | |
files | Specifies an explicit list of file names to load from the source location. Each file name is a single-quoted string. When provided, only the listed files are processed and PATTERN is ignored. Example: FILES = ('2024-01-01.csv', '2024-01-02.csv'). | |
pattern | Specifies a glob pattern appended to the source location to filter which files to load. Example: PATTERN = '*.csv' loads only files ending in .csv. The pattern is combined with the location as <location>/<pattern>. | |
force | When set to TRUE, forces the engine to reload all matching files regardless of whether they have been loaded in a previous COPY INTO execution. By default (FORCE omitted or false), the engine tracks which files have already been processed and skips them, making repeated COPY INTO invocations idempotent. |
-- Load CSV files from a local directory into an existing Delta table
COPY INTO orders
FROM '/data/incoming/orders/'
FILE_FORMAT = CSV
FORMAT_OPTIONS (header = 'true', delimiter = '|');
-- Load JSON files from S3 with an explicit file list
COPY INTO events
FROM 's3://data-lake/events/2024/'
FILE_FORMAT = JSON
FILES = ('jan.json', 'feb.json', 'mar.json');
-- Load Parquet files matching a glob pattern from Azure
COPY INTO transactions
FROM 'abfss://raw@storageaccount.dfs.core.windows.net/txn/'
FILE_FORMAT = PARQUET
PATTERN = '*/part-*.parquet';
-- Re-load all files (bypass idempotent tracking)
COPY INTO orders
FROM 's3://data-lake/orders/'
FILE_FORMAT = CSV
FORMAT_OPTIONS (header = 'true')
FORCE = TRUE;
-- Infer format from file extension (no FILE_FORMAT clause)
COPY INTO sensor_readings
FROM 'gs://iot-bucket/sensors/2024/*.json';
-- Use parenthesized FILE_FORMAT block with TYPE
COPY INTO logs
FROM '/data/logs/'
FILE_FORMAT = (TYPE = CSV, header = 'true', delimiter = '\t')
COPY_OPTIONS (purge = 'false');