Queries a target Delta table for maximum column values and generates a WHERE clause for incremental data loading.
GET INCREMENTAL FILTER FROM <table>
[INCREMENTAL COLUMNS (<columns>)]
[DATE COLUMN <col>]
[OVERLAP <n> DAYS]
[WHERE <condition>]
[FOR DIALECT MSSQL|MYSQL|POSTGRES]
## Overview GET INCREMENTAL FILTER implements watermark-based incremental loading. It scans the target Delta table to find the current high-water marks (maximum values) of the specified columns, then generates a SQL WHERE clause that can be applied to the source system to extract only rows newer than those marks. This approach avoids full-table scans on the source and is the recommended pattern for loading data from OLTP databases, event streams, and other append-oriented sources into Delta tables. ## How It Works 1. The executor opens the Delta table snapshot and resolves the specified columns against the table schema (case-insensitive match). 2. It builds and runs an internal query: `SELECT MAX(col1), MAX(col2), ... FROM <table>`. 3. If the target table is empty (all MAX values are NULL), the result sets `is_empty = true` and produces no filter, signaling that a full initial load is required. 4. For each key column, a filter clause of the form `col > max_value` is generated. Integer and float values are unquoted; strings are single-quoted with escaping. 5. For the date column, the OVERLAP days are subtracted from the maximum value before generating the filter. This creates a lookback window that re-processes recent rows. 6. The output includes separate `key_filter`, `date_filter`, and `combined_filter` columns, plus a JSON `max_values` object containing the raw high-water marks. ## Parameter Merging Every optional parameter follows a merge strategy: if the parameter is provided inline, that value takes precedence. If omitted, the value stored by SET INCREMENTAL CONFIG (persisted as a `delta.forge.incremental.*` table property) is used. If neither exists, the parameter is treated as absent. This allows a one-time configuration via SET INCREMENTAL CONFIG with per-run overrides when needed. ## Result Columns | Column | Type | Description | |--------|------|-------------| | key_filter | VARCHAR | WHERE clause fragment for key-based columns (e.g., `order_id > 1042`). NULL if no key columns are configured. | | date_filter | VARCHAR | WHERE clause fragment for the date column with overlap applied (e.g., `modified_date > '2025-12-28'`). NULL if no date column is configured. | | combined_filter | VARCHAR | key_filter AND date_filter AND static filter, joined with AND. NULL if the target table is empty. | | is_empty | BOOLEAN | TRUE when the target table contains no data, indicating a full initial load is needed. | | max_values | VARCHAR | JSON object mapping each column name to its maximum value as a string. | ## Dialect-Specific Formatting The FOR DIALECT clause controls how identifiers and timestamp literals are formatted in the output filter: - **MSSQL / SQLSERVER / SYNAPSE**: Lowercase unquoted identifiers. Timestamps use ISO 8601 format (`'2025-12-28T14:30:00.000'`). - **MYSQL / MARIADB**: Lowercase unquoted identifiers. Timestamps use microsecond precision (`'2025-12-28 14:30:00.000000'`). - **POSTGRES / POSTGRESQL**: Lowercase unquoted identifiers. Timestamps use `TIMESTAMP '...'` literal prefix. - **ORACLE**: Timestamps use `TO_TIMESTAMP('...', 'YYYY-MM-DD HH24:MI:SS.FF6')` syntax. - **Default (ANSI)**: Double-quoted identifiers (`"OrderId" > 1042`). Timestamps use millisecond precision. ## Incremental Loading Patterns This command supports two complementary incremental strategies that can be combined: - **Key-based**: Uses monotonically increasing identity or sequence columns. Produces exact boundaries with no overlap needed. Best for tables with reliable surrogate keys. - **Date-based**: Uses a modification timestamp column with an overlap buffer. Handles late-arriving data and clock skew. Best for event tables and change-tracked source tables. Combining both strategies (key columns plus a date column) produces the tightest filter, minimizing data transfer while maintaining correctness.
| Name | Type | Description |
|---|---|---|
table | Specifies the target Delta table to scan for maximum values. The table must be registered in the current session (via CREATE DELTA TABLE or OPEN DELTA TABLE). Use a fully qualified name (catalog.schema.table) when multiple schemas are in scope. | |
incremental_columns | Key-based incremental columns. | |
date_column | Date/timestamp column for time-based loading. | |
overlap_days | Days to subtract from max date (safety buffer). | |
filter | Static filter condition. | |
dialect | SQL dialect for output: MSSQL, MYSQL, POSTGRES. |
-- Basic key-based incremental filter
GET INCREMENTAL FILTER FROM staging.orders
COLUMNS (order_id);
-- Combined key and date filter with overlap buffer
GET INCREMENTAL FILTER FROM warehouse.staging.orders
COLUMNS (order_id, batch_id)
DATECOL modified_date
OVERLAP 3 DAYS
FOR DIALECT MSSQL;
-- Date-only filter for PostgreSQL with a static partition predicate
GET INCREMENTAL FILTER FROM analytics.events
DATECOL event_timestamp
OVERLAP 1 DAYS
FILTER region = 'us-east'
FOR DIALECT POSTGRES;
-- Use stored configuration (no inline overrides)
GET INCREMENTAL FILTER FROM warehouse.orders;
-- Use the generated filter in a pipeline
INSERT INTO warehouse.orders
SELECT *
FROM read_mssql('SELECT * FROM dbo.Orders WHERE ' ||
(GET INCREMENTAL FILTER FROM warehouse.orders
COLUMNS (order_id)
FOR DIALECT MSSQL).combined_filter
);