GET INCREMENTAL FILTER

Queries a target Delta table for maximum column values and generates a WHERE clause for incremental data loading.

Category: incrementalDeltaForge extension

Syntax

GET INCREMENTAL FILTER FROM <table>
  [INCREMENTAL COLUMNS (<columns>)]
  [DATE COLUMN <col>]
  [OVERLAP <n> DAYS]
  [WHERE <condition>]
  [FOR DIALECT MSSQL|MYSQL|POSTGRES]

Description

## Overview GET INCREMENTAL FILTER implements watermark-based incremental loading. It scans the target Delta table to find the current high-water marks (maximum values) of the specified columns, then generates a SQL WHERE clause that can be applied to the source system to extract only rows newer than those marks. This approach avoids full-table scans on the source and is the recommended pattern for loading data from OLTP databases, event streams, and other append-oriented sources into Delta tables. ## How It Works 1. The executor opens the Delta table snapshot and resolves the specified columns against the table schema (case-insensitive match). 2. It builds and runs an internal query: `SELECT MAX(col1), MAX(col2), ... FROM <table>`. 3. If the target table is empty (all MAX values are NULL), the result sets `is_empty = true` and produces no filter, signaling that a full initial load is required. 4. For each key column, a filter clause of the form `col > max_value` is generated. Integer and float values are unquoted; strings are single-quoted with escaping. 5. For the date column, the OVERLAP days are subtracted from the maximum value before generating the filter. This creates a lookback window that re-processes recent rows. 6. The output includes separate `key_filter`, `date_filter`, and `combined_filter` columns, plus a JSON `max_values` object containing the raw high-water marks. ## Parameter Merging Every optional parameter follows a merge strategy: if the parameter is provided inline, that value takes precedence. If omitted, the value stored by SET INCREMENTAL CONFIG (persisted as a `delta.forge.incremental.*` table property) is used. If neither exists, the parameter is treated as absent. This allows a one-time configuration via SET INCREMENTAL CONFIG with per-run overrides when needed. ## Result Columns | Column | Type | Description | |--------|------|-------------| | key_filter | VARCHAR | WHERE clause fragment for key-based columns (e.g., `order_id > 1042`). NULL if no key columns are configured. | | date_filter | VARCHAR | WHERE clause fragment for the date column with overlap applied (e.g., `modified_date > '2025-12-28'`). NULL if no date column is configured. | | combined_filter | VARCHAR | key_filter AND date_filter AND static filter, joined with AND. NULL if the target table is empty. | | is_empty | BOOLEAN | TRUE when the target table contains no data, indicating a full initial load is needed. | | max_values | VARCHAR | JSON object mapping each column name to its maximum value as a string. | ## Dialect-Specific Formatting The FOR DIALECT clause controls how identifiers and timestamp literals are formatted in the output filter: - **MSSQL / SQLSERVER / SYNAPSE**: Lowercase unquoted identifiers. Timestamps use ISO 8601 format (`'2025-12-28T14:30:00.000'`). - **MYSQL / MARIADB**: Lowercase unquoted identifiers. Timestamps use microsecond precision (`'2025-12-28 14:30:00.000000'`). - **POSTGRES / POSTGRESQL**: Lowercase unquoted identifiers. Timestamps use `TIMESTAMP '...'` literal prefix. - **ORACLE**: Timestamps use `TO_TIMESTAMP('...', 'YYYY-MM-DD HH24:MI:SS.FF6')` syntax. - **Default (ANSI)**: Double-quoted identifiers (`"OrderId" > 1042`). Timestamps use millisecond precision. ## Incremental Loading Patterns This command supports two complementary incremental strategies that can be combined: - **Key-based**: Uses monotonically increasing identity or sequence columns. Produces exact boundaries with no overlap needed. Best for tables with reliable surrogate keys. - **Date-based**: Uses a modification timestamp column with an overlap buffer. Handles late-arriving data and clock skew. Best for event tables and change-tracked source tables. Combining both strategies (key columns plus a date column) produces the tightest filter, minimizing data transfer while maintaining correctness.

Parameters

Name	Type	Description
`table`		Specifies the target Delta table to scan for maximum values. The table must be registered in the current session (via CREATE DELTA TABLE or OPEN DELTA TABLE). Use a fully qualified name (catalog.schema.table) when multiple schemas are in scope.
`incremental_columns`		Key-based incremental columns.
`date_column`		Date/timestamp column for time-based loading.
`overlap_days`		Days to subtract from max date (safety buffer).
`filter`		Static filter condition.
`dialect`		SQL dialect for output: MSSQL, MYSQL, POSTGRES.

Examples

-- Basic key-based incremental filter
GET INCREMENTAL FILTER FROM staging.orders
    COLUMNS (order_id);

-- Combined key and date filter with overlap buffer
GET INCREMENTAL FILTER FROM warehouse.staging.orders
    COLUMNS (order_id, batch_id)
    DATECOL modified_date
    OVERLAP 3 DAYS
    FOR DIALECT MSSQL;

-- Date-only filter for PostgreSQL with a static partition predicate
GET INCREMENTAL FILTER FROM analytics.events
    DATECOL event_timestamp
    OVERLAP 1 DAYS
    FILTER region = 'us-east'
    FOR DIALECT POSTGRES;

-- Use stored configuration (no inline overrides)
GET INCREMENTAL FILTER FROM warehouse.orders;

-- Use the generated filter in a pipeline
INSERT INTO warehouse.orders
SELECT *
FROM read_mssql('SELECT * FROM dbo.Orders WHERE ' ||
    (GET INCREMENTAL FILTER FROM warehouse.orders
         COLUMNS (order_id)
         FOR DIALECT MSSQL).combined_filter
);

Pitfalls

If the target table is empty, combined_filter is NULL. Always check the is_empty flag before using the filter in a source query. An empty target requires a full initial load with no WHERE clause.
The OVERLAP buffer re-processes rows in the lookback window on every run. The downstream MERGE or INSERT logic must handle duplicates (e.g., use MERGE ON matching keys).
Column name resolution is case-insensitive against the Delta table schema, but the output filter uses the original casing provided in the COLUMNS clause. Ensure the casing matches the source system's column names.
The table must be registered in the current session. If you receive a 'Table not registered' error, run OPEN DELTA TABLE or CREATE DELTA TABLE first.
When no parameters are provided inline and no stored configuration exists, the command returns NULL filters. Always run SET INCREMENTAL CONFIG first or provide explicit parameters.