INCREMENTAL VACUUM

Performs resumable, batch-based vacuum for very large tables.

Category: maintenanceDeltaForge extension

Syntax

INCREMENTAL VACUUM <table> [RETAIN <n> HOURS] [BATCH SIZE <m>] [MAX BATCHES <k>] [DRY RUN] [STATE FILE '<path>']

Description

## Overview Performs a resumable, batch-based vacuum operation designed for very large Delta tables where a standard VACUUM would take too long or consume too much memory. INCREMENTAL VACUUM breaks the file scanning and deletion process into configurable batches, allowing the operation to be paused and resumed across multiple invocations. This command is the preferred approach for tables with millions of files or tables stored on high-latency cloud storage where a full vacuum would exceed operational time windows. ## Behavior - Files are evaluated in batches of BATCH SIZE. Each batch lists files from storage, checks them against the Delta log's active file set, and deletes unreferenced files older than the retention period. - Progress is tracked internally. When MAX BATCHES is reached, the operation stops and records which storage prefixes have been processed. - Subsequent invocations resume from where the previous run stopped, skipping already-processed prefixes. - Once all prefixes have been processed, the tracking state is cleared, and the next invocation starts a fresh scan. - The retention period semantics are identical to standard VACUUM: files must be both unreferenced in the current Delta log and older than the retention threshold. - DRY RUN mode works with batch processing, allowing incremental previewing of large file sets. - Progress tracking uses an internal mechanism within the table's _delta_log directory. ## Access Control No specific privilege required beyond table-level access. ## Compatibility INCREMENTAL VACUUM is a DeltaForge extension designed for operational management of very large Delta tables.

Parameters

NameTypeDescription
tableSpecifies the Delta table to vacuum. Accepts a table name or storage path.
retention_hoursSpecifies the retention period in hours. Files older than this threshold and no longer referenced by the Delta log are eligible for deletion. If omitted, the table's configured retention period is used (default: 168 hours).
batch_sizeSpecifies the number of files to evaluate and potentially delete per batch. Controls memory usage and processing time per iteration.
max_batchesSpecifies the maximum number of batches to process in this invocation. When reached, the operation stops and records its progress in the state file for resumption. If omitted, all batches are processed.
dry_runEnables preview mode. Lists files that would be deleted without actually removing them. Compatible with BATCH SIZE and MAX BATCHES for incremental previewing.
state_filePath for resumable vacuum state.

Examples

-- Run incremental vacuum with batch processing
INCREMENTAL VACUUM large_table RETAIN 168 HOURS BATCH SIZE 10000;
-- Process a limited number of batches per run
INCREMENTAL VACUUM large_table BATCH SIZE 5000 MAX BATCHES 10;
-- Preview what would be deleted in batches
INCREMENTAL VACUUM large_table BATCH SIZE 10000 DRY RUN;
-- Resume a previously interrupted incremental vacuum
INCREMENTAL VACUUM large_table BATCH SIZE 10000 MAX BATCHES 5;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →