CREATE BLOOM FILTER INDEX

Creates bloom filter indexes on specified columns for efficient file-level data skipping.

Category: indexingDeltaForge extension

Syntax

CREATE BLOOM FILTER INDEX ON TABLE <table> FOR COLUMNS (<columns>) [OPTIONS (<key>=<val>, ...)]

Description

## Overview Creates bloom filter indexes on specified columns of a Delta table. Bloom filters are probabilistic data structures that enable efficient file-level data skipping during query execution. When a query contains an equality predicate on a bloom-filtered column, the engine can skip files that definitely do not contain the target value, reducing I/O significantly. ## Behavior - Bloom filter indexes are stored as additional metadata alongside the Delta table's data files. - The indexes are built during subsequent OPTIMIZE or write operations. Creating the index definition does not immediately scan existing data. - Bloom filters are most effective for equality predicates (WHERE col = value). They do not benefit range queries or LIKE patterns. - The false positive probability (fpp) controls the trade-off between filter size and accuracy. Lower values produce larger filters that skip more files accurately. - The num_items option helps the engine size the filter appropriately. When omitted, a default estimate is used. - Multiple columns can be indexed in a single statement. Each column gets an independent bloom filter. - Bloom filter indexes are consulted during query planning alongside min/max statistics and partition pruning. ## Access Control No specific privilege required beyond table-level access. ## Compatibility CREATE BLOOM FILTER INDEX is a DeltaForge extension for adding probabilistic indexes to Delta tables.

Parameters

NameTypeDescription
tableSpecifies the Delta table on which to create bloom filter indexes. The table must already exist.
columnsSpecifies the columns to index with bloom filters. Multiple columns can be listed, separated by commas. Best suited for high-cardinality columns used in equality predicates.
optionsSpecifies bloom filter configuration options. Supported keys: fpp (false positive probability, default 0.1), num_items (expected number of distinct items per file). Lower fpp values produce larger but more accurate filters.

Examples

-- Create bloom filter indexes on two columns
CREATE BLOOM FILTER INDEX ON TABLE orders FOR COLUMNS (order_id, txn_id);
-- Create a bloom filter with custom false positive probability
CREATE BLOOM FILTER INDEX ON TABLE customers FOR COLUMNS (customer_id) OPTIONS (fpp = 0.01);
-- Create bloom filters with expected item count hint
CREATE BLOOM FILTER INDEX ON TABLE events FOR COLUMNS (session_id, event_type) OPTIONS (fpp = 0.05, num_items = 100000);
-- Index a single high-cardinality column
CREATE BLOOM FILTER INDEX ON TABLE transactions FOR COLUMNS (reference_number);

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →