Creates bloom filter indexes on specified columns for efficient file-level data skipping.
CREATE BLOOM FILTER INDEX ON TABLE <table> FOR COLUMNS (<columns>) [OPTIONS (<key>=<val>, ...)]
## Overview Creates bloom filter indexes on specified columns of a Delta table. Bloom filters are probabilistic data structures that enable efficient file-level data skipping during query execution. When a query contains an equality predicate on a bloom-filtered column, the engine can skip files that definitely do not contain the target value, reducing I/O significantly. ## Behavior - Bloom filter indexes are stored as additional metadata alongside the Delta table's data files. - The indexes are built during subsequent OPTIMIZE or write operations. Creating the index definition does not immediately scan existing data. - Bloom filters are most effective for equality predicates (WHERE col = value). They do not benefit range queries or LIKE patterns. - The false positive probability (fpp) controls the trade-off between filter size and accuracy. Lower values produce larger filters that skip more files accurately. - The num_items option helps the engine size the filter appropriately. When omitted, a default estimate is used. - Multiple columns can be indexed in a single statement. Each column gets an independent bloom filter. - Bloom filter indexes are consulted during query planning alongside min/max statistics and partition pruning. ## Access Control No specific privilege required beyond table-level access. ## Compatibility CREATE BLOOM FILTER INDEX is a DeltaForge extension for adding probabilistic indexes to Delta tables.
| Name | Type | Description |
|---|---|---|
table | Specifies the Delta table on which to create bloom filter indexes. The table must already exist. | |
columns | Specifies the columns to index with bloom filters. Multiple columns can be listed, separated by commas. Best suited for high-cardinality columns used in equality predicates. | |
options | Specifies bloom filter configuration options. Supported keys: fpp (false positive probability, default 0.1), num_items (expected number of distinct items per file). Lower fpp values produce larger but more accurate filters. |
-- Create bloom filter indexes on two columns
CREATE BLOOM FILTER INDEX ON TABLE orders FOR COLUMNS (order_id, txn_id);
-- Create a bloom filter with custom false positive probability
CREATE BLOOM FILTER INDEX ON TABLE customers FOR COLUMNS (customer_id) OPTIONS (fpp = 0.01);
-- Create bloom filters with expected item count hint
CREATE BLOOM FILTER INDEX ON TABLE events FOR COLUMNS (session_id, event_type) OPTIONS (fpp = 0.05, num_items = 100000);
-- Index a single high-cardinality column
CREATE BLOOM FILTER INDEX ON TABLE transactions FOR COLUMNS (reference_number);