Partitioning and Dynamic Partition Pruning

Partition a Delta fact table by region and watch the engine skip entire partition directories when a JOIN or subquery filter on the dimension side narrows the match set.

Category: delta

Syntax

-- SETUP
CREATE ZONE IF NOT EXISTS demo_zone TYPE EXTERNAL
    COMMENT 'External and Delta tables -- demo datasets';

CREATE SCHEMA IF NOT EXISTS demo_zone.delta_demos
    COMMENT 'Delta table management tutorial demos';

CREATE DELTA TABLE IF NOT EXISTS demo_zone.delta_demos.sales_facts (
    id          INT,
    product_id  INT,
    region      VARCHAR,
    quarter     VARCHAR,
    amount      DOUBLE,
    qty         INT,
    channel     VARCHAR,
    sale_date   VARCHAR
) LOCATION '~/delta-data/sales_facts'
PARTITIONED BY (region);

INSERT INTO demo_zone.delta_demos.sales_facts VALUES
    (1,  101, 'us-east', 'Q1-2024', 1200.00, 10, 'online',    '2024-01-05'),
    (2,  102, 'us-east', 'Q1-2024', 850.50,  5,  'retail',    '2024-01-12'),
    (3,  103, 'us-east', 'Q1-2024', 2300.00, 20, 'wholesale', '2024-01-19'),
    (4,  104, 'us-east', 'Q1-2024', 475.00,  3,  'online',    '2024-01-26'),
    (5,  105, 'us-east', 'Q2-2024', 320.00,  0,  'retail',    '2024-04-03'),
    (6,  106, 'us-east', 'Q2-2024', 1550.00, 12, 'wholesale', '2024-04-10'),
    (7,  107, 'us-east', 'Q2-2024', 690.00,  7,  'online',    '2024-04-17'),
    (8,  108, 'us-east', 'Q2-2024', 3100.00, 25, 'retail',    '2024-04-24'),
    (9,  109, 'us-east', 'Q3-2024', 410.00,  2,  'wholesale', '2024-07-05'),
    (10, 110, 'us-east', 'Q3-2024', 1875.00, 15, 'online',    '2024-07-12'),
    (11, 111, 'us-east', 'Q3-2024', 560.00,  4,  'retail',    '2024-07-19'),
    (12, 112, 'us-east', 'Q3-2024', 2200.00, 18, 'wholesale', '2024-07-26'),
    (13, 113, 'us-east', 'Q4-2024', 995.00,  8,  'online',    '2024-10-05'),
    (14, 114, 'us-east', 'Q4-2024', 1430.00, 11, 'retail',    '2024-10-12'),
    (15, 115, 'us-east', 'Q4-2024', 780.00,  6,  'wholesale', '2024-10-19'),
    (16, 201, 'us-west', 'Q1-2024', 1450.00, 12, 'online',    '2024-01-08'),
    (17, 202, 'us-west', 'Q1-2024', 920.00,  8,  'retail',    '2024-01-15'),
    (18, 203, 'us-west', 'Q1-2024', 3400.00, 30, 'wholesale', '2024-01-22'),
    (19, 204, 'us-west', 'Q1-2024', 610.00,  5,  'online',    '2024-01-29'),
    (20, 205, 'us-west', 'Q2-2024', 250.00,  0,  'retail',    '2024-04-06'),
    (21, 206, 'us-west', 'Q2-2024', 1780.00, 14, 'wholesale', '2024-04-13'),
    (22, 207, 'us-west', 'Q2-2024', 830.00,  6,  'online',    '2024-04-20'),
    (23, 208, 'us-west', 'Q2-2024', 2650.00, 22, 'retail',    '2024-04-27'),
    (24, 209, 'us-west', 'Q3-2024', 540.00,  4,  'wholesale', '2024-07-08'),
    (25, 210, 'us-west', 'Q3-2024', 1990.00, 16, 'online',    '2024-07-15'),
    (26, 211, 'us-west', 'Q3-2024', 720.00,  5,  'retail',    '2024-07-22'),
    (27, 212, 'us-west', 'Q3-2024', 2850.00, 24, 'wholesale', '2024-07-29'),
    (28, 213, 'us-west', 'Q4-2024', 1100.00, 9,  'online',    '2024-10-08'),
    (29, 214, 'us-west', 'Q4-2024', 1650.00, 13, 'retail',    '2024-10-15'),
    (30, 215, 'us-west', 'Q4-2024', 890.00,  7,  'wholesale', '2024-10-22'),
    (31, 301, 'eu-west', 'Q1-2024', 980.00,  8,  'online',    '2024-01-10'),
    (32, 302, 'eu-west', 'Q1-2024', 1340.00, 11, 'retail',    '2024-01-17'),
    (33, 303, 'eu-west', 'Q1-2024', 2750.00, 23, 'wholesale', '2024-01-24'),
    (34, 304, 'eu-west', 'Q1-2024', 415.00,  3,  'online',    '2024-01-31'),
    (35, 305, 'eu-west', 'Q2-2024', 190.00,  0,  'retail',    '2024-04-09'),
    (36, 306, 'eu-west', 'Q2-2024', 1620.00, 13, 'wholesale', '2024-04-16'),
    (37, 307, 'eu-west', 'Q2-2024', 750.00,  6,  'online',    '2024-04-23'),
    (38, 308, 'eu-west', 'Q2-2024', 2400.00, 19, 'retail',    '2024-04-30'),
    (39, 309, 'eu-west', 'Q3-2024', 380.00,  3,  'wholesale', '2024-07-10'),
    (40, 310, 'eu-west', 'Q3-2024', 1700.00, 14, 'online',    '2024-07-17'),
    (41, 311, 'eu-west', 'Q3-2024', 630.00,  5,  'retail',    '2024-07-24'),
    (42, 312, 'eu-west', 'Q3-2024', 2100.00, 17, 'wholesale', '2024-07-31'),
    (43, 313, 'eu-west', 'Q4-2024', 870.00,  7,  'online',    '2024-10-10'),
    (44, 314, 'eu-west', 'Q4-2024', 1520.00, 12, 'retail',    '2024-10-17'),
    (45, 315, 'eu-west', 'Q4-2024', 950.00,  8,  'wholesale', '2024-10-24'),
    (46, 401, 'ap-south', 'Q1-2024', 520.00,  4,  'online',    '2024-01-11'),
    (47, 402, 'ap-south', 'Q1-2024', 1180.00, 9,  'retail',    '2024-01-18'),
    (48, 403, 'ap-south', 'Q1-2024', 2900.00, 25, 'wholesale', '2024-01-25'),
    (49, 404, 'ap-south', 'Q1-2024', 350.00,  2,  'online',    '2024-02-01'),
    (50, 405, 'ap-south', 'Q2-2024', 280.00,  0,  'retail',    '2024-04-11'),
    (51, 406, 'ap-south', 'Q2-2024', 1400.00, 11, 'wholesale', '2024-04-18'),
    (52, 407, 'ap-south', 'Q2-2024', 660.00,  5,  'online',    '2024-04-25'),
    (53, 408, 'ap-south', 'Q2-2024', 2050.00, 17, 'retail',    '2024-05-02'),
    (54, 409, 'ap-south', 'Q3-2024', 430.00,  3,  'wholesale', '2024-07-11'),
    (55, 410, 'ap-south', 'Q3-2024', 1560.00, 0,  'online',    '2024-07-18'),
    (56, 411, 'ap-south', 'Q3-2024', 590.00,  4,  'retail',    '2024-07-25'),
    (57, 412, 'ap-south', 'Q3-2024', 1850.00, 15, 'wholesale', '2024-08-01'),
    (58, 413, 'ap-south', 'Q4-2024', 740.00,  6,  'online',    '2024-10-11'),
    (59, 414, 'ap-south', 'Q4-2024', 1290.00, 10, 'retail',    '2024-10-18'),
    (60, 415, 'ap-south', 'Q4-2024', 810.00,  7,  'wholesale', '2024-10-25');

CREATE DELTA TABLE IF NOT EXISTS demo_zone.delta_demos.region_targets (
    region        VARCHAR,
    target_amount DOUBLE,
    target_qty    INT
) LOCATION '~/delta-data/region_targets';

INSERT INTO demo_zone.delta_demos.region_targets VALUES
    ('us-east',  75000.00, 500),
    ('us-west',  60000.00, 400),
    ('eu-west',  45000.00, 350),
    ('ap-south', 30000.00, 250);

-- Partition-scoped DML: only rewrites region=ap-south/ files
UPDATE demo_zone.delta_demos.sales_facts
SET amount = ROUND(amount * 0.90, 2)
WHERE region = 'ap-south';

-- Remove the 5 cancelled orders (qty = 0)
DELETE FROM demo_zone.delta_demos.sales_facts
WHERE qty = 0;

-- QUERIES
-- Row distribution across partition directories
SELECT region, COUNT(*) AS row_count,
       ROUND(SUM(amount), 2) AS total_sales,
       SUM(qty) AS total_qty
FROM demo_zone.delta_demos.sales_facts
GROUP BY region
ORDER BY region;

-- Static pruning: 3 of 4 partitions skipped at plan time
SELECT COUNT(*) AS row_count,
       ROUND(SUM(amount), 2) AS total_sales
FROM demo_zone.delta_demos.sales_facts
WHERE region = 'us-east';

-- JOIN-driven dynamic pruning: dimension filter target_amount > 50000 restricts to {us-east, us-west}
-- Engine log: "File filtering: 2 of 4 files skipped (partition pruning: 2, ...)"
SELECT s.region, t.target_amount,
       COUNT(*) AS fact_rows,
       ROUND(SUM(s.amount), 2) AS actual_sales,
       ROUND(SUM(s.amount) / t.target_amount * 100, 1) AS pct_of_target
FROM demo_zone.delta_demos.sales_facts s
JOIN demo_zone.delta_demos.region_targets t
    ON s.region = t.region
WHERE t.target_amount > 50000
GROUP BY s.region, t.target_amount
ORDER BY s.region;

-- IN-subquery pruning: subquery yields {us-east, us-west}; engine prunes the other 2 partitions
SELECT region,
       COUNT(*) AS row_count,
       ROUND(SUM(amount), 2) AS total_sales
FROM demo_zone.delta_demos.sales_facts
WHERE region IN (
    SELECT region FROM demo_zone.delta_demos.region_targets
    WHERE target_qty >= 400
)
GROUP BY region
ORDER BY region;

-- Anti-join pruning: NOT IN exclusion path
SELECT region,
       COUNT(*) AS row_count,
       ROUND(SUM(amount), 2) AS total_sales
FROM demo_zone.delta_demos.sales_facts
WHERE region NOT IN (
    SELECT region FROM demo_zone.delta_demos.region_targets
    WHERE target_amount < 50000
)
GROUP BY region
ORDER BY region;

-- CONTRAST: channel is NOT the partition column, so no partitions can be pruned
-- Engine log: "File filtering: 0 of 4 files skipped"
SELECT region,
       COUNT(*) AS row_count,
       ROUND(SUM(amount), 2) AS total_sales
FROM demo_zone.delta_demos.sales_facts
WHERE channel = 'online'
GROUP BY region
ORDER BY region;

-- Verify the partition-scoped UPDATE only affected ap-south
SELECT id, region, amount,
       CASE
           WHEN region = 'ap-south' THEN 'Discounted (x0.90)'
           ELSE 'Original price'
       END AS price_status
FROM demo_zone.delta_demos.sales_facts
WHERE id IN (1, 16, 31, 46)
ORDER BY id;

-- CLEANUP
DROP DELTA TABLE IF EXISTS demo_zone.delta_demos.region_targets WITH FILES;
DROP DELTA TABLE IF EXISTS demo_zone.delta_demos.sales_facts WITH FILES;
DROP SCHEMA IF EXISTS demo_zone.delta_demos;
DROP ZONE IF EXISTS demo_zone;

Description

## When to Use Anywhere you have a large partitioned fact table joined against a small dimension table and the dimension carries the filter (star schema). Dynamic partition pruning evaluates the dimension filter first, derives the set of partition values that can possibly match, and opens only those partition directories on the fact side. The rest are skipped without ever reading a Parquet file. This demo layers four flavors side by side (JOIN pruning, IN-subquery pruning, NOT IN anti-join pruning, static WHERE pruning) plus a contrast case where the filter column is NOT the partition column so nothing can be pruned. Run the queries with engine INFO logging on to see the pruning counts. ## What You Will Learn 1. How CREATE DELTA TABLE ... PARTITIONED BY (region) lays out one subdirectory per partition value 2. Static pruning: WHERE region = 'us-east' skips 3 of 4 partition directories at plan time 3. JOIN-driven dynamic pruning: WHERE t.target_amount > 50000 on the dimension prunes fact partitions that cannot join 4. IN-subquery pruning: WHERE region IN (SELECT region FROM dim WHERE ...) evaluates the inner query first, then prunes 5. Anti-join pruning: WHERE region NOT IN (...) is the exclusion form, equally prunable 6. Contrast case: WHERE channel = 'online' cannot prune partitions because channel is NOT the partition column -- every partition directory is scanned 7. Partition-scoped DML: UPDATE ... WHERE region = 'ap-south' only rewrites files inside that one partition directory 8. How to read the engine's file-filtering log lines (e.g. "File filtering: 2 of 4 files skipped") to verify pruning actually happened ## Prerequisites None. The setup seeds 60 fact rows (15 per region, 4 regions) plus a 4-row region_targets dimension. A partition-scoped UPDATE discounts ap-south by 10% and a DELETE removes 5 cancelled orders, leaving 55 rows across 4 partition directories.

Parameters

NameTypeDescription
FACT_PATHDelta table location for the partitioned fact table.
DIM_PATHDelta table location for the region targets dimension.
PARTITION_COLUMNColumn used in PARTITIONED BY. Pick the dimension with the most selective query filters -- not a high-cardinality column like id.

Pitfalls

Open in interactive docs →   DeltaForge home →