SPARK_PARTITION_ID

Return the id of the physical partition that produced the current row.

Category: miscReturns: INTDialect: Standard

Syntax

SPARK_PARTITION_ID()

Description

## Overview Returns the zero-based id of the physical partition that produced the current row. Use this function to diagnose data layout, identify partition skew, or correlate rows with the underlying parallel task that read them. It is a diagnostic primitive and is not intended for business logic. ## Behavior - Returns an INT partition id starting at 0. - The value is assigned by the execution engine and depends on the query plan and parallelism settings. - Returns NULL when called outside of a scan context (for example, on constants or non-scan subqueries). - Running the same query twice can produce different partition ids for the same rows. - Side effect free. ## Compatibility - Standard analytical SQL primitive for exposing physical partitioning to queries.

Examples

-- Inspect partition assignment per row
SELECT SPARK_PARTITION_ID() AS partition_id, *
FROM obs.catalog.events;
-- Count rows per partition to diagnose skew
SELECT SPARK_PARTITION_ID() AS pid, COUNT(*) AS row_count
FROM obs.catalog.events
GROUP BY SPARK_PARTITION_ID()
ORDER BY row_count DESC;
-- Combine with INPUT_FILE_NAME to trace a row back to its source file
SELECT SPARK_PARTITION_ID() AS pid, INPUT_FILE_NAME() AS file, event_id
FROM obs.catalog.events
LIMIT 50;
-- Do not use as a stable identifier; partition ids are execution-local
SELECT SPARK_PARTITION_ID() AS pid, COUNT(*)
FROM obs.catalog.events
GROUP BY pid;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →