INPUT_FILE_NAME

Return the path of the underlying data file from which the current row was read.

Category: miscReturns: STRINGDialect: Standard

Syntax

INPUT_FILE_NAME()

Description

## Overview Returns the path of the data file that produced the current row, typically a Parquet file inside a Delta table. Use this function for data lineage, debugging partition layout, or narrowing a query to specific input files. ## Behavior - Returns a STRING containing the storage path of the data file. - Returns NULL when evaluated outside of a table scan context (for example, in a subquery over constant values). - The path format reflects the underlying storage URI (local, S3, Azure Blob, etc.). - Deterministic for a specific scan; the same row always reports the same file. - Side effect free. ## Compatibility - Standard analytical SQL primitive for exposing physical storage location to queries.

Examples

-- Attach the source file to every row during a scan
SELECT INPUT_FILE_NAME() AS source_file, *
FROM obs.catalog.events;
-- Count rows per source file to diagnose skew or late-arriving data
SELECT INPUT_FILE_NAME() AS file, COUNT(*) AS row_count
FROM obs.catalog.events
GROUP BY INPUT_FILE_NAME()
ORDER BY row_count DESC;
-- Filter to rows from a specific Parquet part file
SELECT *
FROM obs.catalog.events
WHERE INPUT_FILE_NAME() LIKE '%part-00001%';

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →