Subqueries

Nested queries used in FROM, WHERE, or SELECT clauses.

Category: select-grammar

Syntax

-- In FROM:
SELECT * FROM (<subquery>) AS alias

-- In WHERE:
WHERE <col> IN (<subquery>)
WHERE EXISTS (<subquery>)

-- Scalar:
SELECT (SELECT MAX(x) FROM t) AS max_val

Description

## Overview A subquery is a complete SELECT statement nested inside another SQL statement. Subqueries can appear in the FROM clause (as derived tables), in the WHERE or HAVING clauses (as filter conditions), or in the SELECT list (as scalar expressions). They enable complex data retrieval patterns that reference intermediate results without requiring temporary tables or CTEs. ## Execution Order The execution order of subqueries depends on their position and correlation: - **Uncorrelated subqueries** (no references to outer query columns) can be evaluated once and their result reused. The optimizer typically executes these first. - **Correlated subqueries** (referencing columns from the outer query) are logically evaluated once per outer row. The optimizer aggressively decorrelates these into join operations when possible. - **Derived tables** (FROM subqueries) are evaluated as part of the FROM clause, before WHERE. ## Behavior - **Derived tables**: A subquery in the FROM clause must be aliased. It produces a temporary result set that the outer query can reference like a regular table. Predicates from the outer WHERE clause can be pushed into the derived table by the optimizer. - **IN subquery**: `WHERE col IN (SELECT ...)` tests whether the column value matches any row in the subquery result. The optimizer rewrites IN subqueries to semi-joins when the subquery is uncorrelated or can be decorrelated, which is significantly more efficient than per-row evaluation. - **EXISTS subquery**: `WHERE EXISTS (SELECT ...)` returns true if the subquery produces at least one row. NOT EXISTS returns true if the subquery produces zero rows. The optimizer rewrites EXISTS to semi-joins and NOT EXISTS to anti-joins. - **Scalar subquery**: A subquery in the SELECT list or in a comparison must return exactly one row and one column. If it returns zero rows, the result is NULL. If it returns more than one row, an error is raised at runtime. - **ALL/ANY/SOME**: Comparison operators combined with ALL or ANY (SOME is a synonym for ANY) compare a value against every row or any row of a subquery. For example, `> ALL (subquery)` means greater than every value in the subquery result. - **Decorrelation and unnesting**: The optimizer converts correlated subqueries into equivalent join operations using PostgreSQL-style subquery unnesting. IN subqueries become semi-joins; EXISTS becomes semi-joins; NOT EXISTS becomes anti-joins. After unnesting, the join optimizer can reorder tables and apply runtime filters. - **Dynamic partition pruning**: When an IN subquery is decorrelated into a semi-join, the optimizer may inject a dynamic partition filter derived from the subquery result. This allows file skipping on the probe side before data is read, dramatically reducing I/O for selective subqueries on partition columns. - **NULL handling**: IN subqueries follow three-valued logic. If the subquery result contains NULL, `x IN (subquery)` may return UNKNOWN rather than FALSE for values not found. Use EXISTS for NULL-safe existence checks. ## Compatibility Subqueries conform to the SQL:1992 standard. Correlated subqueries, scalar subqueries, IN, EXISTS, ALL, ANY, and SOME are all supported. The optimizer's decorrelation and semi-join/anti-join rewriting follows PostgreSQL-style optimization techniques.

Examples

-- Derived table in FROM
SELECT *
FROM (
  SELECT region, COUNT(*) AS cnt
  FROM orders
  GROUP BY region
) sub
WHERE cnt > 100;
-- IN subquery: filter by a set of values
SELECT *
FROM customers
WHERE id IN (
  SELECT customer_id FROM orders WHERE amount > 1000
);
-- EXISTS subquery: check for related rows
SELECT c.name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.id
    AND o.order_date >= '2025-01-01'
);
-- NOT EXISTS: customers with no orders
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
-- Scalar subquery in SELECT list
SELECT
  name,
  (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) AS order_count
FROM customers c;
-- Correlated subquery in WHERE
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary) FROM employees e2
  WHERE e2.department = e.department
);

See Also

Open in interactive docs →   DeltaForge home →