GROUP BY

Groups rows by column values for aggregate calculations.

Category: select-grammar

Syntax

GROUP BY <item>, ...

Items:
  <column>
  <expression>

Description

## Overview GROUP BY partitions the result set into groups where all rows share the same values for the specified columns or expressions. Once grouped, each group collapses into a single output row. Any column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause (or be functionally dependent on a GROUP BY column). GROUP BY is the prerequisite for meaningful aggregate calculations such as COUNT, SUM, AVG, MIN, and MAX. ## Execution Order GROUP BY is evaluated after FROM, JOIN, and WHERE. Rows that survive the WHERE filter are partitioned into groups. The HAVING clause (if present) then filters these groups. The SELECT list expressions and window functions are evaluated after grouping. ## Behavior - **Simple grouping**: Groups rows by one or more column references or arbitrary expressions. Two values belong to the same group when they are equal, with NULLs treated as equal to each other for grouping purposes. - **Ordinal references**: GROUP BY supports referencing SELECT list positions by number (e.g., GROUP BY 1, 2). These references are resolved during query planning. - **Aggregate pushdown**: For simple aggregates (COUNT(*), MIN, MAX) over ungrouped or partition-aligned queries, the optimizer can compute results directly from Parquet file metadata statistics without reading row data. This optimization applies when the grouping columns align with the table's partition structure. - **NULL handling**: All NULLs in a grouping column are placed into a single group. ## Compatibility GROUP BY conforms to the SQL:1999 and SQL:2003 standards. Ordinal position references in GROUP BY follow widespread convention supported by most SQL engines. ROLLUP, CUBE, and GROUPING SETS are not supported; use explicit UNION ALL over multiple grouped queries for the equivalent result.

Examples

-- Basic grouping with COUNT
SELECT region, COUNT(*) AS order_count
FROM orders
GROUP BY region;
-- Group by expression
SELECT DATE_TRUNC('month', order_date) AS month, SUM(amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
-- Multiple aggregates with GROUP BY
SELECT
  department,
  COUNT(*) AS headcount,
  AVG(salary) AS avg_salary,
  MIN(hire_date) AS earliest_hire,
  MAX(hire_date) AS latest_hire
FROM employees
GROUP BY department
ORDER BY headcount DESC;
-- Group by ordinal position (column number)
SELECT status, priority, COUNT(*)
FROM tickets
GROUP BY 1, 2
ORDER BY 3 DESC;
-- Combine with HAVING to filter groups
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;

See Also

Open in interactive docs →   DeltaForge home →