What Are The SUM Function in PostgreSQL With Code Example

SUM Function in PostgreSQL

One essential aggregate function in PostgreSQL for calculating the overall sum of non-NULL input values in a set or a collection of rows is the SUM function. Smallint, integer, bigint, real, double precision, numeric, interval, and money are among the numeric data types that can be used with this function. If all of the input values are NULL, it will return NULL. SUM allows for categorised aggregation, such as total sales per city, when used with the set BY clause because it computes a different total for each unique set of rows.

It’s crucial to remember that the HAVING clause filters groups of rows after the SUM calculation is complete, allowing SUM to be allowed within HAVING but not WHERE, whereas WHERE clauses filter individual rows prior to aggregation. In addition to completing computations across a specified “window” or “partition” of rows without collapsing them into a single output row, SUM can also operate as a window function. The OVER clause governs how it behaves as a window function. It can contain PARTITION BY to group rows and ORDER BY to define a logical order inside these groups, allowing for calculations such as running totals.

By enabling users to specify state transition and optional final procedures, PostgreSQL further expands its flexibility by enabling users to design their own aggregate functions, much like SUM. The “Aggregate panel” and “Grouping Panel” provided by tools such as DBeaver make it simple to use SUM and other aggregate functions for data analysis. The EXPLAIN command can be used to visualise the query plan and pinpoint aggregation-related performance bottlenecks in order to optimise queries utilising SUM, particularly when used with ANALYSE.

Basic Usage and Concepts

PostgreSQL’s fundamental aggregate function SUM calculates the total of non-NULL values from input rows. As an aggregate, it processes many input rows and outputs the sum. This function supports smallint, integer, bigint, real, double precision, numeric, interval, and money data types. If all input values in the set are NULL, SUM returns NULL, not zero. Before computing the total, the COALESCE function can convert NULL values to zero to get a zero result.

SUM is usually used with the GROUP BY clause to sum data categories. The GROUP BY clause groups rows by column values, and SUM calculates a total for each group. Since aggregation has not yet occurred, aggregate functions like SUM cannot be utilised in a WHERE clause to filter rows. The HAVING clause filters rows after aggregation, making it the location to apply SUM and other aggregate functions.

Interaction with GROUP BY and HAVING Clauses

GROUP BY is a clause that is used in conjunction with SUM to determine the sum for particular categories or groups of rows. Using the values of one or more designated columns, the GROUP BY clause groups the input rows. A single sum is then returned for each of these groups after SUM has run its operations on them. SELECT city, SUM(sales) FROM stores GROUP BY city returns the total sales for those cities.

Knowing the distinction between WHERE and HAVING is crucial for SUM.

Prior to grouping and aggregating the input data, the WHERE clause filters each individual row. In a WHERE clause, SUM and other aggregate functions cannot be utilised because the aggregation has not yet taken place.
Once aggregation has been completed, the HAVING clause selects the groups of rows (the output of the GROUP BY clause). To set conditions on the aggregated results, the HAVING clause can employ SUM and other aggregate functions. SELECT city, SUM(sales) FROM shops GROUP BY city HAVING SUM(sales) > 1000, for instance, would only display cities with total sales.

SUM as a Window Function

SUM can also be a PostgreSQL window function. Unlike an aggregate, a window function computes across a related group of rows (a “window” or “partition”) without combining them into a single output row. Each row in the query result retains its individual identity.

SUM’s OVER clause controls its window function behaviour.

PARTITION BY: This OVER clause groups the query’s rows like GROUP BY, but it divides them for the window function’s computation rather than reducing output rows. If PARTITION BY is blank, the pane displays all query rows.

ORDER BY: In the OVER clause, ORDER BY, rows inside each division are ordered. This may affect sum calculations, especially “running totals.”SUM(salary) OVER (ORDER BY salary) computes a cumulative sum—including duplicates—from the partition start to the current row.

SUM and other window functions are only allowed in SELECT and ORDER BY queries. Logic follows WHERE, GROUP BY, and HAVING. Window function arguments can contain aggregate functions (e.g., SUM(COUNT(*)) OVER()), but not vice versa.

Code Example:

CREATE TABLE employees (
    dept TEXT,
    emp  TEXT,
    salary INT
);

INSERT INTO employees VALUES
('HR', 'Alice', 5000),
('HR', 'Bob',   4000),
('IT', 'Eve',   7000),
('IT', 'Tom',   6000),
('IT', 'Sam',   5000);

-- Using SUM as a window function
SELECT 
    dept,
    emp,
    salary,
    SUM(salary) OVER (PARTITION BY dept) AS dept_total,
    SUM(salary) OVER (PARTITION BY dept ORDER BY salary) AS running_total
FROM employees;

Output:

CREATE TABLE
INSERT 0 5
 dept |  emp  | salary | dept_total | running_total 
------+-------+--------+------------+---------------
 HR   | Bob   |   4000 |       9000 |          4000
 HR   | Alice |   5000 |       9000 |          9000
 IT   | Sam   |   5000 |      18000 |          5000
 IT   | Tom   |   6000 |      18000 |         11000
 IT   | Eve   |   7000 |      18000 |         18000
(5 rows)

User-Defined Aggregates

PostgreSQL supports user-defined aggregate functions well, letting users go beyond SUM. Users must specify an internal state value data type, beginning value, and state transition function to define a new aggregate. This ordinary transition function updates the aggregate group’s internal state value for each input tuple. If the desired result differs from the running state value, an optional final function can compute the aggregate’s return value from the collected state information.

For example, to use complex integers in the SUM aggregate, define a complex addition function as its state transition function and establish an initial condition like ‘(0,0)’. PostgreSQL overloads functions to determine the SUM aggregate based on the argument’s data type. A custom SUM aggregate with a NULL beginning condition and a “strict” transition function returns NULL if no non-NULL input values are provided, following the SQL standard for SUM.

Registering these custom aggregates links the state function (SFUNC), state type (STYPE), optional final function (FINALFUNC), and initial condition (INITCOND) using the CREATE AGGREGATE command. Polymorphic aggregates, like array_accum, concatenate all inputs into an array of the actual input type.

Performance Considerations

The PostgreSQL SUM function’s performance depends on how the query planner and executor handle aggregate operations. The PostgreSQL cost-based query planner optimises queries by estimating disc and CPU costs. Statistics from tables (updated by ANALYSE and VACUUM ANALYSE) are used to estimate query plans for aggregate functions like SUM. These obsolete statistics may lead the planner to an inefficient execution strategy.

PostgreSQL’s grouped aggregation uses sort or hash. When there are few groups, the system chooses in-memory hash-based aggregation; otherwise, it uses sort-based. Work_mem defines the maximum memory for such activities before temporary disc files are used, making it critical. Multiple sort and hash operations in a sophisticated query could need up to work_mem.

PostgreSQL may parallelise query plans to use several CPUs/cores, improving SUM operation performance, notably for analytics queries analysing huge datasets. The master process calculates the final result from each background worker’s partial sum in parallel aggregation. Unfortunately, parallel aggregation cannot handle ordered-set aggregates or aggregates with DISTINCT or ORDER BY clauses in the aggregate call.

By creating data-type-specific native code, PostgreSQL 11 and later JIT compilation (Just-In-Time) can accelerate expression evaluation, including aggregates. This is especially useful for CPU-bound, long-running searches that handle big amounts of data, but JIT compilation may be too much for short-running queries.

The EXPLAIN command, especially with the ANALYSE option, is essential for analysing and optimising SUM queries. It shows query plan execution costs, access methods (e.g., sequential scan, index scan), join methods, execution durations, and row counts. Developers can discover bottlenecks and optimise queries by indexing or revising GROUP BY and HAVING clauses.

DBeaver Support

The SUM function in PostgreSQL is extensively supported by DBeaver, which treats it as a crucial aggregate function for data processing and analysis. Through DBeaver’s graphical user interface, users may immediately utilise SUM and other aggregate methods, simplifying data aggregations without requiring manual SQL coding for each instance.

The Grouping panel in DBeaver is specifically made to compute statistics, such as sums, using tables or unique SQL queries. This panel computes several analytics functions, including COUNT, SUM, AVG, MIN, and MAX, and uses GROUP BY clauses to extract unique values. The results are displayed in columns that are specifically designated for these purposes. Users can readily add SUM and other functions to their grouping choices, even though COUNT is frequently the default.

Additionally, SUM is explicitly supported for grouping and aggregating in DBeaver’s Visual Query Builder. To visually process their selected data, users can choose from an aggregate function like COUNT, AVG, MAX, MIN, or SUM within this builder. This feature enables the Grouping Panel to produce aggregated results that can subsequently be displayed in a variety of chart forms. By offering a user-friendly graphical method for aggregations, DBeaver’s integration of SUM into various tools simplifies the analysis of PostgreSQL data.

Page Content

Tutorials