Page Content

Tutorials

What Are The SubQueries in PostgreSQL With Code Example

SubQueries in PostgreSQL

Subqueries, also known as sub-selects or nested queries, allow one SELECT statement to be contained within another query in PostgreSQL. This allows the inner query result to be utilised as input for the outer query, forming a larger query. Subqueries can return a table or a list of values and be used in the FROM clause (where they create a “derived table” that must be aliased), the SELECT list (as scalar subqueries that return a single row and column), and the WHERE clause for filtering with IN, NOT IN, EXISTS, NOT EXISTS, ANY, or ALL.

The LIMIT clause in PostgreSQL enables scalar subqueries. To optimise performance, PostgreSQL can decorrelate various classes of correlated subqueries during query rewrite and combine a noncorrelated subquery into the upper-level query block. Some linked subqueries cannot be decorrelated. Understanding and optimising subquery performance requires the EXPLAIN command, which shows how PostgreSQL’s planner will run the query. PostgreSQL introduced lateral subqueries (using LATERAL JOIN since version 9.3), which allow a subquery in the FROM clause to reference columns from preceding table expressions, making them a simpler and faster alternative to complex window functions or inefficient correlated subqueries.

Usage Contexts

In the FROM Clause: Subqueries in the FROM Clause are easy and popular. The main query uses a “derived table” created by FROM clause subqueries. From clause subqueries must have clear aliases. This method can streamline complicated queries, particularly those that need aggregation or grouping that cannot be simplified to a simple join. For instance, the outer query can filter or join the aggregated data that a subquery computes.

In the WHERE Clause (Filtering Conditions):Within the WHERE clause, subqueries are commonly used to filter data according to conditions that are generated from the results of another query.

IN and NOT IN: The operations IN and NOT IN are employed when a list of values is returned by the subquery. In the set that the subquery returns, the IN predicate determines whether a value is present. NOT IN, on the other hand, determines if a value is present in the result set of the subquery. Functionally, NOT IN() is equivalent to <> ALL() while IN() is equivalent to = ANY(). Note that NULL values in IN subqueries can create unexpected results or delay. Avoid this by adding a subquery WHERE IS NOT NULL condition.

EXISTS and NOT EXISTS: If subquery produces rows, EXISTS returns true, else false. Subqueries end when they locate rows. The subquery returns true if no rows are returned. These are useful for complex comparisons that require the subquery rather than the data.

ANY/SOME and ALL: ALL and ANY/SOME allow multi-row subquery comparisons. ANY (or SOME) returns true if the comparison operator is true for all subquery values. Comparison operator true for all subquery values yields true.

In the SELECT List (Scalar Subqueries): SELECT, WHERE, and GROUP BY clauses can employ scalar subqueries. Dynamic or aggregated values might appear as columns in the primary query output. These subqueries can employ the WHERE clause to reference main query components, enabling complex, row-specific computations.

In UPDATE, INSERT, and DELETE Statements: Subqueries can describe which rows to update or add and which values to utilise in DML statements. In a UPDATE statement, a sub-select can determine column values based on conditions or data from other tables. You can also INSERT data FROM SELECT statements.

In the LIMIT Clause: Scalar subqueries are specially supported by PostgreSQL. This can change how many rows the query returns on the fly.

Code Example:

DROP TABLE IF EXISTS employees, departments;
CREATE TABLE employees(id SERIAL, name TEXT, dept_id INT, salary INT);
CREATE TABLE departments(id SERIAL, dept_name TEXT);

INSERT INTO departments VALUES (1,'HR'),(2,'IT'),(3,'Sales');
INSERT INTO employees(name,dept_id,salary) VALUES
('Alice',1,50000),('Bob',2,60000),('Charlie',2,70000),('Diana',3,40000);

SELECT dept_id, avg_salary
FROM (SELECT dept_id, AVG(salary) AS avg_salary FROM employees GROUP BY dept_id) t
WHERE avg_salary>50000;

SELECT name,salary FROM employees
WHERE salary>(SELECT AVG(salary) FROM employees);

SELECT name FROM employees
WHERE dept_id IN(SELECT id FROM departments WHERE dept_name='IT');

SELECT dept_name FROM departments d
WHERE EXISTS(SELECT 1 FROM employees e WHERE e.dept_id=d.id);

SELECT name FROM employees
WHERE salary>ALL(SELECT salary FROM employees WHERE dept_id=1);

SELECT name,(SELECT dept_name FROM departments d WHERE d.id=e.dept_id) dept
FROM employees e;

UPDATE employees SET salary=salary+5000
WHERE dept_id=(SELECT id FROM departments WHERE dept_name='Sales');

INSERT INTO employees(name,dept_id,salary)
SELECT 'Eve',id,55000 FROM departments WHERE dept_name='HR';

SELECT name,salary FROM employees
ORDER BY salary DESC
LIMIT (SELECT COUNT(*) FROM employees WHERE dept_id=2);

Output:

DROP TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 3
INSERT 0 4
 dept_id |     avg_salary     
---------+--------------------
       2 | 65000.000000000000
(1 row)

  name   | salary 
---------+--------
 Bob     |  60000
 Charlie |  70000
(2 rows)

  name   
---------
 Bob
 Charlie
(2 rows)

 dept_name 
-----------
 HR
 IT
 Sales
(3 rows)

  name   
---------
 Bob
 Charlie
(2 rows)

  name   | dept  
---------+-------
 Alice   | HR
 Bob     | IT
 Charlie | IT
 Diana   | Sales
(4 rows)

UPDATE 1
INSERT 0 1
  name   | salary 
---------+--------
 Charlie |  70000
 Bob     |  60000
(2 rows)

Correlated Subqueries

Correlated subqueries depend on the outer query for execution. A correlated subquery is assessed repeatedly for every row processed by the outer query, unlike a standard subquery. WHERE clauses in subqueries reference columns from the outer query, causing this dependency. SELECT subqueries that reference a primary query table by alias in their WHERE conditions are executed for each record from the main table.

Correlated subqueries are useful for complex comparisons and calculations, such as determining the oldest buddy in each state or dynamically counting items per category, but their looped nature can pose performance concerns. PostgreSQL’s optimiser tries to “decorrelate” several types of connected subqueries to build more efficient execution plans, although it fails sometimes. When performance bottlenecks occur, rewriting associated subqueries with INNER JOIN or IN may enhance speed.

Performance and Optimization

Careless use of subqueries can result in query designs that are ineffective. One typical error is to regard subqueries as autonomous “black boxes” or to overuse them in SELECT lists. This “piecemeal thinking” can lead to slower execution and numerous pointless table scans.

Finding the plan with the lowest predicted cost is the main goal of PostgreSQL cost-based optimiser. It attempts to “collapse” subqueries into the parent query in order to develop an execution strategy that is more effective. However, the planner may choose not to collapse a subquery if doing so will produce an excessive number of FROM items in order to save planning time, which could result in a less-than-ideal plan.

To comprehend and enhance subquery queries, the EXPLAIN command is a great tool. The execution plan is shown, with indentation indicating nested operations, and should be read bottom-up. The query is conducted using EXPLAIN (ANALYSE), which also gives real-time runtime data to help locate bottlenecks. VACUUM ANALYSE and other similar tools are essential because they gather statistics that the query optimiser utilises to create effective plans.

Frequently, supplementary features or alternatives outperform subqueries with subpar design:

Joins: Subqueries in the FROM clause create a “derived table” that the main query can use. The derived table can be joined to other tables, views, or subqueries using normal join syntax. Allocating an alias to any FROM clause subquery is crucial. This method simplifies complex queries, especially those that include grouping or aggregation and cannot be reduced to a join. Subqueries can join their aggregated or filtered results to another table in the outer query.

Common Table Expressions (CTEs): The WITH clause allows a named subquery to be referenced numerous times using Common Table Expressions (CTEs). By splitting complex logic into smaller, reusable bits, programming becomes more visible and modular. Some CTEs boost performance, such recursive searches. If the optimiser does not fold the results into the primary query, CTEs may be “materialised,” blocking indexes from the tables.

Lateral Subqueries: Tables before subqueries in the FROM list can be referenced by LATERAL JOINS. Rerunning a subquery for every outer table entry improves performance.

CASE Expressions: The powerful SQL construct CASE expressions enable conditional reasoning like IF-THEN-ELSE statements in other programming languages. They can be used in the SELECT list, WHERE clause, and GROUP BY clause to filter or dynamically output based on conditions. A CASE expression analyses a sequence of conditions (WHEN) and returns the first expression whose condition is met, NULL if no condition is true and an ELSE clause is present. For checking expression equality against many values, CASE syntax is simplified.

Subqueries are a basic and adaptable component of SQL that enable sophisticated data retrieval and manipulation, to sum up. To write effective and maintainable database queries, it is essential to comprehend their behaviour, particularly correlation and performance consequences.

Kowsalya
Kowsalya
Hi, I'm Kowsalya a B.Com graduate and currently working as an Author at Govindhtech Solutions. I'm deeply passionate about publishing the latest tech news and tutorials that bringing insightful updates to readers. I enjoy creating step-by-step guides and making complex topics easier to understand for everyone.
Index