Page Content

Tutorials

What Are The UNION Operator in PostgreSQL With Example

UNION Operator in PostgreSQL

Two or more SELECT queries can be combined vertically into one result set using PostgreSQL’s UNION operator. Appends rows from one query to another. UNION removes duplicate rows by default, like SELECT DISTINCT. The ALL keyword must be used to include all rows, including duplicates, as in UNION ALL .

To function with UNION, queries must be “union compatible,” meaning they return the same number of columns and have compatible data types. PostgreSQL can evaluate mixed-type expressions and conduct implicit type conversions. If type compatibility concerns emerge, explicit type conversion may be needed. Avoid type resolution issues by ensuring the leftmost UNION has an input of the desired result type when merging NULL items with different types.

UNION Operator in PostgreSQL
UNION Operator in PostgreSQL

Basic Syntax

Two or more SELECT statements are used in the basic UNION syntax, with the UNION keyword separating them:

<query1> UNION <query2>;

UNION allows you to combine an infinite number of SELECT statements. When merging queries, the first subquery’s output columns are usually used for the combined result set.

The UNION operator’s duplicate row handling is crucial. Like SELECT DISTINCT, UNION removes duplicate entries from the result set by default. Use ALL to keep all rows, including duplicates. The UNION ALL clause ensures that every row from each SELECT expression is produced, preserving duplicates.

Compatibility Requirements

Strict compatibility guidelines must be followed in order for UNION to correctly combine queries:

Same Number of Columns: Due to a compatibility restriction, PostgreSQL’s UNION operator can only join the result sets of two or more SELECT queries that return the same number of columns. If the first SELECT query in a UNION operation gives two columns, all following queries must return two columns. This condition guarantees the combined result set is constant, allowing vertical row concatenation. Without this rule, PostgreSQL cannot align data properly, causing an error.

Compatible Data Types: Every SELECT statement needs to have corresponding columns with suitable data types. A type resolution mechanism will be used by PostgreSQL to identify a common data type for every output column across all queries, however they don’t have to be the same.

There will be a problem if a compatible type cannot be located. The INTEGER may be implicitly cast to NUMERIC by PostgreSQL, for instance, if one column is INTEGER and the matching column is NUMERIC. They resolve to text if all of the inputs are of an unknown type.

Code Example:

CREATE TABLE students (
    id INT,
    name TEXT
);
CREATE TABLE employees (
    emp_id NUMERIC,  
    emp_name TEXT
);
INSERT INTO students VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO employees VALUES (3, 'Charlie'), (4, 'Diana');
SELECT id, name FROM students
UNION
SELECT emp_id, emp_name FROM employees;

Output:

CREATE TABLE
CREATE TABLE
INSERT 0 2
INSERT 0 2
 id |  name   
----+---------
  4 | Diana
  3 | Charlie
  1 | Alice
  2 | Bob
(4 rows)

Ordering and Limiting Results

The PostgreSQL UNION operator’s ORDER BY and LIMIT clauses have precise criteria for their use, which control the output. An ORDER BY clause can only be used at the end of a UNION statement to sort the results of all SELECT queries, not each subquery. Sorting columns can be selected by their output column names (inherited from the first SELECT statement in the UNION) or ordinal position. Sorting the final output using ORDER using subqueries is pointless.

The final result set of the UNION operation additionally uses the LIMIT and OFFSET clauses to limit the number of rows returned and skipped. To apply LIMIT or OFFSET to a subquery before the UNION operation, that SELECT query must be placed in brackets to limit its scope. Otherwise, the clause will apply to the result of the entire set operation. When you want a predictable result order, you need an ORDER BY clause and LIMIT.

Without explicit sorting, the query optimizer’s plans can change the row order. The query optimiser actively considers LIMIT, which might result in alternative execution plans and inconsistent row orders if ORDER BY is not utilised to impose a unique and predictable ordering.

UNION vs JOIN

Differentiating between UNION and JOIN operations is essential:

UNION (Vertical Combination): Rows from several queries are combined into a single column set using UNION (Vertical Combination). The total of the rows (less duplicates if ALL is not used) determines the number of rows, while the number of columns stays the same as queries. Combining a list of aquatic and terrestrial creatures is one example of how this might be helpful when listing all entities from various, possibly unrelated data.

JOIN (Horizontal Combination): An INNER JOIN, LEFT JOIN, or CROSS JOIN are examples of horizontal combinations that combine columns from two or more tables by matching rows according to a defined condition. The total of the columns from the linked tables determines the number of columns in the result, while the join condition determines the number of rows. UNION is the best option when there isn’t a logical join key or when combining data from different but structurally related only objective. Using FROM T1, T2, or T1 CROSS JOIN T2, a CROSS JOIN combines each row from one table with every row from another to produce a Cartesian product.

Precedence with Other Set Operators

Certain precedence criteria govern the order in which the UNION operator is evaluated when combined with other SQL set operators, such as INTERSECT and EXCEPT, in a single statement. It is essential to comprehend these guidelines since they specify how complicated queries should be interpreted and run.

In PostgreSQL, the UNION and EXCEPT operations are automatically arranged from left to right. This implies that in a sequence such as query1 UNION query2 EXCEPT query3, the EXCEPT operation would be applied to that intermediate result and query3 after the UNION operation between queries 1 and 2 has been processed. Regarding the general precedence among these operators,however, provide contradictory information.

Role in Recursive Queries

Common Table Expressions (CTEs) require the UNION operator, notably UNION ALL, especially in recursive searches. There are usually two components to a recursive CTE:

Non-Recursive Term (Anchor Member): This is the first SELECT query that creates the base result set. The CTE itself cannot be referred to, and it is only run once.

Recursive Term: The non-recursive term is joined with the CTE itself using UNION or UNION ALL in this SELECT statement. Until it generates no more rows, this term is assessed iteratively.

In some situations, recursive CTEs can avoid infinite loops by using UNION (without ALL), which discards duplicate entries created during the iterative steps. Each iteration’s rows are included in the UNION ALL variation. Calculating factorials or finding a product’s subparts often requires recursive searches.

Overall, PostgreSQL’s UNION operator is powerful for vertically integrating SELECT operations. Controlling duplicate rows is crucial to advanced query tactics like recursive CTEs. Its appropriate use necessitates consideration of data types, column compatibility, and the query’s logical order of operations.

Kowsalya
Kowsalya
Hi, I'm Kowsalya a B.Com graduate and currently working as an Author at Govindhtech Solutions. I'm deeply passionate about publishing the latest tech news and tutorials that bringing insightful updates to readers. I enjoy creating step-by-step guides and making complex topics easier to understand for everyone.
Index