Page Content

Tutorials

What is the use of Self Join in Oracle with an example?

Self Join in Oracle

Joining a table to itself is known as a self-join. By assigning table aliases, the same table appears twice in the FROM clause but is handled as though it were two distinct tables. Hierarchical relationships, such an employee-manager structure where managers and employees are kept in the same EMPLOYEES table, are frequently handled with this technique. To obtain the manager’s details of an employee, for example, you can link rows inside the same table.

To illustrate a self-join for hierarchical data, let’s look to the FAMILY table example:

-- Create table FAMILY
CREATE TABLE family (
  name       CHAR(10) NOT NULL,
  birth_year NUMBER(4) NOT NULL,
  father     CHAR(10)
);
-- Insert values into FAMILY
INSERT INTO family VALUES ('Moishe', 1894, NULL);
INSERT INTO family VALUES ('Joseph', 1930, 'Moishe');
INSERT INTO family VALUES ('Michael', 1957, 'Joseph');
INSERT INTO family VALUES ('David', 1959, 'Joseph');
INSERT INTO family VALUES ('Ian', 1963, 'Joseph');
INSERT INTO family VALUES ('Baila', 1989, 'Ian');
INSERT INTO family VALUES ('Jillian', 1991, 'Ian');
-- Display all records from FAMILY
SELECT * FROM family;
-- Output:
-- NAME       BIRTH_YEAR FATHER
-- ---------- ---------- ----------
-- Moishe           1894
-- Joseph           1930 Moishe
-- Michael          1957 Joseph
-- David            1959 Joseph
-- Ian              1963 Joseph
-- Baila            1989 Ian
-- Jillian          1991 Ian

To determine each person’s father’s birth year, let’s now do a self-join:

SELECT a.name AS ChildName,
       a.birth_year AS ChildBirthYear,
       a.father AS FatherName,
       b.birth_year AS FatherBirthYear
FROM family a, family b
WHERE a.father = b.name;
-- Output:
-- CHILDNAME  CHILDBIRTHYEAR FATHERNAME FATHERBIRTHYEAR
-- ---------- -------------- ---------- ---------------
-- Joseph               1930 Moishe                1894
-- Michael              1957 Joseph                1930
-- David                1959 Joseph                1930
-- Ian                  1963 Joseph                1930
-- Baila                1989 Ian                   1963
-- Jillian              1991 Ian                   1963

In this self-join, the FAMILY table is referenced twice with aliases a and b. By joining a.father (the father’s name of the current row) to b.name (the name in another row of the same table), we can retrieve additional details about the father, such as their birth year (b.birth_year). This technique allows us to navigate relationships inherently present within a single dataset.

Imagine comparing two identical photo albums side by side. When you compare the images in albums A and B, you are performing an outer join. With a left outer join, all of the images from album A are displayed, and if a photo from album B matches, they are combined. If not, you display the picture from A with a blank area in place of the B-photo. The opposite is an outside join on the right. In a full outer join, all of the photos from both albums are displayed, with blanks for non-matches and matches when possible.

A self-join is similar to having a single photo album in which you wish to identify relationships. You may want to compare a child’s photo to a parent’s photo that is in the same album. If you have two copies of the album, you can find the connection by aligning the parent’s photo in the “second album” with the child’s photo in the “first album.”

Advantages of Self Join

Here are the key advantages and uses of self-joins:

Advantages of Self Join in Oracle
Advantages of Self Join in Oracle
  • Handling Hierarchical Data – Self-joins are particularly well-suited for situations where all related records are stored within the same table, such as a family tree or an employee-manager hierarchy. The CONNECT BY clause is specifically designed to define these hierarchical relationships in queries, allowing you to report on family members, corporate management structures, or other tree-like data structures, and even exclude branches or individuals.
  • Data Retrieval and Relationships – When performing a self-join, Oracle Database combines and returns rows from the table that satisfy a specified join condition. This allows for complex queries that reveal connections between rows of data that reside in the same table. For example, in a FAMILY table, you can join a person’s father column back to the name column within the same table to retrieve the father’s details, such as their birth year.
  • Flexibility in Portraying Inheritance Groups – Self-joins (especially when combined with hierarchical query clauses) offer the advantage of accurately portraying many inheritance groups in more than one way, like families, projects, or company divisions.

When using a self-join, it’s necessary to refer to the table multiple times in the FROM clause, using different table aliases to distinguish between the instances of the table being joined. For example, if joining the EMPLOYEE table to itself to find a manager’s details, you might use aliases E for employees and M for managers.

While self-joins are powerful, it’s important to remember that they are an alternative approach to other join types like inner or outer joins, designed for specific scenarios where a table needs to relate to itself.

Disadvantages of Self Join

While self-joins offer powerful capabilities for revealing relationships within a single table, particularly for hierarchical data, they do come with several disadvantages and specific restrictions:

Disadvantages of Self Join
Disadvantages of Self Join
  1. Complexity – Self-joins can “appear complex on the surface”. This complexity is particularly evident when working with hierarchical queries using the CONNECT BY clause, where the specific set of commands is noted as something “few people are likely to remember correctly”, suggesting a steeper learning curve or a higher propensity for errors in query construction.
  2. Restrictions with Database Links – A significant limitation is that tree-structured queries, which commonly involve self-joins with CONNECT BY, will “largely fail when using database links”. This means that if your data is distributed across multiple databases and you need to access hierarchical information traditional self-joins using CONNECT BY are generally not a viable option.
  3. Limitations with Oracle’s Outer Join Operator ((+)) – You cannot directly use the (+) operator to outer-join a table to itself without employing table aliases. For example, a statement like SELECT employee_id, manager_id FROM employees WHERE employees.manager_id(+) = employees.employee_id; is not valid. While self-joins are otherwise valid, aliases are crucial when using the (+) operator for outer self-joins, as demonstrated by the valid syntax SELECT e1.employee_id, e1.manager_id, e2.employee_id FROM employees e1, employees e2 WHERE e1.manager_id(+) = e2.employee_id;.
  4. Risk of Cartesian Products – Like any join operation, if the join condition for a self-join is not correctly specified, it can result in a Cartesian product. This outcome combines every row from one instance of the table with every row from the other instance, leading to an extremely large and often meaningless result set. For example, an 80-row table joined to itself without a proper condition could yield 6,400 rows.
  5. Incompatibility of CONNECT BY with WHERE Clause Joins – The CONNECT BY clause, frequently used with self-joins for hierarchical reporting, cannot be directly used with a table join in the WHERE clause.

In essence, while self-joins are indispensable for modelling and querying self-referential data, their implementation requires careful attention to syntax and an understanding of their inherent limitations, particularly in complex or distributed environments.

Index