What Is Array Function in PostgreSQL With Code Example

Array Function in PostgreSQL

In PostgreSQL arrays, columns can have multiple values of the same type. Unlike the relational paradigm, this capability lets databases store complex, multivalued characteristics directly, making data administration more efficient. With a wide collection of operators and functions and the ability to support multidimensional structures, PostgreSQL arrays are extremely mature.

Any base type, enum type, composite type, range type, or domain can be found in arrays created with PostgreSQL or created by the user. Their adaptability makes them appropriate for a range of applications, including the modelling of multivalued properties, such as an article with several tags or a clothing with multiple colours. Additionally, they can be used to store scientific data or to mimic a hash store utilising two arrays: one for keys and one for values. Moreover, arrays can help with difficult SQL and coding tasks like loop execution with generate_series or sending many arguments to a function using the VARIADIC array option.

Declaring and Inputting Array Values

Square brackets ([]) around array items’ data types define a table array column. While integer[] produces a one-dimensional array of integers, text[] produces a two-dimensional array of text. Despite allowing exact sizes, PostgreSQL does not restrict this.

There are two major ways to enter array values:

Literal Constant: Use curly braces {} and commas to express literal constants as array values. Whitespace, NULL, commas, curly braces, and empty strings require double quotes. Example: ‘{red,green,blue}’::text[.

ARRAY Constructor: ARRAY Constructor syntax follows the SQL standard and uses the ARRAY keyword with a list of expressions in square brackets []. Example: ARRAY[‘red’,’green’,’blue’] as primary_colors. SQL instructions often employ this method because element values are expressed as SQL constants or expressions.

Nesting these structures creates multidimensional arrays, like {{red,green,blue}} for a two-dimensional array. By default, PostgreSQL array subscripts are numbered one.

Accessing and Modifying Arrays

An integer index enclosed in square brackets [] is used to access individual array elements. An index that is out of bounds returns NULL. Array slicing is another feature of PostgreSQL that lets you extract a subarray by giving one or more dimensions a lower-bound:upper-bound. If bounds are not specified, the array’s boundaries will be used. A wildcard accessor [*] yields every entry in the array.

There are various methods for altering arrays:

Replacing Entire Array: UPDATE statements in PostgreSQL can replace an array’s content by assigning a new array value to the array column. The new array can be provided as a literal constant in curly braces {} or using the ARRAY constructor syntax.

Updating Elements/Slices: Update particular elements or rectangular slices in an array using the UPDATE statement and array subscripting in PostgreSQL. To change an element’s index, use square brackets [] and a new value. Use UPDATE to adjust an employee’s fourth-quarter salary. sal_emp Set pay_by_quarter = 15000 WHERE name = ‘Bill’.

Appending Elements: To concatenate items, use the || operator (e.g., ARRAY [‘red’, ‘green’] || ‘{blue}’::text[]). or array_append/prepend.

Removing Elements: Array_remove() removes all instances of a value from a one-dimensional array. The WITH ORDINALITY unnest function removes index-based elements. Use trim_array() to delete a particular number of entries from an array’s first dimension.

Replacing Elements by Value: PostgreSQL’s array_replace() method replaces array members based on their value. The target array, value to replace, and new value are the inputs to this function. The array is searched for all occurrences of the specified value and replaced with the new value.

Enlarging Arrays: One-dimensional arrays expand by filling intervening spots with NULLs when an index is provided a value larger than its current size.

Code Example:

DROP TABLE IF EXISTS emp;
CREATE TABLE emp(name text, pay int[]);
INSERT INTO emp VALUES
('Bill', ARRAY[10000,11000,12000,13000]),
('Alice', ARRAY[9000,9500,9700,9800]);

SELECT name, pay[1] AS q1, pay[2:3] AS mid FROM emp;
UPDATE emp SET pay = ARRAY[15000,15000,15000,15000] WHERE name='Alice';
UPDATE emp SET pay[4] = 15000 WHERE name='Bill';
UPDATE emp SET pay = array_append(pay,16000) WHERE name='Bill';
UPDATE emp SET pay = array_remove(pay,9500) WHERE name='Alice';
UPDATE emp SET pay = array_replace(pay,15000,15500);
UPDATE emp SET pay[7] = 20000 WHERE name='Bill';

SELECT * FROM emp;

Output:

DROP TABLE
CREATE TABLE
INSERT 0 2
 name  |  q1   |      mid      
-------+-------+---------------
 Bill  | 10000 | {11000,12000}
 Alice |  9000 | {9500,9700}
(2 rows)

UPDATE 1
UPDATE 1
UPDATE 1
UPDATE 1
UPDATE 2
UPDATE 1
 name  |                    pay                     
-------+--------------------------------------------
 Alice | {15500,15500,15500,15500}
 Bill  | {10000,11000,12000,15500,16000,NULL,20000}
(2 rows)

Querying and Searching Arrays

For array queries, PostgreSQL has strong operators and functions:

Comparison Operators: Array comparisons require comparison operators beyond equality (=). When it matches any array element, an expression returns true; when it matches all, it returns true.

Overlap and Containment: Specific PostgreSQL operators handle overlap and containment while querying and searching arrays. && (overlaps) returns true if two arrays share elements. For instance, ARRAY && ARRAY is true since both arrays contain 1. The @> (contains) operator checks if the first array fully embraces the second, meaning all elements in the second array must be in the first.

Element Position: Array_position() returns the subscript of the value’s initial occurrence, while array_positions() delivers an array of all subscripts in which a value is located.

Unnest Function:Using the unnest function, which returns a set, an array is converted into a set of rows, each record of which represents an array member. It can be applied to shorter arrays by padding them with NULLs and using the FROM clause with multiple array parameters. Using WITH ORDINALITY also adds a bigint column containing the row numbers.

generate_subscripts: SQL queries that iterate across array members can use generate_subscripts to provide permitted subscripts for a particular array dimension.

String Conversion: String conversion functions in PostgreSQL allow array data to be converted to text and vice versa. The array_to_string() function joins array items with a delimiter to create a text string. Null entries in the array are omitted unless array_to_string() is given a null_string.

Array Information and Advanced Functions

Array-related metadata is provided via several functions:

array_dims(): PostgreSQL’s array_dims() method returns an array’s dimensions in text. The “Dimensions index range” of any array value is returned by this function. For a two-dimensional array like ARRAY[,], array_dims() returns [1:2][1:3]. One-dimensional arrays like ARRAY return [1:3].

array_length(): PostgreSQL’s array_length() method returns an array dimension’s length. It requires the array and an integer indicating the dimension’s length. Examples include array_length(two_dim_array, 1), which returns its first dimension.

array_ndims(): PostgreSQL’s array_ndims() function returns an array’s dimensions. The array is its only argument. For example, array_ndims() returns 2 for a two-dimensional array ARRAY[,].

cardinality(): The PostgreSQL cardinality() method returns the entire number of elements in an array, including all dimensions. This function helps determine an array’s size regardless of structure or dimensionality. Cardinality() returns 3 for a one-dimensional array like ARRAY.

array_fill(): PostgreSQL’s array_fill() method creates an array of copies of a value. This function takes two arguments: the value to fill the array and an integer array defining the new array’s dimensions. Optionally, a third integer array parameter can provide each dimension’s lower-bound values, which default to 1.

Randomization: In PostgreSQL, randomization is primarily achieved through functions that generate non-deterministic results, crucial for various applications from data simulation to security. The core function is random(), which returns a floating-point value between 0.0 and 1.0. This function is categorized as volatile, meaning it can produce a different output each time it is called, even with identical inputs.

Indexing Arrays for Performance

As said, PostgreSQL array data queries perform best using GIN (Generalised Inverted Index) index types. GIN indexes are “inverted indexes” that create index entries for each component value in data values like arrays. This structure lets them efficiently handle queries that check array component values. Searches using array operators like @>, <@, &&, and = are considerably accelerated using GIN indexes.

A GIN index can speed up WHERE pay_by_quarter && ARRAY queries. GIN lookups are faster than GiST, but they cost more to build and update and take up more disc space. Recent PostgreSQL versions have improved GIN query speed and index footprint. The maintenance_work_mem configuration variable speeds up GIN index build time. BRIN (Block Range Index) can also benefit large tables, especially when array contents are well-correlated with table row physical order, by keeping summary information for block ranges to balance index size and search efficiency.

Array Functions in Aggregation and Procedural Contexts

PostgreSQL arrays improve data summarisation and server-side logic in aggregation and procedural situations. Array_agg() aggregates all input values, including NULLs, into one array. The aggregate call’s sort BY clause can sort the array’s elements in this function and others like string_agg() and xmlagg(). FILTER clauses can be used with array_agg() to selectively collect rows based on a criterion and are clearer and faster than CASE WHEN expressions. Json_agg() and jsonb_agg() aggregate input values into JSON arrays.

State transition functions change a state value for each input tuple in PostgreSQL aggregate functions, which use arrays as internal states to manage many data points like the total, count, and sum of squares used in the avg(float8) function. PostgreSQL enables user-defined aggregate methods that use polymorphic state transition or final functions to serve numerous input data types, such as an array_accum aggregate using array_append for anyarray. If they support “Partial Mode” and offer a combine function, aggregate functions like array_agg can be parallelised for performance.

Arrays make procedural function definitions more flexible and powerful. The VARIADIC keyword lets functions accept variable numbers of arguments, which are internally interpreted as an array of the specified type. VARIADIC anyarray functions can receive any array of arguments without automatic expansion. VARIADIC “any” for the last parameter allows C-language functions to accept numerous arguments of any type, but they are not automatically assembled into an array and must be manually handled.

Polymorphic functions in PostgreSQL can operate on anyarray and anyelement, with the array type resolved at parsing time. Procedures like SQL, PL/pgSQL, C Language, Python, and others can return or use arrays. This flexibility and procedures like generate_series allowdevelopers to implement complex jobs and loops directly in SQL, sometimes mirroring row_number before specific window operations, minimising the need for complex procedural code.

Conclusion

In conclusion, PostgreSQL gives developers strong capabilities for handling a variety of data structures because to its strong support for array data types and extensive function and operator library. The capabilities of PostgreSQL are expanded beyond conventional relational paradigms by array functions, which offer flexible data modelling, sophisticated querying, and performance optimisation through specialised indexing.

Page Content

Tutorials