Data Types in PostgreSQL
Definition and data type selection are crucial when constructing a PostgreSQL database table since they determine the data a column can carry and its behaviour. Data types and names must be assigned to table columns. This design assures data consistency, inexpensive storage, and great performance.
A key component of creating columns in a database table in PostgreSQL are data types, which specify the types of data that each column can hold, how it behaves, and how it is maintained. A wide variety of native data types are available in PostgreSQL, and because of its extensibility, users can even create new kinds, operators, functions, and index techniques. The efficiency and maintainability of the database are directly impacted by the choice of data type for each column in a table, which strikes a balance between extensibility and storage usage and guarantees data correctness.
Create TABLE defines column names and data types. Create products table (product_no integer, name text, price numeric). The basic data types of PostgreSQL are boolean, character, and numeric.
Integer Types
Integer Types are a class of numeric data types in PostgreSQL that are intended for storing whole numbers. Depending on the necessary range and storage efficiency, there are many possibilities available. Choosing the right integer type is essential for striking a balance between performance, data validity, and storage usage.
PostgreSQL provides numerous whole integer data types with varied storage sizes and ranges:
Smallint: This data type, which provides a more compact storage alternative than larger integer types like integer or bigint, is usually employed when disc space is a top priority. Because it balances range, storage size, and performance, integer is typically the preferred choice for integer values; nonetheless, smallint is still a good choice in situations where a lower range is adequate and minimising storage footprint is crucial.
Integer (or int): Because it provides the optimum balance between numerical range, storage space, and performance, the integer type is frequently suggested. An integer is usually adequate for the majority of common integer values in database systems, while bigint is saved for numbers that exceed its range and smallint is utilised when disc space is extremely limited.
Bigint: When the range of the ordinary integer type is not enough to store the data, it should be employed. Compared to the integer type, bigint (also called int8 in SQL) has a higher performance penalty despite having a significantly bigger capacity. For tables having more than 2^31 identifiers, the bigserial type is a helpful wrapper for creating autoincrementing integer columns. It creates a SEQUENCE object and a bigint column automatically.
PostgreSQL supports smallserial, serial, and bigserial serial types for unique identifier columns. These automatically create integer columns and SEQUENCE objects. They then set the column default value to read from this sequence, autoincrementing it.
Code Example:
CREATE TABLE int_types_demo (
id SERIAL PRIMARY KEY,
age SMALLINT,
salary INTEGER,
population BIGINT,
big_id BIGSERIAL
);
INSERT INTO int_types_demo (age, salary, population)
VALUES (25, 50000, 9000000000);
SELECT * FROM int_types_demo;
Output:
CREATE TABLE
INSERT 0 1
id | age | salary | population | big_id
----+-----+--------+------------+--------
1 | 25 | 50000 | 9000000000 | 1
(1 row)
Character Types
PostgreSQL uses fixed-length and variable-length character types to store textual data. Each has different behaviours and storage characteristics. These types are character, character varying, and text.
Character(n) stores n characters. To meet the given length, values shorter than n are right-padded with spaces. For character types, trailing spaces are omitted in comparisons because they are semantically irrelevant. Padding can slow char(n) owing to storage expenses. Inserting or updating a string longer than n will result in an error unless the extra characters are spaces, in which case it will be truncated. Truncating an over-length value is automatic and error-free when cast.
Character varying(n) (or varchar(n)) saves variable-length strings up to n characters. Trailing spaces are semantically significant and values are not space-padded like char(n). An error occurs if a string exceeds n characters, yet explicit casting truncates it. Text is also a variable-length character string with an indefinite maximum length (practically 1 GB). Most PostgreSQL string functions use it as its native data type.
Note that PostgreSQL does not act differently between varchar(n) and text. Padding overhead makes char(n) slowest. Text or character changing is best for flexibility and storage in most cases. Using text with a CHECK constraint to imitate varchar with a length restriction allows easy length limit adjustment without changing the table structure. The database’s character set determines these types’ characters, and null cannot be stored.
Code Example:
CREATE TABLE char_types_demo (
fixed CHAR(5),
variable VARCHAR(10),
unlimited TEXT
);
INSERT INTO char_types_demo (fixed, variable, unlimited)
VALUES
('Hi', 'Hello', 'This is a long text example');
SELECT fixed, LENGTH(fixed) AS fixed_len,
variable, LENGTH(variable) AS var_len,
unlimited, LENGTH(unlimited) AS text_len
FROM char_types_demo;
Output:
CREATE TABLE
INSERT 0 1
fixed | fixed_len | variable | var_len | unlimited | text_len
-------+-----------+----------+---------+-----------------------------+----------
Hi | 2 | Hello | 5 | This is a long text example | 27
(1 row)
Boolean Type
PostgreSQL’s core 1-byte boolean data type stores truth values. It can indicate true, false, and unknown. SQL NULL represents “unknown” states. PostgreSQL accepts TRUE, t, true, yes, y, on, and 1 as input values. For FALSE, it accepts f, false, no, n, off, or 0. Note that case and preceding or trailing whitespace are disregarded during input, and unique string prefixes (e.g., t for true, n for false) are permitted.
When outputting a boolean value, PostgreSQL always uses t for true and f for false. Multiple forms are supported, however SQL-compliant terms TRUE and FALSE are clearer. NULL for boolean values may require explicit casting (e.g., NULL::boolean) because the parser cannot implicitly determine its type in all contexts.
A crucial design choice is selecting the best data type for every column. It significantly affects the database’s overall performance and maintainability by striking a balance between extensibility and storage usage and guaranteeing data correctness.
