What Are Optimization Techniques in PostgreSQL With Example

Optimization Techniques in PostgreSQL

A wide range of methods and parameters, including server-level settings, query planning, indexing strategies, and data organisation, are available in PostgreSQL to maximise database performance. Configuration adjustment is essential at the server level since default values are frequently too small for production settings. One of the most important options is max_connections, which specifies the maximum number of concurrent client connections that can use memory. By reducing this default value (usually 100), can be made available for other tasks, such raising work_mem.

To take advantage of available RAM, memory parameters such as shared_buffers and work_mem (for sorting and hash joins) should be changed. Effective_cache_size gives the planner an estimate of the disc cache that a query may access, which affects its preference for index scans. When compared to mechanical drives (default 4.0), hard disc parameters such as random_page_cost, which calculates the cost of retrieving non-sequential disc pages, can be greatly reduced for SSDs, which makes index scans seem more advantageous to the query planner.

Query Processing and Optimization

The query processing process in PostgreSQL consists of multiple stages:

Rewrite Phase: Using the proper rules, this phase first deals with UPDATE, DELETE, and INSERT statements. The query is then continuously rewritten until a fixed point is reached, after which rules involving SELECT statements are fired.

Planning and Optimization Phase: After each query block has been rewritten, it is optimised to produce an execution plan. The main goal of this cost-based strategy is to create a plan with the lowest possible predicted cost. The cost model takes into account CPU costs (processing heap tuples, index tuples, and simple predicates) as well as I/O costs (sequential and random page fetches).

Standard Planner: Like the System R optimisation algorithm, Standard Planner optimises join order using a bottom-up dynamic programming algorithm.

Genetic Query Optimizer (GEQO): PostgreSQL uses a genetic algorithm for queries with a lot of tables, where dynamic programming becomes too costly. This method, which was first created for issues such as the traveling-salesman problem, is capable of managing intricate join queries.

Query Executor: The query plan produced by the optimiser is processed by this module. Each operator (such as sort, aggregation, and join) implements an iterator interface in this demand-driven pipeline approach.

Key Optimization Techniques

EXPLAIN Command and Execution Plans: The main for comprehending how PostgreSQL will carry out a query is the EXPLAIN command. It displays the execution plan along with estimated rows, join methods, and data scan techniques (sequential and index scans).

EXPLAIN ANALYSE: This step helps find differences between planned and actual execution by running the query and returning actual run times and row counts together with estimations.

EXPLAIN (ANALYSE, BUFFERS): This tool helps determine the efficacy of caching by reporting shared buffer hits, which show how much data was already in memory.

Graphical EXPLAIN: By providing graphical representations of query execution plans, tools such as DBeaver facilitate the visualisation of bottlenecks.

Indexing: To enable the effective retrieval of certain rows, indexes are physical database objects defined on table columns or expressions. Several index types are supported by PostgreSQL, and each is appropriate for a certain workload:

B-tree: The default index type is B-tree. On sortable data, it effectively enables equality, range searches, and some pattern matching. Primary and unique keys are also stored in B-trees.

Hash: For basic equality procedures, hash is helpful. Discouraged in the past since it lacked write-ahead logging and performed poorly in comparison to B-trees, it saw a major improvement in PostgreSQL 10 and 11.

Generalized Search Tree (GiST): A balanced tree serves as the foundation for the Generalised Search Tree (GiST), an expandable indexing system. Complex data types include full-text search, multidimensional cubes, and geometric data benefit from it. “Nearest-neighbor” search optimisation is another capability of GiST.

Space-Partitioned Generalized Search Tree (SP-GiST): Supports non-balanced disk-based data structures like quadtrees and may be quicker than GiST for some data distributions.

Generalized Inverted Index (GIN): In full-text search and arrays, for example, the Generalised Inverted Index (GIN) is helpful when several values map to a single row.

Block Range Index (BRIN): Provides a trade-off between index size and search efficiency and may be useful for naturally clustered data.

Index-only Scans: These can greatly improve performance, particularly for key-value searches and aggregates, by allowing queries to be executed using only an index and eliminating reference to the main database (heap).

Table Partitioning: Using partitioning properties, a huge table can be physically divided into smaller sections. When predicates match partitioning attributes, this can decrease maintenance work overhead and enhance query performance. Range, list, and hash partitioning are supported by PostgreSQL 11.

Just-in-Time (JIT) Compilation: JIT compilation (using LLVM) speeds up tuple deforming and expression evaluation as of PostgreSQL 11. This is especially helpful for analytical queries that take a long time and are CPU-bound. The drawbacks of JIT compilation may exceed the advantages for brief queries.

Configuration Tuning: Performance depends on modifying server parameters. Important settings consist of:

Effective_cache_size: An estimate of the amount of memory that can be used for disc caching; affects the planner’s decision between sequential and index scans.

Random_page_cost: A crucial PostgreSQL configuration setting called random_page_cost enables the query planner to calculate the relative cost of accessing a disc page non-sequentially as opposed to sequentially. The cost of a successive page fetch is multiplied by this cost.

Maintenance_work_mem: Temporarily boosting the memory for maintenance activities (like CREATE INDEX and VACUUM) helps speed up bulk processes.

Max_connections: max_connections is a key PostgreSQL configuration setting that limits the database server’s concurrent client connections. This value is usually 100, although it can be lower if the kernel settings don’t support it, as determined during database initialisation (initdb).

Max_wal_size and checkpoint_timeout: For write-heavy systems and bulk loads, max_wal_size and checkpoint_timeout have a significant impact on write-ahead log (WAL) behaviour and checkpoint frequency.

Writing Better Queries: Performance is enhanced by avoiding typical pitfalls:

Accurate Statistics: The query planner has the most recent statistics for the best plan selection when ANALYSE or VACUUM ANALYSE is run on a regular basis.

Judicious use of CTEs: CTEs are strong, but if they are not handled appropriately, they can occasionally block predicate pushdown, resulting in less-than-ideal plans.

Avoid unnecessary operations: Subqueries with poor writing or operations like DISTINCT when not required can affect performance.

Index foreign keys: Foreign keys are essential PostgreSQL constraints that guarantee referential integrity between tables. It binds “referencing” table columns to “referenced” table primary keys or unique constraints. This prevents improper data entry by matching referencing column(s) values to the referenced table. PostgreSQL offers CASCADE (changes/deletes dependent rows), RESTRICT (prevents operation if referenced), SET NULL, and SET DEFAULT for fine-tuned row updates and deletions.

Code Example:

A simple table and two EXPLAIN ANALYSE queries one with and one without an index will be used to illustrate.

CREATE TABLE test_performance (
    id SERIAL PRIMARY KEY,
    value TEXT NOT NULL
);

INSERT INTO test_performance (value)
SELECT md5(random()::text)
FROM generate_series(1, 100000);

Output:

CREATE TABLE
INSERT 0 100000

Conclusion

To sum up, PostgreSQL provides a robust ecosystem of query optimisation strategies that, when combined, guarantee excellent database performance. These strategies range from sophisticated features like JIT compilation and configuration tuning to intelligent query planning, indexing strategies, and execution plan analysis. Developers may reduce the need for expensive full table scans and enable quicker, more reliable query execution by keeping accurate statistics, creating queries that are efficient, and choosing the appropriate index types for certain workloads.

Deep insight into planner choices is made possible by tools like as EXPLAIN and EXPLAIN ANALYSE, which aid in locating performance snags and validating optimisations. PostgreSQL can effectively handle both transactional and analytical workloads, providing consistent scalability and speed, when paired with appropriate server parameter tuning and best practices like partitioning large tables, indexing foreign keys, and avoiding pointless query operations.

Page Content

Tutorials

What Are Optimization Techniques in PostgreSQL With Example

Optimization Techniques in PostgreSQL

Query Processing and Optimization

Key Optimization Techniques

Conclusion

LEAVE A REPLY Cancel reply