Рус Eng Cn Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

Comparative analysis of indexing strategies in PostgreSQL under various load scenarios

Zolotukhina Dar'ya

Independent researcher

394062, Russia, Voronezh region, Voronezh, lane Antokolsky, 4

dar.zolott@gmail.com
Other publications by this author
 

 

DOI:

10.7256/2454-0714.2025.1.73138

EDN:

UUZPDW

Received:

24-01-2025


Published:

03-04-2025


Abstract: The subject of the study is the effectiveness of various indexing strategies implemented in PostgreSQL and their impact on the performance of SELECT, UPDATE and INSERT operations in conditions of different data scales. The object of the study are the B-Tree, GIN and BRIN indexes used to optimize the operation of databases. The author examines in detail such aspects of the topic as the time characteristics of operations, the size of indexes and their resource intensity. Special attention is paid to the impact of data volume on the performance of indexes and their suitability for working with various types of data, including JSONB. The research is aimed at systematizing knowledge about the use of indexes to improve the efficiency of highly loaded systems, which require optimization of access operations, updating and inserting data, as well as analysis of resource consumption. The leading research method is an empirical approach, which includes the development of a test database with orders, customers, and products tables. Experiments were conducted for SELECT, UPDATE, and INSERT operations on small, medium, and large amounts of data. The metrics of query execution time and index size obtained using PostgreSQL tools were used for the analysis. The novelty of the research lies in conducting a comprehensive performance analysis of the B-Tree, GIN, and BRIN indexes in PostgreSQL when performing typical operations on various amounts of data. The main conclusions of the conducted research are recommendations on the choice of indexes depending on the types of queries and their execution conditions. B-Tree indexes have proven their versatility, demonstrating high performance for SELECT and UPDATE operations. GIN indexes have shown advantages for working with JSONB data, but their use is limited by high resource intensity. BRIN indexes have proven to be effective for large amounts of data, especially for SELECT operations, due to their compactness and low overhead. A special contribution of the author to the research of the topic is the creation of recommendations for database developers, which makes it possible to improve application performance by choosing an optimal indexing strategy.


Keywords:

indexing strategies, B-Tree indexes, BRIN indexes, GIN indexes, query optimization, PostgreSQL performance, relational databases, JSONB data, high-load systems, data management

This article is automatically translated. You can find original text of the article here.

Introduction

Modern relational databases such as PostgreSQL provide extensive query optimization opportunities through a variety of indexing strategies. The efficiency of indexes largely determines the performance of data access, update, and insertion operations, especially in conditions of high loads and large amounts of information [1]. Despite significant progress in indexing, the issue of choosing the optimal index type for specific work scenarios remains relevant. This is especially important for applications where the type of operations, data structure, and load scale vary significantly.

The B-Tree, GIN, and BRIN indexes implemented in PostgreSQL are universal tools for solving various tasks. B-Tree indexes provide stable performance and are suitable for a wide range of operations [2]. GIN indexes are optimized for complex queries using JSONB data types, which makes them in demand in modern applications working with unstructured information [3]. Due to their compactness, BRIN indexes are effective in scenarios with large amounts of data and low update intensity [4]. However, differences in the performance of these indexes, depending on the nature of operations and the scale of data, require additional research to identify their optimal applications [5].

The object of the research is the indexing strategies used in PostgreSQL, with an emphasis on the three most common types of indexes: B-Tree, GIN and BRIN. These indexes are widely used in high-load systems and have a significant impact on database performance.

The subject of the study is the effectiveness of using various indexes, depending on the type of operations performed (SELECT, UPDATE, INSERT), the amount of data and the specifics of their storage. The paper provides a detailed analysis of query execution time, index size, and their impact on system performance in different load scenarios.

This study is conducted to determine the best indexing strategies for PostgreSQL under changing load conditions. The work is aimed at systematizing data on the execution time of operations and the size of indexes. The experiments were carried out on a database containing the orders, customers, and products tables, which simulate real-world scenarios of highly loaded systems. The results obtained provide the basis for recommendations on the choice of an indexing strategy, which is especially important for improving the performance of applications operating in conditions of big data and high loads.

The scientific novelty of the study lies in a comprehensive comparison of the B-Tree, GIN and BRIN indexes in PostgreSQL, taking into account not only the time characteristics of query execution, but also their impact on the size of the database and the overall load on the system. In contrast to the existing studies [6, 7, 8], which focused on certain aspects of indexing, this work examines the effectiveness of various types of indexes in conditions of varying workload and different categories of operations. This approach allows us to identify patterns of their application and formulate practical recommendations for choosing the optimal indexing strategy. Additionally, the analysis of the dependence of the index size on the increase in data volume is carried out, which makes it possible to predict the growth of infrastructure costs when using various indexing strategies.

Materials and methods

In this study, aimed at analyzing the performance of various types of indexes in PostgreSQL, experiments were conducted on the PostgreSQL database version 15.2. This version was chosen due to its support for a wide range of indexes, including B-Tree, GIN and BRIN, which allowed us to get a complete picture of their effectiveness in various use cases [9].

To build the test environment, a database was developed that includes three tables: orders, customers, and products. The structure of the tables and their contents have been carefully designed to simulate real-world scenarios of highly loaded systems. The orders table is a list of orders containing fields such as order ID, customer ID, product ID, order date, and price. The customers table stores customer data, including identifiers, names, and JSONB metadata about customers, which made it possible to test indexes specific to working with JSONB data types [10]. The products table contains information about products, including the ID, name, and price.

The tables were filled in using pseudorandom data to create load scenarios that were as close as possible to the conditions of real systems. For example, order dates were distributed evenly over one year, and product prices ranged from 1 to 1,000 units. Three filling options were created for each table: small volume (50 thousand rows), medium volume (500 thousand rows) and large volume (5 million rows). This allowed us to evaluate the performance of indexes at different data scales. The data was generated in such a way as to exclude the influence of anomalies and ensure the representativeness of the results.

The experiments included performing three types of operations: SELECT, UPDATE, and INSERT. The queries were designed to evaluate the performance of indexes in various scenarios. Example of a SELECT query for the orders table:

SELECT *

FROM orders

WHERE order_date > '2023-01-01';

A JSONB query was used for the customers table:

SELECT *

FROM customers

WHERE interests @> '{"sports": true}';

Example of an UPDATE request for the products table:

UPDATE products

SET price = price * 1.1

WHERE price BETWEEN 100 AND 500;

Example of an INSERT query for the orders table:

INSERT INTO orders (customer_id, product_id, order_date, price)

SELECT generate_series(1, 100000),

random() * 1000,

NOW() - interval '1 day' * random() * 365,

random() * 1000;

The following types of indexes were tested as part of the study:

  1. The B-Tree is a universal index applicable to most queries [11].
  2. GIN is designed to work with JSONB data types in the customers table [12].
  3. BRIN is effective for working with large amounts of data in the orders and products tables [13].

Performance measurements were performed for each combination of index, data volume, and operation type. The main metrics included:

  1. The execution time was recorded using the EXPLAIN ANALYZE command and expressed in milliseconds (ms).
  2. The index size was measured using the PostgreSQL functions (pg_relation_size) and expressed in megabytes (MB).

The experiments were carried out on a machine with the following characteristics:

  • Processor: Apple M2 @ 3.49 GHz, 8 cores, 8 threads.
  • RAM: 8 GB.
  • Storage: 256GB PCIe® NVMe™ M.2 SSD.

The same test environment was created for each experiment, excluding the influence of external factors. All measurements were carried out repeatedly to obtain accurate and reproducible results.

The data obtained is presented graphically, which made it possible to analyze the impact of the amount of data, the type of index, and the nature of the operation on PostgreSQL performance.

Results

The study analyzed the timing of SELECT, UPDATE, and INSERT operations on various amounts of data (small, medium, and large) using B-Tree, GIN, and BRIN indexes, as well as in the absence of indexes. The data obtained is presented graphically, which makes it possible to visually assess the dependence of the operation time on the amount of data and the type of index.

Figure 1. The effect of the index type and the amount of data on the execution time of the SELECT operation

Figure 1, which shows the results of the SELECT operation, shows that with a small amount of data, the use of B-Tree, GIN, and BRIN indexes provides similar time characteristics, significantly exceeding execution without an index. As the amount of data increases, the efficiency of GIN and B-Tree increases, but the BRIN index shows minimal execution time for a large amount of data.

Figure 2. The effect of the index type and data volume on the execution time of the INSERT operation

The analysis of the UPDATE operation, which is shown in Figure 2, showed that for all data volumes, the B-Tree index demonstrates the greatest efficiency. GIN, on the contrary, increases the execution time of the operation on medium and large amounts of data. BRIN remains the optimal choice for a large amount of data, but its advantages become noticeable only with significant table sizes.

Figure 3. The effect of the index type and the amount of data on the execution time of the UPDATE operation

Figure 3 shows for the INSERT operation that the GIN and BRIN indexes lead to an increase in execution time compared to the absence of an index, especially with a large amount of data. The B-Tree index also slightly increases the insertion time, but its effect is less significant.

Figure 4. Sizes of indexes of various types depending on the amount of data

The sizes of indexes depend on their type and amount of data, as shown in Figure 4. GIN indexes exhibit the largest size, reaching 612.7 MB with a large amount of data, while B-Tree indexes occupy up to 450.3 MB. BRIN indexes remain the most compact, increasing to 92.1 MB under similar conditions. The dynamics of index size changes as the volume of data increases shows linear growth for all types of indexes, with the most noticeable increase in GIN indexes, reflecting their high resource intensity with increasing data volume. B-Tree indexes grow more moderately, maintaining a balance between efficiency and size, while BRIN indexes show minimal changes, emphasizing their cost-effectiveness in conditions of large amounts of data.

Discussion

The experimental results allow us to identify the characteristic features of each type of index, depending on the amount of data and the type of operation. B-Tree indexes prove to be a universal choice, demonstrating stable performance on all volumes of data and for all types of operations. Their effectiveness in performing SELECT and UPDATE operations is particularly noticeable on large amounts of data, where they maintain acceptable time characteristics without significantly increasing execution time.

GIN indexes, despite their superiority in SELECT operations with highly indexed fields, turn out to be less suitable for UPDATE and INSERT operations. Their resource intensity and complexity of updating lead to increased execution time on medium and large amounts of data. These features make GIN indexes relevant in tasks where the frequency of data updates is minimal and the sampling rate from complex data structures is critical.

BRIN indexes have proven their effectiveness for working with large amounts of data, especially in SELECT operations. Their compact structure and minimal storage overhead make them the optimal choice for data analysis at the large array level. However, their use in UPDATE and INSERT operations is limited, as their architecture is designed to minimize costs in scenarios where data updates occur less frequently than sampling.

Particular attention should be paid to changes in index sizes as data volumes grow. The linear growth observed for all types of indexes reflects their predictability and scalability. GIN indexes show the most significant increase in size, which indicates the need for careful consideration of their use in systems with limited resources. B-Tree indexes show a more moderate increase in size, which makes them a balanced choice for systems where performance and cost-effectiveness must be combined. BRIN indexes remain the most compact, which confirms their suitability for processing large amounts of data with minimal memory consumption.

Conclusion

The results of the study demonstrate the significant impact of indexing strategies on PostgreSQL performance under various load scenarios and data scales. Experiments have shown that choosing an index type is a critical decision that directly affects the efficiency of SELECT, UPDATE, and INSERT operations, especially when the amount of data increases.

B-Tree indexes have proven their versatility and performance stability, making them the optimal choice for most scenarios. These indexes showed minimal execution time for SELECT and UPDATE operations on large amounts of data, as well as acceptable characteristics for INSERT operations, which confirms their balance between efficiency and resource consumption.

GIN indexes have proven to be an effective tool for complex JSONB data samples, however, their resource intensity and long execution time of UPDATE and INSERT operations limit their use in scenarios with a high frequency of data changes. These features make GIN indexes suitable for analytical systems where complex structured data reading operations prevail.

BRIN indexes have shown their advantages in working with large amounts of data, especially for SELECT operations, due to their compactness and minimal storage overhead. However, their use in INSERT and UPDATE operations requires a careful approach, given the specifics of their architecture, focused on rare data updates.

Analysis of changes in index sizes as the volume of data increases has confirmed their linear nature, which indicates the predictability and scalability of all types of indexes. The most significant increase in size was recorded for GIN indexes, which emphasizes the need for their balanced use in systems with limited resources.

The results of the study provide practical recommendations for database developers and administrators on choosing an indexing strategy in PostgreSQL. These recommendations can be useful for building highly efficient systems that require a balance between performance, resource consumption, and application specifics.

The prospects for further research include extending the analysis to other types of indexes, such as Hash indexes and GiST, in order to assess their applicability in various load scenarios. An additional direction may be to study the impact of multiversion (MVCC) in PostgreSQL on index performance with a high degree of competitive access. It is also an important task to develop adaptive indexing strategies that allow dynamically changing the structure of indexes depending on the load and types of queries.

References
1. Dombrovskaya, G. (2022). Query optimization in PostgreSQL. Moscow: DMK-Press.
2. Mostafa, A. S. (2020). A Case Study on B-Tree Database Indexing Technique. Journal of Soft Computing and Data Mining, 27-3. https://penerbit.uthm.edu.my/ojs/index.php/jscdm/article/view/6828
3. Borodin, A., Mirvoda, S., Porshnev, S., & Ponomareva, O. (2018). Improving generalized inverted index lock wait times. Journal of Physics: Conf, 944. https://iopscience.iop.org/article/10.1088/1742-6596/944/1/012022/pdf
4. Borodin, A., Mirvoda, S., Kulikov, I., & Porshnev, S. (2017). Optimization of Memory Operations in Generalized Search Trees of PostgreSQL. Communications in Computer and Information Science, 716. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-58274-0_19
5. Rogov, E. V. (2024). PostgreSQL 16 from the inside. Moscow: DMK Press.
6. Morozov, S. V., Nesterov, S. A. (2024). Comparative analysis of index types in SQL Server and PostgreSQL DBMS. SAEC, 2, 485–491.
7. Selivanov, E. O. (2023). Comparison of index types in various database management systems. Youth and Science: Current Issues of Fundamental and Applied Research: Proceedings of the VI All-Russian National Scientific Conference of Young Scientists. Komsomolsk-on-Amur: Komsomolsk-on-Amur State University. Pp. 357–361.
8. Kudashov, A. S., Agapova, V. A., Dyachkov, D. A., Kazakova, I. A. (2023). Review of index types and their application in database management systems. Modern Digital Technologies. Proceedings of the II All-Russian Scientific and Practical Conference. Barnaul: Polzunov Altai State Technical University. Pp. 299–303.
9Documentation PostgreSQL 15. https://www.postgresql.org/docs/15/release-15-2.html
10. Sorokin, V. E. (2017). Storage and efficient processing of fuzzy data in PostgreSQL. Software Products and Systems, 4. https://app.amanote.com/v4.1.10/research/note-taking?resourceId=Pprz23MBKQvf0Bhi37cu
11B-Tree Indexes in Databases: The Example of PostgreSQL. https://techtrain.ru/talks/e2273ec8ca2b4ea692c65318a50c4be5
12. Medzhidov, R. G. (2019). Analysis of multicolumn database indexes. Current problems of applied mathematics, computer science, and mechanics: Proceedings of the International Scientific Conference. Voronezh : Nauchno-issledovatelskie publikatsii. Pp. 420–422.
13. Bogatov, I. V. (2022). Efficient optimization of queries in PostgreSQL. Academic journalism. Ufa: Aeterna. Pp. 59–64.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The subject of the review is the effectiveness of using various indexes in the Russian version of the free open source database management system PostgreSQL under various load scenarios, depending on the type of operations performed (SELECT, UPDATE, INSERT), the amount of data and the specifics of their storage. The research methodology is based on conducting experiments and comparative analysis of query execution time, index size and their impact on system performance under conditions of varying load and various categories of operations using graphical methods for presenting the results. The authors attribute the relevance of the study to the fact that modern relational databases provide ample opportunities for query optimization due to a variety of indexing strategies that affect the performance of data access, update, and insertion operations, especially under high loads and large amounts of information, as well as insufficient scientific study of the issue of choosing the optimal type of index for specific work scenarios.. The scientific novelty of the peer-reviewed study lies in a comprehensive comparison of the B-Tree, GIN, and BRIN indexes in PostgreSQL, taking into account not only the time characteristics of query execution, but also their impact on the size of the database and the overall load on the system. The following sections are highlighted in the text of the article: Introduction, Materials and methods, Results, Discussion, Conclusion and Bibliography. The paper considers a non-commercial version of an object-relational database management system based on the POSTGRES program developed at the Faculty of Computer Science at the University of California at Berkeley. The publication focuses on a comparative analysis of the three most common types of indexes that are widely used in high-load systems and have a significant impact on database performance: B-Tree, GIN, and BRIN. The article reflects the influence of the index type and the amount of data on the execution time of SELECT, INSERT, UPDATE operations; the sizes of indexes of various types depending on the amount of data are shown. The significant impact of indexing strategies on PostgreSQL performance under various load scenarios and data scales has been demonstrated. Experiments have shown that choosing the type of index is a critically important decision that directly affects the effectiveness of the above operations, especially with increasing data volume. The bibliographic list includes 13 sources – modern scientific publications by Russian and English authors, as well as online resources on the topic under consideration, to which the text contains targeted links, which confirms the existence of an appeal to opponents. The reviewed material corresponds to the direction of the journal "Software Systems and Computational Methods", reflects the results of the work carried out by the authors, may arouse the interest of readers, and is recommended for publication.