RBOIRA: Integrating Rules and Reinforcement Learning to Improve Index Recommendation

INTRODUCTION: The index is one of the most effective ways to improve the database query performance. The expert-based index recommendation approach cannot adjust the index configuration in real time. At the same time, reinforcement learning can automatically update the index and improve the recommended configuration by leveraging expert experience. OBJECTIVES: This paper proposes the RBOIRA, which combines rules and reinforcement learning to recommend the optimal index configuration for a set of workloads in a dynamic database. METHODS: Firstly, RBOIRA designed three heuristic rules for pruning index candidates. Secondly, it uses reinforcement learning to recommend the optimal index configuration for a set of workloads in the database. Finally, we conducted extensive experiments to evaluate RBOIRA using the TPC-H database benchmark. RESULTS: RBOIRA recommends index configurations with superior performance compared to the baselines we define and other reinforcement learning methods used in related work and also has robustness in different database sizes.


Introduction
The advent of big data has significantly impacted the query efficiency of massive data in traditional relational databases.In database optimization, there are many different ways to improve the query performance of a database, such as by creating indexes and materializing views.Creating appropriate indexes for a set of workloads in the database can effectively improve query performance [1].The approach to index creation is no longer limited to a traditional manual approach that relies on the expertise and experience of the database administrator (DBA) to recommend index configurations for a set of workloads in a database.With the rise of machine learning in various research fields [2,3], it is gradually being applied to the self-tuning of databases, such as using reinforcement learning to recommend the optimal index configuration [3] and join order selection for query statements for a set of workloads in a database [4,5].
The most typical traditional index recommendation for relational databases is expert-based, with some limitations: Firstly, it targets static databases to create an index.Secondly, it cannot update the index configuration on time.Although using reinforcement learning to recommend index configurations is better than the traditional approach.Therefore, this paper optimizes the following two aspects: (1) Many index candidates are trained, which reduces the training efficiency of the algorithm with the recommended suboptimal index configuration because many related works extract index candidates from W. Yu et al.
databases [6] or workloads [7] (such as group by or order by) without pre-processing, which reduces the training efficiency of the algorithm and increases the execution cost.
(2) Part of the related work still only targets static databases without considering the actual production environment of databases, which leads to recommending suboptimal index configuration.
Based on the above, this paper proposes the RBOIRA, which combines rules and reinforcement learning to recommend the optimal index configuration for a set of workloads in a dynamic database.The RBOIRA execution steps include two main steps: firstly, it prunes the index candidates using designed heuristic rules.Secondly, it uses reinforcement learning to update the indexes in real time, enabling the recommendation of the optimal index configuration for a set of workloads in a dynamic database.
In summary, our work contributes the following: (1) Three heuristic rules are proposed for the pruning index candidates, significantly reducing the dimensions of action and state space, which improves execution efficiency and reduces the execution cost.
(2) RBOIRA integrates rules and reinforcement learning to configure and update indexes in real time for a set of workload recommendation indexes under a dynamic database.
(3) We conducted extensive experiments to evaluate RBOIRA's performance using the TPC-H database benchmark.Experimental results show that RBOIRA recommends index configurations with superior performance to comparison methods and it also has some robustness in different database sizes.

Related work
Machine learning (ML) is currently applied to many fields, such as a distributed cooperative coevolutionary genetic algorithm to optimize multi-objective data publishing [8], optimize stragglers in edge federated learning (EFL) [9], and uncertain data query [10].In recent years, machine learning has been continuously integrated into traditional relational databases or NoSQL databases to implement components for automation and self-optimization [11][12][13].The limitations of traditional tuning methods can be effectively addressed by using ML techniques, which significantly promote the development of AI4DB [11,12].Moreover, researchers can consider enhancing database performance using hardware devices [14].
Database tuning can be divided into external tuning and internal tuning [15,16].Configuring the database management system through the application programming interface (API) is called external tuning, and embedding algorithms into the database management system is called internal tuning.Integrating ML technology into index selection is part of internal tuning and is one of the most critical parts of achieving database self-tuning [17,18].For example, Ge et al. [19] designed a distributed prediction-randomness framework for the evolutionary dynamic multiobjective partitioning optimization of databases.
Creating an index will directly affect the database's query performance and transaction load-handling capabilities [20].Especially when the database system processes transactions, creating indexes on corresponding columns can improve the query efficiency of the database [21].Jan et al. [22] described and analyzed 8 traditional index selection methods and compared them in different dimensions, such as the time complexity of the algorithm.They also designed a system that could choose the right index for different situations.On the contrary, Ding et al. [23] proposed a method to improve the query efficiency of workload in the database by using data rules, which improved the query performance to some extent.Sadri et al. [24] designed a deep reinforcement learning algorithm to minimize the total execution cost of the workload in the cluster to recommend the best index configuration for the cluster database, where each replica database can recommend the same index configuration or a different index configuration.
Lan et al. [7] proposed five heuristic rules to extract the corresponding index candidates from the workload, which significantly reduced the dimensionality size of the action space and state space, and designed a deep reinforcement learning algorithm to recommend the index, but the shortcoming was that it only targeted at static databases and could not update the index configuration in time.Licks et al. [6] designed SmartIX that is a reinforcement learning algorithm to recommend the index configuration for the next set of workloads in a dynamic database.The limitation of this method is that it does not prune the index candidates extracted from the database, but takes all columns as the index candidates, resulting in low efficiency of algorithm training, so it recommends the second-best index configuration.Sharma.et al. [25] designed MANTIS which uses a deep reinforcement learning algorithm to implement index type recommendation and index selection.However, MANTIS still neglected to further process the index candidates, it did not screen the candidates according to the characteristics of the data in the database, which resulted in long algorithm training time and high cost.

The architecture of RBOIRA
By constructing its architecture to clarify further the execution process of each functional module of RBOIRA, as shown in Figure 1.The description of each functional module is detailed as follows: (2) Index candidate based on rules module.Heuristic rules prune the index candidates extracted from the database table and input them into the reinforcement learning agent module as an input stream.
(3) Reinforcement learning agent module.It is responsible for training the algorithm, including updating the action parameters, updating the status, and selecting the action by using strategies and other operations.
(4) Index selection module.Execute the operation of creating an index and deleting an index.
(5) Reward module.The reward for the action is calculated by the reward function and then feedback to the agent.
(6) Recommended optimal index configuration module.The module recommends the index configuration in the last training episode as the optimal index configuration.

Heuristic rules
The heuristic rules are defined as follows: Rule 1 (R1): When the data volume is the smallest and much smaller than other tables (at least one scale is 10 3 ), the columns in this table are not considered index candidates.
Rule 2 (R2): The column of the longest type in the database table is not recommended to be indexed which is not considered an index candidate.
Rule 3 (R3): A column with too many duplicate values or null values is not considered an index candidate, but must meet two conditions as follows: Condition 1: It is the minimum selectivity of a column in the table.
Condition 2: Selectivity is less than 20%.This column is not considered an index candidate if and only if both conditions are met.
The above three rules are applicable rules defined based on database index optimization experience.The reasons for selecting them are as follows: For R1: (i) Indexes occupy storage space and require maintenance.The additional storage space and maintenance costs of establishing indexes on small data tables outweigh the performance advantages.(ii) For small tables, the database query optimizer may choose a full table scan instead of using an index to improve query performance because it may be faster.
For R2: (i) Building indexes on long fields consumes larger storage space and increases index maintenance costs.(ii) It may hurt query performance and does not work with all database storage engines.
For R3: (i) Selectivity measures the number of distinct values in the index column relative to the total number of rows.If the selectivity is very small, that is, most of the values in the index columns are the same, then the index will provide a limited filtering effect and cannot bring about performance improvement.(ii) The query optimizer may not be able to use this index, and it may be difficult to maintain in a highly concurrent database environment.
In addition, the index candidates are pruned by each rule, as shown in Table 1.

6.
The agent selects action A based on the current state S t and ε -greedy policy

7.
Execute A to get the next state S t+1 and reward R 8.
Update parameters θ of A 9.
Store (S t , R, A, S t+1 ) in the experience pool E 10.
Extract mini-batch data from E to perform experience playback 11.

end
The execution steps are as follows: Step

Evaluation metric
The TPC-H is a well-known non-profit organization that creates database performance benchmarks [26].We get outputs from three metrics based on the TPC-H benchmark: @ Power Size , @ Throughput Size , and @ QphH Size .While @ QphH Size is obtained by computing @ Power Size and @ Throughput Size metrics.The @ Power Size evaluates how fast the DBMS computes the answers to single queries.This metric is computed using formula (1): Where SF is the scale factor or database size, ( , 0) QI i is a set of query streams, ( , 0) RI j is a refresh function, and 3600 is the number of seconds per hour.The @ Throughput Size measures the ability of the system to process the most queries in the least amount of time, taking advantage of I/O and CPU parallelism.It evaluates the system's performance against a multi-user workload that is completed in a set amount of time, using the formula in (2): Where S is the number of query streams executed, and s T is the total time required to run the throughput test for s streams.
Equation ( 3) depicts the Queries-per-Hour Performance @ QphH Size metric, calculated by taking the geometric mean of the previous two metrics and reflecting various aspects of a database's query processing capability.@ @ @ QphH Size Power Size Throughput Size = × (3) The ACT is used to compare the time consumption of the algorithm, which reflects the efficiency of the algorithm, as equation ( 4).Where The Total Costing Time represents the total training time of the algorithm, and The Number Of Episodes represents the total training episodes.

The Total Costing Time ACT The Number Of Episodes = (4)
The Selectivity is used to control the selectivity of attributes, as shown in equation (5).Col represents the corresponding column;

Baseline
To make a full and comprehensive experimental comparison of the proposed RBOIRA method, this section describes in detail the self-defined comparison baseline and the related method of using reinforcement learning algorithm to implement database index recommendation, as follows: 1. Initial_State: It is the default TPC-H configuration and contains only the indexes on the primary and foreign keys.
2. Expert-Based: Indicates the index configuration based on expert suggestions.
3. ALL-S: Indicates the index configuration in which all columns in the database are indexed.
4. ALL-R: Using heuristic rules prune single-attribute index candidates, and indexes are built on the remaining index candidates.
5. SmartIX [6]: A real-time creation and deletion of single-attribute index candidates in a dynamic database to recommend the optimal index configuration.
6. DQN-S [7]: The deep reinforcement learning algorithm in the Index Advisor method is used to recommend the single-attribute index configuration for a set of workloads in the database.
7. NoDBA [27]: A system based on cross-entropy deep reinforcement learning method used to recommend the best index configuration for a given workload in a set of databases.
8. POWA [28]: Index configuration recommended by PostgreSQL Workload Analyser which is an open-source index recommendation tool.9. ITLCS [29]: An index tuning and learning classifier, which combines a learning classifier and genetic algorithm to make efficient index configuration recommendations.

Model Training
The RBOIRA model was trained based on algorithm 1, where the training episodes of RBOIRA are set to 30, and the training steps for each episode are set to 100.The index configuration corresponding to the maximum reward of the last training episode is taken as the optimal index configuration and its convergence is shown in Figure 2.  As can be seen from Figures 2 and 3, although the reinforcement learning algorithm 1 finally successfully converged and the improved algorithm was more efficient, it still consumed huge time costs and hardware costs.In addition, the dimensional space explored by the agent is still relatively large, which increases the complexity of reinforcement learning to explore the optimal solution.

Incremental data experiments
It conducts an incremental data experiment to prove the robustness of RBOIRA in different data sizes.Data description, query performance, and TPC-H benchmark are shown in Table 3, Figure 4, and Table 4, respectively.Table 3 includes 10 levels of data sizes to simulate data sizes.The data size ranges from 100MB to 1GB, and the experimental data is increased by 100MB each time.A detailed description of the number of records for tables is shown in Table 3.
By analyzing the experimental results in Figure 4 and Table 4, we can find that the query performance gradually stabilizes as the data size gradually increases where the high performance at the beginning is caused by smallvolume data.This is mainly because when the size of data in the database is too small, even creating primary and foreign keys in the database still brings significant performance improvement to the database.On the contrary, as the size of database data gradually increases, reinforcement learning dynamically adjusts the index configuration according to the current data changes in the database, ensuring that the query performance of the database does not fall dramatically and keeps the database query performance stable.Therefore, this incremental data analysis experiment further illustrates the robustness of RBOIRA in different data sizes.5.
The analysis of the performance comparative experimental results in Figure 5, shows that RBOIRA outperforms the other comparison methods on query performance.By analyzing the experimental results in Figure 5, it is clear that the index configuration recommended by RBORIA outperforms the other baselines in terms of QphH.The following two reasons mainly explain this: (1) Compared to traditional methods such as Expert-Based, ALL-S, and POWA, RBOIRA maximizes the query performance of a set of workloads in the dynamic database by using reinforcement learning methods to constantly update index configurations, which helps reinforcement learning agents find better index configurations.
(2) Compared to machine learning methods such as SmartIX, DQN-S, and ITLCS, RBOIRA uses rules to prune the initial index candidates.This operation effectively improves the learning efficiency of the agent and reduces the execution cost of the algorithm, thus helping the agent explore better index configuration.
Besides, by comparing and analyzing the experimental results in Figure 6, it is found that although RBOIRA does not have the smallest of all methods on index size, it's still much smaller than other baselines like ALL-S and SmartIX.Although the index size of RBOIRA is somewhat larger than that of POWA, DQN-S, and NoDBA, RBORIA improves query performance by sacrificing less space, and the swap of space for performance is adequate.This situation indicates that the selectivity of 0.2 can reduce the number of index candidates and ensure that the database query performance does not change significantly.

Conclusion
In this paper, we propose RBOIRA which is a practical and flexible index advisor that integrates three heuristic rules and reinforcement learning to recommend the optimal index configuration for a set of workloads in a dynamic database.We have designed extensive experiments to prove the superiority of RBOIRA, and the experimental results show that RBOIRA outperforms other existing related methods.
The RBOIRA still has some areas that can be optimized, such as improving the algorithm efficiency of reinforcement learning and reducing the resource usage required for intelligent index recommendation.In the future, we will improve the efficiency and reduce the cost of the reinforcement learning recommendation index.

Figure 1 .
Figure 1.The architecture of RBOIRA

3. 3 .Algorithm 1 : 1 .> do 3 .
The algorithm of RBOIRA The basic design of the pseudo-code of the RBOIRA is shown in algorithm 1.Index recommendation of RBOIRA Input: Index candidate set DBS, a set of workloads W. Output: Optimal Indexing Configuration OIC.Initializing the environment Env 2. While 0 episode Initializing the environment Env 4. Extract index candidates from the database based on rules and map to the initial state of S 5.While 0 step > do W. Yu et al.
Countrepresents the total number of rows in the column in the table.

Figure 3 .
Figure 3. Algorithm time efficiency comparison

Figure 4 .
Figure 4. Query performance of RBOIRA in different data sizes

Figure 7 .
Figure 7.Comparison experiment of selectivity threshold

Table 1 .
Pruned Index candidates under each rule

Table 2 .
1, initializes the environment, including the initialization state and action parameters, in which , α γ and ε are set as 0.01, 0.85, and 0.9.Step 2 to step 12 into each episode and set it to 30.Step 3, initialize the environment again to delete all indexes.Step 4, prune index candidates based on heuristic rules.Step 5 to step 11 into each step and set it to 100.Step 6, select an action based on ε -greedy and the current state S t .Step 7 executes the action to update the next current state and get a reward.Step 8 updates the parameters of action [6].Step 9 store (S t , R, A, S t+1 ) in the experience pool.Step 10, random extract mini-batch data used to train.Step 11 updates the current state.Step 12 end of steps.Step 13 update ε .Step 14 end of episodes and output OIC.We use the TPC-H tool to generate a 1 GB dataset (containing 8 relation tables and 8,661,245 records as shown in Table 2 stored in MySQL and generate a workload consisting of 22 queries with different levels of complexity.Index candidates.All index candidates are extracted from the database tables and pruned by our three heuristic rules.The number of index candidates is pruned from the initial 45 to the final 30 index candidates for model training.The description of 8 relational tables

Table 3 .
Description of the number of data records in different data sizes EAI Endorsed Transactions on Scalable Information SystemsOnline First

Table 4 .
TPC-H performance of RBOIRA under different data sizes