JCC  Vol.4 No.5 , May 2016
Automated Performance Tuning of Data Management Systems with Materializations and Indices
Abstract: Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.  

1. Introduction

Database management systems used by the business organisations need expert database administrators (DBAs) to configure the systems before a startup time and to tune performance while the systems are running. Currently, to achieve acceptable performance of a database system an administrator has to tune the system by herself/ himself with her/his knowledge of the anticipated workload [1]. A sudden increase in the total number of users accessing a database system during peak hours may cause the significant delays, and it may even block entire system. One of the solutions to increase efficiency and reliability of database systems is automated or self- performance tuning [2].

Self-tuning database management systems automatically create and drop persistent storage structures like indices and materialised views and dynamically re-allocate transient storage resources such as data buffer cache, library cache, and so on. Materialzied views, indices, and better management of cache in transient storage can speed up a database system to reduce response time.

Automated performance tuning of database systems is a challenging problem. At the periods of high workload, the query optimizer has to assign different schedules to execute the query statements that contribute to delays and high processing costs [3]. Ad hoc optimisation of query processing with materializations and indexing would reduce those costs.

Materialized views are created as precomputed joins, stored aggregated data, stored summarised data and so on. Materializations are less expensive for joins and aggregations for processing of complex queries of high importance. In-addition, materializations can also increase the speed of query processing in large database systems which include expensive operations such a complex aggregation with joins. Moreover, it improves the performance of query processing by pre-calculating expensive operations on the database before execution and sorting the results in the database. Appropriate use od the materialized view and indices allows for optimal adjustment of the size of data sets to a given collection of queries through partitioning and creating indices on the database. For example, a materialised view restricts relational tables vertically only to the columns needed by a query. An index restricts relational tables horizontally only to the rows that satisfy the conditions used in a query.

The main objective of this research is to invent the algorithms to generate performance tuning task that reduces high workload and arrange the job within limited time and limited I/O cost level (limited workload). The algorithm uses the predicted query workload [4] of the database system. The predicted workload is used to prepare a database system for better performance during future data processing. First, a database is extended with materializations discovered from the analysis queries. Next, the algorithms generate a set of operations and create indices on extended database system from the previous step.

We present the algorithms, which are used to reduce a high workload for a given set of queries. The given workload can be a high or low workload. To reduce the high workload, we choose the best preparation plan before that high workload occurs. Besides, the best plan cannot be greater than the limited workload, and it also execute within the short time. Figure 1 shows the high and low workload structure with limited time and workload level. The limited time t and workload level l located at the period of low-level workload. Our preparation plans will generate at the duration of the low-level workload with the limited t and l to reduce heavy workload.

In this paper, we present two group of algorithms for automated tuning of database management systems. In the first step, we accept a set of queries which suppose to be processed in high workload time. Then, we find the minimized projections for each query and store the projections as the schemas of materialized views. After that, we minimize a set of schemas of materialized views and arrange them from high to low priority. Next, we check whether the selected schemas can execute within the limited workload time or not. Finally, we generate the materialized views. In the second step, we analyse the queries and extract the operations from extended relational algebra expressions. Then, we eliminate the duplicated operations and arrange them by priority level of cost multiply by frequencies of the operations. Next, we estimate the cost of indices and check whether the cost fits into limited workload time. Finally, we create indices on materialized views.

The major contributions of our work are the following.

・ We show how to discover the schemas needed to execute the queries and eliminate the duplicated one from given collection of queries q1, ∙∙∙, qn for extended database system with materialized view.

・ We show how to discover common operations from database extended with materialized view.

・ We show how to apply indexing technique on database extended with materialized view.

Figure 1. Sample circle for low and high workload.

・ We show how to generate tuning scripts (preparation scripts) within the limited time and workload level.

The paper consists of 6 sections. The previous works in related research areas will discuss in the next section. Sections 3 and 4 explain the algorithms. Experimental results are presented and discussed in Section 5. Finally, Section 6 concludes the paper.

2. Related Work

The manual processes are needed to achieve high-performance of the database system. It is an enormous effort and the complicated job which database management administrators (DBAs) or users are done [5]. There are too many techniques to tune the database system which is proposed by [6]. They discussed papers written over the past decade, which are related to self-tuning database systems. At first, they described papers that are using indexes [7] to tune the database. Next, they discussed materialized view and partitioning, which is another technique to tune the database system. Finally, they listed commercial DBMS tools for tuning database system and analysis the pros and cons of those tools.

There are some commercial databases tools like Oracle Database 11 g [3] providing self-tuning tools to tuning database system automatically. This tool accepts a set of SQL statements, then generates execution plans and allows users to choose the best plan. Once users have accepted the recommended plans, the performance tunning progress will complete and execute the tuning scripts.

Moreover, the authors from [8] show that how to self-tuning for the Database system and proposed Fuzzy- based tuning. They proposed buffer cache size (BCS), Buffer-Hit-Ration (BHR), and Database size (DBSize) to show how to self-tuning the database system. Their methods might have the problem of limited buffer size.

Furthermore, there are many research works proposed indexing technique to tune the database system. Among them, the authors from [9] proposed two automatic on-line index selection algorithms and one semi- automatic on-line index algorithm. They proposed both automatic and semi-automatic tuning by using best selection of index technique. The semi-automatic on-line index provides the list of the index (recommends the set of indexes) and allows DBA to select to create or drop the index.

Materialized view is one of the best techniques for tuning database. In this book [10] shows how materialized views can speed up queries especially for table joins and the use of aggregates like SUM. Materialized view can reduce the queries cost because it has already had the summary information pre-computed in them. The queries cost means I/O, CPU, and memory costs which are related to processing queries.

Above researchers focused on how to achieve the best performance for given the set of queries, and they show the results that reduced cost and time for given the set of queries. Although, they did not describe the consuming cost and time to execute for tuning scripts or tuning processes. In our research, we discuss how tuning scripts occupy within limited time and cost before high workload occurs. Furthermore, we discuss how workload reduces by using our new techniques.

3. Materializations

This section presents the algorithms that find the schemas of materialized views to be created in a period of low workload time. The algorithms maximise the performance improvements during a high workload period and minimise the costs of materialised views creation at a low workload period.

Algorithm 1 takes on input a set of queries {q1, ∙∙∙, qn} predicted for the nearest period of high workload. The output of the algorithm is a multiset of schemas of materialized views V = {v1, ∙∙∙, vm} that improve performance of query processing. Each vi V is a schema of materialized view such as vi = ri[ci1, ∙∙∙, cik] where ri is a relational table processed by one of the queries qj{q1, ∙∙∙, qn} and {ci1, ∙∙∙, cin} is the smallest set of columns from ri needed to process a query qj.

Algorithm 1. performs the following actions.

1. Make a multiset of schemas of materialized views V empty.

2. Iterate from q1 to qn. Let current query be qi. Let ri1, ∙∙∙, rin be the relational tables processed by qi.

2.1. Apply EXPLAIN PLAN statement to qi to get query processing plans. Then, for each relational table ri processed in qi find its smallest projection ri[ci1, ∙∙∙, cik] needed to process a query qi.

2.2. Append each projection ri[ci1, ∙∙∙, cik] found to a multiset of schemas of materializations V.

3. Repeat until all the queries are processed.

As a simple example, we consider the queries q1: SELECT a FROM r WHERE b > 10; and q2: SELECT s.a, t.b FROM s JOIN t ON s.a = t.b; and q3: SELECT r.a, s.a FROM r JOIN s ON r.a = s.a. The output from Algorithm 1 is a multiset V = {r[a, b], r[a], s[a], t[b], s[a]}.

Algorithm 2 takes on input a multiset of schemas of materialized views V. The algorithm replaces each schema vi V with a pair vi:fi where fi is a counter how many times the respective materialised view will be used at high workload time when processing the queries {q1, ∙∙∙, qn} and it eliminates the schemas that do not significantly improve performance and the schemas which are to expensive to be created.

Algorithm 2 performs the following actions.

1. Replace each vi V with a pair vi:0.

2. Iterate from v1 to vn in V. Let current schema of materialised view be vi = ri[Xi] where Xi is a set of columns in ri.

2.1. Iterate from v1 to vn in V - {vi}. Let current schema of materialised view be vj := rj[Xj] where Xi is a set of columns in rj.

2.1.1. If Xi = Xj then increase a frequency counter fi in vi:fi by one and eliminate the duplicated schema vj.

2.1.2. If vi vj or vi vj, then estimate the costs cost(vi) and cost(vj). Let a limit workload level for materialised views be lv. If lv > (cost(vi) + cost(vj)) then we calculate the profit cost by computing as p = |cost(vi) − cost(vj)| where p is profit. If p (cost(vi) + cost(vj)) * 0.5), i.e. profit must be greater than 50% of total cost for both schemas then we take both schemas of materialized views (no elimination). Else if cost(vi) > cost(vj) then eliminate the vj and extend frequency fi.

Assuming that cost(r[a]) in the previous example is almost the same as cost(r[a, b]) then Algorithm 2 applied to a multiset V = {r[a, b], r[a], s[a], t[b], s[a]} returns a set V' = {r[a, b]:2, s[a]:2, t[b]:1}.

Algorithm 3 takes on input a set of schemas of materialized views V' created by Algorithm 2. The algorithm replaces each schema vi:fi V with vi:fi:ci where ci is an estimated cost of creation of materialized view vi. Then, it allocates schemas of materialized views V'' = {v1, ∙∙∙, vn} within the limited workload lv and time tv for materialized views and execute them. Finally, Algorithm 3 verifies whether the queries q1, ∙∙∙, qn benefit from the existence of the views in V''.

Algorithm 3 performs the following actions.

1. Replace each vi:fi V with vi:fi:0.

2. Iterate from v1 to vn in V. Let current schema of materialised view be vi = ri[Xi]:fi:ci where Xi is schema of materialized view in ri.

2.1. Estimate the cost of materialized view ci and add such value to vi:fi:ci.

3. Sort the vi:fi:ci in descending order of ci*fi and update the V.

4. Iterate from v1 to vn in V. Let current schema of materialized view be vi = ri[Xi]:fi:ci. Let estimated creation time for materialized view be iv, limited workload level be lv, and limited time be tv.

4.1. If lv > ci and tv > iv then we create a materialized view vi and remove vi from V'. Next, update the lv = lv − ci.

4.2. Iterate until all schemas are allocated into limited workload.

5. In the final step, EXPLAIN PLAN statement is used to find the query processing plans for q1, ∙∙∙, qn and to verify whether all materialized views created in the previous step are used by the queries.

4. Indexing

This section presents the algorithms that find the best index to be created in a low workload time. The algorithms improve performance in a database system and reduce the high workload period.

Algorithm 4 processes a set of queries {q1, ∙∙∙, qn}. Then, we transform operations 1, ∙∙∙, n into a sequence of sets of statements S = i, ∙∙∙, S n>. Each statement in a set S i, for I = 1, ∙∙∙, n takes a form s := (x y) where x, y are the arguments of the operation, and s is a result of operation (x y). The arguments x and y can be the database relational tables or the results of operations which computed earlier.

Algorithm 4 performs the following actions.

1. Let a sequence of sets of statements S be empty.

2. Formulate set of queries {q1, ∙∙∙, qn} as expressions 1, ∙∙∙, e n> of an extended relational algebra.

3. Iterate from e1 to en and reduce expressions into a single name of the temporary result then stop the iteration.

3.1. Let the new current set of statements be Si.

3.2. Find all operations like (x y) in the expressions e1, ∙∙∙, en.

3.3. Take each operation from the previous step and transforms into a form like sij := (x y) and added into the current set of statements Si.

3.4. Get all operations like (x y) from expressions ei, ∙∙∙, en and store into temporary result like sij. Then append into current set Si.

As a simple example, consider the queries are q1, q2 and q3. Their processing plans expressed as the expressions of extended relational algebras are q1:(a1r), q2:(a1(s2t)), q3:(b1(s2t)) where 1 and 2 are the operations and a, b, r, t, and s are the relational tables or the results of operations computed earlier. The transformation results of sets of statements are <{s11 := (a1r), s12 := (s2t), s13 := (s2t)}, {s21 := (a1s12), s22 := (b1s13)}>.

Algorithm 5 processes a sequence of sets of statements S. The algorithm appends each operation with a frequency like (xiy):fi where fi is a counter of how many time appear i in S. Then algorithm finds the common operations and eliminates one from a sequence of sets of statements and increase the frequency.

Algorithm 5 performs the following actions.

1. Make all frequency be 0.

2. Iterate over the sets of statements 1, ∙∙∙, S p> in S and let the current set statements be S i.

2.1. For each statements from Si and let current statement be sii := (Xi):fi where Xi is (xiy).

2.1.1. For each statemets from Si − {sii} and let current statement be sij := (Xj):fj. If Xi = Xj then eliminate sij and incease the counter fi = fi + 1. Get eliminated statement sij and replace all sij with sii in S.

According to Algorithm 5, we found that s12 and s13 are the same and eliminate the s12 then, extend the frequency by one. After that, algorithm replace s13 with s12 in S. Then, algorithm return the S = <{s11 := (a1r):1, s12 := (s2t):2}, {s21 := (a1s12), s22 := (b1s12)}>.

Algorithm 6 processes updated a sequence of sets of statements S. The algorithm changes each statement sii:fi Si with sii:fi:oi where oi is estimated cost for index of operation i. Then, it allocates the indexes if they can fit into low workload li and time ti.

Algorithm 6 performs the following actions.

1. Replace each sii:fi with sii:fi:0 in S.

2. Iterate over the sequence of sequences 1, ∙∙∙, S n> in S. Let current set of statements be S i. Let estimated creation time for index be i i, limited workload level be l i and limited time be t i.

2.1. For each statement in Si like sii := (xiy):fi:oi use a specification (xiy) and the estimated cost of index oi for operation i then, add such value to sii:fi:oi.

2.2. Sort the sii:fi:oi in descending order of oi*fi and update the Si.

2.3. Iterate each statement in Si. Let current statement be sii := (xiy):fi:oi.

2.3.1. The algorithm discovers the indexing by searching the type of operations like SELECTION, PROJECTION, NATURAL JOIN, SEMIJOIN, and so on. If i is not projection then processes below because we no need to create index on projections. If li > oi and ti > ii then create the index. Then update the li = li − oi.

5. Experiment

In our experiments, we show that our results are better than original respond time and cost. We used a synthetic TPC-H 4 GB benchmark relational database [11] which ready built relational tables by the user applications. We used commercial database software which includes procedure and packages to estimate the size of materialized view to compute the cost and virtual index to estimate the complexity of the operation for indexes. As a high workload, we choose eight complex queries from TPC’s Template Set.

We start our experiment by analyzing the queries and getting the schemas for materialized views which are necessary for each query. Then, we remove the duplicated set of schemas and compute the estimated cost and time for materialized view. Next, we generate the materialized view if they fit into limited workload time lm. After that, we verify weather all created materialized views are used by queries or not. The second part of our experiment analyzing the queries. Then, we get the sequence of the set of statements of operations. Next, we remove the duplicated operations and append the frequencies. After that, we create indexes when they can fit into limited workload time li.

We did our experiments for several times to get the reliable results. Our experiments show that our algorithms achieve the better result than the original query. Table 1 shows that average percentages profit results of our algorithm is the best by comparing with the original one. The profits we achieve for executions times are range between 20% - 65% and the costs are range between 66% - 89%.

6. Summary and Conclusions

In this paper, we present the algorithms for automated Performance Tuning of Database System by using Materialized View and Indexing. The input is a set of low and high workloads of queries. The algorithm focuses on reducing high workload. To reduce the high workload, we execute tuning script (preparation stage) on low workload time. There are two main steps in this paper. One is how to create materialized views within the limited workload lv and time tv. The other is how to execute indexes on materialized views within the limited workload li and time ti.

In the first stage, we analysis the queries and extract schemas of materialized views then, remove the duplicated schemas. Next, we execute the materialized view when it is fitted into limited workload time lv. The materialized view can extract the database schemas into smaller pieces of database schemas. It can store summarized data, precomputed joins with or without aggregations. Besides, it is suitable for large or important queries because it can eliminate the overhead associated with expensive joins and aggregations. Furthermore, it allows us to create indexes with minimal creation time and cost.

Table 1. Compare original queries execution time and cost with tuning queries execution time and cost: experiemnt results for q1-q8.

In the second stage, we extract the operations from queries. Then, we eliminate duplicated operations and create the index over materialized view they are fitted on limited workload time li. There are different ways to choose the best index and generally, an index created based on WHERE clause. In this paper, we make a decision based on operations like SELECTION, PROJECTION, NATURAL JOIN, SEMIJOIN, and so on. Finally, we set up our experiments base on our algorithms, and the results show that our methods perform better than original execution time and cost.

Cite this paper: Noon, N. and Getta, J. (2016) Automated Performance Tuning of Data Management Systems with Materializations and Indices. Journal of Computer and Communications, 4, 46-52. doi: 10.4236/jcc.2016.45007.

[1]   Weikum, G., Moenkeberg, A., Hasse, C. and Zabback, P. (2002) Self-Tuning Database Technology and Information Services: From Wishful Thinking to Viable Engeering. Proceedings of the 28th International Conference on Very Large DataBase, 20-31.

[2]   Almeida, A.C., Lifschitz, S. and Breithman, K. (2009) A Knowledge Representation and Data Provenance Model to Self-Tuning Database Systems. 3rd Annual IEEE Software Engeering Workshop (SEW), 144-150.

[3]   Belknap, P., Dageville, B., Dias, K. and Yagoub, K. (2009) Self-Tuning for SQL Performance in Oracle Database 11g. Data Engineering ICDE’09. IEEE 25th International Conference, 1694-1700.

[4]   Marcin, Z., Janusz, R.G. and Wolfgang, B. (2014) Deriving Composite Periodic Patterns from Database Audit Trails. Intelligent Information and Database Systems, Spring International Publishing, 310-321.

[5]   Cong, G., Chung, I.H., Wen, H.F., Klepacki, D., Murata, H., Negishi, Y. and Moriyama, T. (2012) A System-atic Approach toward Automated Performance Analysis and Tuning. IEEE Transactions on Parallel and Distributed Systems, 23, 426-435.

[6]   Surajit, C. and Vivek, N. (2007) Self-Tuning Database System: A Decade of Progress. Very Large Database Endowment Inc. (VLDB).

[7]   Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G. and Skelley, A. (2000) Db2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. Data Engineering, Proceedings, 16th International Conference, 101-110.

[8]   Rodd, S. and Kulkarni, U. (2013) Adaptive Self-Tuning Techniques for Performance Tuning of Database Systems: A Fuzzy-Based Appraoch. Advanced Computing, Networking and Security (ADCONS), 2013 2nd International Conference, IEEE, 124-129.

[9]   Schnaitter, K. (2010) On-Line Index Selection for Physical Database Tuning. Ph.D. Thesis, Santa.

[10]   Alapati, S.R. (2008) Expert Oracle Database 11g Administration. Apress, Berkely.

[11]   TPC (2015)