OSS Effort Expense Optimization Based on Wiener Process Model and GA

Show more

1. Introduction

Recently, the network-oriented open source software (OSS) has been used many users. In particular, the cloud and edge computings attract the extensive attention of the developers and users, because of the network revolution such as 5 G. Also, the OSS is helpful for many users to make a cost reduction, standardization, and quick delivery. In particular, several research papers in terms of cloud and edge computings such as “OpenStack” are proposed in the past [1] [2] [3]. From past to present, the development and maintenance paradigm of OSS has been changed with the times.

Historically, the software reliability growth models have been applied to the system testing phase of software development [4]. Also, our research team has developed several reliability assessment methods [5] [6] [7]. Moreover, several optimal software release problems have been discussed by the specified researchers [8]. Furthermore, our research team has discussed several optimum release problems. However, the cost parameters have been assumed as the known parameters considering the optimum software release problem. Then, we propose an estimation method of cost parameters based on the genetic algorithm. Thereby, we can obtain the software effort expense required for the OSS management. Considering the characteristics of OSS such as the cost reduction and standardization, the OSS will increasingly give an advantage in the cloud and edge computing. Then, there is a need to consider the influences from the big data behind the OSS management.

As the research background, it is difficult for the OSS managers to comprehend the human resources required before the OSS project in advance. Especially, the environment of software development in the cloud and edge computing will be the complex situation. Also, there are several optimal release problems based on the software reliability growth models. Then, the cost parameters are given in the past researches. On the other hand, we can appropriately control the human resources, if the software managers can estimate the effort parameters in advance.

The problems and solutions in order to solve in the paper are listed as follows:

1) the estimation method of the given parameters included in the existing optimal release problems;

2) the optimal release problem with application to the effort expense optimization;

3) the optimal solution by using GA.

This paper discusses a method of OSS project management considering irregular fluctuation via the big data arose from OSS development and management under the cloud and edge computing. In particular, the effort expense optimization method based on Wiener process model in terms of effort is proposed in order to estimate the effort parameters in this paper. Then, the genetic algorithm is applied to the proposed method. Also, we analyze actual software effort expense data by showing numerical examples of OSS project optimization analysis.

2. Wiener Process Model Description

In the assumptions of Wiener process modeling, we consider the following situations:

1) the noisy event of software effort expense arise from the complex management in big data environment;

2) the factor of big data is defined as “3 V model” such as Volume, Variety, and Velocity;

3) the velocity directly influence the software effort, the variety and volume indirectly influence one.

Moreover, it is important to consider the effort control, because several research papers considering the relationship between effort and fault have been proposed by in the past [9] [10] [11]. Also, the Wiener process has been useful in precision required high accuracy. As the other examples of applying the Wiener process effectively, several research papers have been proposed in the past [12].

Then, we can obtain the following two dimensional stochastic differential equation with Brownian motion derived from software reliability growth modeling approach based on Wiener process [13]:

$\begin{array}{c}\text{d}\Psi \left(t\right)=\left\{\zeta \left(t\right)-\frac{1}{2}\left({\upsilon}_{1}^{2}+{\upsilon}_{2}^{2}\right)\right\}\left\{\lambda -\Psi \left(t\right)\right\}\text{d}t\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\upsilon}_{1}\left\{\lambda -\Psi \left(t\right)\right\}\text{d}{\omega}_{1}\left(t\right)+{\upsilon}_{2}\left\{\lambda -\Psi \left(t\right)\right\}\text{d}{\omega}_{2}\left(t\right)\mathrm{.}\end{array}$ (1)

Then, each parameter is as follows:

$\Psi \left(t\right)$ : the cumulative maintenance effort expense at up to operational time; $t\left(t\ge 0\right)$ in the OSS development project, which takes on continuous real values;

$\zeta \left(t\right)$ : the increase rate of maintenance effort expense at operational time *t* and a non-negative function;

$\lambda $ : the estimated maintenance effort required until the end of operation;

${\omega}_{1}\left(t\right)$ : 1st Wiener process considering the “Volume” factor;

${\omega}_{2}\left(t\right)$ : 2nd Wiener process considering the “Variety” factor;

${\upsilon}_{1}$ : 1st positive constant for “Volume” representing a magnitude of the irregular fluctuation;

${\upsilon}_{2}$ : 2nd positive constant for “Variety” representing a magnitude of the irregular fluctuation.

Moreover, we define the increase rate of maintenance effort expense in case of $\zeta \left(t\right)$. We assume the following equations based on software reliability models [4] as the reliability growth function of the proposed model:

$\zeta \left(t\right)\doteq \frac{\frac{\text{d}I\left(t\right)}{\text{d}t}}{o-I\left(t\right)}=\frac{p}{1+q\cdot \mathrm{exp}\left(-pt\right)}\mathrm{.}$ (2)

Then, the parameters in Equation (2) are given by

$I\left(t\right)$ : the mean value function of the inflection S-shaped software reliability growth model based on a nonhomogeneous Poisson process (NHPP);

*o*: the expected number of latent faults;

*p*: the fault detection rate per fault;

*q*: is defined as
$\frac{1-r}{r}$ ;

*r*: the impact parameter in terms of “Velocity”.

Therefore, the cumulative maintenance effort expense up to time *t* is obtained as follows:

$\Psi \left(t\right)=\lambda \left[1-\frac{1+p}{1+q\cdot \mathrm{exp}\left(-pt\right)}\cdot \mathrm{exp}\left\{-pt-{\upsilon}_{1}{\omega}_{1}\left(t\right)-{\upsilon}_{2}{\omega}_{2}\left(t\right)\right\}\right].$ (3)

Similarly, the estimated maintenance effort expense required until the end of operation can give as follows:

${\Psi}_{r}\left(t\right)=\lambda \left[\frac{1+p}{1+q\cdot \mathrm{exp}\left(-pt\right)}\cdot \mathrm{exp}\left\{-pt-{\upsilon}_{1}{\omega}_{1}\left(t\right)-{\upsilon}_{2}{\omega}_{2}\left(t\right)\right\}\right].$ (4)

Above mentioned model parameters can easily estimate by using the method of maximum likelihood [12].

3. Effort Expense Optimization Based on Genetic Algorithm

We discuss the effort expense optimization based on the conventional optimal software release problems. Generally, it is interesting for the software developers to estimate the time when we should stop the testing phase in order to release software efficiently. Therefore, several researchers have discussed about the determination of software release times minimizing the expected total software cost [8].

According to the conventional optimal release problems, we can consider the optimal software maintenance problem with software effort expense from the relationship between cost and effort. Our research group has proposed several optimal maintenance problem based on software effort expense. Then, we define the following effort parameters:

${\tau}_{1}$ : the fixing effort expense per fault during the operation;

${\tau}_{2}$ : the effort expense per unit time during the operation;

${\tau}_{3}$ : the maintenance effort expense per fault after the maintenance. Then, the software effort in the operation can be formulated as:

${\Theta}_{1}\left(t\right)={\tau}_{1}\Psi \left(t\right)+{\tau}_{2}t\mathrm{.}$ (5)

Also, the software maintenance effort expense after the maintenance is represented as follows:

${\Theta}_{2}\left(t\right)={\tau}_{3}{\Psi}_{r}\left(t\right)\mathrm{.}$ (6)

Consequently, from Equations (5) and (6), the total software maintenance effort expense is given by

$\Theta \left(t\right)={\Theta}_{1}\left(t\right)+{\Theta}_{2}\left(t\right)\mathrm{.}$ (7)

The optimum maintenance time ${t}^{\mathrm{*}}$ is obtained by minimizing $\Theta \left(t\right)$ in Equation (7).

In the past, the effort expense parameter has been assumed as the given parameter according to the conventional optimal software release problem. We propose the estimation method of the effort parameters based on genetic algorithm. Then, we consider as the search problem minimizing the total software maintenance effort expense in Equation (7).

In the past, we have proposed several parameter estimation methods for jump diffusion process models [12]. Then, we focus on a genetic algorithm for the estimation of the effort expense parameters included in the total software maintenance effort expense in Equation (7). The procedure of parameter estimation based on genetic algorithm is given in the following [14]:

Step 1. The initial values are randomly generated;

Step 2. The crossover is executed;

Step 3. The value of goodness-of-fit is calculated from the following evaluation function ${\kappa}_{i}$ included parameters:

$\begin{array}{l}\underset{\delta}{\mathrm{min}}{\kappa}_{i}\left(\delta \right),\\ {\kappa}_{i}={\displaystyle \underset{i=0}{\overset{I}{\sum}}}{\left\{\Theta \left(i\right)-{\alpha}_{i}\right\}}^{2}\text{\hspace{1em}}\left(i=1,2,\cdots ,I\right).\end{array}$ (8)

Then, the parameters included in evaluation function are as follows:

$\Theta \left(i\right)$ : the cumulative amount of maintenance effort expense at up to operation time *i* in Equation (4);

${\alpha}_{i}$ : *i*-th actual cumulative amount of maintenance effort expense;

$\delta $ : the set of effort expense parameters ${\tau}_{1}$, ${\tau}_{2}$, and ${\tau}_{3}$.

Step 4 and Step 2-Step 3* *are continued until minimizing the value of evaluation function*
${\kappa}_{i}$.*

Then, the fitness function based on our model is formulated as follows:

$\{\begin{array}{l}\underset{{\tau}_{1},{\tau}_{2},{\tau}_{3}}{\mathrm{min}}{\kappa}_{i}\left({\tau}_{1},{\tau}_{2},{\tau}_{3}\right),\\ {\kappa}_{i}\left({\tau}_{1},{\tau}_{2},{\tau}_{3}\right)={\displaystyle \underset{i=0}{\overset{I}{\sum}}}{\left\{\Theta \left(i\right)-{\alpha}_{i}\right\}}^{2}.\end{array}$ (9)

Specifically, ${\kappa}_{i}\left({\tau}_{1},{\tau}_{2},{\tau}_{3}\right)$ in Equation (9) is given by the following equation

$\begin{array}{l}{\kappa}_{i}\left({\tau}_{1},{\tau}_{2},{\tau}_{3}\right)\\ ={\displaystyle \underset{i=0}{\overset{I}{\sum}}}{\left\{\Theta \left(i\right)-{\alpha}_{i}\right\}}^{2}={\displaystyle \underset{i=0}{\overset{I}{\sum}}}{\left\{{\Theta}_{1}\left(i\right)+{\Theta}_{2}\left(i\right)-{\alpha}_{i}\right\}}^{2}\\ ={\displaystyle \underset{i=0}{\overset{I}{\sum}}}{\left\{{\tau}_{1}\Psi \left(i\right)+{\tau}_{2}i+{\tau}_{3}{\Psi}_{r}\left(i\right)-{\alpha}_{i}\right\}}^{2}\\ ={\displaystyle \underset{i=0}{\overset{I}{\sum}}}\{{\tau}_{1}\lambda \left[1-\frac{1+p}{1+q\cdot \mathrm{exp}\left(-pi\right)}\cdot \mathrm{exp}\left\{-pi-{\upsilon}_{1}{\omega}_{1}\left(i\right)-{\upsilon}_{2}{\omega}_{2}\left(i\right)\right\}\right]+{\tau}_{2}i\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{{\tau}_{3}\lambda \left[\frac{1+p}{1+q\cdot \mathrm{exp}\left(-pi\right)}\cdot \mathrm{exp}\left\{-pi-{\upsilon}_{1}{\omega}_{1}\left(i\right)-{\upsilon}_{2}{\omega}_{2}\left(i\right)\right\}\right]-{\alpha}_{i}\}}^{2}.\end{array}$ (10)

Then, the effort expense parameters ${\tau}_{1}\mathrm{,}{\tau}_{2}$, and ${\tau}_{3}$ are searched by the genetic algorithm under the flow in Figure 1.

4. Numerical Examples for Effort Expense Optimization

We show several numerical examples by using the effort expense data of Apache

Figure 1. Flow of effort expense parameters estimation in Equation (7) based on genetic algorithm.

HTTP Server Project [15]. The actual effort data set is used in the proposed method. The effort data sets are obtained from the bug tracking system of actual OSS project. The effort data requires the data in terms of the person and time. Then, the data sets in terms of the person and time are obtained from “Opened”, “Reporter”, and “Assignee” recorded on the bug tracking system. We have arranged the effort data from fault big data recorded on the bug tracking system. Then, the unit of effort is (“man × days”), because the unit of date is the day. First, we confirm the state of changing for several effort parameters. Figures 2-4 show the sensitivity analysis for effort expense parameter ${\tau}_{1}$, ${\tau}_{2}$, and ${\tau}_{3}$, respectively. We find that the effort expense parameters ${\tau}_{1}$ and ${\tau}_{2}$ affect the second half of the operation from Figure 2 & Figure 3. On the other hand, the effort expense parameter ${\tau}_{1}$ affect the first half of the operation from Figure 4. In particular, the optimum maintenance times strongly depend on the ${\tau}_{1}$, ${\tau}_{2}$,

Figure 2. The sensitivity analysis for effort expense parameter ${\tau}_{1}$.

Figure 3. The sensitivity analysis for effort expense parameter ${\tau}_{2}$.

Figure 4. The sensitivity analysis for effort expense parameter ${\tau}_{3}$.

and ${\tau}_{3}$, respectively. Similarly, we show the sensitivity analysis of the sample paths for effort expense parameter ${\tau}_{1}$, ${\tau}_{2}$, and ${\tau}_{3}$ in Figures 5-7, respectively. From Figures 5-7, we find that the noises become large in the early phase of operation.

In the past, the effort or cost parameters are the given ones. Conversely, we consider that the effort parameters can be estimated by using Equation (9). Figure 8 and Figure 9 show the estimated software effort based on genetic algorithm in Equations (5) and (6). Moreover, we show the estimated total software effort based on genetic algorithm in Figure 10. Furthermore, we show the convergence status in Figure 11. Then, the effort parameters are estimated by using Equations (8)-(10) in section 3 as follows:

${\tau}_{1}=0.90915,{\tau}_{2}=10130.2,{\tau}_{3}=\mathrm{2.19561.}$

From the estimated results, we found that the effort parameters are appropriately estimated by using the combination optimization problem. Then, the software manager will be able to allocate the human resources for the software development appropriately. In other words, the OSS managers can comprehend the maintenance effort expense required in advance by using the proposed method. Therefore, the OSS managers will be able to prepare the human resources required before the OSS project.

The key significance points of the research contribution is as follows:

· There is no research paper in terms of the estimation method of cost parameters and effort ones based on the software reliability growth models. The

Figure 5. The sensitivity analysis of the sample paths for effort expense parameter ${\tau}_{1}$.

Figure 6. The sensitivity analysis of the sample paths for effort expense parameter ${\tau}_{2}$.

Figure 7. The sensitivity analysis of the sample paths for effort expense parameter ${\tau}_{3}$.

Figure 8. The estimated software effort expense in Equation (5) based on genetic algorithm.

Figure 9. The estimated software effort expense in Equation (6) based on genetic algorithm.

Figure 10. The estimated total software effort expense based on genetic algorithm.

Figure 11. The convergence status based on genetic algorithm.

cost parameters and effort ones have been given as the experimental parameters of software managers in the past. Therefore, we cannot compare the proposed method with the conventional method, because there is no similar method.

· The managers of OSS project can decide the optimal software operation effort by using the estimated effort parameters ${\tau}_{1}$, ${\tau}_{2}$, and ${\tau}_{3}$, respectively. Thereby, the OSS managers can achieve the cost reduction and quick delivery.

Also, several research papers in terms of the optimization algorithms have been proposed in the past [16] [17]. However, these methods are not for the area of software engineering.

From above mentioned, it is difficult to compare the proposed method with the conventional one. Alternatively, we show the comparison results based on several GA’s. Figures 10 and Figures 11 are the results based on “L-BFGS-B” [18]. As the comparison results by using the “Nelder-Mead” [19], we show the estimated total software effort based on GA in Figures 12. Similarly, we show the convergence status in Figures 13. Then, the effort parameters are estimated by using Equations (8)-(10) in section 3 as follows:

${\tau}_{1}=1.0362,{\tau}_{2}=9617.0,{\tau}_{3}=\mathrm{2.1984.}$

As the comparison results based on two kinds of algorithms, the estimation results based on “Nelder-Mead” is pessimistically estimated from Figure 10 and Figure 12. However, we found that there is almost no difference between “L-BFGS-B” and “Nelder-Mead” in terms of the optimization algorithms in GA.

Figure 12. The estimated total software effort expense based on genetic algorithm based on “Nelder-Mead”.

Figure 13. The convergence status based on genetic algorithm based on “Nelder-Mead”.

Therefore, the proposed method is stable without regard to the optimization algorithms in GA.

5. Conclusions

We have discussed the optimum maintenance problem based on the genetic algorithm. In particular, we have proposed an estimation method of effort expense parameters included in our effort function. Moreover, we have concretely shown numerical examples of effort optimization. Then, we have found that the OSS managers can comprehend the human resources required before the OSS project in advance.

Finally, we have discussed the contribution of the proposed method. In particular, we have focused on two dimensional Wiener processes model for effort expense estimation. We conclude the advantage points as follows:

1st Point: The proposed stochastic differential equation model can assess OSS projects considering the influence of big data in terms of 3 V model.

2nd Point: The proposed method can search the optimum maintenance time under the complex situation such as Equation (9).

3rd Point: Before the OSS operation, the OSS managers can comprehend the required human resource in advance.

The OSS managers will be able to control the software effort expense from the stand point of 3 V model as big data by using the proposed effort expense optimization analysis.

At present, the cloud and edge computing based on OSS is attracting the most attention. It is difficult to assess the optimum maintenance time, because the cloud and edge computing have the network-based-service and many distributed node servers. As the future study, it will be important to discuss the optimum maintenance problem for the cloud and edge computing service.

Acknowledgements

This work was supported in part by the JSPS KAKENHI Grant No. 20K11799 in Japan.

References

[1] Ibrahim, I.M., et al. (2018) A Robust Generic Multi-Authority Attributes Management System for Cloud Storage Services. IEEE Transactions on Cloud Computing, 30 August 2018, 1.

https://doi.org/10.1109/TCC.2018.2867871

[2] Ahmad, A.A., et al. (2019) Scalability Analysis Comparisons of Cloud-Based Software Services. Journal of Cloud Computing: Advances, Systems and Applications, 8, Article No. 10.

[3] Taleb, T., Samdanis, K., Mada, B., Flinck, H., Dutta S. and Sabella, D. (2017) On Multi-Access Edge Computing: A Survey of the Emerging 5G Network Edge Cloud Architecture and Orchestration. IEEE Communications Surveys & Tutorials, 19, 1657-1681.

https://doi.org/10.1109/COMST.2017.2705720

[4] Kapur, P.K., Pham, H., Gupta, A. and Jha, P.C. (2011) Software Reliability Assessment with OR Applications. Springer-Verlag, London.

https://doi.org/10.1007/978-0-85729-204-9

[5] Yamada, S. and Tamura, Y. (2016) OSS Reliability Measurement and Assessment. Springer International Publishing, Switzerland.

https://doi.org/10.1007/978-3-319-31818-9

[6] Norris, J. (2004) Mission-Critical Development with Open Source Software. IEEE Software Magazine, 21, 42-49.

https://doi.org/10.1109/MS.2004.1259211

[7] Singh, V.B., Sharma, M. and Pham, H. (2017) Entropy Based Software Reliability Analysis of Multi-Version Open Source Software. IEEE Transactions on Software Engineering, 44, 1207-1223.

https://doi.org/10.1109/TSE.2017.2766070

[8] Yamada, S. and Osaki, S. (1985) Cost-Reliability Optimal Software Release Policies for software Systems. IEEE Transactions on Reliability, 34, 422-424.

https://doi.org/10.1109/TR.1985.5222222

[9] Khatri, S.K., John S.A. and Majumdar, R. (2016) Quantifying Software Reliability Using Testing Effort. Proceedings of International Conference on Information Technology (InCITe)—The Next Generation IT Summit on the Theme—Internet of Things: Connect Your Worlds, Noida, 6-7 October 2016, 23-26.

https://doi.org/10.1109/INCITE.2016.7857582

[10] Lance, F. and Swapna, S.G. (2008) Software Reliability Models Incorporating Testing Effort. OPSEARCH, 45, 351-368.

https://doi.org/10.1007/BF03398825

[11] Kapur, P.K., Gupta, A. and Jha, P. (2007) Reliability Analysis of Project and Product Type Software in Operational Phase Incorporating the Effect of Fault Removal Efficiency. International Journal of Reliability, Quality and Safety Engineering, 14, 219-240.

https://doi.org/10.1142/S021853930700260X

[12] Tamura, Y. and Yamada, S. (2019) Maintenance Effort Management Based on Double Jump Diffusion Model for OSS Project. Annals of Operations Research, 1-16.

https://doi.org/10.1007/s10479-019-03170-w

[13] Arnold, L. (1974) Stochastic Differential Equations-Theory and Applications. John Wiley & Sons, New York.

[14] Holland, J.H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.

[15] The Apache Software Foundation. The Apache HTTP Server Project.

http://httpd.apache.org/

[16] Nayak, S.C. and Misra, B.B. (2020) Extreme Learning with Chemical Reaction Optimization for Stock Volatility Prediction. Financial Innovation, 6, Article No. 16.

https://doi.org/10.1186/s40854-020-00177-2

[17] Balderas, F., Fernandez, E., Gomez-Santillan, C., Rangel-Valdez, N. and Cruz, L. (2019) An Interval-Based Approach for Evolutionary Multi-Objective Optimization of Project Portfolios. International Journal of Information Technology & Decision Making, 18, 1317-1358.

https://doi.org/10.1142/S021962201950024X

[18] Byrd, R.H., Lu, P., Nocedal, J. and Zhu, C. (1995) A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific Computing, 16, 1190-1208.

https://doi.org/10.1137/0916069

[19] Nash, J.C. (1990) Compact Numerical Methods for Computers, Linear Algebra and Function Minimisation. Adam Hilger, Bristol.