The maintenance objective is as a rule to increase the lifetime of maintained device and to reduce the maintenance costs simultaneously. As these two requirements are in fact contradictory, one has to consider a criterion combining them, using for instance relative cost to lifetime unit, or, on the contrary, the proportion of effective lifetime to costs needed for the device acquisition and maintenance, not neglecting costs caused by eventual device failure.
The main contribution of the present study is the following: firstly, a quite general model of optimal preventive maintenance problem is formulated, including a rather universal scheme of sequential maintenance actions. Then, a procedure of numerical solution is proposed. It is based on the MCMC (Markov Chain Monte Carlo) method solving the stochastic optimization problem via controlled simulation of possible maintenance scenarios. Both these features of the approach are rather original (to author’s knowledge). The way of solution is demonstrated on artificial examples.
There exists a number of works dealing with the optimal maintenance problems from different points of view. Some are closely connected with concrete applications; others are more theoretical; however, most of them can serve at least as an inspiration for solutions of real problems. It should be also the case of the present paper. In order to put this to a wider research context, let us here present a brief selection of references relevant to the theme.
The paper  reviews computational approaches to maintenance scheduling. Several major categories are distinguished. One group of approaches is those utilizing traditional optimization-based techniques, such as dynamic programming, yielding exact optimal solution. Another type of approach, taking into account stochastic aspects of the problem, deals with heuristic methods, including simulated annealing, genetic algorithm, or guided randomized search methods. A similar overview of both the state of the art and directions of development is given in  . Both categorizations concern prevailingly to methods of solution; however, first, the problem description (model) has to be specified, concerning the lifetime distribution, possible actions, their consequences as well as costs.
Sometimes, the schedule is given (allowing for certain tolerance); the objective is to optimize distribution and allocation of sources and people to carry out all required actions at proper time, see for instance  . In the present paper, on the contrary, the time schedule and, eventually, the extent of maintenance are to be optimized, while their impact and costs are given (though depending, naturally, on maintenance actions). Schematically, the setting is sketched in Figure 1 below. As the lifetime (survival) of a device is random, characterized by its probability distribution, we deal with the problem of stochastic optimization. In general cases, it could be rather difficult to find one optimal solution. Sometimes a unique solution does not exist, or the surface of objective function is flat around its optimum. Thus, the intensive simulation (under selected strategy) is a common way of how to find at least sufficiently sub-optimal solution (see also  ).
More complicated problem arises in connection with optimal maintenance in complex systems. A crucial task there is to describe the dependence among system components and their mutual impact. Such a description could be based on the
Figure 1. Scheme of repeated maintenance actions.
7FTA analysis, which should contain also a model of dynamic reaction to events (defects, maintenances) of the system elements. One example of such an approach could be found in  . A nice literature overview concerning maintenance decision objective functions is presented in  . The authors also discuss the gap between maintenance impact and costs, leading to a multi-criteria optimization problem. A useful overview of problems connected with maintenance optimization is given also in  , it can serve as an inspiration to further deeper study of this area and offers also a number of relevant sources.
Another set of models studied in optimal maintenance literature, see for instance  , considers also inspections of actual system state. Then the maintenance is optimized in a sequential manner conditionally on the current prediction of system reliability. In the framework of our setting, inspections could be included as a special maintenance action, then the optimal maintenance policy has to be recomputed sequentially, when a new information on actual lifetime distribution is provided.
It is necessary to mention here also the models of imperfect repairs (we shall return to this theme in more detail in Section 3). The models are often based on the reduction of the hazard rate, either directly or indirectly (by shifting the virtual age of the system). If the state of the system is characterized by its degradation, the repair degree can be connected with the reduction of the degradation level. However, here other problems arise. First, how the degradation should be characterized and evaluated. Actually, the degradation could be connected directly with the growth of cumulated hazard rate, i.e. with one of basic characteristics of survival time probability distribution. There are numerous other possibilities. One of standard approaches consists in describing the process of degradation and repairs as a discrete (Markov or hidden Markov) chain on a finite set of states, see for instance  . However, the second problem is how to connect the state improvement with maintenance costs. This has to be settled as well.
In the present study, the specification of device degradation is not solved, as, in fact, it could be as well characterized by cumulated hazard rate, i.e. given by lifetime probability distribution. The impact of a maintenance is then modeled via certain change of this distribution. The organization of the paper is the following: In Section 2 a sketch of general model of maintenance sequence and its impact is provided and the objective function of stochastic optimization problem is formulated. Then, Section 3 recalls several models of non-complete maintenance and its impact to future lifetime. One of them, namely the model of accelerated ageing, is then described in more details. Its particular version is then specified to be used in concluding example. In Section 4, the method of numerical solution is described. It utilizes the random search via the Metropolis algorithm combined with simulated annealing. It is, in Section 5, applied to a solution of the optimal maintenance in an artificial example. Finally, the conclusion discusses how the approach can be adapted to different more complicated variants of the optimal maintenance problem.
2. Problem Formulation
Figure 1 shows a general scheme of a sequence of maintenance actions. The symbols used there have the following sense:
—times of maintenance actions, to be optimized, .
—time of failure (random), it depends on maintenances performed before its occurrence.
—survival times after j-th maintenance (random variables),
—initial time to failure (random variable).
—cumulative distribution functions and densities of .
—corresponding survival functions.
—cost of j-th maintenance, —cost of failure,
—cost of device acquisition, , C—total costs.
—time periods between maintenances.
—random objective function.
—events, maintenance actions, F—event, failure.
While the initial distribution of , i.e. , is assumed to be known, other distributions depend on maintenances, that is why the model of such a dependence should be specified. Moreover, in a general case also the costs of actions may depend on their extent. Further, as the problem is stochastic, we have to select an objective function, i.e. a corresponding random variable and its representative characteristics, e.g. its expectation, or its certain quantile. Let us first assume that all components and parameters of the scheme are fixed. Then it is possible, at least formally, to evaluate distribution of following random variables, which might be of the interest:
1) Distribution of the time to failure TF: In each interval its distribution is given (conditionally, provided the device survives ) by a probability density function with , and, simultaneously, the probability that the device survives over equals . Let us assume that there are exactly K maintenance times, fixed in advance (the device can but need not survive to them). Hence, when densities are specified, we are able to derive distribution of lifetime . In particular, its expectation then equals:
2) Distribution of costs, C: It is discrete; if the failure occurs in interval , then the total costs are , after the last the costs equal . Corresponding probabilities are then , while , is the probability of survival over the last .
3) Distribution of random variable , the time-to-failure to costs- to-failure. Again, when the model is specified, there is no problem to derive this distribution from above. As it was already said, we might be interested in maximization of its certain quantile, however, in the present paper, we shall search for parameters maximizing its expectation. For given all model parameters, the expectation EZ has a form quite similar to (1), each term of (1) has to be divided by corresponding costs shown in point 2. For instance, when the number of planned maintenances is just K = 2, then
It is seen that, in general case, the solution of the optimization task could be a rather demanding problem. Though the structure is sequential, it is not possible to solve the problem parts separately. For instance, it is rather difficult to find an optimal as a function of . On the other hand, as shown above, for given all variables and parameters, the objective function can be evaluated. Hence, optimal parameters can be obtained by a convenient method of (randomized, possibly) search.
2.1. Sketch of Optimization Procedure
While a numerical example is solved in detail in Section 5, here the main features of the solution approach are described. The objective is to find the configuration of maintenance times (and, eventually, the maintenance extent parameters) maximizing EZ. In the numerical example in Section 5 the corresponding parameter (namely the parameter of time acceleration, ) is fixed, in a more general case such parameters could also be the subject of optimal search, then, naturally, the costs of maintenances have to depend on them, too.
The procedure starts from a reasonably chosen initial configuration and is searching for the best one, comparing their values EZ. It means that for each considered configuration corresponding EZ has to be computed. The way of search depends on selected search method. There exists a number of random search methods, each consisting, as a rule, of repeating two sub-steps: First, a new configuration is proposed, second, it is accepted or not. For instance, genetic algorithms can propose a new configuration as a combination of already accepted, they store a whole set of best results. On the contrary, the method used here is based on the Metropolis algorithm. The method creates a (Markov) chain of configurations, at each step both the proposal distribution and the acceptance probability depend just on the actual state and proposed new state. The algorithm accepts also non-optimal configurations, which helps the procedure to escape from the regions of local extremes. In order to force the procedure to converge to the global extreme, the method is combined with simulated annealing. Other details of the method are given in Section 4. As a rule, such a global search method should be started several times, in order to be more sure that really the global extreme has been approached.
When the search ends (and we hope that the optimal solution is achieved), it could be of interest to evaluate corresponding distributions of optimized variable Z, also of TF and C, eventually their other characteristics, e.g. selected quantiles.
2.2. Illustrative Example
The objective of the following example is just to illustrate the form of results, i.e. the evaluation of mentioned distributions and also the maintenance times optimization, for a rather simple special case. Let the initial distribution of X0 be Weibull with parameters a = 100 (scale), b = 2 (shape), i.e. its characteristics are , . Further, let us for simplicity assume that this distribution remains the same after all maintenances (i.e. the maintenances are “complete”), their maximal number is K = 10, inter-maintenance times are also the same, . Further, , all other . Figure 2 shows distributions of all three variables, i.e. . Obtained characteristics of these variables (namely their expectations and standard deviations) then are: , , , , , , .
In fact, is approximately the optimal inter-maintenance interval in this simplified setting. For comparison, let us consider the same case as above, except that the number of maintenances is limited to only 3, on the other hand we optimized all 3 inter-maintenance times (using already the method of random
Figure 2. Cumulative distribution function of time-to-failure T (above), the dotted curve there is the c.d.f. of Weibull(100, 2) r.v. X0. The distribution of total costs C is in the middle, cumulative distribution function of variable Z is below.
search described later in Section 4, however assuming just complete repairs at maintenance times). It was found that the maximum of EZ was attained approximately for with , while median .
2.3. A Variant Model with Finite Time Horizon
The time of model introduced in (1) has no upper bound, assuming that each finite lifetime can be exceeded with positive probability. While it holds theoretically, in practice it is not realistic. Therefore let us now assume that there is an upper time limit TR, given as a rule by the manufacturer, after which the device is innovated completely (or exchanged for new). The model is then slightly different. For instance, let us consider again fixed number K of planned maintenances between 0 and TR. Let us recall that CA is the cost of the device acquisition, CF the cost of failure consequences, . Now, after a convenient re-arrangement, the expression for EZ is
where for , is the time remaining to after the last maintenance action. Thus, in general, except the maintenance times , also their number K should be optimized.
There are many model variants. For instance, one just slightly different from the setting considered here assumes the renewal after a given number of maintenances, similarly like in  . It means that K is given, at TK the device is innovated, however the maintenance times Tm should be found optimally as well. Expression (3) is adapted easily to this case. The way of solution could be also quite similar to our solution here.
In the numerical example above it was assumed that after a maintenance the device is as new. This is evidently unrealistic in many instances. The model (3), on the contrary, is more universal and considers changed distributions. That is why the next section is devoted to a brief presentation of commonly used models taking into account the maintenance impact.
3. Models of Partial Repairs
The renewal (or the “complete repair”) means that the device is repaired fully (or exchanged for a new one) and that, consequently, the successive random variables—times to failure—are distributed identically and independently. On the contrary, the minimal repair means that the repair has no innovative effect (we can imagine just switching an electronic device off and on, to restart it).
There are several natural ways how the models of complete repairs can be extended to repairs incomplete. Some authors, e.g.  , speak in this context on “Generalized renewal processes” meaning that after the maintenance and other interventions the distribution of time-to-failure is changed. One of basic contributions is in  : Let be the distribution function of the time to failure of a new system. Assume that a maintenance reduces the virtual age. It could be modeled directly by switching from the actual age to , with a parameter characterizing the degree of repair. Hence, the remaining lifetime is from this moment described by the distribution function .
In  , several sub-models of imperfect repairs are then specified. Denote by the degree of the n-th repair, the time survived after the last repair. Then in Model I the n-th repair cannot remove old damages, . On the contrary, the Model II allows for a reduction of the whole virtual age, namely . Notice that special cases contain the complete repair model with , minimal repair model, , and frequently used variant with constant degree . Naturally, there are many other approaches, e.g. considering a randomized degree of repair, the regressed degree (based on the system history), accelerated virtual ageing, change of hazard rate (i.e. its reduction as a rule) etc. A nice overview of models of partial maintenance can be found e.g. in  .
Accelerated Failure Time Model
In the rest of this paper, the specification of the model presented schematically in Figure 1 will use the Accelerated Failure Time (AFT) model assuming that after an event the subjective time speed is changed, as a rule accelerated. Formally, the maintenance returns the virtual age back to zero, however, the speed of ageing is then higher, by an accelerating parameter . In general, this parameter value, , can depend on maintenance action . Then, when the distribution function of the variable is , after the next j-th maintenance the distribution function of is and its time starts from zero anew. Hence, for probability densities it means that .
In the sequel, it will be assumed that the initial lifetime distribution of is given and that the acceleration parameter is constant and known. It means that also the distributions of variables are known, they do not depend on maintenance timing. If we further assume that all maintenance costs are the same, , the problem description seems to be rather simple. Nevertheless, still the maintenance schedule optimization is not less difficult and needs a convenient numerical procedure.
4. Method of Solution
The method used here will be based on the Markov Chain Monte Carlo approach (MCMC, see for instance  ) generating a chain of possibly sub-optimal solutions, and on its combination with the simulated annealing reducing variability of accepted sequence of solution and “forcing” it to approach the optimum. In this framework, two variants of optimization can be considered. First, optimal times are found for several fixed values K, then the optimal K from them is selected. The second method should be able to change K during the process of search. In the framework of the MCMC algorithms, the method is known as the reversible jumps MCMC. As the structure of corresponding algorithm is then more complicated, we shall use the first approach.
Hence, let us assume that K is fixed, we search for optimal vector of maintenance times from the subset . The model sketch in Figure 1 as well as its description in (3) suggest that a reasonable method could consists in a sequential alternating of times . We shall follow the Metropolis algorithm. It starts from an initial configuration of , e.g. selected equidistantly inside . At each step, one new component is proposed, selected randomly in , and accepted with probability
Hence we can control the distribution proposing new candidates, the most rough choice is the uniform distribution in , a finer selection could be beta distribution, or another distribution limited to this interval, with the last accepted value as its parameter (constrained Gauss distribution could be used as well). In every case, such a procedure innovating one component after another creates a Markov sequence of sets of times , with limit distribution given by a probability density function proportional to . The rule (4) accepts, with positive probability, also non-optimal values, therefore it is able to jump from a local maximum region. However, the objective is to find the global maximum of EZ. The method of simulated annealing suggests, instead of acceptance probability (4), to use its modification, namely
Then, the acceptance rule is controlled by a variable (“temperature”), it is seen that when is close to zero, acceptance probability of non-optimal solutions (i.e. the cases ) is reduced and closer to zero as well. The success of simulated annealing application depends on the selection of rate of convergence of to zero. For instance, , with n denoting the iteration loop number, suffices theoretically. However, for practical use it tends to zero too quickly, there does not exist a universal and exact rule how to adapt this rate to actual case.
5. Numerical Example of Optimal Maintenance
For numerical illustration we selected again the distribution of , final time of renewal was set to . Notice that without any maintenance . Further, the acceleration parameter was . Thus, unlike in the illustrative example at the end of Section 2, the distribution changed, in the AFT model manner, after each action. The objective was to find an optimal scheduling of maintenances, i.e. times (or, equivalently, inter-maintenance periods ), and also optimal number K maximizing EZ, where random variable Z was now the proportion of to costs spent to T, including if failure occurred before , or just if device survived to . The expression for EZ is written in (3), the method of randomized search for optimal solution is sketched roughly above. As it was said there, optimal times were found for different fixed K, then the best K was selected.
In the first experiment the costs were set to for maintenances , the cost of failure impact , too. Initial location of points was regular inside . Then, the iterative search followed the Metropolis algorithm described above, with N = 10,000 loops. In each loop the innovation of each point was proposed sequentially, candidates were selected randomly uniformly from an interval inside (where , ). It means that computation contained together steps. New point was accepted along (5), with variable . Here was the number of loop, c was such that final (i.e. ). Notice that such a choice yields an “S-curve” decreasing from 1 to 0 (for instance , in fact it is the survival function of Weibull distribution). Its shape allows for rough allocation of optimal configuration in the first part of iteration, making it more accurate in the second part. Simultaneously, also the width of proposal interval was decreased, which resulted in reduction of number of rejected proposals.
The best result was found for . Optimal times of maintenance were . Achieved maximal . Figure 3 shows the distribution of costs and the distribution of variable Z in optimal case. Then, Figure 4 presents, in its upper part, the trace of search for maximal value of EZ, while its lower part shows how location of points was concentrated around their optimal values. Both graphs display values from the second part of iteration, i.e. last 5000 loops, 25,000 iterations. It must be said that the shape of objective function around its optimum was rather flat, maximal values of EZ for selections or were just slightly smaller (by several %), as well as the result for quite regular inter-maintenance periods.
In order to explore a solution sensitivity, namely the change of optimal solution for different costs, in the second case we set for maintenances, while remained unchanged. Now, in order to reduce the probability of rather expensive failure event, optimal number of maintenances was higher, namely , optimal times then , maximum of .
On the contrary, the third case assumed that there were no additional failure expenses. We selected , and for maintenances.
Figure 3. Probability distribution of total costs C (upper plot), circles correspond to costs with failure, the star sign at value 10 to costs of all maintenances plus CA (i.e. without failure). Below is the cumulative distribution function of variable Z.
Figure 4. Upper plot shows the sequence of EZ, lower plot the histogram of accepted locations of times Tk. Both from the second half of iterations.
As expected, optimal number of maintenances was lower, namely , optimal maintenance times were , and achieved maximum of EZ was 10.93.
As shown in Figure 3, for given K and maintenance times it is possible to derive the distribution function of variable Z, therefore the task of maximization of its another characteristics (e.g. of a selected quantile) can be solved rather straightforwardly. Recall that the p-quantile of a random variable Z, , fulfils . In other words, it means that Z is larger than its quantile value with guaranteed probability. Let us consider again the 1-st case, i.e. the case with . The maximal quantile problem was solved with the aid of the same algorithm as the search for maximal expectation. The result for maximal median of Z was not far from optimal solution for EZ, with achieved now for . Then, for instance, the 25% quantile of Z was maximized for , optimal and achieved max .
As the last example, let us consider a variant where two types of preventive maintenance are available. Let the parameters of the case be the same as above, namely the initial distribution being Weibull(100, 2), , , and let there be a choice between two actions, the first accelerating the lifetime by , with cost (i.e. as above), the second cheaper, with cost , however with the lifetime accelerated more, with . The method of random search via the Metropolis algorithm with simulated annealing, proposing also types of actions, yielded an optimal solution consisting of maintenances in and combining both maintenance types. Namely, the procedure found a solution with , optimal sequence of times 51.83, 105.21, 161.21, 212.10, 251.53, 279.43, corresponding maintenance action types were 2, 2, 1, 1, 2, 2, achieved . This is more than in the case with just 1-st action (results are above), and also when just the 2-nd type is used (in this case, maximal was achieved, for ).
In the present paper, a rather general model of sequential maintenance actions, including their effects and costs, was sketched, and the problem of stochastic optimization of action times was formulated. Further, the method of numerical solution, in fact also rather general, was proposed. Different approaches to maintenance impact characterization were then described and discussed. Finally, in order to demonstrate the way of solution, the model was specified, the method applied and illustrated on an artificial example.
The approach proposed in the present contribution can easily be adapted to different formulations of the maintenance problem. The modification can concern to following aspects (among others):
1) Modeling the reliability via different probabilistic distributions of time-to- failure. For reliability of one unit, the standard choices are the Weibull (exponential as well), or the log-normal, gamma, Gumbel and Gauss distribution; sometimes their combination is applied. However, even the empirical distribution based on lifetime data directly, i.e. yielding a stepwise reliability function, can be used.
2) Impact of maintenance action is, in general, described via a change of failure rate. Two ways on how this could be formulated are presented in Section 3. Naturally, other descriptions are possible.
3) Maintenance actions scheduling. Standardly, regular inter-maintenance periods are considered (as in  ), in such a way this part of problem is simplified and one can concentrate to other aspects of optimal solutions (e.g. the extent of maintenance). It has to be said that the solution assuming constant periods could be, as a rule, close to the optimal solution without this assumption. Still, even a simple example in subsection 2.2 shows that there is a difference.
4) Formal objective of optimization. Its choice depends on preferences. In the present contribution, the proportion of time-to-failure to spent costs is considered, its mean value taken as a criterion. Use of selected quantile is discussed, too. A variant approach considers multiple criteria (see, for instance,  comparing several scenarios). As a rule, in the end, it is necessary to select weights of each criterion and to optimize a combination of them.
5) Optimization with constraints (for instance concerning the budget or other sources for maintenance actions). This is another, rather practical approach to multi-objective optimization, where an optimal solution is found in restricted space of “strategies”. Most of randomized optimization procedures (including the one presented here) can control its search to stay inside prescribed space (the simplest way consists in rejection of inadmissible configurations).
6) Maintenance of a system of components, the theme of many studies. The key point of presented approach (see Figure 1) is the knowledge (i.e. a good model) of the time-to-failure probabilistic distribution of individual components before and after each maintenance action. Then, the distribution for the whole system failure can be recomputed, provided its proper scheme is given. Again, optimal times and also the type of actions should be selected, by the action type we mean here what component (and, eventually, to which degree) is to be maintained.
In fact, in a system with parallel structure (including “K of N” system), the failures of individual components should be taken into account rather than the failure of the whole system. The system is still available after one component failure; however, its capacity is reduced during its repair, which causes additional costs. Further, the number of redundant components could also be a matter of optimal choice (cf.  ).
 Sharma, A., Yadava, G.S. and Deshmukh, S.G. (2011) A Literature Review and Future Perspectives on Maintenance Optimization. Journal of Quality in Maintenance Engineering, 17, 5-25.
 Lahiani, N., El Mhamedi, A., Hani, Y. and Triki, A. (2014) Simulation Based Optimization Approach to Solve a Maintenance Process Problem. Proceedings of 2014 International Conference on Control, Decision and Information Technologies (CoDIT), Metz, 3-5 November 2014, 146-151.
 Turan, O., Lazakis, I., Judah, S. and Incecik, A. (2011) Investigating the Reliability and Criticality of the Maintenance Characteristics of a Diving Support Vessel. Quality and Reliability Engineering International, 27, 931-946.
 Hatsey, N.H. and Birkie, S.E. (2021) Total Cost Optimization of Submersible Irrigation Pump Maintenance Using Simulation. Journal of Quality in Maintenance Engineering, 27, 187-202.
 Van Horenbeek, A., Pintelon, L. and Muchiri, P. (2010) Maintenance Optimization Models and Criteria. International Journal of System Assurance Engineering and Management, 1, 189-200.
 Wu, L. and Zhou, Q. (2021) Adaptive Sequential Predictive Maintenance Policy with Nonperiodic Inspection for Hard Failures. Quality and Reliability Engineering International, 37, 1173-1185.
 Tanwar, M., Rai, R.N. and Bolia, N. (2014) Imperfect Repair Modeling Using Kijima type Generalized Renewal Process. Reliability Engineering and System Safety, 124, 24-31.
 Coria, V.H., Maximov, S., Rivas-Dvalos, F., Melchor, C.L. and Guardado, J.L. (2015) Analytical Method for Optimization of Maintenance Policy Based on Available System Failure Data. Reliability Engineering and System Safety, 135, 55-63.
 Linnéusson, G., Ng, A.H.C. and Aslam, T. (2018) Quantitative Analysis of a Conceptual System Dynamics Maintenance Performance Mode Using Multi-Objective Optimisation. Journal of Simulation, 12, 171-189.
 Mellal, M.A., Zio, E. and Williams, E.J. (2020) Cost Minimization of Repairable Systems Subject to Availability Constraints Using Efficient Cuckoo Optimization Algorithm. Quality and Reliability Engineering International, 36, 1098-1110.