Let on a Borel space two following Markov control processes be given:
where , are the controls forming the control policies , (see   for definitions); are sequences of independent and identically distributed (i.i.d.) random vectors in a separable metric space . In what follows the distributions of are denoted by and respectively. Let c be a given bounded measurable one-step cost function; for any initial state and control policy ( is the set of all control policies see ), the expected total α-discounted cost criterion areas follows:
Under assumptions 3.1 and 3.2 given in section 3, there exist stationary optimal policies and such that
To set the stability estimation problem, first, suppose that process given in Equation (1.2) is interpreted as an “available approximation” to process given in Equation (1.1), i.e., is an approximation to .
Second, the policy (optimal with respect to Equation (1.4)) is applied to control the “original process” given in Equation (1.1) (instead of “unavailable” optimal policy ).
Following the definition given in     , we introduce the stability index:
where is the value function defined in Equation (1.5). This definition means that represents anextra cost paid for using instead of the optimal policy .
Under certain Lipschitz conditions it was proved (for the processes with bounded costs c) that
where is an explicitly calculated constant, and is the Lévy-Prokhorov metric (see Section 2 for definition). The convergence in is equivalent to the weak convergence plus the convergence of first absolute moments (see ).
Inequalities as given in Equation (1.6) have been developed with other types of metrics (Kantorovich, total variation, etc.) and optimization criteria (the averange cost) see e.g. in    . Other types of criteria used to obtain the stability of the process can be consulted in   .
The aim of the present paper is making advantage of boundedness of c and using the well-known contractive properties of the operators related to the expected total discount cost optimality equations to prove the “stability inequality” as in Equation (1.6) with the Lévy-Prokhorov distance on its right-hand side.
This paper is organized as follows: Section 2 defines the control Markov model and the problem of its stability. Section 3 presents the Lipschitz conditions and the assumptions to guarantee the existence of a optimal control to the Markov control process as well as the mail result of this work, the Theorem 3.1, which establishes the conditions to achieve the stability. Section 4 is presented a couple of application examples, for which the assumptions are validated and then the result obtained in the Theorem 3.1 is applied. Finally, section 5 has presented the proof of the Theorem 3.1 as well as a couple of lemmas that are required for this proof.
2. Setting of the Problem
In standard way (see for instance  ), it will denote a Markov Control Process (MCP) indiscrete time with infinite horizon, stationary and homogeneous as the following fivefold:
where will be assumed that the controllable process components M have the following characteristics:
• The space state X is a metric space with a metric and denotes the sigma-algebra;
• The actions space A is a metric space with a metric l;
• The set of admissible actions is compact for every ;
• The pairs set of admissible state-actions is anon-empty (and measurable) Borel subset of the set and it is equipped with the metric ;
• p is a stochastic kernel in X given . This stochastic kernel specifies the transition probability:
where and .
• Finally, is a bounded and measurable function called a step cost function.
On the other hand, in many applications the evolution of the MCP given in Equation (2.1) is specified by the following model:
where represents the initial state and it is a sequence of i.i.d. random vectors that take values in any Borel space S with a common distribution . In fact, it is considered that S is a metric space equipped with a metric r and is a measurable function. The expression given in Equation (2.3) will be denoted as the original process.
Let be the initial state and the applied control policy, ( is the set of all control policies, see   for definitions), then the performance criterion called expected total α-discounted cost is defined as usual, by the following functional:
where is a fixed discount coefficient; denotes the expected value corresponds to the distribution of the process with the initial state and the control policy applied.
Now, the function with is called the value function and a control policy (provided it exists) is called optimum (with respect to the criterion ) if it meets the following:
Later, conditions will be imposed that will guarantee the existence of an optimal stationary policy for Equation (2.5), (see ).
The stability index and its estimation problem. Estimation of the stability problem arises when there is uncertainty about the likelihood of transition p defined in Equation (2.2). The original task of controller consists of the search (or approach) of the optimal policy that satisfies Equation (2.5) for the original process. In many applications, this task cannot be accomplished directly because, among other reasons, any of the following:
1) Frequently p or some of its parameters are unknown to the controller and this transition probability is estimated using some statistical procedures (from observations). With the results of these estimates another transition probability is generated , that is interpreted as an approximation accessible to the unknown p.
2) There are situations in which p is known but too complicated to have a hope to solve the problem of optimization of control policy. In such cases, sometimes p is replaced by “a theoretical approach” , resulting in a controllable process with a more simple structure.
In both cases, in the optimization policies the controller is to work with the controllable Markov process defined by the accessible transition probability . This means that instead of the original process , given in Equation (2.3), the controller uses an approximate process given by the following equation:
, , with given (2.6)
where are states of the processes; is an action of the corresponding state; and is a sequence of random vector i.i.d. with values in S. The only difference between the given processes in Equations (2.3) and (2.6) is possible, the different distributions and from the random vector and respectively.
Changing for in Equations (2.4) and (2.5), it is defined as the corresponding optimization criterion for approximate processing
, with .
Suppose now, that it is possible (at least theoretically) to find an optimal policy for the process , i.e., the value for the approximate process function is defined as
The control policy in Equation (2.7) is used as approximation to the nonaccessible optimal policy (assuming it exists). In other words, the policy is used to control the original process M instead of the unknown policy .
The increase in the cost for such an approach is estimated by means of the following stability index, (see  ):
As proposed in  , the estimation of the stability problem consists of the search of some inequalities of the following type (stability inequalities):
is a “distance” between the probabilities of transition p and (expressed in terms of a probabilistic metric).
is a continuous function such that when ; and is a function with values calculated explicity.
The results presented in   provide inequalities such as the one given in the inequality (2.9) using for , and the so-called “strong metrics”: total variation metric and the weighted total variation metric.
The aim of this article is obtaining inequalities of stability such as given in the inequality (2.9) with and the use of “metric weak” probabilistic, specifically, the Lévy-Prokhorov metric ( ).
For instance, the Theorem 3.1 presented in the next section, see inequalities (3.1) and (3.2), ensure that under appropriate conditions it complies
is the Lévy-Prokhorov metric; and denotes the sma-algebra of Borel of metric space .
It is well-known (see ) that metrizes weak convergence in any separable metric space. A succession of random vectors that converge under the metric , converges weakly.
3. Assumptions and Results
The Hausdorff distance (h) between compact subsets of the metric space is given by
Likewise the so-called “strong metric”, the total variation metric ( ) is given by
where are in the space of probability distributions over and is the supremum norm. Of course under , then
On the other hand, one of the metrics called “weak” is the Kantorovich metric ( κ )
where the function it is of Lipchitz, namely, the set is defined as
It is well known (see ) that in the case of , it is true that if and only if (weak convergence) and that .
In the remainder of the article, it will be denoted by B to the Banach space of all measurable functions for which the norm is finite.
The first set of technical assumptions is required to ensure the existence of minimizers in the value functions of the original and the approximate model, see .
1) The set A is compact for each ; also the mapping of values set as is upper semicontinuous with respect to the Hausdorff metric.
2) The one-step cost function is bounded, namely for each , ; and for each , the one-step cost function is lower semicontinuous in A.
3) For each continuous function bounded , the functions
with , are continuous in .
The second set of assumptions imposes the “Lipschitz conditions” on the one-step cost function as well as on the transition probabilities of the original and approximate processes.
There are finite constants such that the following is true:
1) for each ;
2) for all ;
3) for all where is the Hausdorff metric;
4) for all , where is the total variation metric;
5) for all , ;
6) For each , and the bounded function , then the function is lower semicontinuous in .
For a proof of the following proposition, see .
Proposition 1 (Well-known result). Under the assumptions 3.1 and 3.2, for the control processes given in Equations (2.3) and (2.6) there are optimal stationary control policies denoted by and respectively, such that an do not depend on the initial state and
In addition, the corresponding value functions . In particular, for each fixed , expected values and are well defined.
Now, we are in position to formulate the main result of the paper.
Theorem 3.1. Under the assumptions 3.1 and 3.2, the stability index given in Equation (2.8) meets the following inequality:
where the stability constant is
Note that if , then the constant in the inequality (3.2) it is of order .
4. Some Examples
4.1. The Process of Regularization of the Water Level in a Dam
An important application of control problems (deterministic and stochastic) are those related to water reserve operations. An excellent introduction to many of these problems, including the connection between these and inventory systems, is given in .
In the simplest case of regularization of the water level in a dam, the following modeling can be used for the original process:
and the respective approximate model remains as
In this model, the state variable represents the level of the stock (volume) of water that the dam has at the beginning of the period t; the control is the amount of water that is released from the dam for family consumption, irrigation, electric power, etc. during the period t; and the “disturbance” is the amount of water that the dam receives, randomly, viarain for instance.
In this example, we get , , , with , where U is the maximum capacity of the dam.
Let be the cost paid for the released water service, for example, can be made use of a cost function given by , proportional to water consumption and where would represent the cost of a unit of water.
To ensure compliance with assumption 3.2 for this example, it is admitted that the following conditions are met:
➢ C1. The cost for one step satisfies the assumption 3.2 clauses (1) and (2).
➢ C2. The random variable has a density , which is:
1) Bounds by a constant ;
2) Satisfies the condition of Lipschitz with a constant .
For , the clause (3) in the assumption 3.2 is verified directly (using the Hausdorff metric definition) with the constant . Now, denoting by , it is easy to see that for each y fixed, the function is Lipschitz in S with the constant 1. Then the clause (5) of this assumption is complied with . Next, the clauses will be verified (4) of assumption 3.2.
Denoting by and with , , , consider the following random variables:
it is enough to prove that for a constant the following inequality is met
At the time you will see that, according to the definition of the total variation metric, to prove the inequality (4.3) it must be proved that for each measurable function , with it is true that
where for a random variable , we get that .
Using the same representation for , we get that
For the second term on the right side of the last inequality, we get that
Let for instance be . Then from Equation (4.5) and the condition C2, we get that
Let be an arbitrary but fixed number and that denotes an infinitesimal interval with center in z. Since
Then in the inequality (4.4) (taking into account that for ):
or then, assuming for example that , we get that
(Applying the conditions C1 and C2).
Joining inequalities (4.4), (4.6) and (4.7) is obtained the inequality (4.3) with .
Finally it has been established that for this example the clause (4) of assumption 3.2 is met with . Following similar arguments can be shown that the clause (6) of assumption 3.2 is also true. Therefore, in this example inequality (3.1) can be applied to the Theorem 3.1, obtaining the following:
On the other hand, the distance given in the inequality (4.8) is very difficult to calculate. Therefore, the result given in the inequality (4.8) can be expressed in terms of other probabilistic metrics as it’s shown in the following:
v Total variation metric. Using the well-known relationship , see , between the metrics of Lévy-Prokhorov and of total variation and since in this example , we can narrow the part on the right side of inequality (4.8) for the next stability inequality:
where constant is given in the inequality (4.9).
v Kantorovich metric ( ). Let be ( ),
the distribution functions of random variables and , respectively, in Equations (4.1) and (4.2). Then, using the fact that , see , relates the Lévy-Prokhorov metric and the Kantorovich metric (which was defined in Section 2), the part on the right side of the inequality (4.8) is bounded as
where constant is given in the inequality (4.9).
The integral in the last inequality represents the Kantorovich metric between and . The inequality (4.11) is more informative compared to inequality (4.10) since it supports that approximation of for the corresponding empirical distribution functions.
4.2. Example 4.2
Let be , , , with A being a compact set in . Now, define the following processes:
where , , and are bounded and Lipschitz functions with constants and respectively.
In , it is shown that assumption 3.1 is satisfied for this model.
Properly selecting a cost function that is bounded and Lipschitz, it is assured that the clauses (1) and (2) from assumption 3.2 are fulfilled; for instance, if the following cost function is selected by , then given that , we get that
so, by selecting a constant of , this clause (2) is satisfied.
On the other hand, it is clear that the clause (3) is satisfied for any positive constant . To validate the clause (4) of assumption 3.2 first, define the following random variables:
so, it is clear that the probability density of each of the previous random variables is, respectively
then, since this example , after some direct calculations we get to the next result
and as it was assumed that the functions H and G are Lipschitz for the constants , respectively, then from the last inequality we get that
So, by selecting the constant the clause (4) of assumption 3.2 is satisfied. To validate the clause (5) of this assumption, let be , and note that
and since the functions H and G are bounded, let be the finite constant, such that for all . Therefore, from the last inequality we get that
So for a constant of , the clause (5) is satisfied. Finally, since the function is continuous in all its arguments, then the clause (6) is also true.
In conclusion, the example 4.2 satisfies the assumption 3.2, so the result of the Theorem 3.1 can be applied, see inequalities (3.1), (3.2), and narrow the stability index using the Lévy-Prokhorov metric
5.1. Some Preliminary Lemmas
For the proof of the theorem 3.1, the following lemmas will be used:
Lemma 5.1. Under assumption 3.2, the value function defined in Equation (2.5) satisfies the condition of Lipschitz in the state space X, with the
Proof. For the assumption 3.2 clause (1), for each we get that is bounded by , then .
On the other hand, in  it is proved that the following operators:
are contractive in the space of Banach B with module .
Now, of these operators will be selected the terms that are inside the “brackets” to define the following function:
It is claimed that the function is Lipschitz for the constant .
To prove it, let be , , then
Applying the assumption 3.2 clause (2) and the fact that , we get that
Then, applying the assumption 3.2 clauses (4), the previous inequality can be expressed as
therefore, in , where .
By virtue of which the operators given in Equation (5.1) are contractive, we get that .
Then, to prove that the function is Lipschitz in X with the constant , is enough to try the following:
For the function , so that , with , it follows that for all , :
Remark 5.1. Observe that
so, we have that
So if the inequality (5.2) is met, then it is true that
which would conclude that is Lipschitz in X.
Next the proof of the inequality (5.2) is presented.
Let be .
Then, by the inequality of the triangle we get the following:
It will be proven that
The proof will be done by contradiction. Assuming inequality is not met given in (5.4), then there is a such that the following is satisfied:
Due to the compactness of the sets , and to the continuity of g, there are elements , for which the infimum are reached in I, see Equation (5.3).
If it is admitted for example that
Now, for the assumption 3.2 clause (3), as , exists such that and consequently we get that
The above implies that
if this last inequality is substituted in the inequality (5.6), we obtain
which contradicts the fact that is the element for which the minimum of over is reached. Therefore, the assumption made in the inequality (5.5) is false. Then, we get that , which implies that and consequently
Finally, because of the comments made in remark 5.1, we get that
which proves lemma 5.1.
Lemma 5.2. Under assumption 3.2, the value function defined in Equation (2.5) satisfies the condition of Lipschitz in space S with the constant
Proof. For proof of lemma 5.2, the following function will be used: Let be ; define the function as.
Let be , . By the definition of functional we get that
and because of lemma 5.1, we came to the next inequality:
Now, applying assumption 3.2 clause (4) to the previous inequality we get the following:
5.2. The Proof of the Theorem 3.1
To prove inequality (3.1) we take advantage of method proposed in , nevertheless we need to modify this technique and the combination of certain Lyapunov like conditions in the results of the paper allows to use the contractive properties of the operators related to the discount cost optimality equations, so the following are the incorporations and developments required for the proof of the Theorem 3.1 obtained in this article, with a bounded cost function.
Let , be the optimal stationary policies for processes given in Equations (2.3) and (2.6) respectively, and the corresponding value functions.
Then (see chapter 8 of ) and satisfy the following optimality equations (even more, are the only solutions to these equations):
For all are defined
As it has been proved in , the stability index given in Equation (2.8) can be represented as
and is the trajectory of the process given in Equation (2.3) applying the control policy .
By the definition given in Equations (5.10) and (5.8) along with the fact that is optimal for the process given in Equation (2.6), we have that
and by the definition of the functions H and
Then, applying the inequality of the triangle
Define now, the next pseudo-metric:
Then, from Equation (5.12) it is observed that the first summand on the right side of Equation (5.11) is bounded by .
On the other hand, applying the supremum in Equation (5.11), the second term on the right side is
As already mentioned, the operators given in Equation (5.1) are contractive in B with module a. So, Equations (5.7) and (5.8) can be expressed as and . Now, given that and are fixed points of these operators, we get that
now, applying the inequality of the triangle
Using the definition given in Equation (5.12), we obtain that the previous inequality can be expressed as
Substituting this last expression in the inequality (5.13), we get that the second term on the right side of inequality (5.11) is bounded by and so inequality (5.11) is bounded by
Finally, substituting inequality (5.14) in Equation (5.9) we obtain
To find a dimension for , it will be used the definition of the Dudley metric ( ) in the space of distributions in :
(See  for definition and properties of ).
By the lemma 5.2, we get that and since , then the stability index can be narrowed in terms of Dudley’s metric by the following expression:
Now, using the well-known relationship between the Dudley metric and Lévy-Prokhorov metric (see ) and after some direct calculations, the desired inequality (3.1) is obtained with the constant given in Equation (3.2).
Despite the vast literature that exists on the subject of Markov controllable processes, a few studies have been carried out on the subject of estimating stability. The study of stability for Markov control processes represents a challenge both from a theoretical and a practical point of view. Proposing appropriate probabilistic metrics to achieve so-called stability inequalities is an additional effort.
In this article, conditions were found to obtain the stability of a Markov control process under the optimization criterion of expected total α-discounted cost with a bounded cost function using the Lévy-Prokhorov metric.
The importance of being able to use the Lévy-Prokhorov metric lies in the fact that for application problems it allows estimations of the stability index under the use of empirical distributions for the random elements, since they converge weakly under this metric to the distributions that are it tries to estimate (unlike the so-called “strong metrics”).
On the other hand, since in applications, there is no company that can bear unlimited (unbounded) costs, the results found in this work using simple techniques such as contractive operators provide an estimate of the increase in cost (the stability index) to control the “original process” using the optimal policy of the “approximate process”. Of course, the stability constant ( ) affects this stability index, specifically in this work it was found that this constant is of order if . There are arguments to support the hypothesis that in the left part of the inequality (3.1) (for each initial state fixed x): when and the distribution of and are fixed. It is not clear what the rate of such growth is. Therefore, it is proposed that in future research, based on particular (and simple) control processes, to verify the growth rate of using computational experiments and process simulation.
The author is particularly grateful to Professor Edgar Vladyvosky M.S. for his instructive discussions on a generalization of the Markov processes and properties of the Lévy-Prokhorov metric.
 Gordienko, E.I. (1992) An Estimate of the Stability of Optimal Control of Certain in Stochastic and Deterministic Systems. Journal of Soviet Mathematics, 59, 891-899.
 Gordienko, E.I. and Salem, F.S. (1998) Robustness Inequality for Markov Control Processes with Undounded Costs. Systems & Control Letters, 33, 125-130.
 Gordienko, E.I. and Yushkevich, A. (2003) Stability Estimates in the Problem of Optimal Switching of a Markov Chain. Mathematical Methods of Operations Research, 57, 345-365.
 Gordienko, E.I., Lemus-Rodriguez, E. and Montes-de-Oca, R. (2008) Discounted Cost Optimality Problem: Stability with Respect to Weak Metrics. Mathematical Methods of Operations Research, 68, 77-96.
 Gordienko, E.I., Lemus-Rodriguez, E. and Montes-de-Oca, R. (2009) Average Cost Markov Control Processes: Stability with Respect to the Kantorovich Metric. Mathematical Methods of Operations Research, 70, 13-33.
 Montes-de-Oca, R. and Salem-Silva, F. (2005) Estimates for Perturbations of Average Markov Decision Processes with a Minimal State and Upper Bounded by Stochastically Ordered Markov Chains. Kybernetika, 41, 757-772.
 Arapostathis, A., Borkar, V.S., Fernandez-Gaucherande, G.M.K. and Marcus, S.I. (1993) Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey. SIAM Journal on Control and Optimization, 31, 282-344.