Nonlinear optimal control problem, which is disturbed by random noises, is an interesting research topic. In presence of the random noises, the entire state trajectory could not be measured exactly. Due to the nonlinear structure and the fluctuation behavior of the dynamical system, an efficient computational approach is, therefore, necessarily required to estimate the state dynamics. Further from this, the state estimate shall be used to optimize and control the dynamical system, where the optimal control policy is drawn apparently      . From literatures, the applications of the nonlinear stochastic optimal control are widely studied, see for examples, vehicle trajectory planning  , portfolio selection problem  , building structural system  , investment in insurance  , switching system  , machine maintenance problem  , nonlinear differential game problem  , and viscoelastic systems  .
In recent years, using the linear optimal control model with model-reality differences in solving the nonlinear optimal control problem, especially for discrete-time nonlinear stochastic optimal control problem, is proposed     . Such method is known as the integrated optimal control and parameter estimation (IOCPE) algorithm. In this approach, the adjusted parameters are introduced into the model used, so as the differences between the real plant and the model used can be calculated repeatedly. This algorithm is an iterative procedure, where system optimization and parameter estimation are integrated interactively. During the computation procedure, the optimal solution of the model used is updated iteratively. Once the convergence is achieved, the iterative solution of the model used approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences.
Besides, the applications of the IOCPE algorithm in providing the expectation solution as well as the filtering solution of the discrete-time nonlinear stochastic optimal control problem have been well-demonstrated   . In addition, the optimal output solution obtained from the IOCPE algorithm has been improved by using the weighted output residual  , which is introduced into the model cost function, and the output matching scheme  , where the adjusted parameter is introduced into the model output. Moreover, the application of the approaches on the least-square and the Gauss-Newton with the principle of model-reality differences, which omits from using the adjusted parameters, enhance the practical usage of the IOCPE algorithm for delivering the optimal solution of the original optimal control problem   .
By virtue of the improvement done, it is simply seen that the efficiency of the IOCPE algorithm for solving the discrete-time nonlinear stochastic optimal control problem is shown. However, we find that the output residual from the Kalman filtering theory could be further reduced, in turn, having an efficient output solution for representing the original output. Hence, in this paper, we aim to improve the accuracy of the output solution of the model used. In our approach, the stochastic approximation approach, which is an iterative stochastic optimization algorithm     , is applied. The advantage of the stochastic approximation algorithm is to find the optimum of a function, which cannot be computed directly, but only be estimated from noisy observations     , and its applications to control systems have been well-defined      . This advantage motivates us on applying the stochastic approximation algorithm into the IOCPE algorithm can significantly reduce the output residual compared to those output residual from the Kalman filtering theory. Here, the optimal control law, which is based on the state mean propagation, is constructed. At the end of iteration, the trajectories of state and control, which are in expectation manner, are obtained, while the output trajectory could track the real output closely. Hence, the efficiency of the approach proposed is highly recommended.
The rest of the paper is organized as follows. In Section 2, a general discrete-time nonlinear stochastic optimal control problem is described. In Section 3, the stochastic approximation scheme, which is combined with the principle of model-reality differences, is discussed. The calculation procedure is then formulated as an iterative algorithm. In Section 4, an illustrative example on a continuous stirred-tank reactor problem is studied and the applicability of the approach proposed is presented. Finally, some concluding remarks are made.
2. Problem Statement
Consider a general discrete-time nonlinear stochastic optimal control problem given by
where , and are, respectively, control sequence, state sequence and output sequence. The process noise sequence and the measurement noise sequence are the stationary Gaussian white noise sequences with zero mean and their covariance matrices are, respectively, given by and , which both are positive definite matrices. While, is a process noise coefficient matrix, represents the real plant, and is the output measurement, whereas is the terminal cost and is the cost under summation. Here, is the scalar cost function and is the expectation operator. It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.
The initial state
where is a random vector with mean and covariance are, respectively, given by
Here, is a positive definite matrix. It is assumed that initial state, process noise and measurement noise are statistically independent.
This problem, which is regarded as the discrete-time stochastic optimal control problem, is referred to as Problem (P). Notice that the exact solution of Problem (P) is, in general, unable to be obtained. Moreover, applying the nonlinear filtering theory to estimate the state of the real plant is computationally demanding. Nevertheless, the output can be measured from the real plant process.
In view of these weaknesses, a linear model-based optimal control problem, which is referred to as Problem (M), is constructed, given by
where and are, respectively, the expected state sequence and the expected output sequence; is a state transition matrix, is a control coefficient matrix, and is an output coefficient matrix, while and are positive semi-definite matrices and is a positive definite matrix. Here, is the scalar cost function.
It is emphasized that only solving Problem (M) would not give the optimal solution of Problem (P). However, by establishing an efficient matching scheme based on the output error, which is the differences between the real output and the model output, to Problem (M), it is possible to obtain the optimal solution of Problem (P) as solving Problem (M) iteratively. In this point of view, we are motivated to look into the possibility of constructing an expanded optimal control model with the output error. This model formulation is for obtaining the true optimal solution of Problem (P) despite model-reality differences.
3. Optimal Control with Stochastic Approximation
Now, let us define the expanded optimal control problem, which is referred to as Problem (E), is formulated by
where is introduced to separate the output sequence from the respective signals in the output error problem. It is important to note that the algorithm is to be designed such that the constraint will be satisfied at the end of the iterations. In this situation, the output will be used for the output error problem and the establishment of the matching scheme, whereas the corresponding output will be reserved for the model output after optimizing the model-based optimal control problem. Here, the output error is defined as
3.1. Necessary Optimality Conditions
Define the Hamiltonian function as follows
then, the augmented cost function becomes
where and are the appropriate multipliers to be determined later.
Applying the calculus of variation    to the augmented cost function (6), the following necessary optimality conditions are obtained:
1) Stationary condition:
2) Co-state equation:
3) State equation:
with the boundary conditions and .
4) Output equation:
5) Separable variables:
with the multipliers , and .
In view of these necessary optimality conditions, the conditions (7a), (7b) and (7c) are the necessary conditions for Problem (M), while the necessary condition (7d) is an adjustable output measurement. Notice that with this adjustable output, the real output could be tracked by the model output as closely as possible once the output residual is significantly minimized.
3.2. Feedback Optimal Control Law
From (7a), the feedback optimal control law can be calculated from
For more detail, see     for the proof of the derivation on this feedback optimal control law.
Applying (8), the state equation is written as
and the co-state equation is given by
3.3. Stochastic Approximation Scheme
In general, the recursive equation for the stochastic approximation (SA) algorithm     is defined by
where is the set of the parameters to be estimated, is the stochastic gradient, and is the gain sequence. On this basis, refer to Problem (E), let us define and the stochastic gradient, which is assumed to be measurable for the objective function given in (3), is introduced as
Refer to the SA algorithm (12), it leads to the following iterative equations:
These equations would be used to update the optimal solution of Problem (E), in turn, to approximate the optimal solution of Problem (P), in spite of model-reality differences.
Consequently, to evaluate the stochastic gradient, rewrite the output error defined in (4), for k = k +1, as
where the separable variable in (7e) is satisfied. After that, taking the expected output measured (7d) for k = k +1, and substituting by the state equation (10), we have
Hence, from the objective function (3) in Problem (E), the stochastic gradient, which the chain rule differentiation is applied, is calculated from
On the other hand, the gain sequence , which is given in (12), has the asymptotic normality and its convergence property has been well-defined      . In particular, the formulation form of the gain sequence is given from
where a and b are strictly positive and the stability constant A ≥ 0. The practical value of b is 0.602, which provides the generally more desirable slowly decaying gain (17).
3.4. Computational Algorithm
From the discussion above, the resulting algorithm provides the optimal solution of the linear model-based optimal control problem. This optimal solution is then updated based on the stochastic approximation algorithm to approximate the true optimal solution of the original optimal control problem. As a result, the computation procedure of the iterative algorithm is summarized as follows.
Iterative algorithm with SA scheme
Data: Given .
Step 0: Compute a nominal solution. Calculate and from (9a) and (9b), respectively, Then, solve Problem (M) defined by (2) to obtain , and . Set , , and .
Step 1: Compute the output error from (4).
Step 2: With the determined , solve Problem (E) defined by (3) to obtain the new , the new , and the new , respectively, from (8), (10) and (7d).
Step 3: Update the optimal solution given, respectively, by (13a), (13b) and (13c). If , and , within a given tolerance, stop; else set and repeat from Step 1.
1) The off-line computation is done, as stated in Step 0, to calculate and , for the control law design. Then, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (E) in Step 2, respectively.
2) The variable is zero in Step 0 and the calculated value of changes from iteration to iteration.
3) Problem (P) is not necessary to be linear or to have a quadratic cost function.
4) The conditions and are required to be satisfied for the converged optimal control sequence and the converged state estimate sequence. The following averaged 2-norms are computed and then they are compared with a given tolerance to verify the convergence of and :
5) The gain sequence , which is considered in the algorithm proposed, is
where A = 0 from (17).
4. Illustrative Example
Consider the optimal control of a continuous stirred-tank reactor problem  :
with the initial condition
Here, and are Gaussian white noise sequences with their respective covariance given by and .
This problem is referred to as Problem (P).
The linear model-based optimal control problem, which is simplified from Problem (P) and is referred to as Problem (M), is defined by
with the initial condition
and the adjusted parameter is added into the output measurement channel.
By running the approach proposed, the simulation result is shown in Table 1, where it is compared to the result of the filtering solution  . It can be seen that the iteration number of the approach proposed is more than the iteration number of filtering model, and the final cost of the approach proposed is greater than the final cost of filtering model. But, it is found that the output residual of the approach proposed is dramatically reduced to 0.000216 unit, which is a 99 percent reduction. This percentage shows that the model output solution obtained by the approach proposed is significantly closely to the real output trajectory. Hence, this indicates that the approach proposed is practically useful in obtaining the real output solution.
The trajectories of control, state, and output are, respectively, shown in Figures 1-3. It is noticed that the trajectories of control and state are smoothly freely from the disturbance of random noise sequences. This is because of they are an ideal deterministic optimal solution to the nonlinear model-based optimal control problem. However, the real output that is disturbed by the random noise sequences is really fluctuated. By applying the approach proposed, the model
Table 1. Simulation result.
Figure 1. Control trajectory.
Figure 2. State trajectories.
Figure 3. Output trajectories.
Figure 4. Output error.
output trajectory could follow the real output trajectory as closely as possible. Additionally, the output error, which is presenting the differences between the real output and the model output, is shown in Figure 4. As a result of this, it is concluded that the approach proposed is efficient and its applicability is demonstrated.
5. Concluding Remarks
Applying the stochastic approximation scheme into the IOCPE algorithm was discussed in this paper. The aim is to improve the output solution of the model used. From previous studies, the IOCPE algorithm is for solving the discrete-time nonlinear stochastic optimal control problem, while the stochastic approximation is for the stochastic optimization. In combining these two approaches, the state mean propagation is constructed, where the adjusted parameter is added into the model output used. During the calculation procedure, the differences between the real plant and the model used are taken into account for updating the iterative solution repeatedly. On the other hand, the least square output error is established such that the stochastic gradient is derived. Consequently, the iterative solution approximates to the optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, an example on a continuous stirred-tank reactor problem was studied to show the applicability of the approach proposed. In conclusion, the efficiency of the approach proposed is highly recommended.
For the future research direction, it is suggested to apply the SA algorithm to solve the linear model-based optimal control problem, without calculating the adjusted parameter, in order to obtain the true optimal solution of the nonlinear optimal control problem. The result would be compared to the result which is obtained by using the Gauss-Newton method   . Hence, the calculation procedure in the IOCPE could be simplified.
 Liu, H.F., Zhang, Y., Chen, S.F. and Chen, J. (2012) Autonomous Vehicle Trajectory Planning under Uncertainty Using Stochastic Collocation. Advanced Materials Research, 580, 175-179.
 Li, X.P., Yu, C., Zhang, J.Y., Zhou, J.J. and Zhang, L.M. (2013) Instantaneous Stochastic Optimal Control of Seismically Excited Structures Based on Time Domain Explicit Method. Advanced Materials Research, 790, 215-218.
 Liu, J., Yiu, K.F.C., Loxton, R. and Teo, K.L. (2013) Optimal Investment and Proportional Reinsurance with Risk Constraint. Journal of Mathematical Finance, 3, 437-447.
 Abushov, Q. and Aghayeva, C. (2014) Stochastic Maximum Principle for Nonlinear Optimal Control Problem of Switching Systems. Journal of Computational and Applied Mathematics Part B, 259, 371-376.
 Sun, Y., Aw, G., Loxton, R. and Teo, K.L. (2014) An Optimal Machine Maintenance Problem with Probabilistic State Constraints. Information Sciences, 281, 386-398.
 Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2010) An Integrated Optimal Control Algorithm for Discrete-Time Nonlinear Stochastic System. International Journal of Control, 83, 2536-2545.
 Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2012) Filtering Solution of Nonlinear Stochastic Optimal Control Problem in Discrete-Time with Model-Reality Differences. Numerical Algebra, Control and Optimization, 2, 207-222.
 Kek, S.L., Ismail, A.A.M., Teo, K.L. and Rohanin, A. (2013) An Iterative Algorithm Based on Model-Reality Differences for Discrete-Time Nonlinear Stochastic Optimal Control Problems. Numerical Algebra, Control and Optimization, 3, 109-125.
 Kek, S.L., Teo, K.L. and Ismail, A.A.M. (2014) Efficient Output Solution for Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Mathematical Problems in Engineering, 2014, Article ID 659506.
 Kek, S.L., Li, J. and Teo, K.L. (2017) Least Squares Solution for Discrete Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Applied Mathematics, 8, 1-14.
 Kek, S.L., Li, J., Leong, W.J. and Ismail, A.A.M. (2017), A Gauss-Newton Approach for Nonlinear Optimal Control Problem with Model-Reality Differences. Open Journal of Optimization (OJOp), 6, 85-100.
 Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009) Robust Stochastic Approximation Approach to Stochastic Programming. SIAM Journal on Optimization, 19, 1574-1609.
 Spall, J.C. and Cristion, J.A. (1998) Model-Free Control of Nonlinear Stochastic Systems with Discrete-Time Measurements. IEEE Transactions on Automatic Control, 43, 1198-1210.
 Aksakalli, V. and Ursu, D. (2006) Control of Nonlinear Stochastic Systems: Model-Free Controllers versus Linear Quadratic Regulators. Proceedings of the 45th IEEE Conference on Decision and Control (CDC ’06), San Diego, CA, December 2006, 4145-4150.