Many real processes are not linear in natural, so the actual model would not be necessary known. In addition to this, modeling the real process into a dynamical system could be an alternative solution plan. Since dynamical system has evolved over time, efficient computational approaches are highly demanded, and their development towards to optimize and control dynamical system is properly required. This situation imposes on obtaining the optimal solution of the real process enthusiastically. However, the difficulty level of solving the optimal control problems is increased with respect to the nonlinearity structure of dynamical systems. Simultaneously, the use of output measurement, especially from the industrial control applications  , becomes importance in constructing the corresponding dynamical system, which covers model predictive control     , system identification    , and data-driven control    .
In fact, the solution methods of linear optimal control problem have been well-developed. Particularly, the linear quadratic regulator (LQR) technique is recognized as a standard procedure in solving the linear optimal control problems      . Recently, an efficient computational method, which is based on LQR optimal control model, is proposed to solve the nonlinear stochastic optimal control problems in discrete time     . This approach is known as the integrated optimal control and parameter estimation (IOCPE) algorithm. It is an extension of the dynamic integrated system optimization and parameter estimation (DISOPE) algorithm  . The applications of the DISOPE algorithm have been well-defined in solving the deterministic nonlinear optimal control problem   . By virtue of this, the IOCPE is developed, based on the principle of model-reality differences, for solving the discrete time deterministic and stochastic nonlinear optimal control problems.
Indeed, in both of these iterative algorithms, the adjusted parameters are introduced in the model-based optimal control problem. The aim is to calculate the differences between the real plant and the model used. These differences are then taken into account in updating the model used iteratively. Once the convergence is achieved, the iterative solution could approximate to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. On the other hand, the use of the model output is an additional feature in the IOCPE algorithm  , which does not executed in the DISOPE algorithm.
Definitely, in this paper, the use of the output measurement, rather than adding the adjusted parameters into the model used, is further discussed. In our approach, the LQR optimal control model with the output measurement is simplified from the nonlinear optimal control problem. The differences between the output measurements, which are, respectively, from the model used and the real plant are defined. Follow from this, a least squares scheme is established. The aim is to approximate the output that is measured from the real plant in such a way that the output residual between the output measurements is minimized. In doing so, the linear dynamic system in the model used is reformulated and the control sequence is added into the output channel. Then, the model output is presented as input-output equations.
During the computational procedure, the control trajectory is updated iteratively by using the Gauss-Newton algorithm. As a result, the output residual between the original output and the model output is minimized. Here, the optimal control law is constructed from the model-based optimal control problem, which is not adding the adjusted parameters. By feed backing the updated control trajectory into the dynamic system, the iterative solution of the model used approximates to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. Hence, the efficiency of the approach proposed is highly recommended. On the basis of this, it is highlighted that applying the least-square updating scheme for solving discrete-time nonlinear optimal control problems, both for deterministic and stochastic cases, are well-presented. See  for more details on stochastic case.
The rest of the paper is organized as follows. In Section 2, a discrete time nonlinear optimal control problem is described and the corresponding model-based optimal control problem is simplified. In Section 3, the construction of the feedback optimal control law is discussed. The output residual is defined in which a least-squares minimization problem for the model-based optimal control problem is formulated. The iterative algorithm based on the Gauss-Newton method is established, and the computational procedure is summarized. In Section 4, two illustrative examples, which are current converted and isothermal reaction rector problems, are demonstrated, and their results show the efficiency of the approach proposed. Finally, some concluding remarks are made.
2. Problem Statement
Consider a general discrete time nonlinear optimal control problem, given by
where, and are, respectively, control sequence, state sequence and output sequence, represents the real plant and is the output measurement, whereas is the terminal cost and is the cost under summation. Here, is the scalar cost function and is the initial state. It is assumed that all functions in Equation (1) are continuously differentiable with respect to their respective arguments.
This problem, which is referred to as Problem (P), is complex. Solving Problem (P) would increase the computational burden and the exact solution might not exist due to the nonlinear structure of Problem (P). Nevertheless, in order to obtain the optimal solution of Problem (P), the linear model-based optimal control model, which is referred to as Problem (M), is proposed. This problem is given by
where is model output sequence, is a state transition matrix, is a control coefficient matrix, and is an output coefficient matrix, while and are positive semi-definite matrices and is a positive definite matrix. Here, is the scalar cost function.
Notice that only solving Problem (M) would not give the optimal solution of Problem (P). However, by constructing an efficient matching scheme, it is possible to obtain the optimal solution of the original optimal control problem, in spite of model-reality differences.
3. System Optimization with Gauss-Newton Updating Scheme
Now, consider the following solution method on system optimization. Define the Hamiltonian function for Problem (M) as follows:
Then, the augmented objective function becomes
where is the appropriate multiplier to be determined later.
3.1. Necessary Optimality Conditions
Applying the calculus of variation     to the augmented cost function in Equation (4), the necessary optimality conditions are obtained, as shown below:
(a) Stationary condition:
(b) Costate equation:
(c) State equation:
with the boundary conditions and.
3.2. Feedback Optimal Control Law
According to the necessary conditions given in Equations (5) to (7), a feedback optimal control law could be constructed in which the optimal solution of Problem (M) is obtained. For this purpose, the corresponding result is stated in following theorem.
Theorem 1. For the given Problem (M), the optimal control law is the feedback control law defined by
with the boundary condition given.
Proof: From Equation (5), the stationary condition is rewritten as follows:
Applying the sweep method   , that is,
and substitute Equation (12) for into Equation (11) to yield
Taking Equation (7) in Equation (13), and after some algebraic manipulations, the feedback control law (8) is obtained, where Equation (9) is satisfied.
From Equation (6), after substituting Equation (12) for into Equation (6), the costate equation is rewritten as follows:
Considering the state Equation (7) in Equation (14), we have
Apply the feedback control law (8) in Equation (15), and doing some algebraic manipulations, it is concluded that Equation (10) is satisfied after comparing the manipulation result to Equation (12). This completes the proof. ¨
Taking Equation (8) in Equation (7), the state equation becomes
and the model output is measured from
Hence, the solution procedure of solving Problem (M) is summarized below:
Algorithm 1: Feedback control algorithm
Step 0 Calculate and from Equations (9) and (10), respectively.
Step 1 Solve Problem (M) that is defined by Equation (2) to obtain and, respectively, from Equations (8), (16) and (17).
Step 2 Evaluate the cost function from Equation (2).
a) The data are obtained by the linearization of the real plant and the output measurement from Problem (P).
b) In Step 0, the offline calculation is done for and.
3.3. Gauss-Newton Updating Scheme
Now, let us define the output residual by
where the model output (17) is reformulated as
Rewrite Equation (19) as the following input-output equations  :
Notice that the matrix is the extended observability matrix, and the matrix is one type of block Hankel matrix  .
Hence, consider the objective function, which represents the sum squares of error (SSE), given by
Then, an optimization problem, which is referred to as Problem (O), is defined as follows:
Find a set of the control sequence, such that the objective function is minimized.
To solve Problem (O), consider the second-order Taylor expansion   about the current at iteration:
The first-order condition for Equation (22) with respect to is expressed by
Rearrange Equation (23) to yield the normal equation,
Notice that the gradient of is calculated from
and the Hessian matrix of is computed from
where is the Jacobian matrix of, and its entries are denoted by
From Equations (25) and (26), Equation (24) can be rewritten as
By ignoring the second-order derivative term, that is, the first term at the left-hand side of Equation (28), we obtain the following recurrence relation:
with the initial given. Hence, Equation (29) is known as the Gauss-New- ton recursive equation   .
From the discussion above, the updating scheme based on Gauss-Newton recursive approach for the control sequence is summarized below:
Algorithm 2: Gauss-Newton updating scheme
Step 0 Given an initial and tolerance. Set.
Step 1 Evaluate the output error and the Jacobian matrix from Equations (18) and (27), respectively.
Step 2 Solve the normal equation.
Step 3 Update the control sequence by. If, within a given tolerance, stop; else set and repeat from Step 1 to Step 3.
a) In Step 1, the calculation of the output error is done online, while the Jacobian matrix might be done offline.
b) In Step 2, the inverse of must be exist. The value of represents the step-size for the control set-point.
c) In Step 3, the initial is taken from Equation (8). The condition is required to be satisfied for the converged optimal control sequence. The following 2-norm is computed and it is compared with a given tolerance to verify the convergence of:
d) In order to provide a convergence mechanism for the state sequence, a simple relaxation method is employed:
where, is the state sequence of the real plant and is updated from (16).
4. Illustrative Examples
In this section, two examples are illustrated. The first example shows a direct current and alternating current (DC/AC) converter model   , while the second example gives a model of an isothermal series/parallel Van de Vussue reaction in a continuous stirred-tank reactor   . In these models, the real plants are in nonlinear structure and the single output is measured. Since these models are in continuous time, the simple discretization scheme with the respective sampling time is applied. The optimal solution would be obtained by using the approach proposed and the solution procedure is implemented in the MATLAB environment.
To be convenient, the quadratic criterion cost function, for both Problem (P) and Problem (M), is employed, that is,
4.1. Example 1
Consider the state space representation of a direct current/alternating current (DC/AC) converter model   given by
with the initial, where and represent the current (in unit of ampere) and the voltage (in unit of volt) flow in the circuit, and is the control signal. This problem is referred to as Problem (P).
The discrete time model of Problem (M) is formulated by
for, with the sampling time minute.
The simulation result is shown in Table 1. The initial cost of 0.0429 unit, which is the cost function value for Problem (M), is calculated before the iteration. After five iterations, the convergence is achieved. The final cost of 110.8926 units is preferred instead of the original cost of 1.0885 × 103 units. This reduction saves 89.8 percent of the expense. The value of SSE of 7.647011 × 10?12 shows that the model output is very close to the real output. Hence, the approach proposed is efficient to obtain the optimal solution of Problem (P).
Figure 1. Final control trajectory.
Figure 2. Final output (?) and real output (+) trajectories.
Table 1. Simulation result for Example 1.
be seen that both of the output trajectories are fitted each other with the smallest value of SSE.
Figure 3 shows the control trajectory, which is applied in the real plant. With the matching scheme that is established in the approach proposed, the final state trajectory tracks the real state trajectory closely, as shown in Figure 4.
Figure 3. Real control trajectory.
Figure 4. Real state (+) and final state (?) trajectories.
Figure 5. Initial control trajectory.
Figure 6. Initial state trajectory.
model-reality differences reveal the applicability and reliability of the approach proposed, where the output error is minimized definitely.
4.2. Example 2
Consider the dynamical system of an isothermal series/parallel Van de Vussue reaction in a continuous stirred-tank reactor   :
with the initial, where and are, respectively, the dimensionless reactant and product concentration in the reactor, and is the dimensionless dilution rate. Let this problem as Problem (P).In Problem (M), the model used is presented byfor, with the sampling time second.Table 2 shows the simulation result, where the number of iteration is 5. The
Figure 7. Output error after iteration.
Figure 8. Output error before iteration.
Table 2. Simulation result for Example 2.
implementation of the approach proposed begins with the initial cost of 12.6916 units. During the iterative procedure, the convergence is achieved with giving the final cost of 543.1649 units. This shows a reduction of 99.8 percent of the saving cost from the original cost of 3.0122 × 105 units. The value of SSE of 1.587211 × 10?12 indicates that the approach proposed is efficient to generate the optimal solution of Problem (P).
The graphical result in Figure 9 and Figure 10 shows, respectively, the trajectories of final control, final output and real output. The final control is stable and this stabilization manner makes the steady state of the final output occurred at 1.2324. Moreover, the final output fits the real output very well.
Figure 11 and Figure 12 show the trajectories of control and state in the real plant. With this control trajectory, the state trajectories are converged to 2.8250 and 1.2324, respectively. In addition, by using the approach proposed, this steady state is tracked closely by the final state trajectory.
Figure 9. Final control trajectory.
Figure 10. Final output (?) and real output (+) trajectories.
Figure 11. Real control trajectory.
Figure 12. Real state (+) and final state (?) trajectories.
Figure 13. Initial control trajectory.
Figure 14. Initial state trajectory.
Figure 15. Output error after iteration.
Figure 16. Output error before iteration.
From Examples 1 and 2, the structures of Problem (M) and Problem (P) are clearly different. Solving Problem (M) with taking the Gauss-Newton updating scheme into consideration provides us the iterative solution, which approximates to the correct optimal solution of Problem (P), in spite of the model-re- ality differences. The results obtained are evidently demonstrated in Figure 1 and Figure 16. Hence, the applicability of the approach proposed is significantly proven.
5. Concluding Remarks
In this paper, an efficient computational approach was proposed, where the least squares scheme is established. In our approach, the model-based optimal control problem is solved in advanced. Consequently, the feedback control law, which is constructed from the model used, is added in the output measurement. Through optimizing the sum squares of error, the Gauss-Newton updating scheme is derived. On this basis, the control trajectory is updated repeatedly during the computational procedure. By feed backing the updated control trajectory into the dynamic system, the iterative solution of the model used approximates to the correct optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, two examples were studied. Their simulation results and graphical solutions indicated the applicability and reliability of the approach proposed. In conclusion, the efficiency of the approach proposed is proven.
The authors would like to thanks the Universiti Tun Hussein Onn Malaysia (UTHM) and the Ministry of Higher Education (MOHE) for financial supporting to this study under Incentive Grant Scheme for Publication (IGSP) VOT. U417 and Fundamental Research Grant Scheme (FRGS) VOT. 1561. The second author was supported by the NSF (11501053) of China and the fund (15C0026) of the Education Department of Hunan Province.