Application of Conjugate Gradient Approach for Nonlinear Optimal Control Problem with Model-Reality Differences

Show more

1. Introduction

Recently, the integrated optimal control and parameter estimation (IOCPE) algorithm has been proposed [1] in solving the nonlinear optimal control problem, both for discrete time deterministic and stochastic cases (see for more detail in [2] - [9] ). In essence, the concept of the IOCPE algorithm is come from the dynamic integrated system optimization and parameter estimation (DISOPE) algorithm, which was developed by [10] . By using the DISOPE algorithm, optimal control of the deterministic dynamical systems, not only for continuous time but also for discrete time, has been widely discussed [10] [11] . On this point of view, the applications of the DISOPE algorithm have been well-defined. Date back to the 70s, [12] and [13] proposed the integrated system optimization and parameter estimation (ISOPE) algorithm, which is for solving the static optimization problems. Since then, the development of ISOPE algorithm in the dynamic version is rapidly growing up till today.

In fact, the basic idea for ISOPE, DISOPE and IOCPE algorithms is the principle of model-reality differences [1] [10] [13] . Because the structure of the nonlinear optimal control problem is complex and solving such problem is computationally demanding, the simplified model for the original optimal control problem is proposed to be solved iteratively. By adding the adjusted parameters into the model used, the differences between the model used and the real plant can be measured. This measurement is done repeatedly, in turn, to update the optimal solution of the model used. Once the convergence is achieved, the iterative solution approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences [1] [10] [13] . Besides, for solving the discrete time nonlinear stochastic optimal control problem, the Kalman filtering theory is associated with the principle of model-reality differences in order to do state estimation and system optimization [2] [3] [4] [6] .

By virtue of the evolution of these algorithms, the feedback optimal control law is provided in solving the nonlinear optimal control problems, and their effectiveness has been well-confirmed. Nevertheless, the applicability of the open-loop optimal control law in these algorithms shall be investigated such that the popularity of these algorithms could be promoted. This is because of the open-loop optimal control sequences could be generated by taking the advantage of the power of the state-of-the-art nonlinear programming (NLP) solver. Thus, as an efficient optimization technique, the conjugate gradient method [14] [15] has been explored to solve the optimal control problem [16] [17] [18] since last few decades. Thereby, the use of the conjugate gradient method inspires us to explore this method in the IOCPE algorithm practically.

Hence, the application of the conjugate gradient method is discussed in this paper for solving the nonlinear optimal control problem, where the model-reality differences are considered. Apparently, the model-based optimal control problem, which is simplified from the nonlinear optimal control problem, is constructed. Follow from this, the Hamiltonian function is defined and the augmented cost function is obtained. Then, the set of the necessary conditions for optimality is derived. Consequently, the modified model-based optimal control problem is converted to be a nonlinear optimization problem. By applying the conjugate gradient approach, the nonlinear optimization problem is solved and the optimal control sequences are generated. With this open-loop control law, the dynamical system is optimized and the cost function is evaluated. For illustration, optimal control of an economic growth problem [19] is discussed. The results obtained show the applicability of the algorithm proposed.

The structure of the paper is organized as follows. In Section 2, the problem statement is described briefly, where the simplified model from the nonlinear optimal control problem is discussed. In Section 3, system optimization with parameter estimation is further discussed. The use of the conjugate gradient approach in solving the model-based optimal control problem is presented and the calculation procedure is summarized as an iterative algorithm. In Section 4, an economic growth problem is solved and the results are obtained. Finally, the concluding remarks are made.

2. Problem Statement

Consider a general discrete-time optimal control problem, given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{0}\left(u\right)=\phi \left(x\left(N\right),N\right)+{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}L\left(x\left(k\right),u\left(k\right),k\right)}\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(k+1\right)=f\left(x\left(k\right),u\left(k\right),k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)={x}_{0}\end{array}$ (1)

where $u\left(k\right)\in {\Re}^{m},k=0,1,\cdots ,N-1$ and $x\left(k\right)\in {\Re}^{n},k=0,1,\cdots ,N$ are, respectively, the control sequences and the state sequences. Here, $f:{\Re}^{n}\times {\Re}^{m}\times \Re \to {\Re}^{n}$ represents the real plant, $L:{\Re}^{n}\times {\Re}^{m}\times \Re \to \Re $ is the cost under summation and $\phi :{\Re}^{n}\times \Re \to \Re $ is the terminal cost, whereas ${J}_{0}$ is the scalar cost function and ${x}_{0}$ is the known initial state vector. It is assumed that all functions in Equation (1) are continuously differentiable with respect to their respective arguments.

This problem is regarded as the real optimal control problem, and is referred to as Problem (P). Note that the structure of Problem (P) is complex and nonlinear, solving Problem (P) requires the efficient computation techniques. On this point of view, the simplified model of Problem (P) is probably suggested to be solved in order to approximate the true optimal solution of Problem (P). Therefore, let us define this simplified model-based optimal control problem as follows:

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{1}\left(u\right)=\frac{1}{2}x{\left(N\right)}^{\text{T}}S\left(N\right)x\left(N\right)+\gamma \left(N\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}\frac{1}{2}\left(x{\left(k\right)}^{\text{T}}Qx\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)}+\gamma \left(k\right)\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(k+1\right)=Ax\left(k\right)+Bu\left(k\right)+\alpha \left(k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)={x}_{0}\end{array}$ (2)

where $\gamma \left(k\right),k=0,1,\cdots ,N$ and $\alpha \left(k\right),k=0,1,\cdots ,N-1$ are introduced as the adjusted parameters, whereas A is an $n\times n$ transition matrix and B is an $n\times m$ control coefficient matrix. Besides, $S\left(N\right)$ and Q are $n\times n$ positive semi-definite matrices, and R is a $m\times m$ positive definite matrix. Here, ${J}_{1}$ is the scalar cost function.

This problem is referred to as Problem (M).

Notice that, due to the different structures and parameters, only solving Problem (M), without the adjusted parameters, would not obtain the optimal solution of Problem (P). However, by adding the adjusted parameters into Problem (M), the differences between the real plant and the model used can be calculated. In such a way, solving Problem (M) iteratively could give the correct optimal solution of Problem (P), in spite of model-reality differences.

3. System Optimization with Parameter Estimation

Now, introduce an expanded optimal control problem, which is referred to as Problem (E), given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{2}\left(u\right)=\frac{1}{2}x{\left(N\right)}^{\text{T}}S\left(N\right)x\left(N\right)+\gamma \left(N\right)+{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}\frac{1}{2}\left(x{\left(k\right)}^{\text{T}}Qx\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+\gamma \left(k\right)+\frac{1}{2}{r}_{1}{\Vert u\left(k\right)-v\left(k\right)\Vert}^{2}+\frac{1}{2}{r}_{2}{\Vert x\left(k\right)-z\left(k\right)\Vert}^{2}\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(k+1\right)=Ax\left(k\right)+Bu\left(k\right)+\alpha \left(k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)={x}_{0}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\frac{1}{2}z{\left(N\right)}^{\text{T}}S\left(N\right)z\left(N\right)+\gamma \left(N\right)=\phi \left(z\left(N\right),N\right)\end{array}$

$\begin{array}{l}\frac{1}{2}\left(z{\left(k\right)}^{\text{T}}Qz\left(k\right)+v{\left(k\right)}^{\text{T}}Rv\left(k\right)\right)+\gamma \left(k\right)=L\left(z\left(k\right),v\left(k\right),k\right)\\ Az\left(k\right)+Bv\left(k\right)+\alpha \left(k\right)=f\left(z\left(k\right),v\left(k\right),k\right)\\ v\left(k\right)=u\left(k\right)\\ z\left(k\right)=x\left(k\right)\end{array}$ (3)

where $v\left(k\right)\in {\Re}^{m},k=0,1,\cdots ,N-1$ and $z\left(k\right)\in {\Re}^{n},k=0,1,\cdots ,N$ are introduced to separate the sequences of control and state in the optimization problem from the respective signals in the parameter estimation problem, and $\Vert \text{\hspace{0.05em}}\cdot \text{\hspace{0.05em}}\Vert $ de

notes the usual Euclidean norm. The term $\frac{1}{2}{r}_{1}{\Vert u\left(k\right)-v\left(k\right)\Vert}^{2}$ and $\frac{1}{2}{r}_{2}{\Vert x\left(k\right)-z\left(k\right)\Vert}^{2}$ with ${r}_{1},{r}_{2}\in \Re $ are introduced to improve the convexity and

to facilitate the convergence of the resulting iterative algorithm. Here, it is classified that the algorithm is designed such that the constraints $v\left(k\right)=u\left(k\right)$ and $z\left(k\right)=x\left(k\right)$ are satisfied upon termination of the iterations, assuming that convergence is achieved. Moreover, the state constraint $z\left(k\right)$ and the control constraint $v\left(k\right)$ are used for the computation of the parameter estimation and matching scheme, while the corresponding state constraint $x\left(k\right)$ and control constraint $u\left(k\right)$ are reserved for optimizing the model-based optimal control problem. By virtue of this, system optimization and parameter estimation are mutually integrated.

3.1. Necessary Conditions for Optimality

Define the Hamiltonian function for Problem (E) by

$\begin{array}{c}{H}_{2}\left(k\right)=\frac{1}{2}\left(x{\left(k\right)}^{\text{T}}Qx\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)+\gamma \left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{1}{2}{r}_{1}{\Vert u\left(k\right)-v\left(k\right)\Vert}^{2}+\frac{1}{2}{r}_{2}{\Vert x\left(k\right)-z\left(k\right)\Vert}^{2}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+p{\left(k+1\right)}^{\text{T}}\left(Ax\left(k\right)+Bu\left(k\right)+\alpha \left(k\right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-\lambda {\left(k\right)}^{\text{T}}u\left(k\right)-\beta {\left(k\right)}^{\text{T}}x\left(k\right)\end{array}$ (4)

where $\lambda \left(k\right)\in {\Re}^{m},k=0,1,\cdots ,N-1$ , $\beta \left(k\right)\in {\Re}^{n},k=0,1,\cdots ,N$ and $p\left(k\right)\in {\Re}^{n},k=0,1,\cdots ,N$ are modifiers. Then, the augmented cost function becomes

$\begin{array}{c}{{J}^{\prime}}_{2}\left(u\right)=\frac{1}{2}x{\left(N\right)}^{\text{T}}S\left(N\right)x\left(N\right)+\gamma \left(N\right)+p{\left(0\right)}^{\text{T}}x\left(0\right)-p{\left(N\right)}^{\text{T}}x\left(N\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\xi \left(N\right)\left(\phi \left(z\left(N\right),N\right)-\frac{1}{2}z{\left(N\right)}^{\text{T}}S\left(N\right)z\left(N\right)-\gamma \left(N\right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\Gamma}^{\text{T}}\left(x\left(N\right)-z\left(N\right)\right)+{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}{H}_{2}(k)}-p{\left(k\right)}^{\text{T}}x\left(k\right)+\lambda {\left(k\right)}^{\text{T}}v\left(k\right)+\beta {\left(k\right)}^{\text{T}}z\left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\xi \left(k\right)\left(L\left(z\left(k\right),v\left(k\right),k\right)-\frac{1}{2}\left(z{\left(k\right)}^{\text{T}}Qz\left(k\right)+v{\left(k\right)}^{\text{T}}Rv\left(k\right)\right)-\gamma \left(k\right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\mu {\left(k\right)}^{\text{T}}\left(f\left(z\left(k\right),v\left(k\right),k\right)-Az\left(k\right)-Bv\left(k\right)-\alpha \left(k\right)\right)\end{array}$ (5)

where $p\left(k\right),\xi \left(k\right),\lambda \left(k\right),\beta \left(k\right),\mu \left(k\right)$ and $\Gamma $ are the appropriate multipliers to be determined later.

Applying the calculus of variation [20] [21] [22] , the following necessary conditions for optimality are obtained:

1) Stationary condition:

$Ru\left(k\right)+{B}^{\text{T}}p\left(k+1\right)-\lambda \left(k\right)+{r}_{1}\left(u\left(k\right)-v\left(k\right)\right)=0$ (6a)

2) Co-state equation:

$p\left(k\right)=Qx\left(k\right)+{A}^{\text{T}}p\left(k+1\right)-\beta \left(k\right)+{r}_{2}\left(x\left(k\right)-z\left(k\right)\right)$ (6b)

3) State equation:

$x\left(k+1\right)=Ax\left(k\right)+Bu\left(k\right)+\alpha \left(k\right)$ (6c)

4) Boundary conditions:

$p\left(N\right)=S\left(N\right)x\left(N\right)+\Gamma $ and $x\left(0\right)={x}_{0}$ (6d)

5) Adjusted parameter equations:

$\phi \left(z\left(N\right),N\right)=\frac{1}{2}z{\left(N\right)}^{\text{T}}S\left(N\right)z\left(N\right)+\gamma \left(N\right)$ (7a)

$L\left(z\left(k\right),v\left(k\right),k\right)=\frac{1}{2}\left(z{\left(k\right)}^{\text{T}}Qz\left(k\right)+v{\left(k\right)}^{\text{T}}Rv\left(k\right)\right)+\gamma \left(k\right)$ (7b)

$f\left(z\left(k\right),v\left(k\right),k\right)=Az\left(k\right)+Bv\left(k\right)+\alpha \left(k\right)$ (7c)

6) Modifier equations:

$\Gamma ={\nabla}_{z\left(N\right)}\phi -S\left(N\right)z\left(N\right)$ (8a)

$\lambda \left(k\right)=-\left({\nabla}_{v\left(k\right)}L-Rv\left(k\right)\right)-{\left(\frac{\partial f}{\partial v\left(k\right)}-B\right)}^{\text{T}}\stackrel{^}{p}\left(k+1\right)$ (8b)

$\beta \left(k\right)=-\left({\nabla}_{z\left(k\right)}L-Qz\left(k\right)\right)-{\left(\frac{\partial f}{\partial z\left(k\right)}-A\right)}^{\text{T}}\stackrel{^}{p}\left(k+1\right)$ (8c)

with $\xi \left(k\right)=1$ and $\mu \left(k\right)=\stackrel{^}{p}\left(k+1\right)$.

7) Separable variables:

$v\left(k\right)=u\left(k\right)$ , $z\left(k\right)=x\left(k\right)$ , $\stackrel{^}{p}\left(k\right)=p\left(k\right)$. (9)

Notice that the parameter estimation problem is defined by Equation (7) and the computation of multipliers is given by Equation (8). Indeed, the necessary conditions, which are defined by Equations (6a) to (6d), are the optimality for the modified model-based optimal control problem.

3.2. Modified Model-Based Optimal Control Problem

The modified model-based optimal control problem, which is referred to as Problem (MM), is given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{3}\left(u\right)=\frac{1}{2}x{\left(N\right)}^{\text{T}}S\left(N\right)x\left(N\right)+{\Gamma}^{\text{T}}x\left(N\right)+\gamma \left(N\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}\frac{1}{2}\left(x{\left(k\right)}^{\text{T}}Qx\left(k\right)+u{\left(k\right)}^{\text{T}}Ru\left(k\right)\right)}+\gamma \left(k\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+\frac{1}{2}{r}_{1}{\Vert u\left(k\right)-v\left(k\right)\Vert}^{2}+\frac{1}{2}{r}_{2}{\Vert x\left(k\right)-z\left(k\right)\Vert}^{2}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}-\lambda {\left(k\right)}^{\text{T}}u\left(k\right)-\beta {\left(k\right)}^{\text{T}}x\left(k\right)\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(k+1\right)=Ax\left(k\right)+Bu\left(k\right)+\alpha \left(k\right),\text{\hspace{0.17em}}x\left(0\right)={x}_{0}\end{array}$ (10)

with the specified $\alpha \left(k\right),\gamma \left(k\right),\lambda \left(k\right),\beta \left(k\right),\Gamma ,v\left(k\right)$ and $z\left(k\right)$ , where the boundary conditions are given by ${x}_{0}$ and $p\left(N\right)$ with the specified multiplier $\Gamma $.

3.3. Open-Loop Optimal Control Law

For simplicity, define Problem (MM) as an equivalent nonlinear optimization problem with the initial control ${u}^{\left(0\right)}=u{\left(k\right)}^{0}$ , given by

$\underset{u\left(k\right)}{\mathrm{min}}{J}_{3}\left(u\right)$ subject to $u\in {\Re}^{m}$ for $k=0,1,\cdots ,N-1$ (11)

where the admissible control variable u is set to be

$u=\left[{\left(u\left(0\right)\right)}^{\text{T}},{\left(u\left(1\right)\right)}^{\text{T}},\cdots ,{\left(u\left(N-1\right)\right)}^{\text{T}}\right]$.

Let this problem as Problem (N). To proceed, it is noticed that solving Problem (N) could be done once the state Equation (6c) is solved forward and the costate Equation (6b) is solved backward with the corresponding control sequences u. In addition, the gradient function for the objective function ${J}_{3}\left(u\right)$ is evaluated from

$\nabla {J}_{3}\left(u\right)={\nabla}_{u\left(k\right)}H$ (12)

which can be calculated from the Hamiltonian function (4) and the stationary condition (6a) once the necessary conditions for optimality, given by Equations (6) - (9), are satisfied.

Suppose the gradient function (12) is represented as

$g\left(u\right)={\nabla}_{u\left(k\right)}H$. (13)

Then, for arbitrary initial control ${u}^{\left(0\right)}$ , the initial gradient and the initial direction are, respectively, given by

${g}^{\left(0\right)}=g\left({u}^{\left(0\right)}\right)$ and ${d}^{\left(0\right)}=-{g}^{\left(0\right)}$. (14)

By using the line search equation [14] [16] , the control sequences can be generated from

${u}^{\left(i+1\right)}={u}^{\left(i\right)}+{a}_{i}\cdot {d}^{\left(i\right)}$ (15)

where ${a}_{i}\in \Re $ is determined from the one-dimensional search, that is,

${a}_{i}=\underset{a\ge 0}{\mathrm{arg}\mathrm{min}}{J}_{3}\left({u}^{\left(i\right)}+a\cdot {d}^{\left(i\right)}\right)$. (16)

Later, the gradient and the direction are updated as follow:

${g}^{\left(i+1\right)}=g\left({u}^{\left(i+1\right)}\right)$ (17)

${d}^{\left(i+1\right)}=-{g}^{\left(i+1\right)}+{b}_{i}\cdot {d}^{\left(i\right)}$ (18)

with the coefficient

${b}_{i}=\frac{{g}^{\left(i+1\right)\text{T}}{g}^{\left(i+1\right)}}{{g}^{\left(i\right)\text{T}}{g}^{\left(i\right)}}$ (19)

where $i=0,1,2,\cdots $ represents the iteration numbers.

Thus, we present the result on the obtaining optimal control law discussed above as a proposition, given below:

Proposition 1. Consider Problem (N). The control sequences ${u}^{\left(i\right)}$ , which is defined in Equation (15) and is represented by

${u}^{\left(i\right)}=\left[{\left(u\left(0\right)\right)}^{\text{T}},{\left(u\left(1\right)\right)}^{\text{T}},\cdots ,{\left(u\left(N-1\right)\right)}^{\text{T}}\right]$ ,

is generated through a set of the direction vectors ${d}^{\left(i\right)}$ whose components are linearly independent. Also, the direction ${d}^{\left(i\right)}$ is conjugacy.

Proof: Refer [14] .

Here, the conjugate gradient algorithm for obtaining the optimal control law is summarized below:

Algorithm 1: Conjugate gradient algorithm

Data Choose the arbitrary initial control ${u}^{\left(0\right)}$. Compute the initial gradient ${g}^{\left(0\right)}$ and the initial direction ${d}^{\left(0\right)}$ from Equation (14). Set $i$ = 0.

Step 1 Solve the state Equation (6c) forward in time from $k$ = 0 to $k$ = $N$ with the initial condition (6d) to obtain $x{\left(k\right)}^{i},k=0,1,\cdots ,N$.

Step 2 Solve the costate Equation (6b) backward in time from $k$ = $N$ to $k$ = 0 with the boundary condition (6d), where $p{\left(k\right)}^{i}$ is the solution obtained.

Step 3 Calculate the value of the cost functional ${J}_{3}\left({u}^{\left(i\right)}\right)$ from Equation (10).

Step 4 Determine the step size ${a}_{i}$ from Equation (16).

Step 5 Update the control ${u}^{\left(i+1\right)}$ from Equation (15).

Step 6 Update the gradient ${g}^{\left(i+1\right)}$ from Equation (17). If the gradient ${g}^{\left(i+1\right)}=0$ , stop, else go to Step 7.

Step 7 Compute the coefficient ${b}_{i}$ from Equation (19).

Step 8 Update the direction ${d}^{\left(i+1\right)}$ from Equation (18). Set $i=i+1$ , go to Step 1.

Remarks:

1) The initial control ${u}^{\left(0\right)}$ can be any valued-vectors, including the zero vector.

2) The gradient function $g\left(u\right)$ for Problem (N) defined by Equation (11) is calculated from the stationary condition (6a). This is the turning point of using the conjugate gradient algorithm for solving Problem (M) defined by Equation (2) and Problem (MM) defined by Equation (10).

3) The optimal control sequences generated by the line search equation in Equation (15) is known as the open-loop control law.

4) The necessary conditions (6b) and (6c) shall be satisfied in solving Problem (N) defined by Equation (11).

3.4. Iterative Procedure

Accordingly, from the discussion above, a summary of the calculation procedure for the integrated system optimization and parameter estimation is made as follows:

Algorithm 2: Iterative procedure

Data $A,B,Q,R,S\left(N\right),N,{x}_{0},{r}_{1},{r}_{2},{k}_{v},{k}_{z},{k}_{p},f,L$. Note that A and B could be determined based on the linearization of $f$ at ${x}_{0}$ or from the linear terms of $f$.

Step 0 Compute a nominal solution. Assume that $\alpha \left(k\right)=0,k=0,1,\cdots ,N-1$ and ${r}_{1}={r}_{2}=0$. Solve Problem (M) defined by Equation (2) to obtain $u{\left(k\right)}^{0},k=0,1,\cdots ,N-1$ and $x{\left(k\right)}^{0},p{\left(k\right)}^{0},k=0,1,\cdots ,N$. Then, with $\alpha \left(k\right)=0,k=0,1,\cdots ,N-1$ and using ${r}_{1},{r}_{2}$ from the data. Set $i=0$ , $v{\left(k\right)}^{0}=u{\left(k\right)}^{0}$ , $z{\left(k\right)}^{0}=x{\left(k\right)}^{0}$ and $\stackrel{^}{p}{\left(k\right)}^{0}=p{\left(k\right)}^{0}$.

Step 1 Compute the parameters $\gamma {\left(k\right)}^{i},k=0,1,\cdots ,N$ and $\alpha {\left(k\right)}^{i},k=0,1,\cdots ,N-1$ from Equation (7). This is called the parameter estimation step.

Step 2 Compute the modifiers ${\Gamma}^{i},\lambda {\left(k\right)}^{i}$ and $\beta {\left(k\right)}^{i},k=0,1,\cdots ,N-1$ from Equation (8). Notice that this step requires taking the derivatives of f and L with respect to $v{\left(k\right)}^{i}$ and $z{\left(k\right)}^{i}$.

Step 3 With $\gamma {\left(k\right)}^{i},\alpha {\left(k\right)}^{i},{\Gamma}^{i},\lambda {\left(k\right)}^{i},\beta {\left(k\right)}^{i},v{\left(k\right)}^{i}$ and $z{\left(k\right)}^{i}$ , solve Problem (N) by using Algorithm 1. This is called the system optimization step.

a) Use Equation (15) to obtain the new control $u{\left(k\right)}^{i},k=0,1,\cdots ,N-1$.

b) Use Equation (6c) to obtain the new state $x{\left(k\right)}^{i},k=0,1,\cdots ,N$.

c) Use Equation (6b) to obtain the new costate $p{\left(k\right)}^{i},k=0,1,\cdots ,N$.

Step 4 Test the convergence and update the optimal solution of Problem (P). In order to provide a mechanism for regulating convergence, a simple relaxation method is employed:

$v{\left(k\right)}^{i+1}=v{\left(k\right)}^{i}+{k}_{v}\left(u{\left(k\right)}^{i}-v{\left(k\right)}^{i}\right)$ (20a)

$z{\left(k\right)}^{i+1}=z{\left(k\right)}^{i}+{k}_{z}\left(x{\left(k\right)}^{i}-z{\left(k\right)}^{i}\right)$ (20b)

$\stackrel{^}{p}{\left(k\right)}^{i+1}=\stackrel{^}{p}{\left(k\right)}^{i}+{k}_{p}\left(p{\left(k\right)}^{i}-\stackrel{^}{p}{\left(k\right)}^{i}\right)$ (20c)

where ${k}_{v},{k}_{z},{k}_{p}\in \left(0,1\right]$ are scalar gains. If $v{\left(k\right)}^{i+1}=v{\left(k\right)}^{i},k=0,1,\cdots ,N-1$ and $z{\left(k\right)}^{i+1}=z{\left(k\right)}^{i},k=0,1,\cdots ,N$ , within a given tolerance, stop; else set $i=i+1$ , and repeat the procedure starting from Step 1.

Remarks:

1) The variable $\alpha \left(k\right)$ is zero in Step 0. The calculated value of $\alpha \left(k\right)$ changes from iteration to iteration during the calculation procedure.

2) The conjugate gradient algorithm is applied to generate the control sequences $u\left(k\right)$ for Problem (M) and Problem (MM), respectively.

3) Problem (P) is not necessary to be linear or to have a quadratic cost function.

4) The conditions $u{\left(k\right)}^{i+1}=u{\left(k\right)}^{i}$ and $x{\left(k\right)}^{i+1}=x{\left(k\right)}^{i}$ are required to be satisfied for the converged optimal control sequence and the converged state sequence. The following averaged 2-norms are computed and then they are compared with a given tolerance to verify the convergence of $u\left(k\right)$ and $x\left(k\right)$ :

${\Vert {u}^{i+1}-{u}^{i}\Vert}_{2}={\left(\frac{1}{N-1}{\displaystyle \underset{k=0}{\overset{N-1}{\sum}}\Vert u{\left(k\right)}^{i+1}-u{\left(k\right)}^{i}\Vert}\right)}^{1/2}$ (21a)

${\Vert {x}^{i+1}-{x}^{i}\Vert}_{2}={\left(\frac{1}{N}{\displaystyle \underset{k=0}{\overset{N}{\sum}}\Vert x{\left(k\right)}^{i+1}-x{\left(k\right)}^{i}\Vert}\right)}^{1/2}$ (21b)

5) The convergence result on the conjugate gradient algorithm can be referred to [14] , and the convergence result for Algorithm 2 is presented in [4] and [10] .

4. Illustrative Example

Consider a basic economic growth model [19] [23] , which is a discrete time minimization problem, given by

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{0}\left(u\right)={\displaystyle \underset{k=0}{\overset{N-1}{\sum}}{\rho}^{k}L\left(x\left(k\right),u\left(k\right),k\right)}\\ \text{subjectto}\text{\hspace{0.17em}}x\left(k+1\right)=f\left(x\left(k\right),u\left(k\right),k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)=5\end{array}$

where the payoff function and dynamics system are, respectively, defined by

$L\left(x\left(k\right),u\left(k\right),k\right)=-\mathrm{ln}\left(\Lambda x{\left(k\right)}^{\nu}-u\left(k\right)\right)$ and $f\left(x\left(k\right),u\left(k\right),k\right)=u\left(k\right)$.

Here, x is the capital stock, $x$ is the control variable, $\rho $ is the discount factor, whereas $\Lambda {x}^{\nu}$ is a production function with constants $\Lambda >0$ , $0<\nu <1$. The difference between the output and the next period’s capital stock is the consumption.

Let us refer this problem as Problem (P). In literature, the exact solution of Problem (P) is known [24] , and is given by

$V\left(x\right)=C+D\mathrm{ln}\left(x\right),$

with

$C=\frac{\mathrm{ln}\left(\left(1-\nu \cdot \rho \right)\cdot \Lambda \right)+D\cdot \rho \cdot \mathrm{ln}\left(\nu \cdot \rho \cdot \Lambda \right)}{1-\rho}$ and $D=\frac{\nu}{1-\nu \cdot \rho}$.

The unique optimal equilibrium for Problem (P) is given by

${x}^{*}=\frac{1}{\sqrt[\left(\nu -1\right)]{\nu \cdot \rho \cdot \Lambda}}$.

By using the specified parameters $\Lambda =5$ , $\nu =0.34$ and $\rho =0.95$ , the optimal equilibrium is ${x}^{*}\approx 2.0673$ [23] .

In the following, we introduce a simplified model-based optimal control model, which is derived from Problem (P) and is referred to as Problem (M), given below:

$\begin{array}{l}\underset{u\left(k\right)}{\mathrm{min}}{J}_{1}\left(u\right)={\displaystyle \underset{k=0}{\overset{30}{\sum}}\left(\frac{1}{2}\left(0.1\cdot x{\left(k\right)}^{2}+10\cdot u{\left(k\right)}^{2}\right)+\gamma \left(k\right)\right)}\\ \text{subjectto}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(k+1\right)=x\left(k\right)+u\left(k\right)+\alpha \left(k\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)=5.\end{array}$

Note that Problem (M) and Problem (P) are different from the structures and the parameters used.

After running the algorithm proposed within the tolerance (10^{−6}), the result is shown in Table 1. The initial cost, which is 13.072, is the cost spent before taking into account system optimization with parameter estimation. At the end of implementing the algorithm proposed, the final cost is 22.198. There are 41 iterations with 7.84 seconds to reach the convergence.

The graphical results for this economic growth model illustrate the application of the algorithm proposed. Figure 1 shows the final control trajectory and Figure 2 shows the final state trajectory, respectively. With this final control solution, it is observed that the final state towards to the steady state at x = 2.0673. Figure 3 shows the final costate trajectory, while Figure 4 and Figure 5 show, respectively, the adjusted parameters γ(k) and α(k). Overall, these solutions are in the optimal sense, which are verified by the satisfaction of the stationary condition shown in Figure 6.

5. Concluding Remarks

The use of the conjugate gradient approach in solving the nonlinear optimal control problem with model-reality differences was discussed in this paper. Essentially, the simplified model of the original optimal control problem, which is the linear optimal control problem by adding the adjusted parameters, is

Table 1. Simulation result.

Figure 1. Final control trajectory $u\left(k\right)$.

Figure 2. Final state trajectory $x\left(k\right)$ (--) and state equilibrium ${x}^{*}$ (×××××).

Figure 3. Final costate trajectory $p\left(k\right)$.

Figure 4. Adjusted parameter $\gamma \left(k\right)$.

Figure 5. Adjusted parameter $\alpha \left(k\right)$.

Figure 6. Stationary condition g.

formulated. In solving this model-based optimal control problem, the conjugate gradient approach is employed to generate the open-loop control sequences such that the optimal solution is obtained. Here, the stationary condition is used to be the gradient function in the conjugate gradient approach. On the other hand, due on the different structure of the problems, the differences between the real plant and the model used, which is measured by the adjusted parameters repeatedly, are taken into consideration during the iteration calculation procedure. At the convergence, the optimal solution of the model used approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, the application of the algorithm proposed was discussed for solving a basic economic growth model. The results obtained show the efficiency of the algorithm proposed. In conclusion, the applicability of the algorithm is highly recommended.

Acknowledgements

The authors would like to acknowledge the Universiti Tun Hussein Onn Malaysia (UTHM) and the Ministry of Higher Education (MOHE) for the financial support for this study under the research grant FRGS VOT. 1561.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Kek, S.L., Teo, K.L. and Mohd Ismail, A.A. (2010) An Integrated Optimal Control Algorithm for Discrete-Time Nonlinear Stochastic System. International Journal of Control, 83, 2536-2545.

https://doi.org/10.1080/00207179.2010.531766

[2] Kek, S.L., Teo, K.L. and Mohd Ismail, A.A. (2012) Filtering Solution of Nonlinear Stochastic Optimal Control Problem in Discrete-Time with Model-Reality Differences. Numerical Algebra, Control and Optimization, 2, 207-222.

https://doi.org/10.3934/naco.2012.2.207

[3] Kek, S.L., Mohd Ismail, A.A., Teo, K.L. and Rohanin, A. (2013) An Iterative Algorithm Based on Model-Reality Differences for Discrete-Time Nonlinear Stochastic Optimal Control Problems. Numerical Algebra, Control and Optimization, 3, 109-125.

https://doi.org/10.3934/naco.2013.3.109

[4] Kek, S.L., Teo, K.L. and Mohd Ismail, A.A. (2015) Efficient Output Solution for Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Mathematical Problems in Engineering, 2015, Article ID: 659506.

https://doi.org/10.1155/2015/659506

[5] Kek, S.L., Mohd Ismail, A.A. and Teo, K.L. (2015) A Gradient Algorithm for Optimal Control Problems with Model-Reality Differences. Numerical Algebra, Control and Optimization, 5, 252-266.

[6] Kek, S.L. and Mohd Ismail, A.A. (2015) Output Regulation for Discrete-Time Nonlinear Stochastic Optimal Control Problems with Model-Reality Differences. Numerical Algebra, Control and Optimization, 5, 275-288.

https://doi.org/10.3934/naco.2015.5.275

[7] Kek, S.L., Li, J. and Teo, K.L. (2017) Least Squares Solution for Discrete Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Applied Mathematics, 8, 1-14.

https://doi.org/10.4236/am.2017.81001

[8] Kek, S.L., Li, J., Leong, W.J. and Mohd Ismail, A.A. (2017) A Gauss-Newton Approach for Nonlinear Optimal Control Problem with Model-Reality Differences. Open Journal of Optimization, 6, 85-100.

https://doi.org/10.4236/ojop.2017.63007

[9] Kek, S.L., Sim, S.Y., Leong, W.J. and Teo, K.L. (2018) Discrete-Time Nonlinear Stochastic Optimal Control Problem Based on Stochastic Approximation Approach. Advances in Pure Mathematics, 8, 232-244.

https://doi.org/10.4236/apm.2018.83012

[10] Becerra, V.M. and Roberts, P.D. (1996) Dynamic Integrated System Optimization and Parameter Estimation for Discrete Time Optimal Control of Nonlinear Systems. International Journal of Control, 63, 257-281.

https://doi.org/10.1080/00207179608921843

[11] Roberts, P.D. and Becerra, V.M. (2001) Optimal Control of a Class of Discrete-Continuous Nonlinear Systems Decomposition and Hierarchical Structure. Automatica, 37, 1757-1769.

https://doi.org/10.1016/S0005-1098(01)00141-8

[12] Roberts, P.D. (1979) An Algorithm for Steady-State System Optimization and Parameter Estimation. International Journal of Systems Science, 10, 719-734.

https://doi.org/10.1080/00207727908941614

[13] Roberts, P.D. and Williams, T.W.C. (1981) On an Algorithm for Combined System Optimization and Parameter Estimation. Automatica, 17, 199-209.

https://doi.org/10.1016/0005-1098(81)90095-9

[14] Chong, E.K.P. and Zak, S.H. (2013) An Introduction to Optimization. 4th Edition, John Wiley & Sons, Inc., Hoboken, New Jersey.

[15] Mostafa, E.M.E. (2014) A Nonlinear Conjugate Gradient Method for A Special Class of Matrix Optimization Problems. Journal of Industrial and Management Optimization, 10, 883-903.

https://doi.org/10.3934/jimo.2014.10.883

[16] Lasdon, L.S. and Mitter, S.K. (1967) The Conjugate Gradient Method for Optimal Control Problems. IEEE Transactions on Automatic Control, 12, 132-138.

https://doi.org/10.1109/TAC.1967.1098538

[17] Nwaeze, E. (2011) An Extended Conjugate Gradient Method for Optimizing Continuous-Time Optimal Control Problems. Canadian Journal on Computing in Mathematics, Natural Sciences, Engineering & Medicine, 2, 39-44.

[18] Raji, R.A. and Oke, M.O. (2015) Higher-Order Conjugate Gradient Method for Solving Continuous Optimal Control Problems. IOSR Journal of Mathematics, 11, 88-90.

[19] Brock, W.A. and Mirman, L. (1972) Optimal Economic Growth and Uncertainty: The Discounted Case. Journal of Economy Theory, 4, 479-513.

https://doi.org/10.1016/0022-0531(72)90135-4

[20] Bryson, A.E. and Ho, Y.C. (1975) Applied Optimal Control. Hemisphere, Washington DC.

[21] Lewis, F.L., Vrabie, V. and Symos, V.L. (2012) Optimal Control. 3rd Edition, John Wiley & Sons, Inc., New York.

https://doi.org/10.1002/9781118122631

[22] Kirk, D.E. (2004) Optimal Control Theory: An Introduction. Dover Publications, New York.

[23] Grüne, L., Semmler, W. and Stieler, M. (2015) Using Nonlinear Model Predictive Control for Dynamic Decision Problems in Economics. Journal of Economic Dynamics and Control, 60, 112-133.

https://doi.org/10.1016/j.jedc.2015.08.010

[24] Santos, M.S. and Vigo-Aguiar, J. (1998) Analysis of a Numerical Dynamic Programming Algorithm Applied to Economic Models. Econometrica, 66, 409-426.

https://doi.org/10.2307/2998564