The concept of portfolio optimization is of fundamental importance in financial investment theory and practice. It was initially introduced by Harry Markowitz in his historical work about Portfolio selection . Later, Samwel  extended the Markowitz work to a more dynamic framework. He applied dynamic programming approach to derive the optimal decision for a consumption investment model. Further, Merton   used optimal stochastic control technique in continuous time to explicitly determine a closed-form solution of the optimal portfolio problem in the financial investment market comprising risk-less asset and a stock as investment alternatives .
The development and analysis of numerous empirical studies regarding portfolio selection problem have revealed that portfolio returns are definitely asymmetric and due to complexity of financial markets the future security returns are uncertain variables and presented based on the experts’ estimations due to lack of historical data  . In the study by , the skewness was introduced as the measure of the asymmetric property of the portfolio returns while the mean-risk-skewness model for portfolio selection was proposed to favor the uncertain environment. In this study , the hybrid intelligent algorithm was used to solve the optimization model. In similar situations of portfolio returns considered as uncertain variables,  proposed a semi variance technique to be used for handling the diversified portfolio selection problem. In the study of , “99 Method” was employed purposely for computing the expected value and semivariance of the uncertain variables, while the genetic algorithm was employed to seek the best allocation strategy for portfolio selection problem under uncertain environment.  discussed in the study about portfolio selection problem in uncertain environment where the security returns are considered subjective to experts’ estimations and depicted as uncertain variables as well. In this study , the hybrid intelligent algorithm technique was designed so as to provide a general method for solving the new optimization models.
Back to the context of continuous-time stochastic models of financial variables as pioneered by Merton   , the problems of portfolio optimization have been extensively studied . The general issue is how an optimal portfolio can be constructed to quench the investor’s thirst of optimizing the expected profits (expected returns) meanwhile subduing possible losses (possible risks). Thus, from a couple of number of assets available for investments, with prices and returns distribution, the agenda is now, what should be the optimal portfolio? This naturally is what definitely triggers the investors’ minds as far as the investment and portfolio management is concerned.
In financial investments, the general abstraction behind this problem is the selection of the best strategies that could indeed provide optimal results at times an investor is faced with huge varieties of investment decisions about his wealth. The investors, dynamically allocate wealth between the risky and the risk-less assets with the major objective of maximizing total expected returns while minimizing the variance, i.e. the possible risk. The investors’ minds at any point in time want to maximize profit by appropriately and strategically choosing an optimal investment strategy and if it exists, will depend on some factors such as: 1) trending information about the market; 2) initial wealth of the investor; 3) the belief and behavior of investor’s mind in front of the market risks; and 4) the decision criterion used by the investor regarding the optimality of the investment strategy .
While paying attention to continuous time portfolio optimization problem, Researchers have as well noted the impact of mean-reversion on optimal portfolio choice and also is of central importance in the asset allocation problem. The random walk model was the first to be in place as the basic model of stock prices based on the assumption of market efficiency. The basic idea is that returns can be represented as unforecastable fluctuations around some mean return. This assumption implies that the distribution of the returns at time t is independent from, or at least uncorrelated with, the distribution of returns in previous moments. Therefore, mean-reversion is thought of as a modification of the random walk, where returns change are not completely independent of one another but rather are related. Mean-reversion has actually received a considerable attention in the financial world as a classic indicator of predictability in financial markets and has more economic logic than geometric Brownian model  - .
In general, the problem of portfolio optimization can successfully be solved by the theory of optimal stochastic control, where Dynamic programming principle (DPP) and HJB theory are instrumental for finding a solution. Thus, by considering an optimal control of Ito-type processes which satisfy the stochastic differential equation(SDE) w.r.t some Wiener process, our goal is to choose the investment control strategy (i.e. dynamic portfolio strategy) to maximize the expected utility of wealth at some future time  .
The main focus is on portfolio problem of an investor who trades continuously from say time t and maximizes expected utility of wealth at some future time . The problem of finding the optimal strategy is classical and has been extensively studied. Most of these studies considered stock price as Markov process.  through the study of optimal portfolio optimization for an investor who can trade in a risk-free bond and stock, included the stochastic volatility in the dynamics of the risky asset. Its drawback is that, volatility is not directly observable in the market unlike the stock price, and it is therefore in practice impossible to follow portfolio rules where one must take the level of volatility explicitly into account.
The study by  investigated the portfolio selection consisting of instruments whose logarithms are mean-reverting. They assumed that portfolios are constant and also short-selling and borrowing are allowed, and the optimal strategies were found in the sense of time-independent portfolios, i.e. portfolios which do not depend on asset prices, which is not the case in real life situation.
In this study, we focus on optimal strategies in the sense that portfolio depends on asset prices and no borrowing and short-selling (thats no inflow and external flow of cash). As previous observations might be useful in predicting the future prices of the risky asset, then stock-price indexes can be characterized as mean-reversion processes . Therefore, in this work we consider the price dynamics of the risky asset described by the geometric mean reversion (GMR) model
The organization of this paper is as follows. The Wealth stochastic differential equation is formulated in Section 2, while the stochastic optimal control problem is discussed in Section 3. The application of dynamic programming and HJB equation in obtaining the explicit solution of the stochastic optimal control problem is discussed in Section 4. The analysis of the results using MaTLAB software is done in Section 5. Finally, the conclusion and recommendation is provided in Section 6 and Section 7 respectively.
2. Formulation of the Wealth SDE
The stochastic portfolio optimization problem in continuous time is formulated and the stochastic control technique is used to find the optimal portfolio value by maximizing the utility of the wealth at some future time T.
The formulation process has considered the dynamic system characterized by its state at any time, and evolving in an environment formalized by a filtered probability space for satisfying the usual condition on which a 1-dimensional standard Brownian motion valued in is defined .
The problem of portfolio allocation has considered Black-Scholes financial market with two investment possibilities namely: a risk free asset with positive price evolving as
and a risky asset with price at time t described dynamically by the geometric mean-reversion model
The parameters of the market , , are positive constants such that represents the long term mean equilibrium (i.e. the value around which the future trajectories will converge in a long run), is the speed of that convergence and is the degree of volatility.
If the incremental change in the stock price is governed by the above geometric mean reversion relation then, solving (2) provides the price of the stock at time t given(assuming it, a unique solution ) by
The investment problem of an investor who has access to the capital market and wants to transfer current wealth into the bond and stock is considered. His/her preference is to dynamically choose the portfolio strategy in order to maximize the expected utility of wealth at some future time T. Thus, to describe the investor’s actions, the portfolio strategies are introduced.
Definition 1 (  ). Portfolio strategy is a two dimensional stochastic process
satisfying the following conditions:
1) is progressively measurable;
2) is adapted i.e. , is -measurable.
The financial interpretation of the portfolio strategy is that is the number of units of bonds held by the investor at time t and is the number of units of stocks held by the investor at time t.
Therefore the wealth (portfolio value) of an investor with initial capital is such that
The pair is said to be a self-financing provided that, the corresponding wealth process is a continuous and adapted process such that
This implies that changes in the wealth are only due to changes in the bond or stock prices, i.e. no external inflows or outflows of cash.
The investor needs to monitor his/her wealth, and therefore, the fraction of the wealth invested in stocks is set to be the control of the system at time t . Thus, here comes
It is assumed that, be almost surely continuous in and since is assumed to be self-financing, then from (6), the differential equation
below is formulated, and by (1) and (2) Equation (8) takes the form
Through collection of like terms in equation above, then the equation below is obtain
With further elimination of and using (7), finally the wealth stochastic differential equation is obtain
For further simplification of Equation (9) to look much more beautiful, then the setting is done such that and that, substitution into (9) is done to get
which is again a simplified stochastic differential equation of the wealth.
Logically it is assumed that the investor has complete information from the market at all instant, i.e. is adapted. Therefore the investment policy is defined by an -adapted process which is a control process. In this case given a portfolio process , plausibly sounds convenient to rewrite (10) as
The notation is used to emphasize the dependence of the wealth process on the initial wealth and the control. If the Equation (11) has a unique solution X, for a given data, then X is called the controlled process, as it’s dynamics are driven by the actions of the control process .
3. The Stochastic Optimal Control Problem
From (11), it is supposed that at time . The investor wants to maximize the expected utility of the wealth at some future time . We assume that , and by the concept of utility function from which the utility function U to the wealth is assigned, then the Optimization criterion or a Reward function is then defined as
where is the first exit time from the region defined in Theorem below .
Theorem 1 (  ). Let X be a cad-lag, adapted process and be an open subset of
1) If the filtration satisfies the usual conditions, then the hitting time of defined by
(with the convection ) is a stopping time.
2) If X is continuous, then the exit time of defined by
is a predictable stopping time.
Actually, is the amount of the wealth at any time before exit from the region . We notice that, Equation (12) is a performance criterion of the form:
with and .
It is required to maximize the expected utility of the wealth over the class of all admissible portfolio strategies that satisfy
Now, the value function of the control problem which is actually our stochastic optimal control problem is defined as follows
The main wish is to find an optimal strategy for which an optimal value is attained, that is
4. Dynamic Programming and Hamilton-Jacobi-Bellman Equation
At this juncture, the stochastic optimal control problem (15) is solved by maximizing the performance function (12) satisfying condition (14) and subject to the state (wealth) Equation (11).
The statement of the stochastic version of Bellman’s principle of optimality, which is commonly known as the Dynamic programming principle (DPP) is provided as a reference for the next discussions.
Theorem 2 (Bellman’s equation  ). For all and
Briefly the principle says that, an optimal policy from to T passing through is also optimal in . Its thorough proof is in , and one can also find it in  and .
It can be noticed that, an optimal control problem (15) is similar to Bellman’s Equation (16) in Theorem 2, with .
The differential operator is applied to the value function V in (15) to get
whereby, the comparison with the wealth SDE (11), provides that
and , being the substitution made in (9) to yield (11). Hence from Equation (15), it is possible to deduce the HJB equation
where stands for any utility function that shall be applied in here.
Therefore, for all , the main interest is to find the value of which, in turn, it maximizes the function
Since , then let such that . Thus
for simplicity, before dealing with the value of which maximizes above, it is first better to linearly approximate the function by Taylor series at . Thus by Taylor series the expression below is found
Therefore, making substitution for , an approximated linear expression
is obtained. Plug Equation (20) into the function in Equation (19), to get the approximated function which is named as .
The Equation (21), then modifies the HJB Equation (18) and become
which is the same as
Now, assuming that, V satisfies conditions of being strictly concave and increasing, and that has a maximum value at some , then
is achieved and solving for from the expression above, finally the result is obtained to be
With substitution of Equation (24) into HJB Equation (23), the partial differential equation below is obtained
which is a boundary value problem for V. This boundary value problem is extremely hard to solve for general utility function U. Thus, the work would be simplified if we consider the specific utility functions. We start to implement this by stating hereunder, the first theorem which thereafter will be followed by its proof.
Theorem 3. Suppose that, for all -adapted control process of the wealth , the solution for the boundary value problem (25) exists, and that, the investor’s behaviour is modeled by the power utility function
Then, the optimal control strategy is given by
where the constants and are positive and depend on the market parameters.
Proof. Since V is a function of two variables t and x, then by separation of variables (or product method), the goal is to have a solution of the form
satisfying the boundary value problem (25), and therefore, it is required to solve for h. From (27), it is found that
Then, substituting Equation (28) into BVP (25), the equation below is obtained
Since as then a simplified equation below is obtained
which is a separable differential equation. The Equation (29) is solved, while setting , for . The solution is then found to be
Hence Equation (27) becomes
where . Now, from (31) the partial derivatives and are obtained, which are then plugged into (24) to get
which is equivalent to (26) with , and , and the proof is hence complete.
And consequently, the Equation (31) is then the solution of the HJB Equation (23), provided that .
Theorem 4. Suppose that the first hypothesis in Theorem 3 is considered, and suppose that, an exponential utility function.
is considered in the modeling of the investor’s behavior in the market playground. Then the optimal policy is inversely proportional to the wealth. That is
Proof. By the separation technique, the proof begins by assuming that, the value function is given by such that:
Therefore, plug (33) into (25) and then look forward to obtain the function h.
Since then, through setting for , it appears to have separable ordinary differential equation
from which the solution is simply obtained. That is
whereby . Hence, the solution is achieved such that
So, from (35), the partial derivatives
are easily obtained, and then substitution into (24) is performed to get another expression for the optimal strategy in the case of exponential utility considered as the investor’s behavioral measure. That is
and the proof is complete.
The optimal control obtained in both cases of utility functions, depends on the wealth x, the market parameters and as well as for the first case and a for the second case. The results obtained here look different from the other results which have been found by other researchers.
The differences actually arise from the fact that, most of the researches which have been conducted particularly in the optimal portfolio problems, the dynamics of the risky assets (stocks) have been described by the geometric Brownian motion. The controlled SDE for the wealth process formulated from that model leads to the value function from which the optimal policy is obtained and found to be independent from time and the wealth in particular.
In this study, the dynamics of the risky asset is described by the geometric mean reversion (GMR) processes as the Equation (2) shows. The formulation of the controlled wealth SDE incorporates the deterministic differential Equation (1) and the GMR model (2), and from there the value function (14) is defined and hence the optimal policies which depend on the wealth and the market parameters are determined as indicated above.
5. Analysis of the Results
In this section, the use of MATLAB software to implement the simulation of the optimal strategy and study its behavior in relation to the wealth is essentially done. Also the implementation of the simulation of the value function with respect to time and the wealth for the same market parameters used in the simulation of optimal policy is well achieved. For both cases, power utility and exponential utility, the results are analyzed differently.
5.1. The Analysis of Optimal Strategy in the Case of Power Utility
At this juncture, the simulation of the results obtained by solving the portfolio problem when the power utility used as the measure of the investor’s behavior is implemented. The implementation of the simulation of the optimal strategy with respect to wealth of the portfolio with the market parameters , , , , and is effectively achieved. Figure 1 shows that, the optimal investment strategy decreases almost to zero as the wealth increases.
This implies that, as the investor becomes richer the less he invests in risky assets. This result looks somewhat absurd as it contradicts with the economic
Figure 1. The optimal policy with respect to wealth for the power utility case.
interpretation of the absolute risk aversion(ARA) , for power utility, which signifies that, someone with higher capital is less afraid of taking risk
in investing on risky assets. On the other hand, the result concurs exactly with what is happening in real life situation, whereby as someone gains more wealth, then deposits most of his/her wealth in bank accounts than investing in risky assets. He/she looks somewhere he can invest his wealth with minimum or almost no risk at all to take on, while expecting for an absolute perfect return.
5.2. The Analysis of Value Function in the Case of Power Utility
At this point the intention is to study graphically how the value function behaves in relation to time and the wealth with the same market parameters used above.
The value function decreases with time and wealth. The observations show that, the value function does not decrease exactly to zero, yet it reaches a certain point where it shows some unnoticeable changes with respect to wealth, while continues to decrease exponentially with the increase in time. The surface described so far in Figure 2 shows a nonlinear relationship between the value function and the time and wealth as well.
5.3. The Analysis of Optimal Strategy in the Case of Exponential Utility
The results obtained when exponential utility used as the measure of the investor’s behavior in the market are considered. The market parameters , , , , and are used. The realization of the graph of optimal investment strategy with respect to the wealth of the portfolio for the exponential utility is done.
Figure 3 shows that, the optimal strategy varies inversely with respect to wealth. As the wealth increases the optimal policy decreases. This result has the same implication as the one already discussed above for the power utility. Thats, the genuine investor reduces his proportions invested in risky assets and deposits them in bank accounts. This means that, the investor escapes from too much trading and now tries to find more time to get relaxed and avoid stresses.
5.4. The Analysis of Value Function in the Case of Exponential Utility
At this point, the realization of the value function with respect to time and wealth of the portfolio and the same market parameters used above for the exponential utility is well done. Figure 4 shows that, the value function does not vary with respect to the wealth, but rather varies exponentially with respect to time.
Figure 2. The value function with respect to time and wealth for the power utility case.
Figure 3. The optimal strategy with respect to current wealth for the exponential utility case.
Figure 4. The value function with respect to time and wealth for the exponential utility case.
The value function increases negatively as the time advances with no effect from the wealth. The value function remains maximum no matter how wealth increases, however, that is not the case with time.
6. Conclusion and Recommendation
This paper has provided discussion on portfolio management under the mean-reverting stock returns and the constant force of interest for bond returns. The problem of portfolio optimization has been approached by the theory of stochastic optimal control technique. The determination of optimal investment strategies and the value functions from the two theorems which have been stated and then proved for the power utility and exponential utility cases have been achieved. The results however show that, the optimal investment rules are absolutely inversely related to the wealth and therefore rules out the popular investment allocation advice that, the more capital someone has the more he/she invests in risky assets for quick and better expected returns. The popular investment allocation advice is that, the wealthier someone is, the less he/she fears in investing on the risky assets. However, this is contrary to the above findings obtained in this work. The investment problem studied so far involves only two assets, namely, bonds with the price at time t evolving exponentially with constant interest rate r and the stocks whose price at time t described by geometric mean-reversion model. The introduction of extra features such as consumption, human capital and transaction costs may bring model improvements and hence the optimal asset allocation choice. Also the use of other utility functions in handling the problem is highly recommended before arriving to the general conclusion of the results so far obtained in this work.
A.1. Codes for Numerical Simulation of Optimal Policy and Value Function for Power Utility Case
A.2. Codes for Numerical Simulation of Optimal Investment Strategy and Value Function for Exponential Utility Case
 Benth, F.E., Karlsen, K.H. and Reikvam, K. (2003) Merton’s Portfolio Optimization Problem in a Black-Scholes Market with Non-Gaussian Stochastic Volatility of Ornstein-Uhlenbeck Type. Mathematical Finance, 13, 215-244.
 Chen, L., Peng, J., Zhang, B. and Rosyida, I. (2017) Diversified Models for Portfolio Selection Based on Uncertain Semivariance. International Journal of Systems Science, 48, 637-638. https://doi.org/10.1080/00207721.2016.1206985
 Zhang, B., Peng, J. and Li, S. (2015) Uncertain Programming Models for Portfolio Selection with Uncertain Returns. International Journal of Systems Science, 46, 2510-2519. https://doi.org/10.1080/00207721.2013.871366
 Koijen, R.S.J., Rodriguez, J.C. and Sbuelz, A. (2009) Momentum and Mean-Reversion in Strategic Asset Allocation. Journal of Management Science, 55, 1199-1213.
 Munk, C., Sorensen, C. and Vinther, T.N. (2004) Dynamic Asset Allocation under Mean-Reverting Returns, Stochastic Interest Rates and Inflation Uncertainty: Are Popular Recommendations Consistent with Rational Behaviour? International Review of Economics and Finance, 13, 141-166.