The conjugate gradient method (CG) and Quasi-Newton method are two major popular iterative methods for solving smooth unconstrained optimization problems, and Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is one of the most efficient quasi-Newton methods for solving small and medium sized unconstrained optimization problems    . The iterative method is computed by
where is a step size and is a search direction. For continuously differentiable function , the minimization problem:
has been well studied for several decades. Conjugate gradient method is among the preferable methods for solving problem (2) with search direction given by
where is the gradient of an objective function at k iterate and is a scalar describing the attributes of the CG methods.
Some well-known formulas for the scalar are the Hestenes-Stiefel (HS) , Fletcher-Reeves (FR) , Polak-Ribière and Polak (PRP) , and Dai-Yuan (DY)  given by
where and denotes the Euclidean norm. Due to their simplicity and low memory requirement, CG methods are more effective and desirable for large scale unconstrained smooth problems  . The global convergence properties of nonlinear CG methods have been analyzed under the weak Wolfe line search
and the strong Wolfe line search:
where . CG methods use relatively little memory for large scale problems and require no numerical linear algebra, so each step is quite fast. However, they do not have second order information of the objective function, and typically converge much more slowly than Newton or quasi-Newton methods.
The quasi-Newton method is an iterative method with second order information of the objective function, and BFGS is the effective quasi-Newton method with the search direction
where is an approximation of the Hessian matrix of h at . The update formula for is defined by
where is defined as , and the Hessian approximation of (7) satisfies the standard secant equation
if , which is known as the curvature condition. The BFGS method has very interesting properties and remains one of the most respectable quasi-Newton methods for unconstrained optimization . The theory of BFGS method and its global convergence have been established by many researchers (see ). For convex objective function, using some special inexact line search, it has been proved that the BFGS method is globally convergent (see   ). However, when the objective function is nonconvex, the BFGS method under exact line search may fail to converge . Moreover, Dai  proved that the BFGS method may fail for nonconvex functions with Wolfe line search techniques given in (4) and (5) . Wolfe line search technique is the most common monotone line search technique, and it may leads to small steps without making significant progress to the minimum when the contours of the objective functions are a family of curves with large curvature (see    ). In order to overcome this drawback, the first nonmonotone line search technique was proposed by Grippo et al.  for Newton’s method. With this initiative, many nonmonotone line search techniques have been proposed in recent years . Yuan et al.  developed a modified limited memory BFGS method with the update formula that has a higher order approximation to exact Hessian, and its convergence property is analyzed under the nonmonotone line search type. However, the method converges for only uniformly convex functions. Li et al.  proposed a new BFGS algorithm with modified secant equation which achieves both global and superlinear convergence for generally convex functions under the nonmonotone line search of . Su and Rong  introduced and established a new spectral CG method and its implementation under a modified nonmonotone line search technique. They introduced a new spectral conjugate gradient direction
and . It is not difficult to notice that the denominator of (11) is the convex combination of the denominator of the conjugate parameters in HS and PRP conjugate gradient methods. The choice of spectral parameter given (10) ensures the sufficient descent property of the search direction without dependence of line search. The convergence property of their method analyzed under a new modified nonmonotone line search with some mild conditions. However, this spectral CG method has only first order information, and excludes second order information. When the number of dimension is large, the CG methods are more effective compared to the BFGS methods in term of the CPU-time but in term of the number of iterations and the number of function evaluations, the BFGS methods are better. In order to incorporate the remarkable properties of the CG and BFGS methods and to overcome their drawbacks, many hybrid of CG and BFGS methods are introduced for unconstrained smooth optimization   . However, the usage of these methods is mainly restricted to solve smooth optimization problems. Recently, Yuan et al.     introduced some CG approaches to solve nonsmooth convex large scale problems using the smoothing regularization, and under some assumptions, the global convergence properties of these approaches are analyzed. Yuan and Wei  proposed the Barzilai and Borwein (BB) gradient method with nonmonotone line search to solve nonsmooth convex optimization problems. Some implementable quasi-Newton methods are also introduced for solving the same problem (see    ). More recently, Ou and Zhou  introduced a modified scaled BFGS preconditioned CG algorithm, and under appropriate assumptions, the method is proven to possess global convergence for nonsmooth convex functions.
Motivated by the work of Ou and Zhou , in this paper, we propose a hybrid approach of the a scaled CG method and a modified BFGS method to combine the simplicity of CG method and the Hessian approximation of BFGS method. Our work is mainly focused in developing the scaled conjugate search direction that includes the second order information of the objective function by incorporating the modified secant equation of BFGS method. Opposing the work of Ou and Zhou , our method has both the function and gradient value information of the objective function. Moreover, our method leads to better descent direction than the CG methods proposed so far. To the best of our knowledge, this is the first work to incorporate the scaled CG algorithm with the BFGS secant equation which contains both the function and gradient value information of the objective function for solving large scale nonsmooth optimization. Under the new modified nonmonotone line search technique, the global convergence of the algorithm is analyzed for nonsmooth convex problems.
The paper is organized as follows. In the next section, we consider a nonsmooth convex problem and review their basic results. In Section 3, we propose a new scaled CG algorithm that incorporates the BFGS secant equation which has both function value and gradient information of the objective function via the smoothing regularization. Using the new modified nonmonotone line search technique, we prove the global convergence of our new algorithm for nonsmooth convex problems. Numerical results and related comparisons are reported in Section 4. Finally, Section 5 concludes our work.
2. Nonsmooth Convex Problems and Their Basic Results
In this section, we consider the unconstrained optimization problem
where is a possibly nonsmooth convex function. This problem is equivalent to the following problem
where is the Moreau-Yosida regularization of f , which is defined by
where is a positive parameter. The function F is a finite-valued, continuously differentiable convex function even though the function f is nondifferentiable (see ). Let be the unique solution of (14). In what follows, we can express as
Moreover, the gradient of F is globally Lipschitz continuous, i.e.,
The point is an optimal solution to (12) if and only if (see ). Furthermore, under reasonable conditions the gradient of F is semismooth and some of its remarkable properties are given in  .
Several methods have been proposed to solve (13) by incorporating bundle methods and quasi-Newton methods ideas   , but it is burdensome to evaluate the exact value of at any given point x . Luckily, for each and any , we can have such that
Therefore, we can approximate and by
respectively. Implementable algorithms to define such a for nonsmooth convex model can be seen in . The noticeable attributes of and by the following proposition .
Proposition 1. Let be a vector that satisfies (18), and let and be defined by (19) and (20), respectively. Then we obtain
Proposition 1 shows that the approximations of and can be made arbitrarily close to the exact values of and respectively.
3. A Scaled CG Method Based on New BFGS Secant Equation
In this section, we introduce the new scaled CG search direction that incorporates the modified BFGS secant equation, and then describe the new algorithm for solving nonsmooth problems. We make use of a modified nonmonotone line search technique introduced by  to compute a step size. Based on the above approximations, we redefine the search direction of CG method (3) to solve problem (13) as follows:
where is an appropriately chosen positive number. Ou and Zhou  provided a search direction defined by
where is defined
The vector and in (27) are defined as
It is easy to observe that the (27) has only gradient value information. In order to have both gradient and function value information, we replace (27) and (29) by
respectively. Thus, the BFGS method with the secant equation
and the update formula
has both gradient and function value information, and the matrix inherits the positive definiteness of for generally convex functions. Using the secant Equation (32), we propose the new search direction is defined by
Now, based on the above search direction, we describe our new scaled CG algorithm with a modified nonmonotone line search for solving problem (13) as follows.
Step 0. Given , and a point . Set and .
Step 1. If , then stop, else go to the next step.
Step 2. Compute the search direction by using (34)-(37).
Step 3. Set trial step size .
Step 4. Set and choose a scalar such that .
Step 5. Let , is a positive integer, define , and choose
Let be bounded above and satisfy:
If (38) does not holds, define and go to step 5.
Step 6. Set K := k + 1 and go to step 1.
It can be observed that the line search technique in step 5 of Algorithm 1 is a nonmonotone line search technique with some modifications.
In this subsection, we establish the global convergence of our method for nonsmooth convex problem (12). To prove the global convergence of Algorithm 1, the following Lemmas are needed.
Lemma 1. Assume that the search direction is generated by Algorithm 1, then for all , we have
Proof. If , then
Let , then from (34) we have
Once more, (34) yields that
Thus, the proof is completed.
Lemma 1 shows that the search direction developed in (34)-(37) leads to the most sufficiently descent direction and it belongs to a trust region.
Lemma 2. Let the step size satisfy (38), then there exist satisfy a
Proof. If satisfies the formula (38), then the proof is completed. Otherwise, there exist such that
Using mean value theorem, we have
Combining the above inequality with (42), we have
Thus, the proof is completed.
Lemma 3. Assume that the sequence is generated by Algorithm 1, then we have
Proof. We prove this lemma by induction. For , by (38) and , we have
Assume the equation holds for , and we need to show for . To show the condition, we have considered two cases.
Then, from (38), we have
let . Then, again from (38),
Thus, by imposing
Theorem 1. Assume that the sequences and are generated by Algorithm 1. Let F is bounded below on the level set and
Proof. Suppose that (43) is not true. Then there exist constants and such that
From Lemma 3, we have
By (40), (41) and (44), we have
Letting , we have
and this contradicts our assumption on F. Hence the theorem is proved.
Theorem 2. Let the conditions in Lemma 1 and Theorem 1 hold, then Algorithm 1 converges for nonsmooth problem (12).
Proof. From Lemma 1 and Theorem 1, we have
Thus, (23) and convergence of sequence yield
Let be an accumulation point of . Then there exists a subsequence satisfying
Thus, (17), (43) and (47) yield . Therefore is an optimal solution of nonsmooth problem (12).
4. Numerical Experiments for Large Scale Nonsmooth Problems
In this section, we present some numerical experiments to examine the efficiency of Algorithm 1 for some large scale nonsmooth academic test problems which are introduced in . The details of these large scale nonsmooth academic test problems with their initial points and the minimum values are listed as follows:
The problems 1 - 5 are convex functions, and the others are nonconvex functions. We test the above problems with the dimension of , , , , , , , , and . For convenience sake, we denote Algorithm 1 by scaled conjugate gradient method based on modified secant equation of BFGS method (SCG-MBFGS), and in order to demonstrate validity of our algorithm, we also list the results of other three algorithms MPRP in , MHS in  and MSBFGS-CG in . All algorithms were implemented in Fortran 90 and run on a PC with an intel(R) Core(TM)i3-3110M CPU at 2.40 GHz, 4.00 GB of RAM, and the Windows 7 operating system. We stopped the iteration when the condition was satisfied. The parameters for SCG-MBFGS were chosen as . All parameters for other three methods are chosen as in   and  respectively. Table 1 shows the numerical results of SCG-MBFGS, MPRP, MHS and MSBFGS-CG on the given test problems. The columns in Table 1 have the following meanings:
Dim: the dimensions of problem.
NI: the total number of iterations.
NF: the number of function evaluations.
TIME: the CPU time in seconds.
: the value of at the final iteration.
From the numerical results in Table 1, it is not difficult to see that
Table 1. Numerical results for 10 problems with given initial points and dimensions.
SCG-MBFGS is superior or competitive to the other three methods in solving the given problems in terms of number of iteration, number of function evaluations and CPU time. Furthermore, to directly illustrate the performances of our method, we employed the tool provided by Dolan and Moré  to analyze and compare the efficiency of the method in terms of the number of iterations, number of function evaluations and CPU time. Figures 1-3 represent the computational performance profiles of the above algorithms regarding the number of iterations, number of function evaluations and CPU time respectively. From the 3 figures, we can observe that for the given test problems, SCG-MBFGS is competitive or superior to other three methods in terms of number of iteration, function evaluations and CPU time respectively.
From Figure 1 and Figure 2, we also notice that SCG-MBFGS performs better than the other methods do in terms of the numbers of iterations and function evaluations. Figure 3 indicates that MHS is comparable to SCG-MBFGS in terms of CPU time, and since the search direction of MHS is developed with only first order information while SCG-MBFGS, MPRP and MSBFGS-CG are with second order information, it is reasonable to need less CPU time for MHS.
Figure 1. Performance profiles of these three methods based on NI.
Figure 2. Performance profiles of these three methods based on NF.
Figure 3. Performance profiles of these three methods based on CPUTIME.
In this paper, we propose a new scaled conjugate gradient method which incorporates a modified secant equation of BFGS method. This modified secant equation contains both function value and gradient information of the objective function, and its Hessian approximation update generates positive definite matrix. Under a modified nonmonotone line search and some mild conditions, the strong global convergence of the proposed method is analyzed for nonsmooth convex problems. The search direction of our new method generates sufficiently descent condition and belongs to a trust region. Compared with existing nonsmooth CG methods, the search direction of our approach is more descent direction. Numerical results and related comparisons show that the proposed method is effective for solving large scale nonsmooth optimization problems.
The authors would like to thank the reviewers and editor for their valuable comments which greatly improve our paper. This work is supported by the National Natural Science Foundation of China [Grant No. 11771003].
 Broyden, C.G. (1970) The Convergence of a Class of Double-Rank Minimization Algorithms: I. General Considerations. Journal of the Institute of Mathematics and Its Applications, 6, 76-90.
 Hestenes, M.R. and Stiefel, E. (1952) Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards, 49, 409-436.
 Dennis, J.E. and Moré, J.J. (1974) A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods. Mathematics of Computation, 28, 549-560.
 Byrd, R., Nocedal, J. and Yuan, Y. (1987) Global Convergence of a Class of Quasi-Newton Methods on Convex Problems. SIAM Journal on Numerical Analysis, 24, 1171-1189.
 Byrd, R. and Nocedal, J. (1989) A Tool for the Analysis of Quasi-Newton Methods with Application to Unconstrained Minimization. SIAM Journal on Numerical Analysis, 26, 727-739.
 Griewank, A. (1991) The Global Convergence of Partitioned BFGS on Problems with Convex Decompositions and Lipschitzian Gradients. Mathematical Programming, 50, 141-175.
 Panier, E.R. and Tits, A.L. (1991) Avoiding Maratos Effect by Means of Nonmonotone Line Search Constrained Problems. SIAM Journal on Numerical Analysis, 28, 1183-1190.
 Yu, Z. and Pu, D. (2008) A New Nonmonotone Line Search Technique for Unconstrained Optimization. Journal of Computational and Applied Mathematics, 219, 134-144.
 Yuan, G., Wei, Z. and Wu, Y. (2010) Modified Limited Memory BFGS Method with Nonmonotone Line Search for Unconstrained Optimization Problems. Journal of the Korean Mathematical Society, 47, 767-788.
 Li, X., Wang, B. and Hu, W. (2017) A Modified Nonmonotone BFGS Algorithm for Unconstrained Optimization. Journal of Inequalities and Applications, 183, 1-18.
 Babaie-Kafaki, S. and Chanbari, R. (2017) A Class of Descent Four-Term Extension of the Dai-Liao Conjugate Gradient Method Based on the Scaled Memoryless BFGS Update. Journal of Industrial & Management Optimization, 13, 649-658.
 Yuan, G., Wei, Z. and Li, G. (2014) A Modified Polak-Ribière-Polyak Conjugate Gradient Algorithm for Nonsmooth Convex Programs. Journal of Computational and Applied Mathematics, 255, 86-96.
 Yuan, G., Wei, Z. and Li, Y. (2015) A Modified Hestenes and Stiefel Conjugate Gradient Algorithm for Large-Scale Nonsmooth Minimizations and Nonlinear Equations. Journal of Optimization Theory and Applications, 168, 129-152.
 Yuan, G. and Wei, Z. (2015) A Modified PRP Conjugate Gradient Algorithm with Nonmonotone Line Search for Nonsmooth Convex Optimization Problems. Journal of Applied Mathematics and Computing, 51, 397-412.
 Yuan, G. and Wei, Z. (2012) The Barzilai and Browein Gradient Method with Nonmonotone Line Search for Nonsmooth Convex Optimization Problems. Mathematical Modelling and Analysis, 17, 203-216.
 Cui, Z., Yuan, G., Sheng, Z., Liu, W., Wang, X. and Duan, X. (2015) A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations. PLoS ONE, 10, e0140606.
 Burke, J.V. and Qian, M. (2000) On the Superlinear Convergence of the Variable Metric Proximal Point Algorithm Using Broyden and BFGS Matrix Secant Updating. Mathematical Programming, 88, 157-181.
 Ou, Y. and Zhou, X. (2018) A Modified Scaled Memoryless BFGS Preconditioned Conjugate Gradient Algorithm for Nonsmooth Convex Optimization. Journal of Industrial & Management Optimization, 14, 785-801.
 Fukushima, M. and Qi, L. (1996) A Global and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization. SIAM Journal on Optimization, 6, 1106-1120.
 Lemaréchal, C. and Sagastizábal, C. (1997) Practical Aspects of the Moreu-Yosida Regularization: Theoretical Preliminaries. SIAM Journal on Optimization, 7, 367-385.
 Rauf, A.I. and Fukushima, M. (2000) A Globally Convergent BFGS Method for Nonsmooth Convex Optimization. Journal of Optimization Theory and Applications, 104, 539-558.
 Li, D.H. and Fukushima, M. (1999) On the Global Convergence of BFGS Method for Nonconvex Unconstrained Optimization Problems. SIAM Journal on Optimization, 11, 1054-1064.
 Haarala, M. and Mäkelä, M.M. (2004) New Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization. Optimization Methods and Software, 19, 673-692.