An Efficient Projected Gradient Method for Convex Constrained Monotone Equations with Applications in Compressive Sensing

Show more

1. Introduction

This paper is dedicated to solving the following nonlinear convex constrained monotone equations:

$F\left(x\right)=0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\in \Omega ,$ (1)

where $F:{R}^{n}\to {R}^{n}$ is a continuous nonlinear mapping and the feasible region $\Omega \subset {R}^{n}$ is a nonempty closed convex set, e.g. an n-dimensional box, namely, $\Omega =x\in {R}^{n}:l\le x\le u$. Monotone means that

$\langle F\left(x\right)-F\left(y\right),x-y\rangle \ge 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall x,y\in {R}^{n},$ (2)

where the $\langle \cdot ,\cdot \rangle $ denotes the inner product of vectors. The problems (1) emerges in many fields such as economic equilibrium problems [1], chemical equilibrium systems [2] and the power flow equations [3]. Based on the work of Solodov and Svaiter [4], Wang et al. [5] proposed a projection type method to solve Equation (1). The obtained method in [5] possesses global convergence property without any regularity assumptions. Nevertheless the method needs to solve a linear equation at each iteration. To avoid solving the linear equation and improving the effectiveness, some projected conjugate gradient methods [6] [7] [8] [9] are studied based on the projection technique of Solodov and Svaiter [4]. The numerical results gained in [6] [7] [8] [9] indicate that the projected conjugate gradient type methods for solving problem (1) are indeed efficient and promising. In this paper, by combining the well-known Polak-Ribière-Polyak [10] [11] method with the projection technique of Solodov and Svaiter [4], a conjugate gradient projected method with fast convergent property is proposed for the nonlinear monotone equations with convex constraints. Under some mild conditions, the global convergent results are established for the given method. The obtained method possesses the following three beneficial properties: 1) The search direction satisfies the sufficient descent condition, 2) The global convergence is independent of any merit function, and 3) It is derivative-free method and is effective for large scale nonlinear convex constrained monotone equations (with a maximum dimension of 100,000). Furthermore, the obtained method is extended to solve the ${l}_{1}$ -norm problem by reformulating it as non-smooth monotone equations.

In Section 2, the modified PRP-type conjugate gradient projected method is proposed, and some preliminary properties are studied. The global convergence results are established in Section 3. The numerical experiments, and the applications of the obtained method for ${l}_{1}$ -norm regularized compressive sensing problems are discussed in Section 4. Finally, we have a conclusion section.

2. The Proposed Method and Corresponding Algorithm

We firstly introduce the definition of the projection operator ${P}_{\Omega}[\cdot ]$ which is defined as the mapping from ${R}^{n}$ to $\Omega $,

${P}_{\Omega}\left[x\right]=\mathrm{arg}\mathrm{min}\left\{\Vert y-x\Vert |y\in \Omega \right\},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall x\in {R}^{n},$

where $\Vert \text{\hspace{0.05em}}\cdot \text{\hspace{0.05em}}\Vert $ denotes the Euclidean norm of vectors, $\Omega $ is a nonempty closed convex subset of ${R}^{n}$.

The projection operator is non-expansive, namely, for any $x,y\in {R}^{n}$, the following condition holds

$\Vert {P}_{\Omega}\left[y\right]-{P}_{\Omega}\left[x\right]\Vert \le \Vert x-y\Vert .$ (3)

Let’s review the Polak-Ribière-Polyak [10] [11] conjugate gradient method briefly. The PRP method is firstly designed for solving the unconstrained optimization problem:

$\mathrm{min}\left\{f\left(x\right)|x\in {R}^{n}\right\},$ (4)

where $f:{R}^{n}\to R$ is continuously differentiable. It generates the iteration sequence $\left\{{x}_{k}\right\}$ in the form

${x}_{k+1}={x}_{k}+{\alpha}_{k}{d}_{k},$ (5)

where ${x}_{k}$ is the current iteration point, ${\alpha}_{k}>0$ is a step-length, and ${d}_{k}$ is the search direction given by

${d}_{k}=\{\begin{array}{ll}-{g}_{k}+{\beta}_{k-1}^{PRP}{d}_{k-1},\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k>0,\hfill \\ -{g}_{k},\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0,\hfill \end{array}$ (6)

where ${\beta}_{k-1}^{PRP}=\frac{{g}_{k}^{\text{T}}{y}_{k-1}}{{\Vert {g}_{k-1}\Vert}^{2}}$, ${y}_{k-1}={g}_{k}-{g}_{k-1}$.

Combining the projected technique of Solodov and Svaiter [4] with the PRP method formed by Equation (5) and Equation (6), the following modified PRP formula is defined given in this paper

$\begin{array}{r}\hfill {d}_{k}=\{\begin{array}{ll}-{g}_{k}+\frac{{g}_{k}^{\text{T}}{y}_{k-1}{d}_{k-1}-{d}_{k-1}^{\text{T}}{g}_{k}{y}_{k-1}}{\mathrm{max}\left\{2\gamma \Vert {d}_{k-1}\Vert \Vert {y}_{k-1}\Vert ,{d}_{k-1}^{\text{T}}{y}_{k-1},{\Vert {g}_{k-1}\Vert}^{2}\right\}},\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k>0\hfill \\ -{g}_{k},\hfill & \text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=0,\hfill \end{array}\end{array}$ (7)

where ${y}_{k-1}={g}_{k}-{g}_{k-1}$ and $\gamma >0$ is a constant.

It is show be noted that the proposed direction formula Equation (7) reduces to PRP formula if the exact line search is used. Furthermore, the sufficient descent condition automatically holds for all k, since ${d}_{k}^{\text{T}}g\left({x}_{k}\right)=-{\Vert g\left({x}_{k}\right)\Vert}^{2}$. There are some conjugate gradient methods with similar idea concerning Equation (7) have been studied in the papers [12] - [19].

The corresponding modified PRP conjugate gradient projection algorithm for solving problem (1) starts as follows.

Algorithm 1:

Step 0 Choose any initial point ${x}_{0}\in \Omega $, and select constants $\rho \in \left(0,1\right)$, $\gamma >0$, $\sigma >0$, $\xi >0$, $\u03f5\in \left(0,1\right)$ and ${d}_{0}=-F\left({x}_{0}\right)$. Let $k:=0$.

Step 1 If $\Vert F\left({x}_{k}\right)\Vert \le \u03f5$, stop. Otherwise compute search direction ${d}_{k}$ by Equation (7) with ${g}_{k}$ and ${g}_{k-1}$ replaced by ${F}_{k}$ and ${F}_{k-1}$, respectively.

Step 2 Let ${z}_{k}={x}_{k}+{\alpha}_{k}{d}_{k}$, where ${\alpha}_{k}=\mathrm{max}\left\{\xi {\rho}^{i}|i=0,1,\cdots \right\}$ such that

$-\langle F\left({x}_{k}+{\alpha}_{k}{d}_{k}\right),{d}_{k}\rangle \ge \sigma {\alpha}_{k}{\Vert {d}_{k}\Vert}^{2}.$ (8)

Step 3 If $\Vert F\left({z}_{k}\right)\Vert \le \u03f5$, stop and let ${x}_{k+1}={z}_{k}$. Otherwise compute the next iteration by

${x}_{k+1}={P}_{\Omega}\left[{x}_{k}-{\beta}_{k}F\left({z}_{k}\right)\right],$ (9)

where

${\beta}_{k}=\frac{\langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle}{{\Vert F\left({z}_{k}\right)\Vert}^{2}}$ (10)

Step 4 Let $k:=k+1$, and go to Step 1.

Remark 1: In the algorithm 1, the step size ${\alpha}_{k}$ given by Equation (8) satisfies

$\langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle >0,$

where ${z}_{k}={x}_{k}+{\alpha}_{k}{d}_{k}$, ${d}_{k}$ is the search direction. Moreover, for any ${x}^{*}$ such that $F\left({x}^{*}\right)=0$,

$\langle F\left({z}_{k}\right),{x}^{*}-{z}_{k}\rangle \le 0.$

comes from the monotonicity property of $F\left(x\right)$. This means that the hyperplane

${H}_{k}=\left\{x\in {R}^{n}|\langle F\left({z}_{k}\right),x-{z}_{k}\rangle =0\right\}$

strictly separates the current point ${x}_{k}$ from the solution set of the problem. The above facts and Step 3 indicate that the next iteration ${x}_{k+1}$ is computed by projecting ${x}_{k}$ onto the intersection of the feasible set $\Omega $ with the halfspace ${H}_{k}$.

3. Convergence Analysis

In this section, we are going to discuss the convergence property of the given method. Before that, there are some basic assumptions on problem (1) needs to been given.

Assumption 1: The mapping F is Lipschitz continuous with constant $L>0$ in a set $\Omega $, written $F\in \text{Lip}\left(\Omega \right)$, for every $x,y\in \Omega $,

$\Vert F\left(x\right)-F\left(y\right)\Vert \le L\Vert x-y\Vert .$ (11)

Assumption 2: The solution set of the problem (1), denoted by S, is nonempty convex.

For conjugate gradient method, the sufficient descent property is essential in the convergence analysis, the following lemma shows that the search direction $\left\{{d}_{k}\right\}$ generated by Algorithm 1 satisfies the sufficient descent condition independent of line search.

Lemma 1: Let the sequence $\left\{{x}_{k}\right\}$ and $\left\{{d}_{k}\right\}$ be generated by Algorithm 1. Then, for all $k\ge 0$,

$F{\left({x}_{k}\right)}^{\text{T}}{d}_{k}=-{\Vert F\left({x}_{k}\right)\Vert}^{2},$ (12)

and

$\Vert {d}_{k}\Vert \le \left(1+\frac{1}{\gamma}\right)\Vert F\left({x}_{k}\right)\Vert .$ (13)

Proof: For $k=0$, Equation (12) and Equation (13) follows from the direct application of ${d}_{0}=-g\left({x}_{0}\right)$. For $k\ge 1$, using Equation (7), the definition of the search direction ${d}_{k+1}$, it follows that

${d}_{k+1}^{\text{T}}{F}_{k+1}=-{\Vert {F}_{k+1}\Vert}^{2}+{\left[\frac{{F}_{k+1}^{\text{T}}{y}_{k}{d}_{k}-{d}_{k}^{\text{T}}{F}_{k+1}{y}_{k}}{\mathrm{max}\left\{2\gamma \Vert {d}_{k}\Vert \Vert {y}_{k}\Vert ,{d}_{k}^{\text{T}}{y}_{k},{\Vert {F}_{k}\Vert}^{2}\right\}}\right]}^{\text{T}}{F}_{k+1}=-{\Vert {F}_{k+1}\Vert}^{2},$

similarly,

$\begin{array}{c}\Vert {d}_{k+1}\Vert =\Vert -{F}_{k+1}+\frac{{F}_{k+1}^{\text{T}}{y}_{k}{d}_{k}-{d}_{k}^{\text{T}}{F}_{k+1}{y}_{k}}{\mathrm{max}\left\{2\gamma \Vert {d}_{k}\Vert \Vert {y}_{k}\Vert ,{d}_{k}^{\text{T}}{y}_{k},{\Vert {F}_{k}\Vert}^{2}\right\}}\Vert \\ \le \Vert {F}_{k+1}\Vert +\frac{\Vert {F}_{k+1}\Vert \Vert {y}_{k}\Vert \Vert {d}_{k}\Vert +\Vert {d}_{k}\Vert \Vert {F}_{k+1}\Vert \Vert {y}_{k}\Vert}{\mathrm{max}\left\{2\gamma \Vert {d}_{k}\Vert \Vert {y}_{k}\Vert ,{d}_{k}^{\text{T}}{y}_{k},{\Vert {F}_{k}\Vert}^{2}\right\}}\\ \le \left(1+\frac{1}{\gamma}\right)\Vert {F}_{k+1}\Vert ,\end{array}$

where the last inequality follows from the fact

$\mathrm{max}\left\{2\gamma \Vert {d}_{k}\Vert \Vert {y}_{k}\Vert ,{\Vert {F}_{k}\Vert}^{2}\right\}\ge 2\gamma \Vert {d}_{k}\Vert \Vert {y}_{k}\Vert .$

In the remaining part of this paper, we assume that ${F}_{k}\ne 0$ for all $\forall k\ge 0$, otherwise, the solution of the problem (1) has been found.

Lemma2: Let the sequence $\left\{{x}_{k}\right\}$ and $\left\{{z}_{k}\right\}$ be generated by Algorithm 1. Suppose that the Assumption 1 holds. Then there exists a positive number ${\alpha}_{k}$ satisfying Equation (8) for all $k\ge 0$.

Proof: The line search ensure that if ${\alpha}_{k}\ne \xi $, then ${{\alpha}^{\prime}}_{k}={\rho}^{-1}{\alpha}_{k}$ does not satisfy Equation (8), namely,

$-\langle F\left({{z}^{\prime}}_{k}\right),{d}_{k}\rangle <\sigma {{\alpha}^{\prime}}_{k}{\Vert {d}_{k}\Vert}^{2},$

where ${{z}^{\prime}}_{k}={x}_{k}+{{\alpha}^{\prime}}_{k}{d}_{k}$. From Equation (12) and Assumption 1 we have

$\begin{array}{c}{\Vert {F}_{k}\Vert}^{2}=-\langle {F}_{k},{d}_{k}\rangle =\langle F\left({{z}^{\prime}}_{k}\right)-F\left({x}_{k}\right),{d}_{k}\rangle -\langle F\left({{z}^{\prime}}_{k}\right),{d}_{k}\rangle \\ \le L{{\alpha}^{\prime}}_{k}{\Vert {d}_{k}\Vert}^{2}+\sigma {{\alpha}^{\prime}}_{k}{\Vert {d}_{k}\Vert}^{2}\le {\rho}^{-1}{\alpha}_{k}\left(L+\sigma \right){\Vert {d}_{k}\Vert}^{2}\end{array}$

which means that

${\alpha}_{k}\ge \mathrm{min}\left\{\xi ,\frac{\rho}{L+\sigma}\frac{{\Vert {F}_{k}\Vert}^{2}}{{\Vert {d}_{k}\Vert}^{2}}\right\}.$ (14)

The above result Equation (14) shows that the line search procedure Equation (8) always terminates in a finite number of steps.

Lemma3: Let sequences $\left\{{x}_{k}\right\}$ and $\left\{{z}_{k}\right\}$ be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold. Then both $\left\{{x}_{k}\right\}$ and $\left\{{z}_{k}\right\}$ are bounded. Moreover, we have

$\underset{k\to \infty}{\mathrm{lim}}\Vert {x}_{k}-{z}_{k}\Vert =0,$ (15)

and

$\underset{k\to \infty}{\mathrm{lim}}\Vert {x}_{k+1}-{x}_{k}\Vert =0.$ (16)

Particularly, Equation (15) implies that

$\underset{k\to \infty}{\mathrm{lim}}{\alpha}_{k}\Vert {d}_{k}\Vert =0.$ (17)

Proof: ${x}^{*}\in S$ denotes any arbitrary solution of the problem (1). The monotonicity of F and the line search Equation (8) deduce

$\langle F\left({z}_{k}\right),{x}_{k}-{x}^{*}\rangle \ge \langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle \ge \sigma {\alpha}_{k}^{2}{\Vert {d}_{k}\Vert}^{2}\ge 0.$ (18)

Equation (3), Equation (9) and Equation (18) imply

$\begin{array}{c}{\Vert {x}_{k+1}-{x}^{*}\Vert}^{2}={\Vert {P}_{\Omega}\left[{x}_{k}-{\beta}_{k}F\left({z}_{k}\right)\right]-{x}^{*}\Vert}^{2}\le {\Vert {x}_{k}-{\beta}_{k}F\left({z}_{k}\right)-{x}^{*}\Vert}^{2}\\ ={\Vert {x}_{k}-{x}^{*}\Vert}^{2}-2{\beta}_{k}\langle F\left({z}_{k}\right),{x}_{k}-{x}^{*}\rangle +{\beta}_{k}^{2}{\Vert F\left({z}_{k}\right)\Vert}^{2}\\ \le {\Vert {x}_{k}-{x}^{*}\Vert}^{2}-2{\beta}_{k}\langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle +{\beta}_{k}^{2}{\Vert F\left({z}_{k}\right)\Vert}^{2}\\ \le {\Vert {x}_{k}-{x}^{*}\Vert}^{2}-\frac{{\langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle}^{2}}{{\Vert F\left({z}_{k}\right)\Vert}^{2}}\\ \le {\Vert {x}_{k}-{x}^{*}\Vert}^{2}-\frac{{\sigma}^{2}{\Vert {x}_{k}-{z}_{k}\Vert}^{4}}{{\Vert F\left({z}_{k}\right)\Vert}^{2}}.\end{array}$ (19)

Since the sequence $\left\{\Vert {x}_{k}-{x}^{*}\Vert \right\}$ is decreasing and convergent, the sequence $\left\{{x}_{k}\right\}$ is bounded. Equation (19) shows that $\Vert {x}_{k}-{x}^{*}\Vert \le \Vert {x}_{0}-{x}^{*}\Vert $ for all k. Then, by Assumption 1, we have

$\Vert F\left({x}_{k}\right)\Vert =\Vert F\left({x}_{k}\right)-F\left({x}^{*}\right)\Vert \le L\Vert {x}_{k}-{x}^{*}\Vert \le L\Vert {x}_{0}-{x}^{*}\Vert .$ (20)

Let ${M}_{1}=L\Vert {x}_{0}-{x}^{*}\Vert $,

$\Vert F\left({x}_{k}\right)\Vert \le {M}_{1},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ge 0.$ (21)

From the Cauchy-Schwarz inequality, the line search Equation (8), the monotonicity of F and Equation (18), it follows that

$0<\sigma {\Vert {x}_{k}-{z}_{k}\Vert}^{2}\le \langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle \le \langle F\left({x}_{k}\right),{x}_{k}-{z}_{k}\rangle \le \Vert F\left({x}_{k}\right)\Vert \Vert {x}_{k}-{z}_{k}\Vert .$

$\sigma \Vert {x}_{k}-{z}_{k}\Vert \le \Vert F\left({x}_{k}\right)\Vert \le {M}_{1},$ (22)

which shows that the sequence $\left\{{z}_{k}\right\}$ is bounded. Furthermore, the sequence $\left\{\Vert {z}_{k}-{x}^{*}\Vert \right\}$ is also bounded, there exists ${M}_{2}>0$, ${k}_{0}\ge 0$, such that

$\Vert {z}_{k}-{x}^{*}\Vert \le {M}_{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ge {k}_{0}.$ (23)

Based on Equation (23) and Assumption 1 it follows

$\Vert F\left({z}_{k}\right)\Vert =\Vert F\left({z}_{k}\right)-F\left({x}^{*}\right)\Vert \le L\Vert {z}_{k}-{x}^{*}\Vert \le L{M}_{2}.$ (24)

Substituting the above relationship into Equation (19), it deduces

$\frac{{\sigma}^{2}}{{\left(L{M}_{2}\right)}^{2}}{\displaystyle {\sum}_{k=0}^{\infty}{\Vert {x}_{k}-{z}_{k}\Vert}^{4}}\le {\displaystyle {\sum}_{k=0}^{\infty}\left({\Vert {x}_{k}-{x}^{*}\Vert}^{2}-{\Vert {x}_{k+1}-{x}^{*}\Vert}^{2}\right)}<\infty ,$ (25)

which implies

$\underset{k\to \infty}{\mathrm{lim}}\Vert {x}_{k}-{z}_{k}\Vert =0.$

From the definition of ${z}_{k}$ and Equation (15), it holds that

$\underset{k\to \infty}{\mathrm{lim}}{\alpha}_{k}\Vert {d}_{k}\Vert =0.$

Combining the definition of ${\beta}_{k}$, Equation (3), and the Cauchy-Schwarz inequality, we have

$\begin{array}{c}\Vert {x}_{k+1}-{x}_{k}\Vert =\Vert {P}_{\Omega}\left[{x}_{k}-{\beta}_{k}F\left({z}_{k}\right)\right]-{x}_{k}\Vert \\ \le \Vert {x}_{k}-{\beta}_{k}F\left({z}_{k}\right)-{x}_{k}\Vert \\ =\frac{\langle F\left({z}_{k}\right),{x}_{k}-{z}_{k}\rangle}{\Vert F\left({z}_{k}\right)\Vert}\\ \le \Vert {x}_{k}-{z}_{k}\Vert \end{array}$

which together with Equation (15), proves Equation (16).

Theorem1: Let sequences $\left\{{x}_{k}\right\}$ and $\left\{{z}_{k}\right\}$ be generated by Algorithm 1. Suppose that Assumptions 1 and 2 hold. Then

$\underset{k\to \infty}{\mathrm{lim}}\mathrm{inf}\Vert {F}_{k}\Vert =0.$ (26)

Proof: We prove this Theorem by contradiction. Assume that Equation (26) does not hold, namely, there exists $\epsilon >0$ such that

$\Vert {F}_{k}\Vert \ge \epsilon ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ge 0.$ (27)

From Equation (12) and Equation (27),

$\begin{array}{c}{\Vert {d}_{k}\Vert}^{2}={\Vert {d}_{k}+{F}_{k}-{F}_{k}\Vert}^{2}\\ ={\Vert {d}_{k}+{F}_{k}\Vert}^{2}-2\langle {d}_{k}+{F}_{k},{F}_{k}\rangle +{\Vert {F}_{k}\Vert}^{2}\\ \ge -2\langle {d}_{k},{F}_{k}\rangle -{\Vert {F}_{k}\Vert}^{2}\\ ={\Vert {F}_{k}\Vert}^{2},\end{array}$

which implies

$\Vert {d}_{k}\Vert \ge \epsilon ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ge 0.$ (28)

On the other hand, Equation (13), Equation (21) and the definition of ${d}_{k}$ deduce

$\Vert {d}_{k}\Vert \le \left(1+\frac{1}{\gamma}\right)\Vert {F}_{k}\Vert \le \left(1+\frac{1}{\gamma}\right){M}_{1},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall k\ge 0.$

Finally, from Equation (14), Equation (27) and Equation (28),

$\begin{array}{c}{\alpha}_{k}\Vert {d}_{k}\Vert \ge \text{min}\left\{\xi ,\frac{\rho}{L+\sigma}\frac{{\Vert {F}_{k}\Vert}^{2}}{{\Vert {d}_{k}\Vert}^{2}}\right\}\Vert {d}_{k}\Vert \\ \ge \text{min}\left\{\xi \epsilon ,\frac{\rho {\epsilon}^{2}}{\left(L+\sigma \right)\left(1+{\gamma}^{-1}\right){M}_{1}}\right\}\end{array}$

which contradicts with Equation (17). Thus, Equation (26) holds.

4. Numerical Experiments

The numerical performances of the proposed Algorithm 1 for large scale nonlinear convex constrained monotone equations with various dimensions and different initial points are studied in this section. Furthermore, the given Algorithm 1 is extended to solve the ${l}_{1}$ -norm regularized problems which decode a sparse signal in compressive sensing. The algorithm is coded in MATLAB R2015a and run on a PC with Core i5 CPU and 4 GB memory.

4.1. Experiments on Nonlinear Convex Constrained Monotone Equations

The testing problems are listed as follows.

Problem 1. (Wang et al. [5]) The elements of $F\left(x\right)$ are given by

${F}_{i}\left(x\right)={\text{e}}^{{x}_{i}}-1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2,3,\cdots ,n.$

and $\Omega ={R}_{+}^{n}$.

Problem 2. The example is taken from [7]. The elements of $F\left(x\right)$ are given by

${F}_{i}\left(x\right)=2{x}_{i}-\mathrm{sin}\left({x}_{i}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2,3,\cdots ,n.$

and $\Omega ={R}_{+}^{n}$.

Problem 3. The example is taken from [9].

$\begin{array}{l}{g}_{1}\left(x\right)={x}_{1}-{\text{e}}^{\mathrm{cos}\left(\frac{{x}_{1}+{x}_{2}}{n+1}\right)},\\ {g}_{i}\left(x\right)={x}_{i}-{\text{e}}^{\text{cos}\left(\frac{{x}_{i-1}+{x}_{i}+{x}_{i+1}}{n+1}\right)},\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=2,3,\cdots ,n-1,\\ {g}_{n}\left(x\right)={x}_{n}-{\text{e}}^{\text{cos}\left(\frac{{x}_{n-1}+{x}_{n}}{n+1}\right)}.\end{array}$

and $\Omega ={R}_{+}^{n}$.

Problem 4. The example is taken from [20].

${F}_{i}\left(x\right)={x}_{i}-\mathrm{sin}\left(\left|{x}_{i}-1\right|\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2,3,\cdots ,n.$

and $\Omega =\left\{x\in {R}^{n}|{\displaystyle {\sum}_{i=1}^{n}{x}_{i}}\le n,{x}_{i}\ge -1,i=1,2,\cdots ,n\right\}$.

For convenience, MPRP denotes the proposed Algorithm 1. We compare the MPRP method with CGD method [8] on problems 1-4. For both methods, set $\xi =1$, $\rho =0.4$, $\sigma ={10}^{-4}$. In order to evaluate the efficiency and the robustness of both methods, we test the Problems 1-4 with various dimensions $n=10000,50000,100000$ and different initial points: ${x}_{1}={\left(1,0.5,\cdots ,\frac{1}{n}\right)}^{\text{T}}$, ${x}_{2}=\frac{1}{n}\text{ones}\left(n,1\right)$, ${x}_{3}=\text{ones}\left(n,1\right)$, ${x}_{4}=2\text{ones}\left(n,1\right)$, ${x}_{5}=\text{rand}\left(n,1\right)$, where $\text{ones}\left(n,1\right)$ returns a n-by-1 array of ones and $\text{rand}\left(n,1\right)$ returns a n-by-1 array of rand values in MATLAB.

Numerical results are shown in Tables 1-4, in which Init (Dim), NI and NF denote initial points (dimension), the number of iterations and the number of function evaluations respectively. $\Vert F\left(x\right)\Vert $ is the final Euclidean norm of the function values, and CPU-time in seconds.

Tables 1-4 indicate that the dimension of the problem has little effect on the number of iterations of the algorithm. However, the computing time is relatively large in high dimension cases. Moreover, we can see from the results of Tables 1-4 that Algorithm 1 is more competitive than CGD algorithm as Algorithm 1 can get the solution of all the test data at a smaller number of iterations and smaller CPU time. So the results of Tables 1-4 show that our method is very efficient.

The numerical performances of the both methods are also evaluated by using the performance profile tool of tool of Dolan and Moré [21]. Figure 1 shows the performance of two methods, it is obviously that the proposed MPRP method is more efficient and robust than CGD method.

Table 1. Numerical results for MPRP/CGD on problem 1.

Table 2. Numerical results for MPRP/CGD on problem 2.

Table 3. Numerical results for MPRP/CGD on problem 3.

Table 4. Numerical results for MPRP/CGD on problem 4.

Figure 1. Performance profiles for two methods MPRP and CGD, where the left and the right figures are represented as the number of function evaluations and the CPU time, respectively.

4.2. Experiments on the l_{1}-Norm Regularization Problem

The problem of the combination of ${l}_{2}$ and ${l}_{1}$ norms in the cost function often emerges for the signal reconstruction, i.e.:

$\mathrm{min}\frac{1}{2}{\Vert y-Ax\Vert}_{2}^{2}+\lambda {\Vert x\Vert}_{1},$ (28)

where ${\Vert \text{\hspace{0.05em}}.\text{\hspace{0.05em}}\Vert}_{2}$ is the Euclidean norm, and

${\Vert x\Vert}_{1}=\underset{j=1}{\overset{m}{{\displaystyle \sum}}}\left|{x}_{j}\right|$

is the ${l}_{1}$ norm, A is a system matrix, $y\in {R}^{m}$ is the observed data, $x\in {R}^{n}$ is the signal to be reconstructed, and $\lambda $ is a positive regularization parameter.

The optimization problems of the form Equation (28) appear in several signal reconstruction problems, such as sparse signal de-blurring [22], medical image reconstructions [23], compressed sensing [24], and super-resolution [25]. Iterative line search method or fixed point iteration schemes are commonly used to solve problem (28). By using the technique proposed by Figueiredo et al. [26], we can reformulate problem (28) as a convex quadratic program problem. Let $x=u-v$, $u\ge 0$, $v\ge 0$, where $u,v\in {R}^{n}$, ${u}_{i}=\text{max}\left(0,{x}_{i}\right)$ for all $i=1,\cdots ,n$ and ${v}_{i}=-\text{min}\left(0,{x}_{i}\right)$ for all $i=1,\cdots ,n$. The ${l}_{1}$ norm can be formulated as ${\Vert x\Vert}_{1}={e}_{n}^{\text{T}}u+{e}_{n}^{\text{T}}v$, where ${e}_{n}={\left(1,1,\cdots ,n\right)}^{\text{T}}$. The problem (28) is expressed as the bound-constrained quadratic program:

$\underset{u,v}{\mathrm{min}}\frac{1}{2}{\Vert y-A\left(u-v\right)\Vert}_{2}^{2}+\lambda {e}_{n}^{\text{T}}u+\lambda {e}_{n}^{\text{T}}v,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\text{\hspace{0.17em}}u\ge 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}v\ge 0.$ (29)

Furthermore, the problem (29) can be rewritten as a standard convex quadratic program problem:

$\underset{z}{\mathrm{min}}\frac{1}{2}{z}^{\text{T}}Bz+{c}^{\text{T}}z,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{s}\text{.t}.\text{\hspace{0.17em}}\text{\hspace{0.17em}}z\ge 0,$ (30)

where

$z=\left(\begin{array}{c}u\\ v\end{array}\right)$, $c=\lambda {e}_{2n}+\left(\begin{array}{c}u\\ v\end{array}\right)$, $b={A}^{\text{T}}y$, $B=\left(\begin{array}{cc}{A}^{\text{T}}A& -{A}^{\text{T}}A\\ -{A}^{\text{T}}A& {A}^{\text{T}}A\end{array}\right)$,

B is a semi-definite positive matrix. Recently, the problem (30) was reformulated as a linear variable inequality (LVI) problem by Xiao et al. [8] [27]. They pointed out that this LVI problem is equivalent to a linear complementary problem, and z is a solution of the linear complementary problem if and only if it is a solution of the following nonlinear monotone equations:

$F\left(z\right)=\mathrm{min}\left\{z,Bz+c\right\}=0,$ (31)

where $F\left(z\right)$ is Lipschitz continuous. This result indicates that problem (28) can be solved by MPRP projection method.

In this part of numerical experiments, a compressive sensing scenario is considered, which aims to reconstruct a length-n sparse signal from significantly fewer m observations, where $m\ll n$. The quality of restoration is measured by the mean of squared error (MSE) to the original signal $\stackrel{\xaf}{x}$, that is

$\text{MSE}=\frac{1}{n}\Vert \stackrel{\xaf}{x}-{x}^{*}\Vert ,$

where
${x}^{*}$ is the restored signal. In practice,
$n={2}^{12}$ and
$m={2}^{10}$, and the original contains 2^{6} randomly non-zero elements. A is the Gaussian matrix generated by Matlab’s code
$\text{rand}\left(m,n\right)$, the measurement y contains noise,

$y=A\stackrel{\xaf}{x}+\omega ,$

where $\omega $ is the Gaussian noise distributed as $N\left(0,{10}^{-4}\right)$. The merit function is

$f\left(x\right)=\frac{1}{2}{\Vert y-Ax\Vert}_{2}^{2}+\tau {\Vert x\Vert}_{1}$,

where $\tau $ is forced to decrease as the measure in. The experiment starts at the measurement image, i.e. ${x}_{0}={A}^{\text{T}}y$, and terminates when the relative change of the iteration satisfies:

$\text{Tol}=\frac{\Vert {f}_{k}-{f}_{k-1}\Vert}{\Vert {f}_{k-1}\Vert}<{10}^{-5},$

where ${f}_{k}$ is the function value at ${x}_{k}$.

We compare the proposed MPRP method with CGD method for this problem. In both methods, the parameters are taken as $\xi =10$, $\sigma ={10}^{-4}$ and $\rho =0.5$. The same initial point and continuation technique on parameter $\tau $ are used in both methods.

Figure 2 shows simulation results of MPRP and CGD for a signal sparse reconstruction. As we can see in Figure 2, the original sparse signal is restored highly exactly both by MPRP and CGD. Figure 3 provides a series of comparisons among the objective function values and relative error as the iteration numbers and computing time increase. As we can see in Figure 3, the descent rates of MSE and objective function values of MPRP method are faster. The experiments are repeated for 15 random different noise samples in Table 5. We report the

Figure 2. From top to bottom: the original signal, the measurement, and the recovery signals by two methods MPRP and CGD, respectively.

Figure 3. Comparison results of MPRP and CGD methods. From left to right: the changed trends of MSE and the changed trends of the objective function values goes along with the number of iterations and CPU time in seconds, respectively.

Table 5. The experiment results for MPRP/CGD on ${l}_{1}$ -norm regularization problem.

number of iterations (Niter) and the CPU time (in second) required for the whole testing process. From Table 5, we can see that MPRP method is better than CGD method. For example, the new method’s iteration number and CPU time are much less than those of the CGD method. To summarize, these experiment results show that the proposed algorithm MPRP can work well in an efficient manner.

5. Conclusion

In this paper, we proposed a conjugate gradient projection algorithm for solving large-scale nonlinear convex constrained monotone equations based on the well-known Polak-Ribière-Polyak conjugate gradient method which is one of the most effective conjugate gradient methods to solve the unconstrained optimization problems. The algorithm combines CG technique with projection scheme and is a derivative-free method, so it can be applied to solve large-scale non-smooth equations for its low storage requirement. Under some technical conditions, we have established the global convergence. Another contribution of this paper is to use the given method to solve the ${l}_{1}$ -norm regularized problems in compressive sensing.

Acknowledgements

This work was supported by the Scientific Research Project of Tianjin Education Commission (No. 2019KJ232).

References

[1] Dirkse, S.P. and Ferris, M.C. (1995) MCPLIB: A Collection of Nonlinear Mixed Complementarity Problems. Optimization Methods & Software, 5, 319-345.

https://doi.org/10.1080/10556789508805619

[2] Meintjes, K. and Morgan, A.P. (1990) Chemical Equilibrium Systems as Numerical Test Problems. ACM Transactions on Mathematical Software, 16, 143-151.

https://doi.org/10.1145/78928.78930

[3] Wood, A.J. and Wollenberg, B.F. (1996) Power Generations, Operations and Control. Wiley, New York.

[4] Solodov, M.V. and Svaiter, B.F. (1998) A Globally Convergent Inexact Newton Method for Systems of Monotone Equations. In: Fukushima, M. and Qi, L., Eds., Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, Kluwer Academic, 355-369.

https://doi.org/10.1007/978-1-4757-6388-1_18

[5] Wang, C.W. and Wang, Y.J. (2009) A Superlinearly Convergent Projection Method for Constrained Systems of Nonlinear Equations. Journal of Global Optimization, 44, 283-296.

https://doi.org/10.1007/s10898-008-9324-8

[6] Hu, Y.P. and Wei, Z.X. (2015) Wei-Yao-Liu Conjugate Gradient Projection Algorithm for Nonlinear Monotone Equations with Convex Constraints. International Journal of Computer Mathematics, 92, 2261-2272.

https://doi.org/10.1080/00207160.2014.977879

[7] Liu, J.K. and Li, S.J. (2015) A Projection Method for Convex Constrained Monotone Nonlinear Equations with Applications. Computers and Mathematics with Applications, 70, 2442-2453.

https://doi.org/10.1016/j.camwa.2015.09.014

[8] Xiao, Y.H. and Zhu, H. (2013) A Conjugate Gradient Method to Solve Convex Constrained Monotone Equations with Applications in Compressive Sensing. Journal of Mathematical Analysis and Applications, 405, 310-319.

https://doi.org/10.1016/j.jmaa.2013.04.017

[9] Yu, G.H., Niu, S.Z. and Ma, J.H. (2013) Multivariate Spectral Gradient Projection Method for Nonlinear Monotone Equations with Convex Constraints. Journal of Industrial and Management Optimization, 9, 117-129.

https://doi.org/10.3934/jimo.2013.9.117

[10] Polak, E. (1969) The Conjugate Gradient Method in Extreme Problems. USSR Computational Mathematics and Mathematical Physics, 9, 94-112.

https://doi.org/10.1016/0041-5553(69)90035-4

[11] Polak, E. and Ribière, G. (1969) Note sur la convergence de méthodes de directions conjugées. Revue Francaise d’Informatique et de Recherche Opératinelle, 3, 35-43.

https://doi.org/10.1051/m2an/196903R100351

[12] Zhang, L. and Li, J.L. (2011) A New Globalization Technique for Nonlinear Conjugate Gradient Methods for Nonconvex Minimization. Applied Mathematics and Computation, 217, 10295-10304.

https://doi.org/10.1016/j.amc.2011.05.032

[13] Hu, Y.P. and Wei, Z.X. (2014) A Modified Liu-Storey Conjugate Gradient Projection Algorithm for Nonlinear Monotone Equations. International Mathematical Forum, 9, 1767-1777.

https://doi.org/10.12988/imf.2014.411197

[14] Yuan, G.L. and Hu, W.J. (2018) A Conjugate Gradient Algorithm for Large-Scale Unconstrained Optimization Problems and Nonlinear Equations. Journal of Inequalities and Applications, 1, Article No.: 113.

https://doi.org/10.1186/s13660-018-1703-1

[15] Yuan, G.L., Meng, Z.H. and Li, Y. (2016) A Modified Hestenes and Stiefel Conjugate Gradient Algorithm for Large-Scale Nonsmooth Minimizations and Nonlinear Equations. Journal of Optimization Theory and Applications, 168, 129-152.

https://doi.org/10.1007/s10957-015-0781-1

[16] Yuan, G.L., Wei, Z.X. and Li, G.Y. (2014) A Modified Polak-Ribière-Polyak Conjugate Gradient Algorithm for Nonsmooth Convex Programs. Journal of Computational and Applied Mathematics, 255, 86-96.

https://doi.org/10.1016/j.cam.2013.04.032

[17] Yuan, G.L. and Zhang, M.J. (2015) A Three-Terms Polak-Ribière-Polyak Conjugate Gradient Algorithm for Large-Scale Nonlinear Equations. Journal of Computational and Applied Mathematics, 286, 186-195.

https://doi.org/10.1016/j.cam.2015.03.014

[18] Yuan, G.L. and Zhang, M.J. (2013) A Modified Hestenes-Stiefel Conjugate Gradient Algorithm for Large-Scale Optimization. Numerical Functional Analysis and Optimization, 34, 914-937.

https://doi.org/10.1080/01630563.2013.777350

[19] Yuan, G.L., Wei, Z.X. and Zhao, Q.M. (2014) A Modified Polak-Ribière-Polyak Conjugate Gradient Algorithm for Large-Scale Optimization Problems. IIE Transactions, 46, 397-413.

https://doi.org/10.1080/0740817X.2012.726757

[20] Yu, Z.S., Lin, J., Sun, J., Xiao, Y.H., Liu, L.Y. and Li, Z.H. (2009) Spectral Gradient Projection Method for Monotone Nonlinear Equations with Convex Constraints. Applied Numerical Mathematics, 59, 2416-2423.

https://doi.org/10.1016/j.apnum.2009.04.004

[21] Dolan, E.D. and Moré, J.J. (2002) Benchmarking Optimization Software with Performance Profiles. Mathematical Programming, 91, 201-213.

https://doi.org/10.1007/s101070100263

[22] Elad, M. (2010) Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer Science & Business Media, LCC, Berlin.

[23] Fessler, J.A. (2010) Model-Based Image Reconstruction for MRI. IEEE Signal Processing Magazine, 27, 81-89.

https://doi.org/10.1109/MSP.2010.936726

[24] Romberg, J.K. (2008) Imaging via Compressive Sampling. IEEE Signal Processing Magazine, 25, 14-20.

https://doi.org/10.1109/MSP.2007.914729

[25] Yang, J.C., Wright, J., Huang, T.S. and Ma, Y. (2010) Image Super-Resolution via Sparse Representation. IEEE Transactions on Image Processing, 19, 2861-2873.

https://doi.org/10.1109/TIP.2010.2050625

[26] Figueiredo, M., Nowak, R. and Wright, S.J. (2007) Gradient Projection for Sparse Reconstruction, Application to Compressed Sensing and Other Inverse Problems. IEEE Journal of Selected Topics in Signal Processing, 1, 586-597.

https://doi.org/10.1109/JSTSP.2007.910281

[27] Xiao, Y.H., Wang, Q.Y. and Hu, Q.J. (2011) Non-Smooth Equations Based Method for l1-Norm Problems with Applications to Compressed Sensing. Nonlinear Analysis: Theory, Methods & Applications, 74, 3570-3577.

https://doi.org/10.1016/j.na.2011.02.040