The Sliding Gradient Algorithm for Linear Programming

Show more

1. Introduction

The simplex method developed by Dantiz [2] has been widely used to solve many large-scale optimizing problems with linear constraints. Its practical performance has been good and researchers have found that the expected number of iterations exhibits polynomial complexity under certain conditions [3] [4] [5] [6] . However, Klee and Minty in 1972 gave a counter example showing that its worst case performance is $\mathcal{O}\left({2}^{n}\right)$ [7] . Their example is a deliberately constructed deformed cube that exploits a weakness of the original simplex pivot rule, which is sensitive to scaling [8] . It is found that, by using a different pivot rule, the Klee-Minty deformed cube can be solved in one iteration. But for all known pivot rules, one can construct a different deformed cube that requires exponential number of iterations to solve [9] [10] [11] . Recently, the interior point method [12] has been gaining popularity as an efficient and practical LP solver. However, it was also found that such method may also exhibit similar worse case performance by adding a large set of redundant inequalities to the Klee-Minty cube [13] .

Is it possible to develop a strongly polynomial algorithm to solve the linear programming problem, where the number of iterations is a polynomial function of only the number of constraints and the number of variables? The work by Barasz and Vempala shed some light in this aspect. Their AFFINE algorithm [14] takes only $\mathcal{O}\left({n}^{2}\right)$ iterations to solve a broad class of deformed products defined by Amenta and Ziegler [15] which includes the Klee-Minty cube and many of its variants.

In certain aspect, the Gravity Sliding algorithm [1] is similar to the AFFINE algorithm as it also passes through the interior of the feasible region. The main difference is in the calculation of the next descending vector. In the gravity falling approach, a gravity vector is first defined (see Section 3.1 for details). This is the principle gradient descending direction where other descending directions are derived from it. In each iteration, the algorithm first computes the descending direction, then it descends from this direction until it hits one or more facets that forms the boundary of the feasible region. In order not to penetrate the feasible region, the descending direction needs to be changed. The trajectory is likened a water droplet falling from the sky but is blocked by linear planar structures (e.g. the roof top structure of a building) and needs to slide along the structure. The core of gravity sliding algorithm is how to calculate the projection of the gravity vector g onto the intersection of a group of facets. This projection vector lies on the intersection of the facets and hence lies on the null space defined by these facets. Conventional approach is to compute the null space first and then find the projection of g onto this null space. An alternative approach is disclosed in [1] which operates directly from the subspace formed by the intersecting facets. This direct approach is more suitable to the Gravity Sliding algorithm. In this paper, we further present an efficient method to compute the gradient projections on complementary facets and also introduce the notion of selecting the steepest descend projection among a set of candidates. With these refinements, we rename the Gravity Sliding algorithm as the Sliding Gradient algorithm. We have implemented our algorithm and tested it on the Klee-Minty cube. We observe that it can solve the Klee-Minty deformed cube problem in only two iterations, irrespective of the dimension of the cube.

This paper is organized as follows: Section 2 gives an overview of the Cone-Cutting Theory [16] , which is the intuitive background of the Gravity Sliding algorithm. Section 3 discusses the Sliding Gradient algorithm in details. The pseudo-code of this algorithm is summarized in Section 4 and Section 5 gives a walk-through of this algorithm using the Klee-Minty as an example. This section also discusses the practical implementation issues. Finally, Section 6 discuss about future work.

2. Cone-Cutting Principle

The cone-cutting theory [16] offers a geometric interpretation of a set of inequality equations. Instead of considering the full set constraint equations in a LP problem, the cone-cutting theory enables us to consider a subset of equations, and how an additional constraint will shape the feasible region. The geometric insight forms the basis of our algorithm development.

2.1. Cone-Cutting Principle

In an m-dimension space ${\mathbb{R}}^{m}$ , a hyperplane ${y}^{\text{T}}\tau =c$ cuts ${\mathbb{R}}^{m}$ into two half spaces. Here $\tau $ is the normal vector of the hyperplane and c is a constant. We denote the positive half space $\left\{y|{y}^{\text{T}}\tau \ge c\right\}$ the accepted zone of the hyperplane and the negative half space where $\left\{y|{y}^{\text{T}}\tau <c\right\}$ is rejected zone. Note that the normal vector $\tau $ points to accepted zone area and we call the hyperplane with such orientation a facet $\alpha :\left(\tau ,c\right)$ . When there are m facets in ${\mathbb{R}}^{m}$ and $\left\{{\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{m}\right\}$ are linear independent, this set of linear equations has a unique solution which is a point V in ${\mathbb{R}}^{m}$ . Geometrically, $\left\{{\alpha}_{1},{\alpha}_{2},\cdots ,{\alpha}_{m}\right\}$ form a cone and V is the vertex of the cone. We now give a formal definition of a cone, which is taken from [1] .

Definition 1. Given m hyperplanes in
${\mathbb{R}}^{m}$ , with rank
$r\left({\alpha}_{1},\cdots ,{\alpha}_{m}\right)=m$ and intersection V,
$C=C\left(V;{\alpha}_{1},\cdots ,{\alpha}_{m}\right)={\alpha}_{1}\cap \cdots \cap {\alpha}_{m}$ is called a cone in
${\mathbb{R}}^{m}$ . The area
$\left\{y|{y}^{\text{T}}{\tau}_{i}\ge {c}_{i}\left(i=1,2,\cdots ,m\right)\right\}$ is called the accepted zone of C. The point V is the vertex and α_{j} is the facet plane, or simply the facet of C.

A cone C also has m edge lines. They are formed by the intersection of (m − 1) facets. Hence, a cone can also be defined as follows.

Definition 2. Given m rays ${R}_{j}=\left\{V+t{r}_{j}|0\le t<+\infty \right\}\left(j=1,\cdots ,m\right)$ shooting from a point V with rank $r\left({r}_{1},\cdots ,{r}_{m}\right)=m$ , $C=C\left(V;{r}_{1},\cdots ,{r}_{m}\right)=c\left[{R}_{1},\cdots ,{R}_{m}\right]$ , the convex closure of m rays is called a cone in ${\mathbb{R}}^{m}$ . ${R}_{j}$ is the edge, ${r}_{j}$ the edge direction, and ${R}_{j}^{+}=\left\{V+t{r}_{j}|\infty <t<+\infty \right\}$ the edge line of the cone C.

The two definitions are equivalent. Furthermore, P.Z. Wang [11] has observed that ${R}_{i}^{+}$ and ${\alpha}_{i}$ are opposite to each other for $i=1,\cdots ,m$ . Edge-line ${R}_{i}^{+}$ is the intersection of all C-facets except ${\alpha}_{i}$ , while facet ${\alpha}_{i}$ is bounded by all C-edges except ${R}_{i}^{+}$ . This is the duality between facets and edges. For $i=1,\cdots ,m,\left\{{\tau}_{i},{R}_{i}\right\}$ is called a pair of cone C.

It is obvious that ${r}_{j}^{\text{T}}{\tau}_{i}=0$ (for $i\ne j$ ) since ${r}_{j}$ lies on ${\alpha}_{i}$ . Moreover, we have

${r}_{i}^{\text{T}}{\tau}_{i}\ge 0\text{\hspace{0.17em}}\left(\text{for}\text{\hspace{0.17em}}i=1,\cdots ,m\right)$ (1)

2.2. Cone Cutting Algorithm

Consider a linear programming (LP) problem and its dual:

(Primary): $\mathrm{max}\left\{{\stackrel{\u02dc}{c}}^{\text{T}}x|Ax\le b;\text{\hspace{0.17em}}x\ge 0\right\}$ (2)

(Dual): $\mathrm{min}\left\{{y}^{\text{T}}b|{y}^{\text{T}}A\ge \stackrel{\u02dc}{c};\text{\hspace{0.17em}}y\ge 0\right\}$ (3)

In the following, we focus on solving the dual LP problem. The standard simplex tableau can be obtained by appending an $m\times m$ identity matrix ${I}_{m\times m}$ which represents the slack variables as shown below:

$\left[\begin{array}{ccc}\begin{array}{ccc}{a}_{11}& \dots & {a}_{1n}\\ \vdots & \ddots & \vdots \\ {a}_{m1}& \cdots & {a}_{mn}\end{array}& \begin{array}{ccc}1& \cdots & 0\\ \vdots & \ddots & \vdots \\ 0& \cdots & 1\end{array}& \begin{array}{c}{b}_{1}\\ \vdots \\ {b}_{m}\end{array}\\ \begin{array}{ccc}{\stackrel{\u02dc}{c}}_{1}& \cdots & {\stackrel{\u02dc}{c}}_{n}\end{array}& \begin{array}{ccc}0& \cdots & 0\end{array}& 0\end{array}\right]$

We can construct a facet tableau whereby each column is a facet denoted as ${\alpha}_{j}:\left({\tau}_{j},{c}_{j}\right)$ , where ${\tau}_{i}={\left({a}_{1i},{a}_{2i},\cdots ,{a}_{mi}\right)}^{\text{T}}$ and

${c}_{i}=\{\begin{array}{l}{\stackrel{\u02dc}{c}}_{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}1\le i\le n\\ 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.17em}}n<i\le m+n\end{array}$ (4)

The facet tableau is depicted as follow. The last column ${\left({b}_{1},{b}_{2},\cdots ,{b}_{m},0\right)}^{\text{T}}$ is not represented in this tableau.

$\begin{array}{l}\text{\hspace{0.05em}}\begin{array}{cccc}{\alpha}_{1}& {\alpha}_{2}& \cdots & {\alpha}_{m+n}\end{array}\\ \left[\begin{array}{llll}{\tau}_{1}\hfill & {\tau}_{2}\hfill & \cdots \hfill & {\tau}_{m+n}\hfill \\ {c}_{1}\hfill & {c}_{2}\hfill & \cdots \hfill & {c}_{m+n}\hfill \end{array}\right]\end{array}$

When a cone
$C=C\left(V;{\alpha}_{1},\cdots ,{\alpha}_{m}\right)=C\left(V;{r}_{1},\cdots ,{r}_{m}\right)$ is intersected by another facet
${\alpha}_{j}$ , the i^{th} edge of the cone is intersected by
${\alpha}_{j}$ at certain point
${q}_{ij}$ . We call
${\alpha}_{j}$ cuts the cone C and the cut points
${q}_{ij}$ can be obtained by the following equations:

${q}_{ij}=V+{t}_{i}{r}_{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{where}\text{\hspace{0.17em}}{t}_{i}=\left({c}_{i}-{V}^{\text{T}}{\tau}_{j}\right)/{r}_{i}^{\text{T}}{\tau}_{j}$ (5)

The intersection is called real if
${t}_{i}\ge 0$ and fictitious if
${t}_{i}<0$ . Cone cutting greatly alters the accepted zone, as can be seen from the simple 2-dimension example as shown in Figures 1(a)-(e). In 2-dimension, a facet
$\alpha :\left(\tau ,c\right)$ is a line. The normal vector
$\tau $ is perpendicular to this line and points to the accepted zone of this facet. Furthermore, a cone is formed by two non-parallel facets in 2-dimension. Figure 1(a) shows such a cone
$C\left(V;{\alpha}_{1},{\alpha}_{2}\right)$ . The accepted zone of the cone is the intersection of the two accepted zones of facets α_{1} and α_{2}. This is represented by the shaded area A in Figure 1(a). In Figure 1(b), a new facet α_{3} intersects the cone at two cut points
${q}_{13}$ and
${q}_{23}$ . They are both real cut points. Since the arrow of normal vector
${\tau}_{3}$ points to the same general direction of the cone, V lies in the rejected zone of α_{3} and we say α_{3} rejects V. Moreover, the accepted zone of α_{3} intersects with the accepted zone of the cone so that the overall accepted zone is reduced to the shaded area marked as B. In Figure 1(c),
${\tau}_{3}$ points to the opposite direction. α_{3} accepts V and the overall

Figure 1. Accepted zone area of a cone and after it is cut by a facet.

accepted zone is confined to the area marked as C. As the dual feasible region 𝒟 of a LP problem must satisfy all the constraints, it must lie within area C. In Figure 1(d), α_{3} cuts the cone at two fictitious points. Since
${\tau}_{3}$ points to the same direction of the cone, V is accepted by α_{3}. However, the accepted zone of α_{3} covers that of the cone. As a result, α_{3} does not contribute to any reduction of the overall accepted zone area, and so it can be deleted for further consideration without affecting the LP solution. In Figure 1(e),
${\tau}_{3}$ points to the opposite direction of the cone. The intersection between the accepted zone of α_{3} and that of the cone is an empty set. This means that the dual feasible region 𝒟 is empty and the LP is infeasible. This is actually one of the criteria that can be used for detecting infeasibility.

Based on this cone-cutting idea, P.Z. Wang [16] [17] have developed a cone-cutting algorithm to solve the dual LP problem. Each cone is a combination of m facets selected from (m + n) choices. Let D denotes the index set of facets of C, (i.e. if
$\text{\Delta}\left(i\right)=j$ , then
${\tau}_{\text{\Delta}\left(i\right)}={\tau}_{j}$ ). The algorithm starts with an initial coordinate cone C_{o}, then finds a facet

This algorithm finds a facet that rejects V the least as the cutting facet in steps 2 and 3. This facet cuts the edges of the cone at m points. In step 4 and 5, the real cut point ${q}_{{I}^{*}}$ that is closest to the vertex V is identified. This becomes the

Table 1. Cone-cutting algorithm.

vertex of a new cone. This new cone retains all the facets of the original cone except that the cutting facet replaces the facet corresponding to the edge I^{*}. Yet the edge I^{*} is retained but the rest of the edges must be recomputed as shown in step 6. Amazingly, P.Z. Wang shows that when
$b>0$ , this algorithm produces exactly the same result as the original simplex algorithm proposed by Dantz [2] . Hence, the cone-cutting theory offers a geometric interpretation of the simplex method. More significantly, it inspires the authors to explore new approach to tackle the LP problem.

3. Sliding Gradient Algorithm

Expanding on the cone-cutting theory, the Gravity Sliding Algorithm [1] was developed to find the optimal solution of the LP problem from a point within the feasible region 𝒟. Since then, several refinements have been made and they are presented in the following sections.

3.1. Determining the General Descending Direction

The feasible region 𝒟 is a convex polyhedron formed by constraints $\left({y}^{\text{T}}{\tau}_{j}\ge {c}_{j};1\le j\le n+m\right)$ , and the optimal feasible point is at one of its vertices. Let $\Omega =\left\{{V}_{i}|{V}_{i}^{\text{T}}{\tau}_{j}\ge {c}_{j};j=1,\cdots ,n+m\right\}$ be the set of feasible vertices. The dual LP problem (3) can then be stated as: $\mathrm{min}\left\{{V}_{i}^{\text{T}}b|{V}_{i}\in \Omega \right\}$ . As ${V}_{i}^{\text{T}}b$ is the inner-product of vertex ${V}_{i}$ and b, the optimal vertex ${V}^{*}$ is the vertex that yields the lowest inner-product value. Thus we can set the principle descending direction ${g}_{0}$ to be the opposite of the b vector (i.e. ${g}_{0}=-b$ ) and this is referred to as the gravity vector. The descending path then descends along this principle direction inside 𝒟 until it reaches the lowest point in 𝒟 viewed along the direction of b. This point is then the optimal vertex ${V}^{*}$ .

3.2. Circumventing Blocking Facets

The basic principle of the new algorithm can be illustrated in Figure 2. Notice that in 2-dim, a facet is a line. In this figure, these facets (lines) form a closed polyhedron which is the dual feasible region 𝒟. Here the initial point P_{0} is inside 𝒟. From P_{0}, it attempts to descend along the
${g}_{0}=-b$ direction. It can go as far as P_{1} which is the point of intersection between the ray
$R={P}_{0}+t{g}_{0}$ and the facet α_{1}. In essence, α_{1} is blocking this ray and hence it is called the blocking facet relative to this ray. In order not to penetrate 𝒟, the descending direction needs to change from g_{0} to g_{1} at P_{1}, and slides along g_{1} until it hits the other blocking facet α_{2} at P_{2}. Then it needs to change course again and slides along the direction g_{2} until it hits P_{3}. In this figure, P_{3} is the lowest point in this dual feasible region 𝒟 and hence it is the optimal point
${V}^{*}$ .

It can be observed from Figure 2 that g_{1} is the projection of g_{0} onto α_{1} and g_{2} is the projection of g_{0} onto α_{2}. Thus from P_{1}, the descending path slides along α_{1} to reach P_{2} and then slides along α_{2} to reach P_{3}. Hence we call this algorithm Sliding Gradient Algorithm. The basic idea is to compute the new descending direction to circumvent the blocking facets, and advance to find the next one until it reaches the bottom vertex viewed along the direction of b.

Let
${\sigma}_{t}$ denotes the set of blocking facets at the t^{th} iteration. From an initial point P_{0} and a gradient descend vector g_{0}, the algorithm iteratively performs the following steps:

1) compute a gradient direction g_{t} based on
${\sigma}_{t}$ . In this example, the initial set

Figure 2. Sliding gradient illustration.

of blocking facets ${\sigma}_{0}$ is empty and ${g}_{0}=-b$ .

2) move ${P}_{t}$ to ${P}_{t+1}$ along ${g}_{t}$ where ${P}_{t+1}$ is a point at the first blocking facet.

3) Incorporate the newly encountered blocking facet to ${\sigma}_{t}$ to form ${\sigma}_{t+1}$ .

4) go back to step 1.

The algorithm stops when it cannot find any direction to descend in step (1). This is discussed in details in Section 3.6 where a formal stopping criterion is given.

3.3. Minimum Requirements for the Gradient Direction g_{t}

For the first step, the gradient descend vector ${g}_{t}$ needs to satisfy the following requirements.

Proposition 1. ${g}_{t}$ must satisfy ${\left({g}_{t}\right)}^{\text{T}}{g}_{0}\ge 0$ so that the dual objective function ${y}^{\text{T}}b$ will be non-increasing when y move from ${P}_{t}$ to ${P}_{t+1}$ along the direction of ${g}_{t}$ .

Proof. Since ${\left({g}_{t}\right)}^{\text{T}}{g}_{0}\ge 0$ , ${g}_{t}$ aligns to the principle direction of ${g}_{0}$ . As ${P}_{t+1}={P}_{t}+t{g}_{t}$ , ${P}_{t+1}$ moves along the principle direction of ${g}_{0}$ when $t>0$ .

Since ${P}_{t+1}^{\text{T}}b={P}_{t}^{\text{T}}b+t{\left({g}_{t}\right)}^{\text{T}}b={P}_{t}^{\text{T}}b-t{\left({g}_{t}\right)}^{\text{T}}{g}_{0}$ , ${P}_{t+1}^{\text{T}}b\le {P}_{t}^{\text{T}}b$ when ${\left({g}_{t}\right)}^{\text{T}}{g}_{0}\ge 0$ . END

This means that if ${\left({g}_{t}\right)}^{\text{T}}{g}_{0}\ge 0$ , then ${P}_{t+1}$ is “lower than” ${P}_{t}$ when viewed along the b direction.

Proposition 2. If ${P}_{0}\in \mathcal{D}$ , ${g}_{t}$ must satisfy ${\left({\tau}_{{\sigma}_{t}\left(j\right)}\right)}^{\text{T}}{g}_{t}\ge 0$ for all $j\in {\sigma}_{t}$ to ensure that ${P}_{t+1}$ remains dual feasible (i.e. ${P}_{t+1}\in \mathcal{D}$ ).

Proof. If for some j, ${\left({\tau}_{{\sigma}_{t}\left(j\right)}\right)}^{\text{T}}{g}_{t}<0$ , this means that ${g}_{t}$ is in the opposite direction of the normal vector of facet ${\alpha}_{{\sigma}_{t}\left(j\right)}$ so a ray $Q={P}_{t}+t{g}_{t}$ will eventually penetrate this facet for certain positive value of t. This means that Q will be rejected by ${\alpha}_{{\sigma}_{t}\left(j\right)}$ and hence Q is no longer a dual feasible point. END

3.4. Maximum Descend in Each Iteration

To ensure that ${P}_{t+1}\in \mathcal{D}$ , we need to make sure that it won’t advance too far. The following proposition stipulates the requirement.

Proposition 3. Assuming that 𝒟 is non-empty and ${P}_{0}\in \mathcal{D}$ . If ${g}_{t}$ satisfies Propositions 1 and 2; and not all ${g}_{t}^{\text{T}}{\tau}_{j}=0$ for $j=1,\cdots ,m+n$ , then ${P}_{t+1}\in \mathcal{D}$ provided that the next point ${P}_{t+1}$ is determined according to (6) below:

${P}_{t+1}={P}_{t}+{t}_{{j}^{*}}{g}_{t}$ (6)

where ${j}^{*}=\mathrm{arg}{\mathrm{min}}_{j}\left\{{t}_{j}|{t}_{j}=\frac{{c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}}{{g}_{t}^{\text{T}}{\tau}_{j}};{t}_{j}>0;j=1,\cdots ,m+n\right\}$ .

Proof. The equation for a line passing through P along the direction g is $P+tg$ . If this line is not parallel to the plane (i.e. ${g}^{\text{T}}\tau \ne 0$ ), it will intersect a facet $\alpha :\left(\tau ,c\right)$ at a point Q according to the following equation:

$Q=P+tg\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{where}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}t=\left(c-{P}^{\text{T}}\tau \right)/{g}^{\text{T}}\tau $ . (7)

We call t the displacement from P to Q. So
${t}_{j}=\frac{{c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}}{{g}_{t}^{\text{T}}{\tau}_{j}}$ is the displacement from
${P}_{t}$ to α_{j}. The condition t_{j} > 0 ensures that
${P}_{t+1}$ moves along the direction
${g}_{t}$ but not the opposite direction.
${j}^{*}$ is the smallest of all the displacements thus
${\alpha}_{{j}^{*}}$ is the first blocking facet that is closest to
${P}_{t}$ .

To show that ${P}_{t+1}\in \mathcal{D}$ , we need to show ${P}_{t+1}^{\text{T}}{\tau}_{j}-{c}_{j}\ge 0$ for $j=1,\cdots ,m+n$ . Note that

${P}_{t+1}^{\text{T}}{\tau}_{j}-{c}_{j}={\left({P}_{t}+{t}_{{j}^{*}}{g}_{t}\right)}^{\text{T}}{\tau}_{j}-{c}_{j}=\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)+{t}_{{j}^{*}}{g}_{t}^{\text{T}}{\tau}_{j}$ .

Since ${P}_{t}\in \mathcal{D}$ , $\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)\ge 0$ for $j=1,\cdots ,m+n$ , so we need to show that ${t}_{{j}^{*}}{g}_{t}^{\text{T}}{\tau}_{j}\ge 0$ for $j=1,\cdots ,m+n$ .

The displacements ${t}_{j}$ can be split into two groups. For those displacements where ${t}_{j}<0$ , $\frac{{c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}}{{g}_{t}^{\text{T}}{\tau}_{j}}={t}_{j}<0$ so ${g}_{t}^{\text{T}}{\tau}_{j}=\left({c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}\right)/{t}_{j}=-\left({c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}\right)/{k}_{1}$ , where ${k}_{1}=-{t}_{j}$ is a positive constant. Since ${t}_{{j}^{*}}>0$ .

${t}_{{j}^{*}}{g}_{t}^{\text{T}}{\tau}_{j}=-\frac{{t}_{{j}^{*}}}{{k}_{1}}\left({c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}\right)={k}_{2}\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)\ge 0$ since ${P}_{t}\in \mathcal{D}$ & ${k}_{2}>0$ .

For those displacements where ${t}_{j}\ge 0$ , we have that ${t}_{{j}^{*}}$ is the minimum of all ${t}_{j}$ in this group. Let ${k}_{3}$ be the ratio between ${t}_{{j}^{*}}$ and ${t}_{j}$ . Obviously, ${k}_{3}=\frac{{t}_{{j}^{*}}}{{t}_{j}}\le 1$

$\begin{array}{l}\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)+{t}_{{j}^{*}}{g}_{t}^{\text{T}}{\tau}_{j}\\ =\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)+{k}_{3}{t}_{j}{g}_{t}^{\text{T}}{\tau}_{j}=\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)+{k}_{3}\frac{{c}_{j}-{P}_{t}^{\text{T}}{\tau}_{j}}{{g}_{t}^{\text{T}}{\tau}_{j}}{g}_{t}^{\text{T}}{\tau}_{j}\\ =\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)-{k}_{3}\left({P}_{t}^{\text{T}}{\tau}_{j}-{c}_{j}\right)\ge 0\end{array}$

So ${P}_{t+1}\in \mathcal{D}$ . END

If ${g}_{t}^{\text{T}}{\tau}_{j}=0$ , ${g}_{t}$ is parallel to ${\alpha}_{j}$ . Unless all facets are parallel to ${g}_{t}$ , Proposition 3 can still find the next descend point ${P}_{t+1}$ . If all facets are parallel to ${g}_{t}$ , this means that facets are linearly dependent with each other. The LP problem is not well formulated.

3.5. Gradient Projection

We now show that the projection of ${g}_{0}$ onto the set of blocking facets ${\sigma}_{t}$ satisfies the requirements of Proposition 1 and 2. Before we do so, we discuss the projection operations in subspace first.

3.5.1. Projection in Subspaces

Projection is a basic concept defined in vector space. Since we are only interested in the gradient descend direction of ${g}_{t}$ but not the actual location of the projection, we can ignore the constant c in the hyperplane $\left\{y|{y}^{\text{T}}\tau =c\right\}$ . In other words, we focus on the subspace $V\left(\tau \right)$ spanned by $\tau $ and its null space $N\left(\tau \right)$ rather than the affine space spanned by the hyperplanes.

Let Y be the vector space in ${\mathbb{R}}^{m}$ , $V\left(\tau \right)=\left\{y|y=t\tau ;t\in \mathbb{R}\right\}$ and its corresponding null space is $N\left(\tau \right)=\left\{x|{x}^{\text{T}}y=0\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}x\in Y\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}y\in V\left(\tau \right)\right\}$ . Extending to k hyperplanes, we have $V\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)=\left\{y|y={\displaystyle {\sum}_{j}^{k}{t}_{j}{\tau}_{j}};{t}_{j}\in R\right\}$ and the null space is $N\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)=\left\{x|{x}^{\text{T}}y=0\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}x\in Y\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}y\in V\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)\right\}$ . It can be shown that $N\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)=N\left({\tau}_{1}\right)\cap N\left({\tau}_{2}\right)\cap \cdots \cap N\left({\tau}_{k}\right)$ . Since $V\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ and $N\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ are the orthogonal decomposition of the whole space Y, a vector g in ${\mathbb{R}}^{m}$ can be decomposed into two components: the projection of g onto $V\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ and the projection of g onto $N\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ . We use the notation $g{\downarrow}_{\left[{\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right]}$ and $g{\downarrow}_{{\alpha}_{1}\cap \cdots \cap {\alpha}_{k}}$ to denote them and they are called direct projection and null projection respectively.

The following definition and theorem were first presented in [1] and is repeated here for completeness.

Let the set of all subspaces of $Y={\mathbb{R}}^{m}$ be 𝒩, and let 𝒪 stand for 0-dim subspace, we now give an axiomatic definition of projection.

Definition 3. The projection defined on a vector space Y is a mapping

$\ast {\downarrow}_{\#}:Y\times \mathcal{N}\to Y$

where * is a vector in Y, # is a subspace X in 𝒩 satisfying that

(N.1) (Reflectivity).

For any $g\in Y,g{\downarrow}_{Y}=g$ ;

(N.2) (Orthogonal additivity).

For any $g\in Y$ and subspaces $X,Z\in \mathcal{N}$ , if X and Z are orthogonal to each other, then $g{\downarrow}_{X}+\text{\hspace{0.17em}}g{\downarrow}_{Z}\text{\hspace{0.17em}}=g{\downarrow}_{X+Z}$ , where $X+Z$ is the direct sum of X and Z.

(N.3) (Transitivity).

For any $g\in Y$ and subspaces $X,Z\in \mathcal{N}$ , $\left(g{\downarrow}_{X}\right){\downarrow}_{X\cap Z=}g{\downarrow}_{X\cap Z}$ ,

(N.4) (Attribution).

For any $g\in Y$ and subspace $X\in \mathcal{N}$ , $g{\downarrow}_{X}\in X$ , and especially,

(N.5) For any $g\in Y$ and subspace $X\in \mathcal{N}$ , ${g}^{\text{T}}g{\downarrow}_{X}\ge 0$ .

A convention approach to find $g{\downarrow}_{{\alpha}_{1}\cap \cdots \cap {\alpha}_{k}}$ is to compute it directly from the null space $N\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ . We now show another approach that is more suitable to our overall algorithm.

Theorem 1. For any $g\in Y$ , we have

$g{\downarrow}_{{\alpha}_{1}\cap \cdots \cap {\alpha}_{k}}=g-\underset{i}{\overset{k}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{g}^{\text{T}}{o}_{i}$ (8)

where $\left\{{o}_{1},\cdots ,{o}_{k}\right\}$ are an orthonormal basis of subspace $V\left({\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right)$ .

Proof. Since ${\alpha}_{1}\cap \cdots \cap {\alpha}_{k}$ and $\left[{\tau}_{1},{\tau}_{2},\cdots ,{\tau}_{k}\right]$ are the orthonormal decomposition of Y, according to (N.2) and (N.1) we have

$g=g{\downarrow}_{\left[{\tau}_{1},\cdots ,{\tau}_{k}\right]}+\text{\hspace{0.17em}}g{\downarrow}_{{\alpha}_{1}\cap \cdots \cap {\alpha}_{k}}$ .

According to (N.2) and (N.4), the first term becomes

$g{\downarrow}_{\left[{\tau}_{1},\cdots ,{\tau}_{k}\right]}={g}^{\text{T}}{o}_{1}+\cdots +{g}^{\text{T}}{o}_{k}$ .

Hence (8) is true. END

The following theorem shows that the projection of
${g}_{0}$ onto the set of all blocking facets σ_{t} always satisfies Propositions 1 and 2. First, let us simplify the notation and use σ to represent σ_{t} in the following section and
${g}_{0}{\downarrow}_{\sigma}$ to stand for
${g}_{0}{\downarrow}_{{\alpha}_{\sigma \left(1\right)}\cap \cdots \cap {\alpha}_{\sigma \left(k\right)}}$ where
$k=\left|\sigma \right|$ is the number of elements in σ.

Theorem 2. ${\left({\tau}_{\sigma \left(j\right)}\right)}^{\text{T}}\left({g}_{0}{\downarrow}_{\sigma}\right)=0$ and ${g}_{0}^{\text{T}}\left({g}_{0}{\downarrow}_{\sigma}\right)\ge 0$ for all $j=1,\cdots ,k$ .

Proof. Since ${g}_{0}{\downarrow}_{\sigma}$ lies on the intersection of $\left({\alpha}_{\sigma \left(1\right)}\cap \cdots \cap {\alpha}_{\sigma \left(k\right)}\right)$ , it lies on each facet ${\alpha}_{\sigma \left(j\right)}$ for $j=1,\cdots ,k$ . Thus ${g}_{0}{\downarrow}_{\sigma}$ is perpendicular to the normal vector of ${\alpha}_{\sigma \left(j\right)}$ (i.e. ${\left({\tau}_{\sigma \left(j\right)}\right)}^{\text{T}}\left({g}_{0}{\downarrow}_{\sigma}\right)=0$ ). So it satisfies Proposition 2.

According to (N.5), ${g}_{0}^{\text{T}}\left({g}_{0}{\downarrow}_{\sigma}\right)\ge 0$ . So it satisfies proposition 1 too. END

As such, ${g}_{0}{\downarrow}_{\sigma}$ , the projection of ${g}_{0}$ onto all the blocking facets, can be adopted as the next gradient descend vector ${g}_{t}$ . Hence, ${g}_{0}{\downarrow}_{\sigma}$ , the projection of ${g}_{0}$ onto all the blocking facets, can be adopted as the next gradient descend vector ${g}_{t}$ .

3.5.2. Selecting the Sliding Gradient

In this section, we explore other projection vectors which also satisfy Propositions 1 and 2. Let the j^{th} complement blocking set
${\sigma}_{j}^{c}$ be the blocking set σ excluding the j^{th} element; i.e.
${\sigma}_{j}^{c}={\alpha}_{1}\cap \cdots \cap {\alpha}_{j-1}\cap {\alpha}_{j+1}\cap \cdots \cap {\alpha}_{k}$ . We examine the

projection ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ for $j=1,\cdots ,k$ . Obviously, ${g}_{0}^{\text{T}}\left({g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}\right)\ge 0$ according to (N.5) as ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ is a projection of ${g}_{0}$ . So if ${\left({\tau}_{\sigma \left(j\right)}\right)}^{\text{T}}\left({g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}\right)\ge 0$ for all $j\in \sigma $ ,

it satisfies Proposition 2 and hence is a candidate for consideration. For all the candidates, including ${g}_{0}{\downarrow}_{\sigma}$ , which satisfy this proposition, we can compute the inner product of each candidate with the initial gradient descend vector ${g}_{0}$ , (i.e. ${g}_{0}^{\text{T}}\left({g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}\right)$ ) and select the maximum. This inner product is a measure of how close or similar a candidate is to ${g}_{0}$ so taking the maximum means getting the steepest descend gradient. Notice that if a particular ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ is selected as the next gradient descending vector, the corresponding ${\alpha}_{j}$ is no longer a blocking facet in computing ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ . Thus ${\alpha}_{j}$ needs to be removed from ${\sigma}_{t}$ to form the set of effective blocking facets ${\sigma}_{t}^{*}$ . The set of blocking facets for the next iteration ${\sigma}_{t+1}$ is ${\sigma}_{t}^{*}$ plus the newly encountered blocking facet. In summary, the next gradient descend vector ${g}_{t}$ is:

${g}_{t}=\mathrm{max}\left({g}_{0}^{\text{T}}\left({g}_{0}{\downarrow}_{\sigma}\right),{g}_{0}^{\text{T}}\left({g}_{\left[1\right]}\right),{g}_{0}^{\text{T}}\left({g}_{\left[2\right]}\right),\cdots ,{g}_{0}^{\text{T}}\left({g}_{\left[k\right]}\right)\right)$ (9)

where ${g}_{\left[j\right]}={g}_{0}{\downarrow}_{{\sigma}_{j}^{c}};j\in \sigma $ with ${\left({\tau}_{\sigma \left(j\right)}\right)}^{\text{T}}\left({g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}\right)\ge 0$ and $k=\left|\sigma \right|$ .

The effective blocking set ${\sigma}_{t}^{*}$ is

${\sigma}_{t}^{*}=\{\begin{array}{l}{\sigma}_{t}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{if}\text{\hspace{0.17em}}{g}_{t+1}={g}_{0}{\downarrow}_{\sigma}\\ {\sigma}_{t}\backslash \left\{{\alpha}_{j}\right\}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}{g}_{t+1}={g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}\end{array}$ . (10)

At first, this seems to increase the computation load substantially. However, we now show that once ${g}_{0}{\downarrow}_{\sigma}$ is computed, ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ can be obtained efficiently.

3.5.3. Computing the Gradient Projection Vectors

This session discusses a method of computing $g{\downarrow}_{\sigma}$ and $g{\downarrow}_{{\sigma}_{j}^{c}}$ for any vector g. According to (8), $g{\downarrow}_{\sigma}=g{\downarrow}_{{\alpha}_{1}\cap \cdots \cap {\alpha}_{k}}=g-{\displaystyle {\sum}_{i}^{k}{g}^{\text{T}}{o}_{i}}$ . The orthonormal basis $\left\{{o}_{1},{o}_{2},\cdots ,{o}_{k}\right\}$ can be obtained from the Gram Schmidt procedure as follows:

${o}_{1}={\tau}_{1};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{o}_{1}={o}_{1}/\left|{o}_{1}\right|$ (11)

${o}_{2}={\tau}_{2}-{\tau}_{2}^{\text{T}}{o}_{1};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{o}_{2}={o}_{2}/\left|{o}_{2}\right|$ (12)

Let us introduce the notation $a\downarrow b$ to denote the projection of vector a onto vector b. We have $a\downarrow b=\left(\frac{{a}^{\text{T}}b}{{b}^{\text{T}}b}\right)b$ , then ${o}_{2}={\tau}_{2}-{\tau}_{2}\downarrow {o}_{1}$ as $\left({o}_{1}^{\text{T}}{o}_{1}\right)=1$ . Likewise,

${o}_{j}={\tau}_{j}-\underset{i=1}{\overset{j-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\tau}_{j}\downarrow {o}_{i};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{o}_{j}={o}_{j}/\left|{o}_{j}\right|$ . (13)

Thus from (8),

$g{\downarrow}_{\sigma}=g-\underset{i}{\overset{k}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{g}^{\text{T}}{o}_{i}=g-\underset{i=1}{\overset{k}{{\displaystyle \sum}}}\text{\hspace{0.05em}}g\downarrow {o}_{i}$ . (14)

After evaluating $g{\downarrow}_{\sigma}$ , we can find $g{\downarrow}_{{\sigma}_{j}^{c}}$ backward from $j=k$ to 1. Firstly,

$g{\downarrow}_{{\sigma}_{k}^{c}}=g-\underset{i=1}{\overset{k-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}g\downarrow {o}_{i}=g{\downarrow}_{\sigma}+\text{\hspace{0.17em}}g\downarrow {o}_{k}$ . (15)

Likewise, it can be shown that

$g{\downarrow}_{{\sigma}_{j}^{c}}=g-\underset{i=1}{\overset{j-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}g\downarrow {o}_{i}-\underset{i=j+1}{\overset{k}{{\displaystyle \sum}}}g\downarrow {o}_{i}^{\left(j\right)}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}j=2\text{\hspace{0.17em}}\text{to}\text{\hspace{0.17em}}k$ . (16)

The first summation is projections of g onto existing orthonormal basis ${o}_{i}$ . Each term in this summation has already been computed before and hence is readily available. However, the second summation is projections on new basis ${o}_{i}^{\left(j\right)}$ . Each of these basis must be re-computed as the facet ${\alpha}_{j}$ is skipped in ${\sigma}_{j}^{c}$ . Let

${T}_{k}=g+g\downarrow {o}_{k};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{k}=0$ (17)

${T}_{j}={T}_{j+1}+g\downarrow {o}_{j};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{j}=\underset{i=j+1}{\overset{k}{{\displaystyle \sum}}}g\downarrow {o}_{i}^{\left(j\right)}$ . (18)

Then we can obtain $g{\downarrow}_{{\sigma}_{j}^{c}}$ recursively from $j=k,k-1,\cdots ,1$ by:

$g{\downarrow}_{{\sigma}_{j}^{c}}={T}_{j}-{S}_{j}$ . (19)

To compute ${o}_{i}^{\left(j\right)}$ , some of the intermediate results in obtaining the orthonormal basis can also be reused.

Let ${\mu}_{j,1}=0$ for all $j=2,\cdots ,k$ and ${\mu}_{j,i}={\mu}_{j,i-1}+{\tau}_{j}\downarrow {o}_{i}$ for $i=1,\cdots ,j-1$ , then we have

${o}_{j}={\tau}_{j}-{\mu}_{j,j-1};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{o}_{j}={o}_{j}/\left|{o}_{j}\right|\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}j=2,\cdots ,k$ . (20)

The intermediate terms ${\mu}_{j,i}$ can be reused in computing ${o}_{m}^{\left(j\right)}$ as follows:

${o}_{m}^{\left(j\right)}={\tau}_{m}-{\mu}_{m,j-1}-\underset{i=j+1}{\overset{m-1}{{\displaystyle \sum}}}{\tau}_{m}\downarrow {o}_{i}^{\left(j\right)};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{o}_{m}^{\left(j\right)}=\frac{{o}_{m}^{\left(j\right)}}{\left|{o}_{m}^{\left(j\right)}\right|}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}m=+1,\cdots ,k$ . (21)

By using these intermediate results, the computation load can be reduced substantially.

3.6. Termination Criterion

When a new blocking facet is encountered, it will be added to the existing set of blocking facets. Hence both ${\sigma}_{t}$ and ${\sigma}_{t}^{*}$ will typically grow in each iteration unless one of the ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ is selected as ${g}_{t}$ . In this case, ${\alpha}_{\sigma \left(j\right)}$ is deleted from ${\sigma}_{t}$ according to (10). The following theorem, which was first presented in [1] shows that when $\left|{\sigma}_{t}^{*}\right|=m$ , the algorithm can stop.

Theorem 3 (Stopping criterion) Assuming that the dual feasible region 𝒟 is non-empty, let
${P}_{t}\in \mathcal{D}$ and is descending along the initial direction
${g}_{0}=-b$ ; let
$\left|{\sigma}_{t}^{*}\right|$ be the number of effective blocking facets in
${\sigma}_{t}^{*}$ at the t^{th} iteration. If

$\left|{\sigma}_{t}^{*}\right|=m$ and the rank $r\left({\sigma}_{t}^{*}\right)=m$ , then ${P}_{t}$ is a lowest point in the dual feasible region 𝒟.

Proof. If $\left|{\sigma}_{t}^{*}\right|=m$ and the rank $r\left({\sigma}_{t}^{*}\right)=m$ , then the m facets in ${\sigma}_{t}^{*}$ form a cone C with vertex $V={P}_{t}$ . Since the rank is m, its corresponding null space contains only the zero vector. So ${g}_{0}{\downarrow}_{\sigma}={g}_{0}{\downarrow}_{{\alpha}_{\sigma \left(1\right)}\cap \cdots \cap {\alpha}_{\sigma \left(k\right)}}=0$ .

As mentioned about the facet/edge duality in Section 2, for $j=1,\cdots ,m$ , edge-line ${R}_{i}^{+}$ is the intersection of all C-facets except ${\alpha}_{i}$ . That means ${R}_{i}^{+}={\sigma}_{i}^{c}$ . Since an edge-line is a 1-dimensional line, the projection of a vector

${g}_{0}$ onto ${R}_{i}^{+}$ equals to $\pm {r}_{i}$ and hence ${g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}={g}_{t}{\downarrow}_{{R}_{i}^{+}}=\pm {r}_{i}$ . Since ${g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}$ are projections of ${g}_{0}$ , according to (N.5), ${g}_{0}^{\text{T}}\left({g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}\right)\ge 0$ .

Since $\left|{\sigma}_{t}^{*}\right|=m$ , it means that ${g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}$ does not satisfy Proposition 2 for all $i=1,\cdots ,m$ . Otherwise, one of the ${g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}$ would have been selected as the next gradient descend vector and, according to (10), it would be deleted from ${\sigma}_{t}^{*}$ and hence $\left|{\sigma}_{t}^{*}\right|$ would be less than m. This means that at least one of $j\in \sigma $

has a value
${\tau}_{j}^{\text{T}}\left({g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}\right)<0$ . However, for all
$k\ne i$ ,
${g}_{t}{\downarrow}_{{\sigma}_{i}^{c}}$ is in the null space of

${\left(-{g}_{0}\right)}^{\text{T}}{r}_{i}$ means that edge ${r}_{i}$ is in opposite direction of ${g}_{0}$ . As this is true for all edges, there is no path for ${g}_{t}$ to descend further from this vertex. It is obvious that the vertex V is the lowest point of C when viewed in the b direction.

Since ${P}_{t}$ is dual feasible, and V is a vertex of 𝒟. Cone C coincides with the dual feasible region 𝒟 in a neighborhood N of V, it is obvious that ${P}_{t}$ is the lowest point of 𝒟 when viewed in the b direction. END

In essence, when the optimal vertex ${V}^{*}$ is reached, all the edges of the cone will be in opposite direction of the gradient vector ${g}_{0}=-b$ . There is no path to descend further so the algorithm terminates.

4. The Pseudo Code of the Sliding Gradient Algorithm

The entire algorithm is summarized as follows in Table 2.

Step 0 is the initialization step that sets up the tableau and the starting point P. Step 2 is to find a set of initial blocking facets σ in preparation of step 4. In the inner loop, Step 4 calls the Gradient Select routine. It computes ${g}_{0}{\downarrow}_{\sigma}$ and ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ in view of σ using Equations (11) to (21) and select the best gradient vector g according to (9). This routine not only returns g but also the effective blocking facets ${\sigma}^{*}$ and ${g}_{0}{\downarrow}_{{\sigma}_{j}^{c}}$ for subsequent use. Theorem 3 states that when the size of ${\sigma}^{*}$ reaches m, the optimal point is reached. So when it does, step 5 returns the optimal point and the optimal value to the calling routine. Step 6 is to find the closest blocking facet according to (6). Because P lies on every facets of σ, ${t}_{j}=0$ for $j\in \sigma $ . Hence, we only need to compute those ${t}_{j}$ where $j\notin \sigma $ . The newly found blocking facet is then included in σ in step 7 and the

Table 2. The sliding gradient algorithm.

inner loop is repeated until the optimal vertex is found.

5. Implementation and Experimental Results

5.1. Experiment on the Klee-Minty Problem

^{1}Other derivations of the Klee-Minty formulas have also been tested and the same results are obtained.

We use the Klee-Minty example presented in [18] ^{1} to walk through the algorithm in this section. An example of the Klee-Minty Polytope example is shown below:

$\mathrm{max}{2}^{m-1}{x}_{1}+{2}^{m-2}{x}_{2}+\cdots +2{x}_{m-1}+{x}_{m}$ .

For the standard simplex method, it needs to visit all ${2}^{m-1}$ vertices to find the optimal solution. Here we show that, with a specific choice of initial point ${P}_{0}$ , the Sliding Gradient algorithm can find the optimal solution in two iterations―no matter what the dimension m is.

To apply the Sliding Gradient algorithm, we first construct the tableau. For an example with $m=5$ , the simplex tableau is:

The b vector is $b={\left[5,25,125,625,3125\right]}^{\text{T}}$ . After adding the slack variables, the facet tableau becomes:

Firstly, notice that ${\alpha}_{5}$ and ${\alpha}_{10}$ have the same normal vector (i.e. ${\tau}_{5}={\tau}_{10}$ ) so we can ignore ${\alpha}_{10}$ for further consideration. This is true for all value of m.

If we choose ${P}_{0}=Mb$ , where M is a positive number (e.g. $M=100$ ), It can be shown that ${P}_{0}$ is inside the dual feasible region. The initial gradient descend vector is: ${g}_{0}=-b$ .

With ${P}_{0}$ and ${g}_{0}$ as initial conditions, the algorithm proceeds to find the first blocking facet using (6). The displacements ${t}_{j}$ for each facet can be found by:

${t}_{j}=\frac{{c}_{j}-{P}_{0}^{\text{T}}{\tau}_{j}}{{g}_{0}^{\text{T}}{\tau}_{j}}=\frac{{c}_{j}}{-{b}^{\text{T}}{\tau}_{j}}-\frac{M{b}^{\text{T}}{\tau}_{j}}{-{b}^{\text{T}}{\tau}_{j}}=\frac{{c}_{j}}{-{b}^{\text{T}}{\tau}_{j}}+M=M-\frac{{c}_{j}}{{b}^{\text{T}}{\tau}_{j}}$ .

With ${P}_{0}$ and ${g}_{0}$ as initial conditions, the algorithm proceeds to find the first blocking facet using (6). The displacements ${t}_{j}$ for each facet can be found by:

${t}_{j}=\frac{{c}_{j}-{P}_{0}^{\text{T}}{\tau}_{j}}{{g}_{0}^{\text{T}}{\tau}_{j}}=\frac{{c}_{j}}{-{b}^{\text{T}}{\tau}_{j}}-\frac{M{b}^{\text{T}}{\tau}_{j}}{-{b}^{\text{T}}{\tau}_{j}}=\frac{{c}_{j}}{-{b}^{\text{T}}{\tau}_{j}}+M=M-\frac{{c}_{j}}{{b}^{\text{T}}{\tau}_{j}}$ . (22)

We now show that the minimum of all displacements is ${t}_{m}$ .

First of all, at = m, ${\tau}_{m}={\left[0,\cdots ,0,1\right]}^{\text{T}}$ , ${c}_{m}=1$ and ${b}_{m}={5}^{m}$ , so ${t}_{m}=M-{5}^{-m}$ .

For $m<j\le 2m-1$ , ${c}_{j}=0$ , so ${t}_{j}=M>{t}_{m}$ .

For $1\le j<m$ , ${c}_{j}={2}^{m-j}$ , and the elements of ${\tau}_{j}$ are:

${\tau}_{ij}=\{\begin{array}{l}0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}i<j\\ 1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}i=j\\ {2}^{i-j+1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}j<i\le m\end{array}$ .

The 2^{nd} term of Equation (22) can be re-written as:

$\frac{{c}_{j}}{{b}^{\text{T}}{\tau}_{j}}=\frac{1}{{b}^{\text{T}}\left(\frac{{\tau}_{j}}{{c}_{j}}\right)}=\frac{1}{{b}^{\text{T}}\left(\frac{{\tau}_{j}}{{2}^{m-j}}\right)}$

The inner product of the denominator is:

${b}^{\text{T}}\left(\frac{{\tau}_{j}}{{2}^{m-j}}\right)=\underset{i=1}{\overset{m}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{b}_{i}\left(\frac{{\tau}_{ij}}{{2}^{m-j}}\right)=\underset{i=1}{\overset{m-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{b}_{i}\left(\frac{{\tau}_{ij}}{{2}^{m-j}}\right)+{b}_{m}\left(\frac{{2}^{m-j+1}}{{2}^{m-j}}\right)=\underset{i=1}{\overset{m-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{b}_{i}\left(\frac{{\tau}_{ij}}{{2}^{m-j}}\right)+2{b}_{m}$

Since all the elements in the b vector and the $\tau $ are positive, the summation is a positive number. Thus

${b}^{\text{T}}\left(\frac{{\tau}_{j}}{{2}^{m-j}}\right)=\underset{i=1}{\overset{m-1}{{\displaystyle \sum}}}\text{\hspace{0.05em}}{b}_{i}\left(\frac{{\tau}_{ij}}{{2}^{m-j}}\right)+2{b}_{m}>{b}_{m}$

Since the value of the denominator is bigger than ${b}_{m}={5}^{m}$ , we have

$\frac{1}{{b}^{\text{T}}\left(\frac{{\tau}_{j}}{{2}^{m-j}}\right)}<{5}^{-m}$

So

${t}_{j}=M-\frac{{c}_{j}}{{b}^{\text{T}}{\tau}_{j}}=M-\frac{1}{{b}^{\text{T}}\left(\frac{{\tau}_{j}}{{2}^{m-j}}\right)}>M-{5}^{-m}={t}_{m}$ .

Hence ${t}_{m}$ is the smallest displacement. For the case of $m=5$ , their values are shown in the first row (first iteration) of the following Table 3.

Thus ${\alpha}_{m}$ is the closest blocking facet. Hence, ${\sigma}_{1}^{*}={\sigma}_{1}=\left\{{\alpha}_{m}\right\}$ . For the next iteration,

${P}_{1}={P}_{0}+{t}_{m}{g}_{0}=Mb+\left(M-{5}^{-m}\right)\left(-b\right)=Mb-Mb+{5}^{-m}b={5}^{-m}b$ .

The gradient vector ${g}_{1}$ is ${g}_{0}$ projects onto ${\alpha}_{m}$ . Because ${\tau}_{m}={\left[0,\cdots ,1\right]}^{\text{T}}$ is already an orthonormal vector, we have according to (8)

${g}_{1}={g}_{0}-\left({g}_{0}^{\text{T}}{\tau}_{m}\right){\tau}_{m}={g}_{0}-{\left[0,\cdots ,-{5}^{m}\right]}^{\text{T}}={\left[-5,-{5}^{2},\cdots ,-{5}^{m-1},0\right]}^{\text{T}}$ .

In other words,
${g}_{1}$ is the same as
$-b$ except that the last element is zeroed out. Using
${P}_{1}$ and
${g}_{1}$ , the algorithm proceeds to the next iteration and evaluates the displacements
${t}_{j}$ again. For
$j=m+1$ to
$2m-1$ , since
${c}_{j}=0$ and
${\tau}_{j}$ is a unit vector with only one non-zero entry at the j^{th} element,

${t}_{j}=-\frac{{P}_{1}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}=-\frac{{5}^{-m}{b}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}=-{5}^{-m}\frac{{b}_{j}}{-{b}_{j}}={5}^{-m}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}m+1\le j\le 2m-1$ .

Thus the displacements ${t}_{m}$ to ${t}_{2m-1}$ have the same value of ${5}^{-m}$ .

For $1\le j<m$ , we have:

${t}_{j}=\frac{{c}_{j}-{P}_{1}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}=\frac{{c}_{j}-{5}^{-m}{b}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}={5}^{-m}\frac{{5}^{m}{c}_{j}-{b}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}$ .

As mentioned before, ${g}_{1}$ is the same as $-b$ except that the last element is zero, we can express ${b}^{\text{T}}{\tau}_{j}$ in terms of ${g}_{1}^{\text{T}}{\tau}_{j}$ as follows:

${b}^{\text{T}}{\tau}_{j}=-{g}_{1}^{\text{T}}{\tau}_{j}+{b}_{m}{\tau}_{mj}$ .

The numerator then becomes:

${5}^{m}{c}_{j}-{b}^{\text{T}}{\tau}_{j}={5}^{m}{c}_{j}+{g}_{1}^{\text{T}}{\tau}_{j}-{b}_{m}{\tau}_{mj}$ .

Since ${c}_{j}={2}^{m-j}$ , ${c}_{j}={2}^{m-j}$ and ${\tau}_{mj}={2}^{m-j+1}$ , substituting these values to the above equation, the numerator becomes

${5}^{m}{c}_{j}-{b}^{\text{T}}{\tau}_{j}={5}^{m}{2}^{m-j}-{5}^{m}{2}^{m-j+1}+{g}_{1}^{\text{T}}{\tau}_{j}={g}_{1}^{\text{T}}{\tau}_{j}-{5}^{m}{2}^{m-j}$ .

Thus

${t}_{j}={5}^{-m}\frac{{5}^{m}{c}_{j}-{b}^{\text{T}}{\tau}_{j}}{{g}_{1}^{\text{T}}{\tau}_{j}}={5}^{-m}\left(1-\frac{{5}^{m}{2}^{m-j}}{{g}_{1}^{\text{T}}{\tau}_{j}}\right)$ .

Table 3. Displacement values ${t}_{i}$ in each iterations for $m=5$ .

Notice that all elements in ${g}_{1}$ are negative but all of ${\tau}_{j}$ are positive. So the inner product ${g}_{1}^{\text{T}}{\tau}_{j}$ is a negative number. As a result, the last term inside the bracket is a positive number which makes the whole value inside the bracket bigger than one and hence ${t}_{j}>{5}^{-m}$ for $1\le j<m-1$ . Moreover, ${t}_{m}$ is zero as ${g}_{1}$ lies on ${\alpha}_{m}$ . The actual displacement values for the case of $m=5$ are shown in the second row of Table 3.

Since ${t}_{m}$ to ${t}_{2m-1}$ have the same lowest displacement value, all of them are blocking facets so ${\sigma}_{2}^{*}={\sigma}_{2}=\left\{{\alpha}_{m}\right\}\cup \left\{{\alpha}_{m+1},\cdots ,{\alpha}_{2m-1}\right\}=\left\{{\alpha}_{m},{\alpha}_{m+1},\cdots ,{\alpha}_{2m-1}\right\}$ . Also,

${P}_{2}={P}_{1}+{t}_{m+1}{g}_{1}={5}^{-m}b+{5}^{-m}{\left[-5,-{5}^{2},\cdots ,-{5}^{m-1},0\right]}^{\text{T}}={\left[0,0,\cdots ,0,1\right]}^{\text{T}}$ .

Now $\left|{\sigma}_{t}^{*}\right|=m$ , so ${P}_{2}$ has reached a vertex of a cone. According to Theorem 3, the algorithm stops. The optimal value is ${P}_{2}^{\text{T}}b={5}^{m}$ , which is the last element of the b vector.

Thus with a specific choice of the initial point ${P}_{0}=Mb$ , the Sliding Gradient algorithm can solve the Klee-Minsty LP problem in two iterations, and it is independent of m.

5.2. Issues in Algorithm Implementation

The Sliding Gradient Algorithm has been implemented in MATLAB and tested on the Klee-Minty problems and also self-generated LP problems with random coefficients. As a real number can only be represented in finite precision in digital computer, care must be taken to deal with the round-off issue. For example, when a point P lies on a plane ${y}^{\text{T}}\tau =c$ , the value $d={P}^{\text{T}}\tau -c$ should be exactly zero. But in actual implementation, it may be a very small positive or negative number. Hence in step 2 of the aforementioned algorithm, we need to set a threshold δ so that if $\left|d\right|<\delta $ , we regard that point P is laid on the plane. Likewise for the Klee-Minty problem, this algorithm relies on the fact that in the second iteration, the displacement values ${t}_{i}$ for $i=m+1$ to $2m-1$ should be the same and they should all be smaller than the values of ${t}_{j}$ for $j=1$ to $m-1$ . Due to round-off errors, we need to set a tolerant level to treat the first group to be equal and yet if this tolerant level is set too high, then it cannot exclude members of the second group. The issue is more acute as m increases. It will require higher and higher precision in setting the tolerant level to distinguish these two groups.

6. Conclusions and Future Work

We have presented a new approach to tackle the linear programming problem in this paper. It is based on the gradient descend principle. For any initial point inside the feasible region, it will pass through the interior of the feasible region to reach the optimal vertex. This is made possible by projecting the gravity vector to a set of blocking facets and using that as descending vector in each iteration. In fact, the descending trajectory is a sequence of line segments that hug either a single blocking facet or the intersections of them, and each line segment is advancing towards the optimal point. It should be noted that there is no parameters (such as step-size, ..., etc.) to tune in this algorithm although one needs to take care of numerical round-off issue in actual implementation.

This work opens up many areas of future research. On the one hand, we are extending this algorithm so that it can relax the constraint of starting from a point inside the feasible region. Promising development has been achieved in this area though more thorough testing on obscure cases need to be carried out.

On the theoretical front, we are encouraged that, from the algorithm walk-through on the Klee-Minty example, this algorithm exhibits strongly polynomial complexity characteristics. Its complexity does not appear to depend on the bit sizes of the LP coefficients. However, more rigorous proof is needed and we are working towards this goal.

Acknowledgements

The authors wish to thank all his friends for their valuable critics and comments on the research. Special thanks are given to Prof. Yong Shi, Prof. Sizong Guo for their supports. This study is partially supported by the grants (Grant Nos. 61350003, 70621001, 70531040, 90818025) from the Natural Science Foundation of China, and grant (Grant No. L2014133) from Department of Education of Liaoning Province.

References

[1] Wang, P.Z., Lui, H.C., Liu, H.T. and Guo, S.C. (2017) Gravity Sliding Algorithm for Linear Programming. Annals of Data Science, 4, 193-210.

https://doi.org/10.1007/s40745-017-0108-1

[2] Dantzig, G.B. (1963) Linear Programming and Extensions. Princeton University Press, Princeton.

https://doi.org/10.1515/9781400884179

[3] Megiddo, N. (1987) On the Complexity of Linear Programming. In: Bewley, T., Ed., Advances Economic Theory, 5th World Congress, Cambridge University Press, Cambridge, 225-268.

https://doi.org/10.1017/CCOL0521340446.006

[4] Megiddo, N. (1984) Linear Programming in Linear Time When the Dimension Is Fixed. Journal of ACM, 31, 114-127. https://doi.org/10.1145/2422.322418

[5] Smale, S. (1983) On the Average Number of Steps of the Simplex Method of Linear Programming. Mathematical Programming, 27, 251-262.

https://doi.org/10.1007/BF02591902

[6] Todd, M. (1986) Todd, Polynomial Expected Behavior of a Pivoting Algorithm for Linear Complementarity and Linear Programming Problems. Mathematical Programming, 35, 173-192.

https://doi.org/10.1007/BF01580646

[7] Klee, V. and Minty, G.J. (1972) How Good Is the Simplex Method. In: Shisha, O., Ed., Inequalities III, Academic Press, New York, 159-175.

[8] Chvatal, V. (1983) Linear Programming. W.H. Freeman and Company, New York.

[9] Goldfarb, D. and Sit, W. (1979) Worst Case Behavior of the Steepest Edge Simplex Method. Discrete Applied Mathematics, 1, 277-285.

https://doi.org/10.1016/0166-218X(79)90004-0

[10] Jeroslow, R. (1973) The Simplex Algorithm with the Pivot Rule of Maximizing Improvement Criterion. Discrete Mathematics, 4, 367-377.

https://doi.org/10.1016/0012-365X(73)90171-4

[11] Zadeh, N. (1980) What Is the Worst Case Behavior of the Simplex Algorithm? Technical Report 27, Dept. Operations Research, Stanford University, Stanford.

[12] Karmarkar, N.K. (1984) A New Polynomial-Time Algorithm for Linear Programming. Combinatorica, 4, 373-395. https://doi.org/10.1007/BF02579150

[13] Deza, A., Nematollahi, E. and Terlaky, T. (2008) How Good Are Interior Point Methods? Klee-Minty Cubes Tighten Iteration-Complexity Bounds. Mathematical Programming, 113, 1-14.

[14] Barasz, M. and Vempala, S. (2010) A New Approach to Strongly Polynomial Linear Programming.

[15] Amenta, N. and Ziegler, G. (1999) Deformed Products and Maximal Shadows of Polytopes. Contemporary Mathematics, 223, 57-90.

https://doi.org/10.1090/conm/223/03132

[16] Wang, P.Z. (2011) Cone-Cutting: A Variant Representation of Pivot in Simplex. Information Technology & Decision Making, 10, 65-82.

[17] Wang, P.Z. (2014) Discussions on Hirsch Conjecture and the Existence of Strongly Polynomial-Time Simplex Variants. Annals of Data Science, 1, 41-71.

https://doi.org/10.1007/s40745-014-0005-9

[18] Greenberg, H.J. (1997) Klee-Minty Polytope Shows Exponential Time Complexity of Simplex Method. University of Colorado at Denver, Denver.