The alternating least squares (ALS) method has several important applications, e.g.,  - . The origin of the method lies in the field of statistical Principal Components Analysis, e.g.,    . In the modern era, it is widely used in problems where standard SVD methods are not applicable. These problems include, for example, nonnegative matrix factorization     , matrix completion problems        , and tensor approximations      . In this note, we consider the ALS method as means for calculating low-rank approximations of large sparse matrices. Let be a given large sparse matrix, let k be a given integer (the desired matrix rank) which is considerably smaller than , and let
denote the set of all the real matrices whose rank is at most k. Then the term “low-rank approximation” refers to a matrix that approximates A. More precisely, we seek a matrix that solves the problem
where denotes the Frobenius matrix norm. Recall that a rank-k truncated SVD of A provides a solution of (1.1). However, when A is a large sparse matrix, a standard SVD of A can be “too expensive”. This motivates the use of “cheaper” methods which are able to exploit the sparsity of A. The ALS algorithm is aimed at solving the problem
Note that (1.1) and (1.2) are equivalent in the sense that a solution of one problem provides a solution to the other problem. The idea behind the ALS algorithm is rather simple. The iteration, is composed of the following two steps.
Step 1: Given compute to be a solution of the least squares problem
Step 2: Given , compute to be a solution of the least squares problem
The details of the ALS iteration are discussed in the next section.
The orthogonal iterations method has different aim and different motivation. Let be a given symmetric matrix. Then this method is aimed at computing k dominant eigenvectors of G. It is best suited for handling large sparse matrices in which a matrix-vector product needs only flops. It is also assumed that k is considerably smaller than n. Other names of this method are “subspace iterations”, “simultaneous iterations” and “block-Power method”, e.g.,           . The idea behind this method is to use a block version of the Power method that includes frequent orthogonalizations. The iteration, , is composed of the following two steps. It starts with a matrix that has orthonormal columns. The column space of approximates the desired invariant subspace of G.
Step 1: Given , compute the matrix
Step 2: Compute to be a matrix whose columns constitute an orthonormal basis of . In practice is obtained by applying a QR factorization of .
Using the Rayleigh-Ritz procedure it is possible to extract from the corresponding estimates of the desired eigenpairs of G. The details are discussed in Section 3.
The aim of this note is to show that ALS is closely related to “orthogonal iterations”. To see this relation we assume that . In this case the two methods generate the same sequence of subspaces, and the same sequence of low-rank approximations. The proof is given in Section 4.
The equivalence relations that we derive provide important insight into the behavior of both methods. In particular, as explained in Section 3, the rate of convergence of orthogonal iterations is determined by ratios between certain eigenvalues of G. This implies that the rate of convergence of the ALS method obeys a similar rule. Moreover, there are several ways to accelerate orthogonal iterations, and these methods can be adapted to accelerate the ALS method. Conversely, being a minimization method the objective function of the ALS method is monotonic decreasing. This suggests that the orthogonal iterations method has an analogous property.
The relation between ALS and the block-Power method was recently observed by Jain et al.  in the context of matrix completion algorithms. A further discussion of this relation is given in Hardt  . However, the observations made in these works are using several assumptions on the data matrix. For example, it is assumed that A has missing entries, and that the locations of the missing entries (or the known entries) satisfy certain statistical conditions. It is also assumed that the singular vectors of A satisfy a certain coherence requirement, that A is a low-rank matrix, and in   it is assumed to be symmetric. In contrast, our analysis makes no assumption on A. Consequently, the algorithms considered in    are quite different from the classic versions that are discussed below, which yield different results. Anyway, an important consequence made in    is that the convergence properties of orthogonal iterations can be used to understand the behavior of ALS when applied to matrix completion problems. The equivalence relations that we derive in the next sections help to achieve this goal.
2. The Alternating Least Squares (ALS) Method
In this section we describe two versions of the ALS iteration. The basic scheme solves the linear systems by using a QR factorization that is followed by a back substitution process, while the modified scheme avoids back substitution. This reduces the computational effort per iteration and helps to see the relation with orthogonal iterations.
The first step of the basic iteration requires the solution of (1.3). Let , denote the ith row of A, and let , denote the ith row of . Then, by comparing against we see that solves the linear least squares problem
where denotes the Euclidean vector norm. The solution of (2.1) is carried out by applying a QR factorization of of the form
where , , and is an upper triangular matrix.
Similar arguments enable us to solve (1.4). Let , denote the jth column of A, and let, denote the jth row of. Then by comparing against we see that solves the linear least squares problem
The computation of is carried out by applying a QR factorization of that has the form
where, and is an upper triangular matrix.
Assume for simplicity that the matrices and have full column rank, which means that and are invertible matrices. This enables us to express the solutions of Problem (2.1) and (2.3) as
respectively. In practice, a matrix-vector product of the form is computed by solving the system via back substitution. In matrix notations the above equalities are summarized as
Below we describe a modified scheme which save the multiplications by and. (That is, it avoids the back substitution processes.)
The modified scheme is based on the following observations. Let be any invertible matrix, then
The last equality allows us to replace with, which turns (1.3) into the form
while the last problem has explicit solution,
Similarly, it is possible to replace with, which turns (1.4) into the form
while the last problem has explicit solution,
The modified ALS iteration is summarized in the following two steps.
Step 1: Given compute the QR factorization (2.2) and obtain from (2.11).
Step 2: Given compute the QR factorization (2.4) and obtain from (2.13).
Let, denote the rank-k approximations that are generated by the basic ALS iterations (2.7) - (2.8). Then, in exact arithmetic, the modified scheme generates the same sequence of approximations. Consequently since (2.7) - (2.8) are repeatedly solving (1.3) and (1.4), the sequence, has the decreasing property
Moreover, as we now show, both versions satisfy
Consider first the iteration (2.7) - (2.8). Then combining (2.4) and (2.7) yields
while substituting the last expression for into (2.8) gives
Thus, in exact arithmetic (2.15) holds. Similarly, in the modified scheme (2.4) and (2.11) give
while from (2.13) we obtain
In the next sections we shall see that orthogonal iterations share a similar property.
Observe that the modified ALS iteration is not using the matrices and. This feature helps to overcome a possible rank deficiency of or. In classical Gram-Schmidt orthogonalization rank deficiency may cause a gradual loss of orthogonality, but this can be avoided by reorthogonalization and column pivoting, or by using Householder QR with column pivoting. For detailed discussions of the QR factorization see      . The modified ALS iteration has recently been considered in Oseledets et al.  under the name “simultaneous orthogonal iterations”. It is shown there that the modified version is equivalent to ALS and, therefore, has the same rate of convergence.
3. Orthogonal Iterations
Let be a real symmetric matrix with eigenvalues
Then the orthogonal iterations method is aimed at calculating the k largest eigenvalues of G, , and the corresponding eigenvectors. The iteration, , is composed of the following two steps.
Step 1: Given a matrix, compute a QR factorization of:
Step 2: Compute from the rule
The approximation of the desired eigenpairs is achieved by applying the Rayleigh-Ritz procedure. For this purpose the basic iteration is extended with the following three steps.
Step 3: Compute the related Rayleigh-quotient matrix
Step 4: Compute a spectral decomposition of:
where, and is a diagonal matrix such that
(The diagonal entries of are called “Ritz values”.)
Step 5: If desired compute the related matrix of k “Ritz vectors”,
It is important to note that Steps 3 - 5 are not essential for the computation of. Hence it is possible to defer these steps. The orthogonal iterations method and its convergence properties are derived in the pioneering works of Bauer , Jennings  , Stewart , Rutishauser  , and Clint and Jennings . In these papers the method is called the simultaneous iteration. See also the discussions in ( , pp. 54-55), ( , pp. 156-159), ( , pp. 367-368) and ( , pp. 288-299). It is shown there that the computed Ritz values in (3.4) - (3.7) converge toward the corresponding eigenvalues of G. Assume that, then for, the sequence, converges toward, and the asymptotic rate of convergence is determined by the ratio
In other words, the sequence, converges to zero at about the same speed as the sequence. Furthermore, if is a simple eigenvalue then the jth Ritz vector converges toward the jth eigenvector of G and the rate of convergence is determined by the ratio (3.8).
The last observations open the gate for accelerating the basic orthogonal iteration. Below we mention a number of ways to achieve this task.
Increasing the subspace dimension. In this approach the number of columns in the matrix is increased to be where is a small integer. (Typical values of q are k or 2k.) The advantage of this modification is that the convergence ratio changes to
and it can be much smaller than (3.8). The price paid for this gain is that the storage requirements and the computational efforts per iteration are increased. The next acceleration attempts to avoid this penalty.
Power acceleration. In this iteration the updating of in Step 2 is changed to
where is a small integer. The advantage of this modification is that now the convergence ratio is reduced to
Thus one iteration of this kind has the same effect as p iterations of the basic scheme. The main saving is, therefore, a smaller number of orthogonalizations. In practice p is often restricted to stay smaller than 10. The reason lies in the following difficulty. Assume for a moment that. In this case G has a unique dominant eigenvector. Then, as p increases the columns of the matrix tend toward the dominant eigenspace of G, and becomes highly rank-deficient. Other helpful modifications include polynomial acceleration (which is often based on Chebyshev polynomials), and locking (a type of deflation), e.g.,  and .
4. Equivalence Relations
In this section we derive equivalence relations between the ALS method and the orthogonal iterations method. To see these relations we make the assumption that
and use the following notations. Let, denote the sequence of matrices that are generated by the orthogonal iteration method, and let
denote the related QR factorization of. Similarly, let the matrices, , be obtained by the ALS method, and let
denote the related QR factorization of. Then Step 2 of the orthogonal iteration method gives
while from (2.15) we obtain
These equalities lead to the following conclusion.
Theorem 1. Assume that the initial matrices satisfy
Then in exact arithmetic we have
for. In other words, the two methods generate the same sequence of subspaces!
Proof. The proof is a direct consequence of (4.4) and (4.5) using induction on. □
We have seen that the matrix solves (2.10). Hence the rank-k approximation of A that corresponds to has the form
Similarly, the rank-k approximation of A that corresponds to has the form
The next theorem shows that these approximations are equal.
Theorem 2 (Rank-k approximations). Using the former assumptions and notations we have the equality
In other words, the two methods generate the same sequence of rank-k approximations.
Proof. Let be some vector in. Then from (4.8) we see that the projection of on Range () equals the projection of on Range (). That is,
while the last equality implies (4.11).
Corollary 3 (The decreasing property). Recall that the ALS method has the decreasing property (2.14). Now (4.11) implies that this property is also shared by the orthogonal iteration method.
The next lemma helps to convert the decreasing property into an equivalent increasing property.
Lemma 4. Let and be as above. Then
Proof. Let us complete the columns of to be an orthonormal basis of. This gives us an orthonormal matrix
where is an matrix that satisfies
and O denotes a null matrix. Observe that the structure of Z implies the equality
On the other hand, since the Frobenius norm is unitarily invariant,
Corollary 5. Since the sequence is monotonic decreasing, equality (4.13) implies that the sequence is monotonic increasing. That is,
Observe also that
where is the Rayleigh-quotient matrix (3.4), and, are the corresponding Ritz values. This relation leads to the following conclusion, which appears to be a new property of the orthogonal iteration method.
Corollary 6 (A trace increasing property). Let G be a symmetric positive semidefinite matrix and let the matrices, be generated by the orthogonal iterations method. Then
We have seen in Section 3 that the rate of convergence of the orthogonal iterations method depends on the ratio (3.8). Now the equivalence relations that we have proved suggest that the ALS method behaves in a similar way. To state this result more precisely we need the following notations. Let
denote the singular values of A, and let
denote the singular values of,. Then, clearly,
Similarly, since, the singular values of satisfy
These relations lead to the following observation, which seems to be a new property of the ALS method.
Theorem 7. Assume that, then for, the sequence
converges to zero at the same asymptotic rate as the sequence
Proof. From (4.21) and (4.22) we conclude that the sequence
converges to zero at the same asymptotic rate as the sequence (4.24). Yet, since
the sequence (4.23) shares this property. □
The last theorem implies that the rate of convergence of ALS can be improved by increasing the subspace dimension (See Section 3).
The fact that the two methods converge at the same speed raises the question of which iteration is more efficient to use. One advantage of the orthogonal iterations method is that it stores and updates only (estimates for) the right singular vectors of A. This halves the storage requirements and the number of orthogonalizations. The orthogonal iterations method achieves one QR factorization per iteration, while ALS requires two QR factorizations per iteration. The computation of the left singular vectors and the related low-rank approximation of A is deferred to the end of the iterative process. A further saving can be gained by applying Power acceleration.
However, being a variant of the Block Power method, the orthogonal iteration is expected to be slower than Krylov subspace methods that are based on Lanczos algorithm. See, for example, the comparisons in ( , pp. 554-555), ( , pp. 250-252), and ( , pp. 272-275). Recent Krylov methods are using implicitly restarted Lanczos schemes, e.g.,   , and this approach is considerably faster than orthogonal iterations. Consequently the ALS method is expected to be slower than restarted Krylov methods for low-rank approximations, e.g.,    . This drawback is, perhaps, the reason that the use of ALS has been moved to problems in which it is difficult to apply a standard SVD algorithm or a restarted Krylov method.
5. Concluding Remarks
As noted in the introduction, the relations between ALS and the block-Power method were recently observed in the context of matrix completion algorithms. However, the related matrix completion algorithms differ substantially from the classic versions discussed in this paper. Indeed, the equivalence between ALS and orthogonal iterations is somewhat surprising, as these methods are well known for many years, and the basic ALS iteration, which uses back-substitutions, is quite different from the orthogonal iteration. The modified version avoids back substitutions, which helps to see the similarity between the two methods.
The equivalence relations bring important insight into the behavior of both methods. One consequence is that the convergence properties of ALS are identical to those of orthogonal iterations. This means that the rate of convergence of the ALS method is determined by the ratios in (4.24), which appears to be a new result. Similarly, the descent property of ALS implies a trace increasing property of the orthogonal iteration method.
The orthogonal iterations method needs less storage requirements, and less QR factorizations per iteration. In addition, it has a number of useful accelerations. These advantages suggest that replacing ALS with orthogonal iterations might be helpful in some applications. On the other hand, the ALS method can be modified to handle problems that other methods cannot handle, such as non-negative matrix factorizations (NMF), matrix completion problems, and tensor decompositions. The ALS iteration that is implemented in these problems is often quite different from the basic iteration (2.7) - (2.8). Yet in some cases it has a similar asymptotic behavior. This happens, for example, in NMF problems when (nearly) all the entries in the converging factors are positive. Another example is the proximal-ALS algorithm in matrix completion, see ( , p. 134). In such cases the new results provide important insight into the asymptotic behaviour of the algorithm.
 Baglama, J. and Reichel, L. (2013) An Implicitly Restartedf Block Lanczos Bidiagonalization Method, Using Leja Shifts. BIT Numerical Mathematics, 64, 263-293.
 Bai, A., Demmel, J., Dongarra, J., Ruhe, A. and van der Vorst, H. (1999) Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, PA.
 Bauer, F.L. (1957) Das Verfahren der Treppeniteration und verwandte Verfahren zur Lösung algebraischers Eigenwertprobleme. Zeitschrift für angewandte Mathematik und Physik, 8, 214-235.
 Bell, R. and Koren, Y. (2007) Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights. Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, Omaha, 28-31 October 2007, 43-52.
 Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P. and Plemmons, R.J. (2006) Algorithms and Applications for Approximate Nonnegative Matrix Factorization, 2006. Computational Statistics and Data Analysis.
 Clint, M. and Jennings, A. (1970) The Evaluation of Eigenvalues and Eigenvectors of Real Symmetric Matrices by Simultaneous Iteration. The Computer Journal, 13, 76-80.
 Dax, A. (2000) A Modified Gram-Schmidt Algorithm with Iterative Orthogonalization and Column Pivoting. Linear Algebra and Its Applications, 310, 25-42.
 Espig, M. and Khachatryan, A. (2014) Convergence of Alternating Least Squares Optimisation for Rank-One Approximation to High Order Tensors.
 Hardt, M. (2014) Understanding Alternating Minimization for Matrix Completion. Proceedings of the2014 55th Foundations of Computer Science (FOCS), Philadelphia, PA, 18-21 October 2014.
 Jain, P., Netrapalli, P. and Sanghavi, S. (2013) Low-Rank Matrix Completion Using Alternating Minimization. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, ACM, New York, 665-675.
 Jennings, A. (1967) A Direct Iteration Method for Obtaining Latent Roots and Vectors of a Symmetric Matrix. Proceedings of the Cambridge Philosophical Society, 63, 755-765.
 Kiers, H.A.L. (2002) Setting up Alternating Least Squares and Iterative Majorization Algorithm for Solving Various Matrix Optimization Problems. Computational Statistics and Data Analysis, 41, 157-170.
 Kim, H. and Park, H. (2006) Sparse Non-Negative Matrix Factorizations via Alternating Non-Negativity-Constrained Least Squares. In: Proceedings of the IASTED International Conference on Computational and Systems Biology (CASB2006), ACM, New York, 95-100.
 Krijnen, W.P. (2006) Convergence of the Sequences of Parameters Generated by Alternating Least Squares Algorithms. Computational Statistics and Data Analysis, 5, 481-489.
 Kuroda, M., Mori, Y., Lizuka, M. and Sakakihara, M. (2011) Acceleration of the Alternative Least Squares Algorithm. Computational Statistics and Data Analysis, 55, 143-153.
 de Leeuw, J., Young, F.W. and Takane, Y. (1976) Additive Structure in Qualitative Data: An Alternating Least Squares Method with Optimal Scaling Features. Psychometrika, 4, 471-503.
 Oseledets, I.V., Rakhuba, M. and Uschmajew, A. (2018) Alternating Least Squares as Moving Subspace Correction. SIAM Journal on Numerical Analysis, 56, 3459-3479.
 Takane, Y., Young, F.W. and de Leeuw, J. (1977) Nonmetric Individual Differences Multidimensional Scaling: An Alternating Least Squares Method with Optimal Scaling Features. Psychometrika, 42, 7-67.
 Uschmajew, A. (2012) Local Convergence of the Alternating Least Squares Algorithm for Canonical Tensor Approximation. SIAM Journal on Matrix Analysis and Applications, 33, 639-652.
 Wang, L. and Chu, M.T. (2014) On the Global Convergence of the Alternating Least Squares Method for Rank-One Approximation to Generic Tensors. SIAM Journal on Matrix Analysis and Applications, 35, 1058-1072.
 Wang, L., Chu, M.T. and Yu, B. (2015) Orthogonal Low Rank Tensor Approximation: Alternating Least Squares Method and Its Global Convergence. SIAM Journal on Matrix Analysis and Applications, 36, 1-19.
 Wu, K. and Simon, H. (2000) Thick-Restarted Lanczos Method for Large Symmetric Eigenvalue Problems. SIAM Journal on Matrix Analysis and Applications, 22, 602-616.
 Young, F.W., de Leeuw, J. and Takane, Y. (1976) Regression with Qualitative and Quantitative Variables: An Alternating Least Squares Method with Optimal Scaling Features. Psychometrika, 41, 505-529.
 Young, F.W., Takane, Y. and de Leeuw, J. (1978) Principal Components of Mixed Measurement Level Multivariate Data: An Alternating Least Squares Method with Optimal Scaling Features. Psychometrika, 43, 279-281.
 Zacharih, D., Sundin, M., Jansson, M. and Chatterjee, S. (2012) Alternating Least-Squares for Low-Rank Matrix Reconstruction. Signal Processing Letters, 19, 231-234.