The problem of exploiting low-dimensional structures in high-dimensional data is taking on increasing importance in image, text and video processing, and web search, where the observed data lie in very high dimensional spaces. The principal component analysis (PCA) proposed in  is the most widely used tool for low-rank component extraction and dimensionality reduction. It is easily implementable and efficient for data mildly corrupted by small noise. However, this PCA method is sensitive to data corrupted by heavy impulse noise or outlying observations. Then, a number of robust PCA methods were proposed. But none of these approaches yield a polynomial-time algorithm with strong performance guarantees under broad conditions. The proposed Robust PCA  is one among the earliest polynomial-time algorithms. Assume that a data matrix consists of a low-rank matrix and a sparse matrix . Then, and can be recovered with a high probability by solving the following convex relaxation problem (if is low-rank and satisfies some incoherent conditions, is sufficiently sparse):
where denotes the nuclear norm of and denotes the -norm of . Nuclear norm and -norm are used to induce low rank and sparsity, specifically. is a parameter balancing the low rank and sparsity. Candes et al.  prove that works with a high probability for recovering any low-rank, incoherent matrix. Notably, Chandrasekaran et al.  also consider the problem of decomposing a given data matrix into sparse and low-rank components, and give sufficient conditions for convex programming to succeed. Their work was motivated by applications in system identification and learning of graphical models. In contrast, Candes et al.  were motivated by robust principal component computations in high dimensional settings when there were erroneous and missing entries; missing entries were not considered in  . In  , the parameter is data-dependent, and may have to be selected by solving a number of convex programs, while Candes et al.  provide a universal value of , namely, . The distinction between the two results is a consequence of different assumptions about the origin of the data matrix .
In many real world applications, we need to consider the model defined in Equation (1) under more complicated circumstance   . First, only a fraction of entries of can be observed. This is the well-known matrix completion problem. If the unknown matrix is known to have low rank or approximately low rank, then accurate and even exact recovery is possible by nuclear norm minimization   . Second, the observed data are corrupted by both impulse noise (sparse but large) and Gaussian noise (small but dense). Let be the superposition of low-rank matrix , the impulse noise matrix and the Gaussian noise , i.e., . The Gaussian noise of the observed entries is assumed to be small in the sense that , where is the Gaussian noise level and is the Frobenius norm. Then, to be broadly applicable, we consider the following extension of model defined in Equation (1):
where is a subset of the index set of entries . It’s assumed that only these entries can be observed. The operator is a orthogonal projection onto the span of matrices vanishing outside of so that the ij-th entry of is if and zero otherwise. The problem defined in Equation (2) can be solved by the classical Augmented Lagrangian Method (ALM). The separable structure emerging in the objective function and the constraints entails the idea of splitting the corresponding augmented Lagrangian function to derive more efficient numerical algorithms. Tao et al.  developed the Alternating Splitting Augmented Lagrangian Method (ASALM) and its variant (VASALM), and the Parallel Splitting Augmented Lagrangian method (PSALM) for solving Equation (2).
One shortcoming of model defined in Equation (2) is that it can only handle matrix (two-way) data. However, the real-world data are ubiquitously in multi-way, also referred to as tensor. For example, a color image is a 3-way object with column, row and color modes; a greyscale video is indexed by two spatial variables and one temporal variable. If we use the model defined in Equation (2) to process the tensor data, we have to unfold the multi-way data into a matrix. Such a preprocessing usually leads to the loss of the inherent structure high-di- mensional information in the original observations. To avoid this negative factor, a common approach is to manipulate the tensor data by taking the advantage of its multi-dimensional structure. Tensor analysis have many applications in computer vision  , diffusion Magnetic Resonance Imaging (MRI)    , quantum entanglement problem  , spectral hypergraph theory  and higher-order Markov chains  .
The goal of this paper is to study the Tensor Robust PCA which aims to accurately recover a low-rank tensor from impulse and Gaussian noise. The observations can also be incomplete. Tensors of low rank appear in a variety of applications such as video processing (d = 3)  , time-dependent 3D imaging (d = 4), ray tracing where the material dependent bidirectional reflection function is an order four tensor that has to be determined from measurements  , numerical solution of the electronic Schrödinger equation ( , where N is the number of particles)    , machine learning  and more. More specifically, suppose we are given a data tensor ( is the number of ways), and it can be decomposed as
where is low-rank and is sparse. is Gaussian noise with the noise level being . Then, we try to recover the low-rank through the following convex relaxation problem:
1.1. Related Work
Although the recovery of low-rank matrix has been well studied, the research of low-rank tensor recovery is still lacking. This is mainly because it’s difficult to define a satisfactory tensor rank which enjoys similar good properties as the matrix case. Several different definitions of tensor rank have been proposed but each has its limitation. For example, the CP rank  is defined as the smallest number of rank one tensors that sum up to , where a rank one tensor is of the form . Expectedly, the CP rank is NP-hard to compute. Its convex relaxation is also intractable. Another more popular direction is to use the tractable Tucker rank  and its convex relaxation. For a d-way tensor , the Tucker rank is a vector defined as
where is the mode-i matricization of . Motivated by the fact that the nuclear norm is the convex envelop of the matrix rank within the unit ball of the spectral norm. The Sum of Nuclear Norms (SNN), defined as , is used as a convex surrogate of the Tucker rank. This approach is effective, but SNN is not a tight convex relaxation of Tucker rank.
More recently, the work  proposed the tensor tubal rank based on a new tensor decomposition scheme denoted as t-SVD. The t-SVD is based on a new tensor-tensor product which enjoys many similar properties as the matrix case. Based on the computable t-SVD, the tensor nuclear norm  is used to replace the tubal rank for low-rank tensor recovery (from incomplete/corrupted tensors). The problem of recovering the unknown low-rank tensor from incomplete samples can be solved by the following convex program  ,
where is the nuclear norm of , is the index set of known elements in the original tensor, and is the projector onto the span of tensors. Lu et al.  extended the work and studied the Tensor Robust Principal Component (TRPCA) problem, as defined in the following equation:
In this work, we go one step further, and consider recovering low-rank and sparse components of tensors from incomplete and noisy observations as defined in Equation (4).
1.2. Paper Contribution
The contributions of this work are two-fold:
・ A unified convex relaxation framework is proposed for the problem of recovering low-rank and sparse components of tensors from incomplete and noisy observations. Three augmented-Lagrangian-based algorithms are developed for the optimization problem.
・ Numerical experiments on synthetical data validate the efficacy of our proposed denoising approach.
1.3. Paper Organization
The rest of the paper is organized as follows. In Section 2, some preliminaries that are useful for the subsequent analysis are provided. In Section 3, three augmented-Lagrangian-based methods are developed for the problem defined in Equation (4). In Section 4, some numerical experiments verify the justification of the model defined in Equation (4) and the efficiency of the proposed algorithms. Finally, in Section 5, we make some conclusions and discuss some topics for future work.
In this section, we list some lemmas concerning the shrinkage operators, which will be used at each iteration of the proposed augmented Lagrangian type methods to solve the generated subproblems.
Lemma 1. For , and , the solution of the following problem (7) obeys
is given by . is a soft shrinkage operator and defined as:
Lemma 2. Consider the singular value decomposition (SVD) of a matrix of rank .
where and are orthogonal, and the singular values are real and positive. Then, for all , define the soft-thresholding operator ,
where is the operator that . Then, for each and , the singular value shrinkage operator (10) obeys
An alternative model to study the problem defined in Equation (4) is the following nuclear-norm- and -norm- normalized least squares problem:
Equation (12) can be reformulated into the following favourable form:
Alternating Direction Method of Multiplier (ADMM), which is an extension of ALM algorithm, can be used to solve the tensor recovery problem defined in (13). With given , the ADMM generate the new iterates via the following scheme:
See Algorithm 1 for the optimization details.
3.1. Stopping Criterion
It can be easily verified that the iterates generated by the proposed ADMM algorithm can be characterized by
which is equivalent to
Algorithm 1. Optimization framework for problem defined in Equation (13).
Equation (16) shows that the distance of the iterates to the
solution can be characterized by and . Thus, a straightforward stopping criterion for Algorithm 1 is:
Here is an infinitesimal number, e.g., 10−6.
3.2. Convergence Analysis
In this subsection, we mainly analyze the convergence of ADMM for solving problem defined in Equation (13). We denote , , and . is strongly convex, while and are convex terms, but may not be strongly convex. The problem defined in Equation (13) can be reformulated as
Definition 1. (Convex and Strongly Convex) Let , if the domain of denoted by is not empty, is considered to be proper. If for any and , we always have , then it is considered that is convex. Furthermore, is considered to be strongly convex with the modulus , if
Cai et al.  and Li et al.  have proved the convergence of Extended Alternating Direction Method of Multipliers (e-ADMM) with only one strongly convex function for the case m = 3.
Assumption 1. In Equation (18), and are convex, and is strongly convex with modulus .
Assumption 2. The optimal solution set for the problem defined in Equation (18) is nonempty, i.e., there exist such that the following requirements can be satisfied:
Theorem 1. Assume that Assumption 1 and Assumption 2 hold. Let be the sequence generated by Algorithm 1 for solving the problem
defined in Equation (18). If , the limit point of
is an optimal solution to Equation (18). Moreover, the objective function con- verges to the optimal value and the constraint violation converges to zero, i.e.,
where denotes the optimal objective value for the problem defined in Equa-
tion (18). In our specific application, can sufficiently ensure
the convergence  .
3.3. Parameter Choice
In our optimization framework given in Equation (13), there are three parameters , and . As mentioned in Lu  , does not need any tuning and can be set to , where . Besides, the value of
is limited to the range to ensure the convergence of our algo-
rithm (based on the analysis in Theorem 1). Thus, the value of is important for the performance of our algorithm. For simplicity, we consider the case when is only degraded by Gaussian noise without sparse noise , that is:
The solution for Equation (26) is equal to but with singular values being shifted towards zero by soft thresholding. should be set large enough to remove noise (i.e., to keep the variance low), and small to avoid over-shrinking of the original tensor A (i.e., to keep the bias low). For the matrix case (i.e., ), Candes et al.  have deduced the proper value of , as shown in the following theorem.
Theorem 2. Supposing that the Gaussian noise term , and each entry is iid normally distributed, we can have that for ,
with high probability. Then, . That is, .
Based on this conclusion, we derive the required conditions for convex program defined in Equation (13) to accurately recover the low-rank component from corrupted observations. Our derivations are given in the following main result.
Main Result 1. Assume that the low-rank tensor obeys the incoherence conditions  . The support set Ω of is uniformly distributed among all sets of cardinality m. Then, there exist universal constants such that with probability at least , is the unique minimizer to problem defined in Equation (13). The values of and can be determined
as and . In the same time, the rank of
and the number of non-zero entries of should satisfy that
where and . ρr and ρs are positive constants.
The value of penalty parameter β should be within the range of to
ensure the convergence.
4. Experiments on Synthetic Data
In this section, we conduct synthetic data and real data experiments to corroborate our algorithm. We investigate the ability of our proposed Robust Low Rank Tensor Approximation (denoted as RLRTA) algorithm for recovering low-rank tensors of various tubal rank from noises of various sparsity and random Gaussian noise of different intensity.
4.1. Exact Recovery from Varying Sparsity of
We first verify the correct recovery performance of our algorithm for different sparsity of . Be similar to  , we consider the tensors of size , with varying dimensions . We generate a tensor , where the entries of and are independently sampled from a uniform distribution in interval . The support set , with size of sparse component is chosen uniformly at random. For all , let , where is a tensor with independent Bernoulli entries. For , it can be mathematically expressed as
where is the abbreviation of “with probability”. We test on two settings: the first scenario with setting and . The second scenario with setting and .
The Gaussian noise in each frontal slice is generated independently with each other, i.e.
The variance values in each frontal slice are randomly selected from 0 to 0.1. In this sub-subsection 1, our task is to recovery from noisy observation with of varying sparsity. Table 1 and Table 2 show the
Table 1. Correct recovery for random problems of varying sparsity using RLRTA.
Table 2. Correct recovery for random problems of varying sparsity using TRPCA  .
recovery results of algorithm RLRTA and TRPCA. It’s shown that RLRTA can better recover the low-rank compnent under different sparse component .
4.2. Exact Recovery with of Varying Intensity
Now we exam the recovery phenomenon with Gaussian noise of varying variances. The generation of is the same as that in sub-subsection 1 and . The sparse component has sparsity . For simplicity, we assume that is white Gaussian noise, that is
where . The noise variance values are 0.02, 0.04, 0.06, 0.08 and 0.1, respectively Table 3 and Table 4 show the recovery results of algorithm RLRTA and TRPCA. It’s shown that RLRTA can better recover the low-rank compnent under different Gaussian noise .
4.3. Phase Transition in Rank and Sparsity
Now we exam the recovery phenomenon with varying rank of and varying sparsity of . Similar to  , we consider two sizes of : (1)
Table 3. Correct recovery for random problems of varying intensity using RLRTA.
Table 4. Correct recovery for random problems of varying intensity using TRPCA  .
, (2) . We generate , where the entries of and are independently sampled from a uniform distribution in interval . For , we still consider a Bernoulli model for its support and random signs as in Equation (27). The variance values in each frontal slice are randomly selected from 0 to 0.1, and the mean variance values are both set to be 0.05.
We set as all the choices in , and ρs in . For each -pair, we simulate 10 test instances and declare a trial to be successful if the recovered satisfies . Figure 1 plots
Figure 1. Correct recovery for varying rank and sparsity for RLRTA and TRPCA  . Fraction of correct recoveries, as a function of ( -axis) and sparsity of ( -axis).
the fraction of correct recovery for each pair (black = 0% and white = 100%). It can be seen that there is a large region in which the recovery is correct.
4.4. Phase Transition in Rank and Entry-Wise Noise Intensity
Now we exam the recovery phenomenon with varying rank of and varying intensity of noise . We still consider two sizes of : (1) , (2) . We generate , where the entries of and independently sampled from a uniform distribution in interval . For , we still consider a Bernoulli model for its
Figure 2. Correct recovery for varying rank and noise variance for RLRTA and TRPCA  . Fraction of correct recoveries, as a function of ( -axis) and variance of ( -axis).
support and random signs as in Equation (27) and sparsity parameter ρs is fixed at 0.1. The generation of is similar to Equation (29).
We set as all the choices in . The noise variance values are in . For each -pair, we simulate 10 test instances and declare a trial to be successful if the recovered satisfies . Figure 2 plots the fraction of correct recovery for each pair (black = 0% and white = 100%). It can be seen that there is a large region in which the recovery is correct.
This work verifies the ability of convex optimization for the recovery of low- rank tensors corrupted by both impulse and Gaussian noise. The problem is tackled by integrating the tensor nuclear norm, -norm and least square term in a unified convex relaxation framework. Parameters are selected to comprise the low-rank component, the sparse component and the Gaussian-noise term. Besides, the convergence of the proposed algorithm is discussed. Numerical experiments have been conducted to demonstrate the efficacy of our proposed denoising approach.
The authors would like to thank Canyi Lu for providing the code for TRPCA algorithm.