(1. College of Mathematics and Statistics, Henan University, Kaifeng 475000, China; 2. College of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471023, China;3. College of Software, Henan University, Kaifeng 475000, China)
Abstract: The task of dividing corrupted-data into their respective subspaces can be well illustrated, both theoretically and numerically, by recovering low-rank and sparse-column components of a given matrix. Generally, it can be characterized as a matrix and a-norm involved convex minimization problem. However, solving the resulting problem is full of challenges due to the non-smoothness of the objective function. One of the earliest solvers is an 3-block alternating direction method of multipliers (ADMM) which updates each variable in a Gauss-Seidel manner. In this paper, we present three variants of ADMM for the 3-block separable minimization problem. More preciously, whenever one variable is derived, the resulting problems can be regarded as a convex minimization with 2 blocks, and can be solved immediately using the standard ADMM. If the inner iteration loops only once, the iterative scheme reduces to the ADMM with updates in a Gauss-Seidel manner. If the solution from the inner iteration is assumed to be exact, the convergence can be deduced easily in the literature. The performance comparisons with a couple of recently designed solvers illustrate that the proposed methods are effective and competitive.
Keywords: Convex optimization; Variational inequality problem; Alternating direction method of multipliers; Low-rank representation; Subspace recovery
Given a set of corrupted-data samples drawn from a union of linear subspaces, the goal of the subspace recovery problem is to segment all samples into their respective subspaces and correct the possible errors simultaneously. The problem has recently attracted much attention because of its wide applications in the fields of pattern analysis, signal processing, and data mining,etc..
whereλ>0 is a positive weighting parameter;A∈Rm×nis a dictionary which is actually column full rank;is a nuclear norm (trace norm or Ky Fan norm [12]) defined by the sum of all singular values;is-mixed norm defined by the sum of the-norm of each column of matrix. The nuclear norm is the best convex approximation of the rank function over the unit ball of matrices under the spectral norm [12]. The-norm represents the sparse component ofX, which shows that some data samples are corrupted while the others are keeping clean. When getting the minimizer (Z?,E?) of problem (1.1), the original datacan be reconstructed by settingX ?E?(orAZ?).
Additionally, the minimizerZ?is named as the lowest-rank representation of dataXwith respect to a dictionaryA.
Problem (1.1) is convex since it is separately convex in each of the terms. However, the nonsmoothness of the nuclear norm and-norm makes it too challenging task to minimize. On the one hand, this problem can be easily recast as a semi-definite programming problem, and solved by solvers such as [15] and [13]. On the other hand, it falls into the framework of the alternating direction method of multipliers (ADMM), which is used widely in variety of practical fields,such as, image processing [2,6,11], compressive sensing [17,19], matrix completion [4], matrix decomposition [9,14,16], nuclear norm minimization [18] and others. The earliest approach [9]reformulated problem (1.1) by adding an auxiliary variable, and minimized the corresponding augmented Lagrangian function with respect to with each variable in a Gauss-Seidel manner.Another alternative approach [16] solved (1.1) by using a linearized technique. More precisely,with one variable fixed, it linearized the subproblem to ensure that the closed-form solution is easily derived.
In this paper, unlike all the mentioned algorithms, we propose three variants of ADMM for problem (1.1). In the first variant, we transfer (1.1) into an equivalent formulation by adding a new variableJ. Firstly, by fixing two variables, we minimize the corresponding augmented Lagrangian function to produce the temporary value of one variable. Secondly, by fixing the variable with its latest value, we treat the resulting subproblem as a new convex optimization problem but with fixed Lagrangian multipliers. Thus, it falls into the framework of the classic ADMM again. It is experimentally shown that the number of the inner loops greatly influence the whole performance of the algorithm. Meanwhile, the method reduces to the standard 3-block ADMM when the inner loop goes only once. Moreover, we design other two alternative versions of ADMM from different observation. The convergence of each proposed algorithm is analyzed under the assumption that the subproblem is solved exactly. Numerical experiments indicate that the proposed algorithms are promising and competitive with the recent solvers SLAL and LRR.
The rest of this paper is organized as follows. In section 2, some notations and preliminaries which are used later are provided; a couple of recent algorithms are quickly reviewed; the motivation and iterative framework of the new algorithms are also included. In section 3, the convergence of the first version of the algorithm is established. In section 4, another variant of ADMM from a different observation together with its convergence are presented. In section 5,some numerical results which show the efficiency of the proposed algorithms are reported; the performance comparisons with other solvers are also included. Finally, in section 6, the paper is concluded with some remarks.
In this subsection, we summarize the notations which are used in this paper. The matrices are denoted by uppercase letters and vectors by lowercase letters. Given a matrixX, itsi-th row andj-th column are denoted respectively by[X]i,:and[X]:,j, andxi,jis the(i,j)-th component.The-norm,-norm, and Frobenius norm of matrix are defined respectively by
For any two matricesX,Y ∈Rn×t, we define the standard trace inner product inthenFor a symmetric and positive definite matrixM ∈Rn×n, we defineSymbolis defined as the transpose of a vector or a matrix.
Now, we list two important results, which are very useful to construct our algorithm.
Theorem 2.1.[1,10] Given Y ∈Rm×n of rank r, let
be the singular value decomposition (SVD) of Y. For each μ>0, we let
where {·}+=max(0,·). It is shown that Dμ(Y)obeys
Theorem 2.2.[5] Let Y ∈Rm×n be a given matrix, and Sμ(Y)be the optimal solution of
then the i-th column of Sμ(Y)is
This subsection is devoted to review a couple of existing algorithms. The corresponding augmented Lagrangian function of (1.1) is
where Λ∈Rm×nis the Lagrangian multiplier andμ>0 is a penalty parameter. For fixed(Ek,Zk,Λk), the next triplet (Ek+1,Zk+1,Λk+1) can be generated via
For subproblem (2.3a), it can be easily deduced that
On the other hand, fixing the latestEk+1, the subproblem (2.3b) with respect toZcan be characterized as
For most of the dictionary matrixA, the closed-form solution of (2.5) is not easily derived.SLAL [16] linearizes the quadratic function and adds a proximal point term which ensure that the solution can be obtained explicitly.
In a different way, another solver LRR [9] adds a new variableJ ∈Rn×nto model (1.1) and converts it to the following equivalent form:
The augmented Lagrangian function of (2.6) is
where Λ∈Rm×nand Γ∈Rn×nare the Lagrangian multipliers. LRR minimizesL2(E,Z,J,Λ,Γ)firstly with respect toE, later withZ, and then withJby fixing the other variables with their latest values. More precisely, with the given (Ek,Zk,Jk), the new iterate (Ek+1,Zk+1,Jk+1) is generated by
The attractive feather of the above iterative scheme is that each variable permits closed-form solution.
In this subsection, we turn our attention to construct the new version of ADMM, namely nested minimizing algorithm here. Given (Ek,Zk,Jk,Λk,Γk), the nextEk+1is derived by
IfZandJare grouped as one variable, for fixedEk+1, it is easy to deduce that
Hence, (Zk+1,Jk+1) can also be considered as the solution of the minimization problem by standard Lagrangian function method but with fixed multipliers Λkand Γk,
Fortunately, the favorable structure in both objective function and the constraint makes the resulting problem also fall into the framework of classic ADMM.
For given (Ek+1,Zk,Jk), we letFor fixedthe next paircan be attained by the following alternating scheme:
Firstly, the subproblem (2.13a) is equivalent to
Clearly, (2.14) is a quadratic programming problem with respect toand can be further expressed as
Secondly, the solution of subproblem (2.13b) with respect tocan be described as
In summary,the algorithm named Nested Minimization Method(NMM_v1)can be described as follows.
Algorithm 2.1 (NMM_v1).
Remark 2.1.If the inner iteration goes only once without achieving convergence, then it reduces to the iterative form (2.8), where each variable is updated in a Gauss-Seildel way. Owing to the fact that the exact solution is not achieved as only one step goes in the inner iteration,the3-block ADMM can not converge globally (see [3]).
Remark 2.2.The optimality condition of (2.6) (or (1.1)) can be characterized by finding the solution(E?,Z?,J?)∈Rm×n×Rn×n×Rn×n and the Lagrangian multipliersΛ?andΓ?satisfying the Karush-Kuhn-Tucker system
At each iteration, the triple(Ek+1,Zk+1,Jk+1)generated by NMM_v1 satisfies
Comparing the optimal conditions (2.18a)-(2.18e) with (2.19a)-(2.19e), it is clearly observed that the whole iteration process could be terminated ifΛk+1?Λk,Γk+1?Γk and Zk+1?Zk are all small enough. In other words, for positive constant >0, the stopping criteria should be
From optimization theory, it is clear to see that the variables can be reordered by minimizing firstly with respect toJ, later withZ, and then withEby fixing the other variables with their latest values. More precisely, with given (Jk,Zk,Ek,Λk,Γk), the next iterate(Jk+1,Zk+1,Ek+1,Λk+1,Γk+1) can be generated via the following scheme — named Nested Minimization Method with version two (NMM_v2).
Algorithm 2.2 (NMM_v2).
This section is dedicated to establish the global convergence of algorithm NMM_v1. The convergence properties of second version NMM_v2 can be analyzed in a similar way. Hence, we omit it here. Throughout this paper, we make the following Assumption.
Assumption 3.1.There exists a matrix pair(E?,Z?,J?,Λ?,Γ?)∈Rm×n×Rn×n×Rn×n×Rm×n×Rn×n satisfying the Karush-Kuhn-Tucker system(2.18).Besides the above Assumption,we also make the following Assumption on Algorithm NMM_v1.
Assumption 3.2.The{(Zk+1,Jk+1)} is the exact solution of the resulting convex minimization(2.12).
For convenience, we set
whereIis an identity matrix and 0 is a so-called zero matrix in which its elements are all zero.Using these symbols, problem (2.6) is thus transformed into
Let ?=Rm×n×Rn×n×Rn×n×R(m+n)×n. As a result, solving (3.2) is equivalent to findto satisfy the following variable inequalities problem
Using the notations in (3.1) andthe augmented Lagrangian function in (2.7) can be rewritten as
Moreover, it is not difficult to deduce that the subproblem (2.9) onEis equivalent to
It is also easy to see that the subproblem (2.11) on variablesZandJis identical with
Finally, the compact form of (2.16) and (2.17) equals to
The subproblem (3.5) can be reformulated as the variable inequality scheme. That is, findEk+1such that
Similarly, the problem (3.6) is equivalent to findZk+1andJk+1such that
By (3.7), it holds that
Using the above equality, (3.8) can be rewritten as
In a similar way, (3.9) is reformulated as
For the sake of simplicity, we denote
Combining with (3.10) and (3.11), it yields that
Furthermore, combining with (3.7), it holds that
Recalling the definition ofWand letting
then the inequality (3.13) is equivalent to
Let
To establish the desired convergence theorem, we firstly list some useful lemmas.
Lemma 3.1.Suppose that Assumptions 3.1 and 3.2 hold. Let {Wk+1}={(Ek+1,Zk+1,Jk+1,be generated by Algorithm 2.1. Then we have
Proof.SettingW=W?in (3.14), we obtain
By the monotonicity of operator Φ is easy to see that
The first inequality is due to the monotonicity of Φ and the second one comes from (3.4) by recalling the symbols ofW,Fand Φ. Hence, the claims of this lemma is derived.
By using the above lemma, it is easy to attain the following result,
Lemma 3.2.Suppose that Assumptions 3.1 and 3.2 hold. Let {Wk+1}={(Ek+1,Zk+1,Jk+1,be generated by Algorithm 2.1. Then we have
and
Proof.By usingwe have
Since (3.11) is true for anyk, we can get
SettingZ=Zk+1in (3.11) andZ=Zkin (3.15), respectively, and adding both sides of the resulting inequalities, we have
which shows the statement of this lemma.
It is not difficult to deduce that both lemmas indicate the following fact.
Lemma 3.3.Suppose that Assumptions 3.1 and 3.2 hold. Let the sequence {(Ek+1,Zk+1,Jk+1,be generated by Algorithm 2.1. Then we have
For any matricesX,Y, and a symmetric definite matrixM, define
Theorem 3.1.Suppose that Assumptions 3.1 and 3.2 hold. Let the sequence{(Ek+1,Zk+1,Jk+1,be generated by Algorithm 2.1. Then we have
Proof.We have
The proof is completed.
The theorem shows that the sequenceis bounded, and
which is essential for the convergence of the proposed method. Recalling the definition ofit also holds that
To end this section, we state the desired convergence result of our proposed algorithm.
Theorem 3.2.Suppose that Assumptions 3.1 and 3.2 hold. Let {(Ek,Zk,Jk,Λk,Γk)} be the sequence generated by Algorithm 2.1 from any initial points. Then the sequence converges to(E?,Z?,J?,Λ?,Γ?), i.e., the solution of equivalent model (3.2).
Proof.It follows from(3.16)and(3.17)that there exists an index setkjsuch thatZkj →Z?,Additionally, since
andZk ?Zk?1→0 and Λk ?Λk?1→0, it implies that
It follows from (2.16) that
or equivalently,
By taking limit on both sides, it yields
Similarly, it follows (2.17) that
Taking limit on both sides, it holds
Moreover, by (2.19a), we get
Taking limit of both sides withkjon the above inequality and note (3.18), we get
Similarly, from (2.19b) and (2.19c), we obtain respectively,
and
Note that (3.18) — (3.22) is exactly the optimal condition (2.18a) — (2.18e), it thus concludes that (E?,Z?,J?) is a solution of problem (3.2).
This section is devoted to develop another version of nesting minimizing algorithm for solving problem (1.1) from different observations. Reconsidering the original model and its augmented Lagrangian function (2.2), it clearly shows thatEk+1andZk+1are derived by (2.4)and (2.5), respectively. SettingHk+1=X ?Ek+1+Λk/μ, then (2.5) is reformulated as
which indicates thatZk+1is the minimizer of the following optimization problems with auxiliary variableJ,
Since the objective function and the constraint are both separable, it falls into the framework of the classic ADMM again. The augmented Lagrangian function of (4.2) is
where ?!蔙n×nis a Lagrangian multiplier. Staring fromand from giventhe ADMM generates the next pairby
Simple computing yields that
and
In short, the algorithm named Nested Minimizing Method with version three (NMM_v3)can be stated as follows.
Algorithm 4.1 (NMM_v3).
Remark 4.1.Comparing with Algorithm 2.1, it can be clearly seen that the significant difference between both algorithms is the updating of the multiplierΓk related to the constraint J ?Z=0.This multiplier is updated at the outer iteration process because the auxiliary variable J has been added to the original model (1.1) for obtaining (2.6), while it is updated at the inner loop owing to model (4.2) which is used only as a subproblem for deriving the next Zk+1.
Remark 4.2.Similar as Remark 2.2, the optimal condition of (1.1) can be characterized by finding the solution(E?,Z?)∈Rm×n×Rn×n and the corresponding Lagrangian multiplierΛ?such that
At each iteration, the triple(Ek+1,Zk+1,Λk+1)generated by NMM_v3 satisfies
or, equivalently
which indicates, for sufficient small >0, that the algorithm should be stopped when
As the previous section, we make the following assumption to ensure the algorithm converge globally.
Assumption 4.1.The{(Zk+1,Jk+1)} is the exact solution of the resulting convex minimization(4.2)We can clearly see that the inner iterations with variablesandand the out iterations withEandZare both classic ADMM. Hence, the convergence of this type of methods is available in this literature. Let
It is similarly to prove that the sequence{Yk}generated by Algorithm 4.1 is contraction.
Theorem 4.1.Suppose that Assumptions 3.1 and 4.1 hold. Let the sequence {(Ek+1,Zk+1,Λk+1)} be generated by Algorithm4.1. Then we have
To end this section, we state the desirable convergence theorem without proof,
Theorem 4.2.Suppose that Assumptions 3.1 and 4.1 hold. Let {(Ek,Zk,Λk)} be the sequence generated by Algorithm 4.1 from any initial points. Then every limit of {(Ek,Zk)} is an optimal solution of problem (1.1).
In the section, we present two classes of numerical experiments. In the first class, we test the algorithms with different number of inner loops to verify their efficiency and stability. In the second class,we test against a couple of recent solvers SLAL and LRR to show that the proposed algorithms are very competitive. All experiments are performed with Window 7 operating system and Matlab 7.8 (2009a) running on a Lenovo laptop with Intel Dual-Core CPU at 2.5 GHz and 4G of memory.
In the first class of experiment, we test the proposed algorithms with different number of inner steps on the synthetic data. The used data is created similarly as in the [9,16]. The data sets are constructed from five independent subspaceswith basesgenerated by Ui+1=TUi, 1≤i≤4, where T denotes a random rotation and Uiis a random orthogonal matrix of dimension 100×4. Hence, each subspace has a rank 4 and the data has an ambient dimension of 100. From each subspace,40 data vectors are sampled by usingwhere Qibeing a 4×40 independent and identically distributedN(0,1) matrix. In summary, each sample dataand the whole data matrix is formulated aswith rankr=20. In this test, a fraction (Fr=20%) of the data vectors are grossly corrupted by large noise while others are kept as noiseless. If thei-th column vectoris chosen to be corrupted, its components are generated by adding Gaussian noise with zero mean and standard deviationHence, we haveifi-th column is chosen to be corrupted.
As usual,the dictionaryAis chosen asXin this test,i.e.,A=X. With the given noisy dataX, our goal is to derive the block-diagonal affinity matrixZ?and recover the low-rank matrix by settingX?=AZ?, orX?=X ?E?equivalently. To attain better performance, the values of multiplierμis taken as a nondecreasing sequence 1e?6≤μi ≤1e+10 with the relationshipμi+1=ρμiand settingρ=1.1. Moreover, the weighting parameter is chosen asλ=.1 which always achieves fruitful solutions as proved in experiments’preparing. Both test algorithms start at the zero matrix and terminate when the changes of two consecutive iterations are sufficiently small, i.e.,
wheredenotes the maximum absolute value of components of a matrix;is a tolerance and chosen asin all the following test. To specifically illustrate the performance of each algorithm, we present two comparison results in terms of the number of iterations and running time as the number of inner steps varies from 1 to 10 in Figure 1.
Fig. 1 Comparing the performance of NMM_v1, NMM_v2 and NMM_v3 in sense of the number of iterations (left), and the CPU time required (right) as the number of inner steps varied (the x-axes).
As can be seen from Figure 1, the number of iterations required by algorithms NMM_v2 and NMM_v3 decreases dramatically at the beginning and slightly when the permitted number of inner iterations exceeds 5. It can be also observed that NMM_v3 needs less number of iterations while more computing time than NMM v2. The reason lies in that the newrequires a full Singular Value Decomposition (SVD) which may be the main computation burden in the inner iterative process. Another surprising observation is that the number of iterations required by NMM_v1 keeps invariable despite of the number of inner iterations. And this because the newandare obtained with only one step for the special constraintZ ?J=0.
To further verify the efficiency of the algorithms NMM_v2 and NMM_v3, we test against a couple of solvers LRR and SLAL for performance comparison with different percent of data are grossly corrupted. The Matlab package of LRR is available athttp://sites.google.com/site/guangcanliu/. In running of LRR and SLAL, we set all the parameters as default except forλ=0.1, which is the best choice for the data settings by extensive preparing experiments. The noisy data are created the same as the previous experiment.In this test, the initial points, the stopping criterion, and all the parameters’ values are taken as the same as the previous test. Meanwhile, the quality of restorationX?is measured by means of the recovery errorwhereis the original noiseless data matrix.Moreover, for algorithms NMM_v2 and NMM_v3, we fixed the number of inner as 4 to balance the iterations and the computing time. The numerical results including the number of iterations(Iter), the CPU time required (Time), and the recovery error (Error) are listed in Table 1
Table 1 Comparison results of NMM_v2 and NMM_v3 with LRR and SLAL.
It can be seen from Table 1 that,for all the tested cases,each algorithm obtained comparable recovery errors. It is further observed that, comparing with NMM_v3, NMM_v2 requires more number of iterations but with least CPU seconds. Moreover, both NMM_v2 and NMM_v3 require less number of iterations than LRR, which indicates that more number of inner loops may decrease the whole number of outer iterations. The important observation experimentally verifies that the proposed approaches can accelerate the convergence of LRR. Now we change our attention to the state-of-the-art solver SLAL. We clearly see that SLAL is the fastest among the tested solvers. However, when the number of corrupted samples is relatively small (less than 60-percent), SLAL needs more number of iterations. From the limited performance comparisons,it concludes that our proposed algorithms perform quite well and are competitive with the well-known codes LRR and SLAL.
In this paper, we have proposed, analyzed, and tested three variants of ADMM to solve the matrix-norm and nuclear norm involved non-smooth convex minimization problem. The problem mainly appears in the fields of pattern analysis, signal processing, and data mining, and used to find and exploit the low-dimensional structure of a given high-dimensional noisy data.The earliest solver LRR reformulated the problem into an equivalent model by adding a new variable and a new constraint, and derived the value of each variable alternatively. By using problem (1.1) as an example, this paper showed that if one variable is obtained, the other two variables should be grouped together and then minimized alternatively by standard ADMM.For variants of NMM_v2 and NMM_v3, we numerically illustrated that along with the advance of number of inner steps, both algorithms converge faster and faster in terms of outer iterations.
There is no doubt that when the inner process goes only once without achieving convergence,all the proposed methods reduce to LRR. Surely, this is the main theoretical contribution of this paper. Unfortunately, the number of iterations generated by NMM_v1 keeps unchanged whatever the number of the inner steps. We think that the exact solutions onandhave been produced even the inner loops goes only once. Moreover, we have done performance comparisons with a couple of solvers LRR and SLAL in recent literature. The results showed that our proposed both NMM_v2 and NMM_v3 require less number of iterations to obtain similar quality reconstructions. To conclude our paper, we also hope that our method and its further modifications could produce even applications for problems in relevant areas of pattern analysis, signal processing, data mining and others.
Acknowledgements
We are grateful to the reviewers for their valuable suggestions and comments.
Chinese Quarterly Journal of Mathematics2021年1期