Batch image alignment via subspace recovery based on alternative sparsity pursuit

2017-09-15 08:59:51XianhuiLinZhuLiangYuZhenghuiGuJunZhangandZhaoquanCai

Computational Visual Media 2017年3期

Xianhui Lin,Zhu Liang Yu,Zhenghui Gu,Jun Zhang,and Zhaoquan Cai()

c○The Author(s)2017.This article is published with open access at Springerlink.com

Xianhui Lin1,Zhu Liang Yu1,Zhenghui Gu1,Jun Zhang2,and Zhaoquan Cai3()

c○The Author(s)2017.This article is published with open access at Springerlink.com

The problem ofrobust alignment ofbatches of images can be formulated as a low-rank matrix optimization problem,relying on the similarity of well-aligned images. Going further,observing that the images to be aligned are sampled from a union of low-rank subspaces,we propose a new method based on subspace recovery techniques to provide more robust and accurate alignment.The proposed method seeks a set of domain transformations which are applied to the unaligned images so that the resulting images are made as similar as possible.The resulting optimization problem can be linearized as a series of convex optimization problems which can be solved by alternative sparsity pursuit techniques.Compared to existing methods like robust alignment by sparse and low-rank models,the proposed method can more eff ectively solve the batch image alignment problem, and extract more similar structures from the misaligned images.

image alignment;subspace recovery; sparse representation; convex optimization;image similarity

1 Introduction

With the rapid development of the Internet technologies,a huge amount of visual data canbe found online.These increasing data have the potential for information mining,but also raises some tough issues for preprocessing.In many image data sets,misalignment of images is a common problem in many computer vision and machine learning applications. To deal with this batch image alignment problem,one possible solution is to seek a group of transformations to adjust the unaligned images according to similarity or other measures[1,2].A problem is that such methods are not robust enough to handle corruption or illumination variation which often occur in realworld applications.

Clearly, if one finds a group of optimal transformations and applies them to the unaligned images,the resulting aligned images will be very similar.If these images are vectorized and arranged as columns of a matrix,the constructed matrix will ideally be of low column rank.Since partial corruption or occlusion will affect the low-rank property,a method called robust alignment by sparse and low-rank decomposition(RASL)[3]was proposed to handle these issues,based on low-rank models. These have recently shown strength in many fields such as signal recovery and dimension reduction[4]. The core of a low-rank model is that high-dimensional data,such as images and video sequences,are drawn from low-dimensional structures which lie in low-rank subspaces[5].This idea is applied to batch image alignment by treating the images as samples from the low-rank subspaces.

Since linear subspaces are embedded in a highdimensional space[6],it is possible to seek the underlying structures for a batch of images by subspace recovery. However,in practice,highdimensionaldata are seldom drawn from a single lowrank subspace–it is more reasonable to expect thathigh-dimensional data are drawn from several lowrank subspaces rather than just from one.Based on this idea,we propose a method that considers the unaligned images to be lying in a union of lowrank subspaces. Specifically,each aligned image is sampled from one of the union of subspaces, and can be represented as a linear combination of other images in the same subspace[7]. Further consideration ofthe sparse modelin linear subspaces and high-dimensionaldata analysis[6,8],leads us to modelthe subspace recovery problem using a sparse representation.

In summary,in this paper,we propose a new method for batch image alignment based on seeking a set of optimal transformations via a subspace recovery technique.The proposed method is formulated as an optimization problem which can be approximately solved by linearization and alternative sparsity pursuit. After obtaining the optimal solution,we can recover the underlying structures of a batch of images to deal with misalignment,and remove partial corruption and occlusion.

2 Problem formulation

In this section,we formulate the problem of batch image alignment by modeling unaligned images and sparse errors. The aim is to search for a set of transformations and to recover the low-dimensional structures embedded in high-dimensionalspace.

2.1 Unaligned image model

Given a set of unaligned images I1,...,Inof the same object,we assume that they can be transformed to similar images which are well-aligned by a set of domain transformationsτ1,...,τn.Stacking the transformed images as vectors,we can construct the matrix:

where I0i=Ii?τifor i=1,...,n is a well-aligned version of image i and the operator?denotes the transformation applied to produce it.Pixel(x,y)of the transformed image I0iis given by

Since the aligned images are similar, they can be treated as samples from a union of low-dimensional subspaces.Assuming a suffi cient sampling density,each image can be represented as a linear combination of the other images from the same subspace[9].As shown in Fig.1,compared with the dimension of the entire union of subspaces (i.e.,severalsubspaces of a high-dimensionalspace), the dimension of a single subspace is so small that the representation ofeach image is sparse[10].Thus, we could model that:

where W ∈Rn×nis a sparse coeffi cient matrix and A∈Rm×nis a self-represented matrix.We may then formulate the batch image alignment problem as

where‖·‖0represents the?0-norm which counts the number of nonzero entries of the matrix W.

2.2 Sparse error model

In general,partial corruption and occlusion may exist which will disrupt the low-dimensional subspaces. Since such errors usually occur in a small region of an image and have arbitrarily large magnitudes(especially for face images),these errors can be modelled as sparse errors[11]. In order to separate them from the well-aligned images,we modify Eq.(4)to

where E∈Rm×nis the sparse error matrix.

Our objective is to reconstruct A distributed over a union of low-rank subspaces and to handle the influence of sparse errors.

3 Solution via iterative linearization and alternative sparsity pursuit

In this section,we exploit an iterative scheme[3] to obtain a practical solution to the batch image alignment minimisation problem in Eq.(5).

3.1 Convex relaxation

The optimization problem in Eq.(5)is nonconvex and NP-hard because of the?0-norm.Fortunately, sparse representation and compressed sensing theory shows that it can be approximately solved by replacing the?0-norm by the?1-norm[4,12,13]. Doing so,Eq.(5)becomes:

where‖·‖F(xiàn)represents the Frobenius norm andεis the tolerable noise level.

3.2 Problem linearization

The nonlinearity of the constraint D?τ=A+ E makes the solution of Eq.(6)intractable. In practice,we assume that the change produced by τis small enough that we can linearize the current estimateτto approximate the constraint. Each transformationτi(an affi ne transformation,etc.) can be represented by a vector of p parameters[14], yieldingτ=[τ1|···|τn]∈ Rp×n. Specifically,if initial transformationsτare known,we can changedenotes the Jacobian of the i th image with regard to the transformation τi,andωidenotes the standard basis for Rn.This allows the problem to be relaxed to the following convex optimization problem in which we seek the optimal A,W,E,Δτ:

In order to obtain the approximate solution to Eq.(6),we repeatedly linearize about the current estimate ofτand solve a series of optimization problems using Eq.(8).In other words,we seek a small change inτin each iteration,to gradually approximate the correct transformations. In this way,we can obtain approximate transformations[15, 16]to recover the underlying subspaces. The detailed iterative linearization procedure to solve the batch image alignment problem is summarized in Algorithm 1.Iteration stops when the diff erence between the current objective function and the previous one meets a predefined stopping criterion.

3.3 Solution for inner loop by alternative sparsity pursuit

In the linearized image alignment problem,a key step is to find the solution to the convex optimization subproblem in Eq.(8)in Step 3,the inner loop of Algorithm 1. The recently developed alternating direction method(ADM)and linearized alternating direction method(LADM)can be applied to solve such problems quickly and effectively[17,18].Before using the ADMand LADM,the augmented Lagrange multiplier(ALM)method[19]is applied to the original problem.Firstly,we define:

Then the augmented Lagrangian function for Eq.(8)is

where Y is a Lagrange multiplier matrix,〈·,·〉denotes the inner product operation,andμis a positive penalty parameter.

In the ADM method,the unknowns in the augmented Lagrangian function are iteratively minimized one by one: in other words,the sparsity of W and E are pursued alternatively until convergence[7].In this case,the iterations are given

Hence,the solution to Ak+1after one iteration is given by

Secondly,when updating W and E,considering the constraints in Eq.(8)that A=D?τ+the augmented

Lagrangian function can be rewritten as

where?W=I?W.By linearizing the quadratic terms in Eqs.(15)and(16),we can obtain the approximate solutions for W and E as

whereη1≥ ‖A‖22andη2≥ ‖?W‖22guarantee the solution generated by LADM converges to a KKT (Karush–Kuhn–Tucker)point of Eq.(8)[20].Γα(·) is a soft-thresholding operator defined as

where sgn(·)represents the sign function.When Γα(x)operates on a matrix,it acts element-wise.

Finally,the solution toΔτin Eq.(11)is easily obtained as

where J?idenotes the Moore–Penrose pseudoinverse ofThe Lagrange multiplier matrix Y and penalty parameterμare updated following Eq.(12).The complete procedure for the inner loop of Algorithm 1 using alternative sparsity pursuit is summarized in Algorithm 2.

4 Experimental results and discussion

In this section,we verify the proposed method on severaldata sets,including face images,handwritten digits,and video sequences. In all experiments, we select the target regions from unaligned images manually,or by using object detectors(such as a face detector).These target regions are preprocessed to a uniform size,and used as the original unaligned images forming the input to our algorithm.

4.1 Robustness to sparse errors

In an experiment on images of a dummy head which contains sparse errors including corruption and occlusion[3],the correctness and robustness of the proposed method are illustrated in Fig.2.In this experiment,the input images are the target regions to align.After alignment,we achieve well-aligned images,and reconstruct the underlying structures shown in Figs.2(b)and 2(c).The average of the original,the aligned and the reconstructed imagesare shown in Fig.2(e).These results demonstrate that the set oftransformations found can successfully dealwith misalignments.Moreover,the sparse errors can be separated by recovering the underlying structures from the union of subspaces. In this experiment,since we do not know which subspace each image belongs to,we cannot arrange the images from the same subspaces together in the data matrix.This leads to a different structure for the estimated coeffi cient matrix ?W in this experiment,shown in Fig.3,and the ideal structure in Fig.1.However,each column of this coeffi cient matrix still has many very small elements which reveals the sparsity of the self-representation of the reconstructed images,and supports the reasoning behind the proposed model.

4.2 Face image alignment

To further verify the effi cacy ofthe proposed method, we carried out an experiment on more challenging natural face images from the Labeled Faces in the Wild(LFW)database[21].These are real-world face images with uncontrolled misalignments,under varying illumination.About 35 face images of each person were used in the experiment.The unaligned face regions were used as input images.As shown in Fig.4,clearer average faces were obtained afteralignment with the proposed method.

Algorithm 2 Alternative sparsity pursuit(inner loop) Input:A0∈Rm×n,W0∈Rn×n,E0∈Rm×n, Δτ0∈Rp×n,Ji∈Rm×p,for i=1,...,n,λ,ρ,η1,η2while not converged do Step 1:update A: Ak+1=D?τ+μkYk; Step 2:update W: W k+1=Γ1 n P JiΔτkωiωTi?Ek?1i=1 ?k+1(Ak+1?Wk?Yk/μk)? W k+ATη1; W iik+1=0; μkη1 Step 3:update E: ?Wk+1=I?Wk+1; E k+1=Γλ? ? E k+(Ak+1?Wk+1?Yk/μk)?WTk+1η2; Step 4:updateΔτ: Δτk+1= μkη2 n P ?A′k+1+Ek+1?D?τ+Yk/μk?ωωTi; i=1 J?i Step 5:update Y andμ: Yk+1=Yk+μkf(Ak+1,Wk+1,Ek+1,Δτk+1); μk+1=ρμk. end while Output:solution?A,?E,?W,Δ?τto Eq.(8).

Fig.2 Representative result with a batch of images of a dummy head.(a)Original unaligned images with partial corruption and occlusion,D. (b)Images aligned by a set of transformations generated by the proposed algorithm,D?τ. (c)Reconstructed underlying structures A,aligned images adjusted for sparse errors. (d)Sparse errors E removed from D?τ:E=D?τ?A.(e)Averages of all components in D,D?τ,and A respectively,showing the accuracy of the proposed method.

Fig.3 Estimated coeffi cient matrix ?W in the dummy head experiment.

Since we have no ground truth for these images, we evaluate the experimentalresults according to the similarity of the images:after alignment,the images should be more similar.We thus measure image similarity using peak signal to noise ratio(PSNR) and structural similarity index(SSIM)[22,23]. The mean PSNR and SSIM values for images of each subject are shown in Fig.4,while the mean PSNR and SSIM values for all subjects in the LFW database are given in Tables 1 and 2.They show that the proposed method can reconstruct more similar and general structures from the high-dimensional data than the RASL method[3].These results show the strength ofthe proposed method for batch image alignment.

We also validated the proposed method using real face images from a video sequence.The video sequence consists of140 frames of AlGore talking[3]. Selecting one from every 7 frames,20 sampled images from the video and their results after alignment are shown in Fig.5.In Figs.5(b)and 5(c),the proposed method successfully aligns the speaker. The estimated coeffi cient matrix of this experiment is shown in Fig.5(d);it is similar to the ideal one in Fig.1. This result shows that the proposed method works well with video sequence data.Since adjacent frames from a video sequence are quitecorrelated,they are drawn from the same subspace with high probability.In contrast,if a frame is far from the current one,the structure of its subspace may differ.This estimated coeffi cient matrix further demonstrates the rationality of using a model based on a union of subspaces in this task.The results in Tables 3 and 4 show that the proposed method is better at processing video sequences than RASL.

Table 1 Mean PSNRs on LWF (Unit:dB)

Table 2 Mean SSIMs on LFW

We can conclude that the proposed method outperforms RASL based on the results ofthe above experiments.The RASL method models data using robust principal component analysis(RPCA)[4], which assumes that data are drawn from a single subspace[5].Unlike RASL,the proposed method reconstructs data from a union of subspaces,which enables it to describe the structure of the data more accurately,leading to better results.

4.3 Handwritten digit image alignment

A further kind of data set was used to verify the proposed method. It comprises handwritten digits,which are widely used in machine learning algorithms[24].The images of handwritten digits were taken from the MNIST database[24].We experimented on 100 images of the digit“3”.Results achieved by the proposed method and RASL are shown in Fig.6 and Tables 5 and 6.These results again allow us to conclude that the proposed method leads to better image alignment results.

Table 3 Mean PSNRs on video example(Unit:dB)

Table 4 Mean SSIMs on video example

Table 5 Mean PSNRs for handwritten digits (Unit:dB)

Table 6 Mean SSIMs on handwritten digits

Fig.4 Experimental results using real face images.(a)Averages of original unaligned face images of 20 persons from LFW.(b)Mean PSNR of original images and images aligned by two methods,for each person.(c)Average faces using aligned images for each person.(d)Mean PSNR of original images and images reconstructed by two methods,for each person.(e)Average faces using reconstructed images for each person.(f)Mean SSIM of original images and images aligned by two methods,for each person.

Fig.5 Experimental results for real face images from a video sequence.(a)20 original frames of unaligned faces selected from a 140-frame video.(b)Frames after alignment.(c)Reconstructed underlying structures.(d)Coeffi cient matrix.

Fig.6 Experimental results for handwritten“3”digits.(a)100 original images from MNIST.(b)and(d)were generated by RASL,(c)and (e)by our proposed method.The red circles mark some obvious diff erences between two method,which support that conclusion that our proposed method is more accurate.

5 Conclusions

In this paper,a new method for batch image alignment has been proposed which can handle sparse errors.Several experiments have verified the robustness of the proposed method,as well as its effectivity and superiority.Compared to existing methods,the proposed method is better at extracting the generalunderlying structures from high-dimensionaldata with misalignment and sparse errors.It could readily be extended to deal with 3D structures or much higher-dimensionaldata;this will be studied in our further work.

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Grant Nos. 61573150,61573152,61370185,61403085, and 51275094),and Guangzhou Project Nos. 201604016113 and 201604046018.

[1]Frey, B.J.; Jojic, N.Transformation-invariant clustering using the EMalgorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence Vol.25, No.1,1–17,2003.

[2]Pluim,J.P.W.;Maintz,J.B.A.;Viergever,M. A.Mutual-information-based registration of medical images:A survey.IEEE Transactions on Medical Imaging Vol.22,No.8,986–1004,2003.

[3]Peng,Y.;Ganesh,A.;Wright,J.;Xu,W.;Ma, Y.RASL:Robust alignment by sparse and lowrank decomposition for linearly correlated images. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol.34,No.11,2233–2246,2012.

[4]Cand`es,E.J.;Li,X.;Ma,Y.;Wright,J.Robust principal component analysis?Journal of the ACM Vol.58,No.3,Article No.11,2011.

[5]Liu,G.;Lin,Z.;Yan,S.;Sun,J.;Yu,Y.;Ma, Y.Robust recovery of subspace structures by lowrank representation.IEEE Transactions on Pattern Analysis and Machine Intelligence Vol.35,No.1,171–184,2013.

[6]Elhamifar,E.;Vidal,R.Sparse subspace clustering. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2790–2797,2009.

[7]Bian,X.;Krim,H.BI-sparsity pursuit for robust subspace recovery.In: Proceedings of the IEEE International Conference on Image Processing,3535–3539,2015.

[8]Rubinstein,R.; Faktor,T.;Elad,M.K-SVD dictionary-learning for the analysis sparse model.In: Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing,5405–5408, 2012.

[9]Bian,X.;Krim,H.Robust subspace recovery via bi-sparsity pursuit.arXiv preprint arXiv:1403.8067, 2014.

[10]Elad, M.Sparse and redundant representation modeling—What next? IEEE Signal Processing Letters Vol.19,No.12,922–928,2012.

[11]Wright,J.;Yang,A.Y.;Ganesh,A.;Sastry,S.S.;Ma, Y.Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol.31,No.2,210–227,2009.

[12]Cand`es,E.J.;Romberg,J.K.;Tao,T.Stable signal recovery from incomplete and inaccurate measurements.Communications on Pure and Applied Mathematics Vol.59,No.8,1207–1223,2006.

[13]Donoho,D.L.For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution.Communications on Pure and Applied Mathematics Vol.59,No.6,797–829,2006.

[14]Ma,Y.;Soatto,S.;Kosecka,J.;Sastry,S.S.An Invitation to 3-D Vision:From Images to Geometric Models,Volume 26.Springer Science&Business Media,2012.

[15]Baker,S.;Matthews,I.Lucas–Kanade 20 years on:A unifying framework.International Journal of Computer Vision Vol.56,No.3,221–255,2004.

[16]Vedaldi,A.;Guidi,G.;Soatto,S.Joint data alignment up to(lossy)transformations.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,1–8,2008.

[17]Boyd,S.;Parikh,N.;Chu,E.;Peleato,B.;Eckstein, J.Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R○ in Machine Learning Vol. 3,No.1,1–122,2011.

[18]Liu,G.;Lin,Z.;Yu,Y.Robust subspace segmentation by low-rank representation.In:Proceedings of the 27th International Conference on Machine Learning, 663–670,2010.

[19]Rockafellar,R.T.Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM Journal on Control Vol.12,No.2,268–285, 1974.

[20]Lin,Z.;Liu,R.;Su,Z.Linearized alternating direction method with adaptive penalty for low-rank representation.In:Proceedings of the Advances in Neural Information Processing Systems 24,612–620, 2011.

[21]Huang,G.B.;Ramesh,M.;Berg,T.;Learned-Miller, E.Labeled faces in the wild:A database for studying face recognition in unconstrained environments. Technical Report 07-49,University of Massachusetts, Amherst,2007.

[22]Hore,A.;Ziou,D.Image quality metrics:PSNR vs.SSIM.In:Proceedings of the 20th International Conference on Pattern Recognition,2366–2369,2010.

[23]Wang,Z.;Bovik,A.C.;Sheikh,H.R.;Simoncelli, E.P.Image quality assessment:From error visibility to structural similarity.IEEE Transactions on Image Processing Vol.13,No.4,600–612,2004.

[24]LeCun,Y.;Cortes,C.;Burges,C.J.C.The MNIST database of handwritten digits.2010.Available at http://yann.lecun.com/exdb/mnist.

Zhu Liang Yu received his B.S.E.E. and M.S.E.E.degrees,both in electronic engineering,from Nanjing University of Aeronautics and Astronautics, China,in 1995 and 1998,respectively, and his Ph.D.degree from Nanyang Technological University,Singapore,in 2006.He joined the Center for Signal Processing,Nanyang Technological University,in 2000 as a research engineer,then had been a group leader from 2001.In 2008,he joined the College of Automation Science and Engineering,South China University of Technology and was promoted to full professor in 2010.His research interests include signal processing,pattern recognition, machine learning and their applications in communications, biomedical engineering,robotics,etc.

Zhenghui Gu received her Ph.D. degree from Nanyang Technological University in 2003.From 2002 to 2008, she was with the Institute for Infocomm Research,Singapore. She joined the College of Automation Science and Engineering,South China University of Technology,in 2009 as an associate professor. She was promoted to full professor in 2015. Her research interests include signalprocessing and pattern recognition.

Jun Zhang received his bachelor and master degrees in computer science from Xiangtan University,Xiangtan,China, in 2002 and 2005,respectively,and his doctor degree in pattern recognition and intelligent systems from South China University of Technology,China,in 2012.He was a postdoctoral research fellow in electronic engineering with the University of South California,Los Angeles,USA,from 2015 to 2016. He is currently an associate professor with the School of Information Engineering,Guangdong University of Technology,Guangzhou,China,where he has been the head ofthe Electrical Engineering Department since March 2016. His current research interests include compressive sensing and biomedical signal processing.

Zhaoquan Cai was born in 1970. He received his bachelor degree from South China University of Technology, Guangzhou,and master degree from Huazhong University of Science and Technology,Wuhan,China.He is now a professor in the School of Computer Science,Huizhou University,and also a member of CCF.His research interests include computer networks,intelligent computing,and databases.

Open Access The articles published in this journal are distributed under the terms of the Creative Commons Attribution 4.0 International License(http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use,distribution,and reproduction in any medium,provided you give appropriate credit to the original author(s)and the source,provide a link to the Creative Commons license,and indicate if changes were made.

Other papers from this open access journalare available free of charge from http://www.springer.com/journal/41095. To submit a manuscript,please go to https://www. editorialmanager.com/cvmj.

his bachelor degree in automation from South China Agricultural University, Guangzhou, China,in 2014.He is now a master candidate supervised by Prof.Zhu Liang Yu in the College of Automation Science and Engineering,South China University of Technology.His research interests include machine learning and computer vision.

1 College of Automation Science and Engineering,South China University of Technology,Guangzhou,China.E-mail:X.Lin,xhlin129@163.com;Z.L.Yu,zlyu@scut. edu.cn;Z.Gu,zhgu@scut.edu.cn.

2 School of Information Engineering, Guangdong University of Technology,Guangzhou,China.E-mail: zhangjun7907@hotmail.com.

3 School of Computer Science,Huizhou University, Huizhou,China.E-mail:caizhaoquan@139.com().

Manuscript

2016-12-31;accepted:2017-02-22

Computational Visual Media2017年3期

Computational Visual Media的其它文章: Practical automatic background substitution for live video; Adaptive stray-light compensation in dynamic multi-projection mapping; Robust tracking-by-detection using a selection and completion mechanism; Optimized evacuation route based on crowd simulation; Joint head pose and facial landmark regression from depth images; Variational reconstruction using subdivision surfaces with continuous sharpness control