• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      Attention-Aware Network with Latent Semantic Analysis for Clothing Invariant Gait Recognition

      2019-11-25 10:22:24HefeiLingJiaWuPingLiandJialieShen
      Computers Materials&Continua 2019年9期

      Hefei LingJia WuPing Li and Jialie Shen

      Abstract:Gait recognition is a complicated task due to the existence of co-factors like carrying conditions,clothing,viewpoints,and surfaces which change the appearance of gait more or less.Among those co-factors,clothing analysis is the most challenging one in the area.Conventional methods which are proposed for clothing invariant gait recognition show the body parts and the underlying relationships from them are important for gait recognition.Fortunately,attention mechanism shows dramatic performance for highlighting discriminative regions.Meanwhile,latent semantic analysis is known for the ability of capturing latent semantic variables to represent the underlying attributes and capturing the relationships from the rawinput.Thus,we propose a new CNN-based method which leverages advantage of the latent semantic analysis and attention mechanism.Based on discriminative features extracted using attention and the latent semantic analysis module respectively,multi-modal fusion method is proposed to fuse those features for its high fault tolerance in the decision level.Experiments on the most challenging clothing variation dataset:OU-ISIR TEADM ILL dataset B show that our method outperforms other state- of -art gait approaches.

      Keywords:Gait recognition,latent semantic analysis,attention mechanism,attention-aware neural network,clothing-invariant,feature fusion.

      1 Introduction

      In recent years,how to develop intelligent algorithm for modeling biometric traits plays more and more important roles in human identification.Most of the static traits such as fingerprint and iris have been used in r eality.But these traits are limited by distance and the interaction with subjects [Bouchrika,Carter and Nixon (2016)].Comparing with these biometric features,gait is an important coarse feature about motion so that gait recognition is robust to low resolution.It can be captured from long distance scenarios without the cooperation of subjects.And at the same time,the amount of cameras installed in public places is explosive increasing which make gait recognition possible for crime surveillance and prevention.

      However,there are still many challenges for applying gait recognition in the real life.Robust and discriminative features are important for the task of human identification because of theexistence of covariates(e.g.,carrying condition,cameraviewpoint,clothing,thevariation of walking speed,walking surfaceand so on).Frommost of appearance-based gait recognitionmethods[Wu,Huang,Wang et al.(2016)],the variation of clothing and carrying condition affects the performance of gait recognition drastically.These co-factors take the same problems to clothing invariantgait recognition,they change the appearance of subjectsgreatly.So,it becomesahotspot for researchers.

      In order to tackle the problem of the variation of appearance caused by clothing variation.There are a wide range of methods proposed in recent years(for recent review[Lee,Belkhatir and Sanei(2014)]),most of conventional approaches use hand-crafted features to represent the clothing-invariant human gait.For example,Shariful et al.[Shariful,Islam,Akter et al.(2014)]proposed amethod called random window subspace(RWSM)to split rawinput into smallwindow chunks to get the gaitsegmentation and contribution of each body part for clothing-invariant gait recognition.Guan et al.[Guan,Liand Hu(2012)]proposed a random subspacemethod(RSM)based on computing a fullhypothesis space,the method randomly chooses subspaces for classification.And Hossain et al.[Hossain,Makihara,Wang et al.(2010)]proposed a part-based gait identification in the light of substantial clothing variations,which exploits the discrimination capability as a matchingweight foreach part and controls theweightsadaptively based on the distribution of distances between the probe and all the galleries.Rokanujjaman etal.[Rokanujjaman,Islam,Hossain etal.(2015)]proposed an effective parts definition approach based on the contribution of each row when itmerges orderly from bottom to top.It shows that some rowshave positive effects and some rows have negative effects for gait recognition.Based on the positive and negative bias,they defined threemost effective body parts and two redundantbody parts.Discarding two redundantpartsand considering only three effective body partsimprove theper for mance of gaitrecognitioneffectively.Actually,thepipeline of most of the conventional methods for clothing invariantgait recognition isalways dividing the body into components firstly,and learns the weights of the e features from different components.But the per for mance of these methods are unsatisfied because of the inevitable errors in extracting local featuresby traditional methods.While,they show the importance of local information and the relationship among them.

      Besides those conventional approaches,the deep learning approach[Yeoh,Aguirre and Tanaka(2017)]automatically learns clothing-invariantgait featuresdirectly from raw data.Convolutional neural networks make strong and mostly correct assumptions about the nature of images(namely,stationarity of statistics and locality of pixel dependencies),so they givegreatperformance inobjectrecognition and areapplied inmany fields.Zhouetal.[Zhou,Liang,Lietal.(2018)]use deep learning method in road traffic sign recognition.It is obvious that the CNN-based approachesoutperform those conventional methods in many aspects.The CNN-basedmethodsare easier to capture the features from rawinput.At the same time,from the aforementioned conventional methods,the latent attributes and local features from limbs are importantin the field of clothing invariantgait recognition.To take advantage of the CNN-based methods and make use of the advantages from conventional methods,a more effective method based on convolutional neural network is urgent to proposed.

      Attention network[Zhao,Wu,Feng etal.(2017)]and latentsemantic features[Liand Guo(2014)]play important roles in the field of computer version.Attention network learns to paymore attention in important local parts of images.And latentsemantic analysis(LSA)is known for the ability of capturing latent semantic features.Many recent studies show satisfying results than previous classification network[Krizhevsky,Sutskever and Hinton(2012)]by applying attention mechanism and LSA.They perform well in a variety of applications such as scene classification[Liand Guo(2014)],natural language processing[Fei,Cai-Hong,Wang etal.(2015)]and so on.

      Inspired by the excellentper for mance of attentionmechanism and latentsemantic analysis,we employ latent semantic features to help analyze the contribution for different parts of images and get the latent relationships among features and classification results.And attention-aware network captures more discriminative features which highlight the important regions from subjects.In this paper,we combine the advantages of attention mechanism and LSA respectively,and design a new CNN-based method to address the problem of clothing invariantgait recognition.

      We summarize the contribution of our work as following:

      Firstly,we propose a specific CNN-based method for clothing-invariant gait recognition.The method automatically learns to combine features extracted from low-level input and latent semantic features from middle-level features which get a good representation for clothing invariantgait recognition.

      Secondly,we evaluate our method on the most challenging clothing variant dataset:OU-ISIR Treadmill B dataset which includes the different clothing conditions,and it achievesbetter performance than other sate- of -artmethods.

      In the remainder,wedetailour paperas following:related work about attention mechanism,latent semantic analysis and gait recognition are introduced in Section 2.A fter Section 2,how do CNNs,latentse manticanalys is and attention combine and work are demonstrated in Section 3.Then experimental resultsare shown in Section 4.Finally,wegivea conclusion in Section 5.

      2 Related work

      Approaches to gait recognition can be classified into two categories,one is model-based[Shariful,Islam,Akter et al.(2014);Guan,Li and Hu(2012);Shen,Pang,Tao et al.(2010)]and the other is model-free methods[Wu,Huang,Wangetal.(2016)].Model-based methods are always conventional methods considered to bemade up of statics from shape of human bodies and the components that can reflect the dynamic features of a cycle of gait.It is majoring in modeling the structure of human body.The other method extracts gait feature from the rawinput with outconsidering the structure of subjects,it focuson the the shape of the silhouette rather than fitting it to a chosen model.Our method combines the structure of human body with model-free method so itcan remedy the dependencies of model-free approaches on clothing variation by attention mechanism and latent semantic analysis.

      Attention mechanism[Wang,Jiang,Qian et al.(2017)]is designed to highlight discriminative features for various kindsof tasks including images classification[Cao,Liu,Yang et al.(2016)],semantic segmentation[Chen,Yi,Jiang et al.(2016)],image question answering[Yang,He,Gao etal.(2016)],image captioning[Mnih,Heess,Graves et al.(2014)]and so on.Attention mechanism is effective in understanding images,since it adaptively focuses on related regions of the e images when the deep networks are trained with spatially-related labels for capturing the underlying relations of labels and provides spatial regularization for the the results.In some extent,attention mechanism is similar to the conventional methods for clothing-invariantgait recognition butattention mechanism highlights the salient features automatically.Except the attention mechanism for gait recognition,there isadramatic method to extractun derlying attributes among those subjects.LSA learns latent features for gait recognition,which are important features and compensate the spatial features from attention-aware network.

      LSA is a topic-model technique in neural language processing for improving information retrieval,it is first introduced by Deerwester et al.in 1988[Deerwester(1988)]and further improved in 1990[Deerwester(2010)].Recently,the idea of latent semantic representation learning hasbeen used in computer vision community.Zhiwu Lu proposed a novel latent semantic learningmethod for extracting high-level latent semantics from a large vocabulary of abundantmid-level features[Lu and Peng(2013)]for human action recognition.Bergamo etal.[Bergamo,Torresaniand Fitzgibbon(2011)]applied a compact code learningmethod for object categorization,which uses a set of latentbinary indicator variables as the intermediate representation of images.In the field of image retrievaland objectdetection,latentsemantic learning can also be used to extracthigh-level features for latentsemantic.It isobvious that features learned from latentsemantic analysisextracting latent features not given before,and combining the features from improved CNN-based model with attentionmechanism and latentsemantic an alysiscan improve the performance of our task:clothing invariantgait recognition.

      3 Methodology

      We propose a convolutional neural network for clothing invariant gait recognition,which utilizesattentio nmodel foradaptiveweights of different parts and latentsemantic analysis for learning latent semantic features.The framework of our latent-attention compositional network(LACN)is illustrated in Fig.1.The input data of our method is gait energy image(GEI)[Man and Bhanu(2005)],it is the average silhouette over one walking cycle of gait.And GEI is the most common input data for whether traditionalmethods orCNN-based methods.The samples and corresponding GEIs from dataset of different clothing combination are illustrated in Fig.2.LACN consistsoftwomain components:one combines theattentionmechanism with latentsemantic analysis formulti-level feature extracting,the other ismulti-modal fusion which fuses the features from different feature extractingmodules.

      Figure1:The pipeline of our network is illustrated in the figure.The base network is the same as the CNN-basedmethod[Yeoh,Aguirre and Tanaka(2017)],which is composed of there convolutional layers,the kernel size are 7× 7,5× 5 and 3× 3 respectively.After capturing the featuremaps,the attentionmodule learns a softmask and gets new features from the base network.In the latent semanticmodule,we divide the features from base network into fixed number of components and get latent variables for the corresponding components.Then,calculate the relationship with the final gait labels.Finally,we fuse the features from the twomodules using convolutional layer with kernel size 1× 1 to get discrim inative and robust features

      Figure2:Samples of images from differentkinds of clothing variations of OU-ISIR dataset B and the corresponding GEIs[Makihara,Mannami and Tsuji(2012)]

      The attentionmodel pays attention to high-level representation for the whole input data.It is constructed by two-branch convolutional neural network.Latent semantic analysis is used for extracting m iddle-level features that are ignored in high level.Finally,the features fusion strategy combines the features from different levels.The details for the two componentsare discussed in next three Subsections(3.1,3.2 and 3.3).

      Motivated by the conventional methods for clothing-invariant gaitreco gnition.Dividing the inputGEIinto small fixed subspaces and getting latentvariables from those subspaces is an effective way to ge tmore discriminative features.As a result,we employ latent semantic analysis called patch-based latentse mantic learning model for latentsemantic features.

      In this module,tlabeledimagesare given,where theXidenotes thei-thimage andYiis the label for the image.We aim to learn a model fromXitoYi,the first step is to divide the input GEI into non-overlapping patches,the patches forms low-level features of inputGEI,the features from these patches are regarded as latentvariablesZji.To predict the results from those latent variables,we take the eachas latenthigh-level visual features,and get the gait label by the summarizing the high-level visual features inferred from their corresponding patches.

      It is obvious that the latentvariables are predicted from the inputGEI.In theory,they can also represent the discriminative high-level features for the target gait labels.From the assumption,we formula the two stages of the e prediction problemsas the following unified optimization over the loss function.

      3.1 Latentsemantic analysis

      Figure3:The procedure of the latentsemantic analysis

      where the functionf(?)is the function thatpredicts thegait labels from the latentvariables from thewhole image,Wdenotes themodelparameters of the e prediction function.Latent variableZjiis computed by the latent featuresextractor,formulated as the Eq.(2).

      The process to extract latentsemantic featuresand capture the final result from those latent variables are demonstrated in Fig.3,the procedure of functionf(?)andg(?)are linear function as Eqs.(3)and(4)respectively.

      From those fixed patches,latentvariables are calculated to the corresponding patches and improve the performance of prediction function at the same time.

      3.2 Attentionmodelforadaptiveweights of features

      Attentionmaps highlight discriminative regions of different parts from human body.The attention network stimulatesselection from featuremapsby a softmask which includes the weights of every dimension of features.As shown in Fig.1,we design an attention-aware structure to capturespecific regions from GEI.Thereare two chunks for theattentionmodel.The one learns a softmask for the featuremaps from the base network which extracts featuresautomatically by the othermain chunk.The softmask highlights the regions from corresponding partand playsa important role for its robust features.

      Featuremaps from themain chunk of inputGEIare defined as Eq.(7).

      whereIis the inputdata(GEI).To the resultbetter than theoriginal featuresX.Then,the second stage refines the attentionmapsAbymodifying all previous prediction,θattis the parameters learnt from the attentionmodules.The attentionmodule consistsoftwo layers(the first layer has512 filters with kernelsize1×1 and the second issigmoid layer).

      The result from attention maps ranges from 0 to 1,it represent show important the original features is.The outputsFof final result are formulaas,

      From the formulation,it is obvious that attention map works as discriminative features selector which selects theoriginal featuresX.Although attention mapsadaptively capture the salient features.So the loss for attention modules is:

      whereLattdenotes the loss function of confidencemaps from attention-aware network,it is crossentropy loss.

      We emphasize that the attention model calculates soft weights for feature maps from subjects,and it allows the gradient of loss function to be back-propagated through.The outputAfrom attention module is actual am ask for the corresponding feature mapFwhich adaptively highlights the importantco mponents of subjects.From Fig.4,th eattention module highlights the limbs and head of subjects,which are discriminative parts in the problem of clothing-invariantgait recognition.

      3.3 Feature fusion and classification

      To fuse the feature from network with attention mechanism and latentse mantic analysis and get better performance from the two modules,we joint the two kinds of features.Here we will introducehow we get the new features and calculate the final result from new features.Features from attention-aware networkfattand latent semantic analysisflatentare multi-modal features.After jointing thefattandflatentby channels,we can get the final featuresffin,and employ a convolutional layer with kernel size 1×1 to gethigher-level featuresfmixfrom the two kinds of features.After the feature extracting,we use the featuresfmixto calculate the similarity of individualsubjectsusing the Euclidean distance.

      whered(Pi,Gi)isa distance between the images from gallery and probe,Nis the size of feature vectors.The smaller the value ofdthe higher possibility of the givenmatching pair and find the corresponding subject with the highes tsimilarity in the gallery.

      Figure4:Example images illustrate that different features have different corresponding attention masks in our network.As we can seen in the figure,the attention chunk highlights the limbs and head of human body which are robust from the changing of appearance caused by clothing variation

      Figure5:Samples of evaluation set,the image in the left(a)is in normal clothing type is used as gallery images,images in the right(b)-(i)are probe set with different clothing combinations

      4 Experiments

      4.1 Database description

      The proposed method is evaluated on OU-ISIR Treadmill dataset B[Makihara,Mannami and Tsuji(2012)].OU-ISIR Treadm illdatasetB isa largegaitdataset forevaluation of gait methods in presence of variations in clothing.It includes 68 subjects with up to 32 types of clothing combinations.Tab.1 shows 15 different types of clothes used in constructing the dataset.Tab.2 shows clothing combinationsbased on the15 different types of clothes.For themostcommon approaches,thesetup for the datasetissplitinto three parts including training set,probe setand gallery set.And there are 446 samples of 20 subjects from all types of clothing combination in training set,48 sequences of 48 different subjects from normal clothing type,the probe set is consist of the rest clothing types of the e 48 subjectsexceptsamples in gallery set.The totalnumber of restclothing types of the ese48 subjects is 856.But this kind of setup is notsuitable for deep learning approaches,one reason is that the clothing type in training setin notcomplete from allkinds of clothing types,theother is that446 sequences for training set isnotenough for the input of deep learning approaches.To capture the discriminative feature from the variant clothing types,32 kinds of clothing typesand enough data for input of deep learning arenecessary for training.So,in ourwork,thewhole datasetare divided into two parts,the one is used to train themodel the other is for evaluation.And the proportion for training and evaluation is 80/20 respectively.The subjects from the two subsets are notoverlapping,and sequences in normal clothing type from all subjects in the evaluation are used for gallery set,probe set are composed of the restdata from evaluation.The samples from gallery and probe setare illustrated in Fig.5.

      Table1:List of clothes used in OU-ISIR treadmill dataset B[Makihara,Mannami and Tsuji(2012)]

      Table2:Different clothing combinations used in the OU-ISIR B dataset[Makihara,Mannami and Tsuji(2012)](Abbreviation:Clothes type ID)

      4.2 Performance evaluation

      1)Performance analysis with clothing variationseffect

      Figure6:Performance of ourmethod and state- of -art CNN-based methods on OU-ISIR Treadmill B datasetunder the32 different clothing combination

      To demonstrate the effectiveness for ourmethod,we conduct experiments on the dataset:OU-ISIRTreadmillB.The results of two kinds of featuresextracting from twomodules and the final featuresare illustrated in Fig.6.From the results,we can observe theexperiments’results,we can observe that there are four level difficulties of clothing combination in the dataset OU-ISIR Treadmill B.In the experiment 1-4(Exp.1-4),the CNN-based[Yeoh,Aguirre and Tanaka(2017)]method is the base network of our proposed method.The performance of attention module and latent semantic analysis module are better than CNN-based method inmost of clothing types.What ismore,our proposedmethod which combines the two modules outperforms the twomodules respectively and it also shows better results than CNN-based method especially in the clothes type 4(regular pants and half shirt)and M(baggy pants).Itproves that the two-level features compensate for each other.

      2)Comparison with state- of -art methods

      In the experiment,we evaluate our method on the test set of dataset,and calculate the averageaccuracy.Compared ourmethod with somestate- of -art methods,Tab.3 summarize the comparison of results with thehand-craft methods[Shariful,Islam,Akteretal.(2014);Guan,Liand Hu(2012)],CNN-based method[Yeoh,Aguirre and Tanaka(2017)]and our method.Itshowsourmethod achievebetter performance than state- of -art methods.

      Table3:List of clothes used in OU-ISIR treadmill dataset B[Makihara,Mannami and Tsuji(2012)]

      5 Conclusion

      In this paper,we combine latent semantic analysis and attention mechanism for clothing-invariant gait recognition to get robust and discriminative features end-to-end.And fuse them for higher-level representation which improves the performance of gait recognition.The proposed method not only makes use of the advantages of CNN-based method which learnshigh-level feature from rawinputdat abutal sohig hlights the important regions from subjects.Local information is emphasized by attention mechanism in our method.At the same time,latent semantic variables play an essential role in ourmethod,the number of latentvariablesare not the more the better,herewe chose 30 variablesafter comparing the performance of the gait recognition.The performance of our method also shows itoutperforms the state- of -artmethods.

      In our futurework,we take additive sequential information into consideration.Although GEI is most popular representation for gait,but it obviously loses spatial and sequential information in some extent.To makeuse of sequential information,the rawinputcan bea cycle of silhouette or rawimages.So the network for extracting sequential information is suitable for clothing-invariant.Attention-based long short term memory network(LSTM)[Greff,Srivastava,Koutnik etal.(2017)])is the nextstep of our futurework.

      Acknowledgement:Thiswork was supported in part by the Natural Science Foundation of China under Grant U1536203,in part by the National key research and development program of China(2016QY01W 0200),in part by the Major Scientific and Technological Project of Hubei Province(2018AAA068).

      香格里拉县| 托克托县| 宣化县| 平利县| 西乌| 东乌珠穆沁旗| 桦甸市| 新宁县| 石楼县| 临邑县| 湖口县| 青海省| 五峰| 平潭县| 乐陵市| 普陀区| 新田县| 墨玉县| 武胜县| 西藏| 利辛县| 广水市| 红安县| 江北区| 康定县| 水富县| 年辖:市辖区| 新乐市| 汨罗市| 伊通| 新巴尔虎右旗| 上饶县| 金门县| 扬州市| 宁城县| 丰宁| 高平市| 建湖县| 长子县| 独山县| 广汉市|