• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Domain-Invariant Similarity Activation Map Contrastive Learning for Retrieval-Based Long-Term Visual Localization

    2022-01-25 12:50:52HanjiangHuHeshengWangZheLiuandWeidongChen
    IEEE/CAA Journal of Automatica Sinica 2022年2期

    Hanjiang Hu,Hesheng Wang,,Zhe Liu,and Weidong Chen,

    Abstract—Visual localization is a crucial component in the application of mobile robot and autonomous driving.Image retrieval is an efficient and effective technique in image-based localization methods.Due to the drastic variability of environmental conditions,e.g.,illumination changes,retrievalbased visual localization is severely affected and becomes a challenging problem.In this work,a general architecture is first formulated probabilistically to extract domain-invariant features through multi-domain image translation.Then,a novel gradientweighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.We also propose a new adaptive triplet loss to boost the contrastive learning of the embedding in a self-supervised manner.The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models with and without Grad-SAM loss.Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset.The strong generalization ability of our approach is verified with the RobotCar dataset using models pre-trained on urban parts of the CMU-Seasons dataset.Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision,especially under challenging environments with illumination variance,vegetation,and night-time images.Moreover,real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.

    I.INTRODUCTION

    VISUAL localization is an essential problem in visual perception for autonomous driving and mobile robots[1]–[3],which is low-cost and efficient compared with global positioning system-based (GPS-based) or light detection and ranging-based (LiDAR-based) localization methods.Image retrieval,i.e.,recognizing the most similar place in the database for each query image [4]–[6],is a convenient and effective technique for image-based localization,which serves place recognition for loop closure and provides initial pose for finer 6-DoF camera pose regression [7],[8] for relocalization in simultaneous localization and mapping (SLAM).

    However,the drastic perceptual changes caused by longterm environmental condition variance,e.g.,changing seasons,illumination,and weather,casts serious challenges on image-based localization in long-term outdoor self-driving scenarios [9].Traditional feature descriptors (SIFT,BRIEF,ORB,BRISK,etc.) can be only used for image matching under scenes without significant appearance changes due to the reliance on image pixels.With convolutional neural networks (CNNs) making remarkable progress in the field of computer vision and autonomous driving [10],learning-based methods have gained significant attention owing to the robustness of deep features against changing environments for place recognition and retrieval [11]–[13].

    Contrastive learning is an important technique for image recognition tasks [14]–[16],also known as deep metric learning,which aims to learn metrics and latent representations with closer distance for similar images.Compared to face recognition,supervised learning for place recognition[13],[17] suffers from difficulty in determining which clip of images should be grouped to the same place in the sequence of continuous images.Moreover,supervised contrastive learning methods for outdoor place recognition [18],[19] need numerous paired samples for model training due to heterogeneously entangled scenes with multiple environmental conditions,which is costly and inefficient.Additionally,considering a feature map with salient areas in the explanation of CNNs for classification task [20]–[22],retrieval-based localization could be addressed through such attentive or contextual information [23],[24].However,these methods have no direct access to the similarity of the extracted feature so they are not appropriate for high-precision localization.

    To address these issues,we first propose an unsupervised and implicitly content-disentangled representation learning through probabilistic modeling to obtain domain-invariant features (DIF) based on multi-domain image translation with feature consistency loss (FCL).For retrieval with high accuracy,a novel gradient-weighted similarity activity mapping (Grad-SAM) loss is introduced inside the training framework inspired by [20]–[22].Furthermore,a novel unsupervised adaptive triplet loss is incorporated in the pipeline to promote the training of FCL or Grad-SAM and the two-stage test pipeline is implemented in a coarse-to-fine manner for the performance compensation and improvement.We further investigate the localization and place recognition performance of the proposed method by conducting extensive experiments on both CMU-Seasons dataset and RobotCar-Seasons dataset.Compared to state-of-the-art image-based baselines,our method presents competitive results in medium and high precision.In the real-site experiment,the proposed two-stage method is validated to be simultaneously timeefficient and effective.An example of image retrieval is shown in Fig.1.Our contributions are summarized as follows:

    1) A domain-invariant feature learning framework is proposed based on multi-domain image-to-image translation architecture with feature consistency loss and is statistically formulated as a probabilistic model of image disentanglement.

    2) A new Grad-SAM loss is proposed inside the framework to leverage the localizing information of feature map for highaccuracy retrieval.

    3) A novel adaptive triplet loss is introduced for FCL or Grad-SAM learning for the self-supervised contrastive learning and gives the effective two-stage retrieval pipeline from coarse to fine.

    4) The effectiveness of the proposed approach is validated on CMU-Seasons dataset and RobotCar-Seasons dataset for visual localization through extensive experimentation.Our results are on par with state-of-the-art baselines of image retrieval-based localization for medium and high precision.Also,the time-efficiency and effectiveness of its applicability is shown through a real-site experiment as well.

    The rest of this paper is organized as follows.Section II presents the related work in place recognition and representation learning for image retrieval.Section III presents the formulation of domain-invariant feature learning model with FCL.Section IV introduces the adaptive triplet loss and the two-stage retrieval pipeline with Grad-SAM loss.Section V shows the experimental results on visual localization benchmark.Finally,in Section VI we draw our conclusions and present some suggestions for the future work.

    II.RELATED WORK

    A.Place Recognition and Localization

    Outdoor visual place recognition has been studied for many years for visual localization in autonomous driving or loop closure detection of SLAM,in which the most similar images are retrieved from key frame database for query images.Traditional feature descriptors have been used in traditional robotic applications [25],[26] and are aggregated for image retrieval and matching [27]–[30],which have successfully addressed most cases of loop closure detection in visual SLAM [31] without significant environmental changes.VLAD [32] is the most successful man-made feature for place recognition and has been extended to different versions.NetVLAD [4] extracts deep features through VLAD-like network architecture.DenseVLAD [6] presents impressive results through extracting multi-scale SIFT descriptor for aggregation under drastic perceptual variance.To reduce the false positive rates of single feature-based methods,sequencebased place recognition [33],[34] is proposed for real-time loop closure for SLAM.

    Fig.1.On the first row,column (a) shows a query image under Overcast +Mixed Foliage condition and column (b) shows the retrieved image under Sunny+No Foliage condition.On the second row,the gradient-weighted similarity activation maps are shown for the above images.The activation map visualizes the salient area on the image which contributes most to the matching and retrieval across the different environments.

    Since convolutional neural networks (CNNs) has successfully addressed many tasks in computer vision [35],long-term visual place recognition and localization have significantly developed assisted along with CNNs [4],[13],[36].Some solutions to the change of appearance are based on image translation [37]–[40],where images are transfered across different domains based on generative adversarial networks (GANs) [41],[42].Poravet al.[43] first translates query images to database domain through CycleGAN [44] and retrieves target images through hand-crafted descriptors.ToDayGAN [45] similarly translates night-images to dayimages and uses DenseVLAD for retrieval.Jenicek and Chum [36] proposes to use U-Net to obtain photometric normalization image and finds deep embedding for retrieval.However,generalization ability is limited by translation-based methods because the accuracy of retrieval on image level largely depends on the quality of the translated image compared to the retrieval with latent-feature.

    Some other recent work follows the pipeline of learning the robust deep representation through neural networks together with semantic [46],[47],geometric [48],[49],context-aware information[23],[24],[50],[51],etc.Although these models can perform the image retrieval in the feature level,the representation features are trained with the aid of auxiliary information which is costly to obtain in most cases.With the least human effort for auxiliary perception information and inspired by classification activation map [20]–[22] in visual explanation of CNN,we introduce the notion of activation map to the representation learning for fine place recognition,of which the necessity and advantages lie in implementing retrieval in the latent feature space with self-supervised attentive information without any human effort or laborious annotations.

    B.Disentanglement Representation

    Latent representation reveals the feature vectors in the latent space which determine the distribution of samples.Therefore,it is essential to find the latent disentangled representation to analyze the attributes of data distribution.A similar application is the latent factor model (LFM) in recommender systems [52]–[54],where the latent factor contributes to the preference of specific users.In the field of style transfer or image translation [37],[55],deep representations of images are modeled according to the variations of data which depend on different factors across domains [56],[57],e.g.,disentangled content and style representation.Supervised approaches [58],[59] learn class-specific representations through labeled data,and many works have appeared to learn disentangled representation in unsupervised manners [60],[61].Recently,fully-and partially-shared representation of latent space have been investigated for unsupervised image-to-image translation [39],[40].Inspired by these methods,where the content code is shared across all the domains but the style code is domain-specific,our domain-invariant representation learning is probabilistically formulated and modeled as an extended and modified version of CycleGAN [44] or ComboGAN [38].

    For the application of representation learning in place recognition under changing environments,where each environmental condition corresponds to one domain style and the images share similar scene content across different environments,it is appropriate to make the assumption of disentangled representation to this problem case.Recent works for condition-invariant deep representation learning [5],[62]–[64] in long-term changing environments mainly rely on variance-removal or other auxiliary information introduced in Section II-A.Reference [17] removes the dimension related to the changing condition through PCA for the deep embeddings of latent space through classification model.Reference [12]separates the condition-invariant representation from VLAD features with GANs across multiple domains.Reference [65]filters the distracting feature maps in the shallow CNNs but matches with deep features in deeper CNNs to improve condition-and viewpoint-invariance [66] using image pairs.Compared to these two-stage or supervised methods,we adopt domain-invariant feature learning methods [63],[64] which possess advantages on direct,low-cost,and efficient learning.

    C.Contrastive Learning

    Contrastive learning,a.k.a.,deep metric learning [14],[67]stems from distance metric learning [68],[69] in machine learning but extracts deep features through deep neural networks,i.e.,learning appropriate embeddings and metrics for effective discrimination between similar sample pairs and different sample pairs.With the help of neural networks,deep metric learning typically utilizes siamese networks [70],[71]or triplet networks [72],[73],which makes the embedding of same category closer than that of different category with triple labeled input samples for face recognition,human reidentification,etc.

    Coming to long-term place recognition and visual localization,many works have recently used supervised learning together with siamese networks and triplet loss [18],[62].To avoid vanishing gradient of small distance from different pairs with triplet loss form [14],[15] proposes another form of triplet loss.Due to the hard-annotated data for supervised learning,Radenovi?et al.[19] proposes to leverage geometry of 3D model from structure-from-motion(SfM) for triplet learning in an automated manner.But SfM is off-line and costly,so it is not possible for end-to-end training.Instead we employ an unsupervised triplet training technique adapted to the DIFL framework [63] so that domain-invariant and scene-specific representation can be trained in an unsupervised and end-to-end way efficiently.

    III.FORMULATION OF DOMAIN-INVARIANT FEATURE LEARNING

    A.Problem Assumptions

    Our approach to long-term visual place localization and recognition is modeled in the setting of multi-domain unsupervised image-to-image translation,where all query and database images are captured from multiple identical sequences across environments.Images in different environmental conditions belong to corresponding domains respectively.Let the total number of domains be denoted asNand two different domains are randomly sampled from{1,...,N}for each translation iteration,e.g.,i,j∈{1,...,N},i≠j.Letxi∈Xiandxj∈Xjrepresent images from these two domains.For the multi-domain image-to-image translation task [38],the goal is to find all conditional distributionsp(xi|xj),?i≠j,i,j∈{1,...,N}with known marginal distribution ofp(xi),p(xj),and translated conditional distributionp(xj→i|xj),p(xi→j|xi).Since different domains correspond to different environmental conditions,we suppose the conditional distributionp(xi|xj) is monomodal and deterministic compared to multimodal distribution across only two domains in [40].AsNincreases to infinity and becomes continuous,the multi-domain translation model covers more domains and can be regarded as a generalized multi-modal translation with limited domains.

    Fig.2.The architecture overview for model training from domain i to j and general image retrieval pipeline while testing.The involved losses include GAN loss,cycle loss and feature loss,SAM loss,and triplet loss.Note that SAM loss are only used for fine model and the encoder,decoder and discriminator are specific for each domain.

    Like the shared-latent-space assumption in the recent unsupervised image-to-image translation methods [39],[40],the content representationcis shared across different domains while the style latent variablesibelongs to each specific domain.For the image joint distribution in one domainxi∈Xi,it is generated from the prior distribution of content and style,xi=Gi(si,c),and the content and style are independent of each other.Since the condition distributionp(xi|xj)is deterministic,the style variable is only embodied in the latent generator of the specific domain,i.e.,Under such assumptions,our method could be regarded as implicitly partially-shared,although only content latent code is explicitly found across multiple domains with corresponding generators.Following the previous work [40],we further assume that the domain-specific decoder functions for shared content code,are deterministic and their inverse encoder functions exist,whereAnd our goal of domain-invariant representation learning is to find the underlying decodersand encodersfor all the environmental domains through neural networks,so that the domain-invariant latent codeccould be extracted for any given image samplexithroughc=The overview architecture based on the assumption is shown in Fig.2,of which the details are introduced in the Sections III and IV.

    B.Model Architecture

    We adopt the multi-domain image-to-image translation architecture [38],which is an expansion of CycleGAN [44]from two domains to multiple domains.The generator networks in the framework are decoupled into domainspecific pairs of encodersand decodersfor any domaini.The encoder are the first half of the generator while the decoder is the second half for each domain.For image translation across multiple domains,the encoders and decoders can be randomly combined like manipulation of blocks.The discriminatorsDiare also domain-specific for domainiand optimized in adversarial training as well.The detailed architectures of encoder,decoder,and discriminator for each domain is the same as ComboGAN [38].Note that[63] applies the ComboGAN architecture for image retrieval with feature consistency loss,resulting in an effective selfsupervised retrieval-based localization method.However,in this section we further formulate the architecture in a probabilistic framework,combining the multi-domain image translation and domain-invariant representation learning.

    For images in similar sequences under different environments,first suppose domaini,jar e selected randomlyxi,xjand images are denoted as .The basic framework DIFL is shown as Fig.3,including GAN loss,cycle consistency loss and feature consistency loss.For the image translation pass from domainito domainj,the latent feature is fir st encoded by encoderand then decoded by decoder.The translated image goes back through encoderand decoderto find the cycle consistency loss (1) [44].Also,the translated image goes through the discriminatorDjto find adversarial loss (2) [41].The pass from domainjto domainiis similar.

    The adversarial loss (2) makes the translated imagexi→jindistinguishable from the real imagexjand the distribution of translated images close to the distribution of real images.

    The cycle consistency loss (1) originates from CycleGAN[44],which has been proved to infer deterministic translation[40] and is suitable for representation learning through image translation among multiple domains.For the pure multidomain image translation task,i.e.,ComboGAN [38],the total ComboGAN loss only contains adversarial loss and cycle consistency loss.

    Fig.3.Network architecture for image translation from domain i to j.Constraint by GAN loss,cycle loss,and feature loss,the latent feature code is the domain-invariant representation.The discriminator D j results in GAN loss through adversarial training,given the real image of domain j and the translated image from domain i to j.

    Since every domain owns a set of encoder,decoder,and discriminator,the total architecture is complicated and can be modeled through a probabilistic graph if all the encoders and decoders are regarded as conditional probability distribution.Supposing the optimality of ComboGAN loss (3) is reached,the complex forward propagation during training can be simplified and the representation embedding can be analyzed.

    Without loss of generality,imagexi,m,xj,nare selected from image sequencesxi,xj,i≠j,wherem,nrepresent the places of the shared image sequences and only related to the content of images.According to the assumptions in Section III-A,m,nrepresent the shared domain-invariant content latent codecacross different domains.For the translation from imagexi,mto domainj,we have

    The latent codezi,mimplies the relationship of domainiand the content of imagemfrom (4).Due to the adversarial loss(2),the translated imagexi→j,mhas the same di stribution as imagexj,n,i.e.,xi→j,m,xj,n~p(xj).For the rec onstructed image from (7),the cycle consistency loss (1) limits it to the original image .xi,m

    From (4) and (5),we have

    which indicatesxi→j,mandiare independent if theoptimality of adversarial loss (2) is reached,andzi→j,mandiare also independent from (6).Similarly,zi,mandjare independent for anyj≠i.Combining (5),(6) and (4),(7),we can find the relationship betweenzi,mandzi→j,mand the weak form of inverse constraint on encoders and decoders below:

    When the optimality of original ComboGAN loss (3) is reached,for anyi≠j,the latent codezi,mandzi→j,mare not related tojandi,respectively,which is consistent with the proposition that cycle consistency loss cannot infer sharedlatent learning in [39].Consequently,the representation embeddings are not domain-invariant and not appropriate for image retrieval,and the underlying inverse encoders and decoders have not been found through the vanilla ComboGAN image translation model.

    C.Feature Consistency Loss

    To obtain the shared-latent feature across different domains,unlike [39],we use an additional loss exerted on the latent space called the feature consistency loss as proposed in [63].Under the above assumptions,for imagexifrom domainiit is formulated as

    As a result,the domain-invariant feature [63] can be extracted by combining all the weighted losses together

    Here gives the theoretical analysis for FCL.Supposing the optimality of the DIF loss (13) is reached,(4)–(11) are still satisfied.Additionally,because of the feature consistency loss(12),based on (4),(6),(10),we have

    Sincezi→j,mandiare independent (as discussed in the previous section),zi,mandiare independent for any domainifrom (14),which indicates that the latent feature is wellshared across multiple domains and represents the content latent code given any image from any domain.Furthermore,the trained encoders and decoders are inverse and the goal of finding underlying encodersand decodersis reached according to Section III-A.So it is appropriate to use the content latent code for image representation across different environmental conditions.

    IV.COARSE-TO-FINE RETRIEVAL-BASED LOCALIZATION

    A.Gradient-Weighted Similarity Activation Mapping Loss

    The original domain-invariant feature (13) cannot excavate the context or localizing information of the content latent feature map;as a result the performance of place recognition under high accuracy is limited.To this end,we propose a novel gradient-weighted similarity activation mapping loss for shared-latent feature to fully discover the weighted similar area for high-accuracy retrieval.

    Inspired by CAM [20],Grad-CAM [21],and Grad-CAM++[22] in visual explanation for classification with convolutional neural networks,we assume that the place recognition task can be regarded as an extension of image multi-classification with infinite target classes,where each database image represents a single target class for each query image during the retrieval process.Then,for each query image,the similarity to each database image is treated as the score before softmax or probability for multi-classification task and the one with the largest similarity is the retrieved result,which is similar to the classification result with the largest probability.

    Ideally,suppose the identical content latent feature maps from domaini,j,zi,m,zj,m,have the shape ofn×h×w,where identical contentmis omitted for brevity.First the mean value of the cosine similarity on the height and width dimension is calculated below:

    Yis the score of similarity betweenziandzj.Following the definition of Grad-CAM [21],we have the similarity activation weight and map:

    Equations (17) and (18) are the mathematic formulation of the proposed Grad-SAM,where the activation map is aggregated by each gradient-weighted feature map,retaining the localizing information of the deep feature map.In order to only input the positively-activated areas for training,we exert aReLUfunction to obtain the final activation mapLi,jorLj,i.

    Particularly,as shown in Fig.4,inside the unsupervised DIFL architecture,the content latent codeszi,m,zj,nare shared from the same distribution butzi,m≠zj,nfor the unpairedm≠n.The similarity activation mapLi,m,Li→j,mcould be visualized by resizing to the original size in Fig.4.According to FCL loss (12),zi,mandzi→j,mtend to be identical,which means that the calculation of similarity between them is meaningful and so is the SAM loss.Therefore,the selfsupervised Grad-SAM loss for domainicould be formulated below based on (16)–(18):

    wherezi,mandzi→j,mare substituted intoziandzjin (16)–(18)andLi,mandLi→j,mare short forLi,j,mandLi→j,i,mderived from(17) and (18).

    Fig.4.The illustration of one branch of SAM loss from domain i to j.The real image in domain i is first translated to fake image in domain j,and the gradient of similarity w.r.t.each other could be calculated,denoted as red dashed lines.And then the activation map is the sum of feature map weighted by the gradient,shown as color-gradient line from red to black,and SAM loss could be calculated in a self-supervised manner.Note that the notation of Li,m and L i→j,m here are short for L i,j,m and L i→j,i,m derived from (17) and (18).

    B.Adaptive Triplet Loss

    Though the domain-invariant feature learning is obtained through feature consistency loss (12) and Grad-SAM loss (19)is for further finer retrieval with salient localizing information on the latent feature map,it is difficult to distinguish different latent content codes using domain-invariant features without explicit metric learning.As the distance of the latent features with the same content is decreasing due to feature consistency loss (12) and Grad-SAM loss (19),the distance of latent features for different contents may be forced to diminish as well,resulting in mismatched retrievals for test images in long-term visual localization.

    Toward this end,we propose a novel adaptive triplet loss based on feature consistency loss (12) and Grad-SAM loss(19) to improve the contrastive learning of the latent representation inside the self-supervised DIFL framework.Suppose unpaired imagesxi,m,xj,nare selected from domaini,j,i≠j,wherem,nrepresent the content of images.Note that for the purpose of unsupervised training pipelines,one of the selected image is horizontally flipped while the other is not so thatm≠nis assured for the negative pair.The operation of flipping only one of the input images is random and also functions as data augmentation due to the fact that the flipped images follow the distribution of original images.Details could be found in Section V-A.For the self-supervised contrastive learning,the positively paired samples are not given but generated from the framework in (4)–(6) and(16)–(18),i.e.,zi,m,zi→j,mandLi,m,Li→j,m.For the negatively paired samples,for the sake of the fact that the images under the same environmental condition tend to be closer than ones under different conditions,the stricter constraint is implemented for negative pairs with the translated image and the other real image,which are under the same environment but different places,i.e.,zi→j,m,zj,nandLi→j,m,Lj,n.

    Moreover,in order to improve the efficiency of the triplet loss for representation learning during the late iterations,the negative pair with the least distance between the original and the translated one is automatically selected as the hard negative pairfrom a group of random negative candidates Zj,nor Lj,n,shown as (20) and (21).The adaptive triplet loss is calculated through these hard negative pairs without any supervision or extra priority information.

    We adopt the basic form of triplet loss from [15],but themargindepends on the feature consistency loss (12) or Grad-SAM loss (19),which adapts to the representation learning of(12) or (19).The illustrations of the adaptive triplet loss for FCL and SAM are shown in Figs.5 and 6.The adaptive triplet loss for FCL and Grad-SAM for domainiis shown below:

    where hyperparametersmf,msare themargin,which is the value that the distance of negative pairs exceeds the distance of self-generated positive pairs when the image translation is well trained,i.e.,p(xi→j,m)=p(xj,m).However constantmarginhas an influence on the joint model training with FCL or Grad-SAM loss,so we propose the self-adaptive term,which is the exponent function of negative FCL loss or Grad-SAM loss weighted by αfor αs.

    Combining with the adaptive triplet loss (22) or (23),in the beginning of the whole model training,the exponential adaptive term is close to 0 so the triplet loss term does not affect the FCL (12) or Grad-SAM (19).But as the training process goes by,the triplet loss would dominate the model training since the exponential adaptive term becomes larger and closer to 1.

    C.Coarse-to-Fine Image Retrieval Pipeline

    Different applications have different requirements for coarse-or high-precision localization,e.g.,loop closure and relocalization in SLAM and 3D reconstruction.As shown in the Section III-C,the feature consistency loss together with the cycle consistency loss and GAN loss in image-to-image translation contribute to the domain-invariant representation learning architecture,where the latent feature is independent of the multiple environmental domains so that the feature could be used for image representation and retrieval across different environments.While the Grad-SAM loss in Section IV-A is incorporated to the basic architecture for the purpose of learning salient area and attentive information from the original latent feature,which is important to the highprecision retrieval.The adaptive triplet loss in Section IV-B can balance the self-supervised representation learning and feature consistency loss,which improves the retrieval results through ablation studies in Section V-D.

    Fig.5.The illustration of one-branch adaptive triplet loss for FCL from domain i to j.The inputs of the loss are the encoded latent features from real images in domain i,j and the translated image i →j,resulting in the negative pairs with the red dashed box and the positive pairs with the green dashed box.Note that positive pairs only differ in the environment while the place is the only difference for the negative pairs.

    Fig.6.The illustration of one-branch adaptive triplet loss for Grad-SAM from domain i to j.The inputs of the loss are the similarity activation maps from real images in domain i,j and the translated image i →j.The negative pairs are bounded with the red dashed box while the positive pairs are bounded with the green dashed box.Note that the activation maps in domaini from two pairs are slightly different.

    For the image retrieval,we adopt the coarse-to-fine strategy to fully leverage the models with different training settings for different specific purposes.The DIFL model with FCL (12)and triplet loss (22) aims to find the database retrieval for the query image using general domain-invariant features and results in better performance of localization within larger error thresholds,shown in Section V-D,which gives a good initialrange of retrieved candidates and can be used as a coarse retrieval.

    TABLE IABLATION STUDY ON DIFFERENT STRATEGIES AND LOSS TERMS

    The total loss for coarse-retrieval model training is shown below:

    where λcyc,λFCL,and λTriplet_FCLare the hyperparameters to weigh different loss terms.

    Furthermore,to obtain the finer retrieval results,we incorporate the Grad-SAM (19) with its triplet loss (23) into the coarse-retrieval model,which fully digs out the localizing information of feature map and promotes the high-accuracy retrieval across different conditions shown in Table I.However,according to Section V-D,the accuracy of lowprecision localization for fine-retrieval model is lower than the coarse-retrieval model,which shows the initial necessity of the coarse retrieval.The total loss for the finer model training is shown below:

    where λcyc,λFCL,λSAM,λTriplet_SAM,and λTriplet_FCLare the hyperparameters for each loss term.

    Once the coarse and fine models are trained,the test pipeline contains coarse retrieval and finer retrieval.The 6-DoF poses of database images are given while the goal is to find the poses of query images.We first pre-encode each database image under the reference environment into feature maps through coarse model off line,forming the database of coarse features.While testing,for every query image,we extract the feature map using coarse encoder of the corresponding domain and retrieve thetop-Nmost similar ones from pre-encoded coarse features in the database.TheNcandidates are then encoded through the fine model to find the secondary feature maps,and the query image is also encoded through the fine model to find the query feature.The most similar one in theNcandidates is retrieved as the final result for localization.Although the coarse-to-fine strategy may not get the most similar retrieval globally in some cases,it will increase the accuracy within coarse error in Section V-D compared to the only single fine model,which is beneficial to the application of pose regression for relocalization.It may also benefit from the filtered coarse candidates in some cases,as in Table I,to improve medium-precision results.The 6-DoF pose of query image is the same as the finally-retrieved one in the database.

    V.ExPERIMENTAL RESULTS

    We conduct a series of experiments on CMU-Seasons dataset and validate the effectiveness of coarse-to-fine pipelines with the proposed FCL loss,Grad-SAM loss and adaptive triplet loss.With the model only trained on the urban parts of the CMU seasons dataset in an unsupervised manner,we compare our results with several image-based localization baselines on the untrained suburban and park parts of the CMU-Seasons dataset and RobotCar-Seasons dataset,showing the advantage under scenes with massive vegetation and robustness to huge illumination change.To prove the practical validity and applicability for mobile robotics,we have also conducted real-site field experiments under different environments with more angles using a mobile robot with camera and RTK-GPS.We conduct these experiments on two NVIDIA 2080Ti cards with 64 G RAM on Ubuntu 18.04 system.Our source code and pre-trained models are available on https://github.com/HanjiangHu/DISAM.

    A.Experimental Setup

    The first series of experiments are conducted on the CMUSeasons dataset [9],which is derived from the CMU Visual Localization [75] dataset.It was recorded by a vehicle with left-side and right-side cameras over a year along a route roughly 9 kilometers long in Pittsburgh,U.S.The environmental change of seasons,illumination,and especially foliage is very challenging on this dataset.Reference [9] benchmarks the dataset and presents the groudtruth of camera pose only for the reference database images,adding new categories and area divisions of the original dataset as well.There are 31 250 images in 7 slices for urban area,13 736 images in 3 slices for suburban area,and 30 349 images in 7 slices for park area.Each area has only one reference and eleven query environmental conditions.The condition of database isSunny+No Foliage,and conditions of query images could be any weather intersected with vegetation condition,e.g.,Overcast+Mixed Foliage.Since the images in the training dataset contain both left-side and right-side ones,the operation of flipping horizontally is reasonable and acceptable for the unsupervised generation of negative pairs and data augmentation,as introduced in Section IV-B.

    The second series of experiments are conducted on RobotCar-Seasons dataset [9] derived from Oxford RobotCar dataset [76].The images were captured with three Point Grey Grasshopper2 cameras on the left,rear,and right of the vehicle along a 10 km route under changing weather,season,and illumination across a year in Oxford,U.K.It contains 6 95 4 triplets for database images under overcast condition,3 100 triplets for day-time query images under 7 conditions,and 878 triplets for night-time images under 2 conditions.In the experiment we only test rear images with the pre-trained model on the urban part of CMU-Seasons dataset to validate the generalization ability of our approach.Considering that not all conditions of RoborCar datasets have exactly corresponding conditions in CMU-Seasons,we choose the pre-trained models under the conditions with the most similar descriptions and dates from CMU-Seasons dataset for all the conditions in RobotCar dataset listed in Table II.Note that for the conditions which are not included in CMU-Seasons,we usethe pre-trained models under the reference condition instead,Overcast+Mixed Foliage,for the sake of fairness.

    TABLE IICONDITION CORRESPONDENCE FOR ROBORCAR DATASET

    The images are scaled to 286×286 and cropped to 256×256randomly while training but directly scaled to 256×256while testing,leading to a feature map with the shape of 256×64×64.We follow the protocol introduced in[9] which is the percentage of correctly-localized query images.Since we only focus on high and medium precision,the pose error thresholds are (0.25 m,2°) and (0.5 m,5°) while coarse-precision (low-precision) (5 m,10°) is omitted for the purpose of high-precision localization except for the ablation study.We choose several image-based localization methods FAB-MAP [74],DIFL-FCL [63],NetVLAD [4],and Dense-VLAD [6],which are the best image-based localization methods.

    B.Evaluation on CMU-Seasons Dataset

    Following the transfer learning strategy for DIFL in [63],we fine-tune the pre-trained models in [63] at epoch 300 which are trained only with cycle consistency loss and GAN loss under all the images from the CMU-Seasons dataset for pure image translation task.Then,for the representation learning task,the model is fine-tuned with images from Urban areas in an unsupervised manner,without paired images across conditions.After adding the other loss terms in (24) or(25),we continue to train until epoch 600,with a learning rate linearly decreasing from 0.000 2 to 0.Then the model is trained in the same manner until epoch 1 200 with split 300 epochs.In order to speed up and stabilize the training process with triplet loss,we use the random negative pairs from epoch 300 to epoch 600 for the fundamental representation learning and adopt the hard negative pairs from epoch 600,as shown in Section IV-B.We choose the hard negative pair from 10 pairs of negative samples for each iteration.

    For the coarse-retrieval model training,the weight hyperparameter are maximumly set as λcyc=10,λFCL=0.1,and λTriplet_FCL=1,which are all linearly increasing from 0 as the training process goes by to balance the multi-task framework.Similarly for the fine-retrieval model training,we set λcyc=10,λFCL=0.1,λSAM=1000,λTriplet_SAM=1,and λTriplet_FCL=1with a similar training strategy.The fine model consists of the metrics of bothL2 andcosine similarityfor FCL terms while onlyL2 metric is used in the coarse model for FCL terms.For the adaptive triplet loss,we setmf=5,αf=2 in triplet FCL loss (22) andms=0.1,αs=1000 in triplet SAM loss (23).And during the two-stage retrieval,the number of coarse candidatestop-Nis set to be 3,which makes it both efficient and effective.In the two-stage retrieval pipeline,we use the mean value of thecosine similarityon the height and width dimension as the metric during the coarse retrieval,as shown in (16).For the fine retrieval,we use the normalcosine similarityfor the flatten secondary features due to the salient information in the feature map.

    Our final result is compared with baselines shown as Table III,which shows that ours outperforms baseline methods for highand medium-precision localization,(0.25 m,2°) and(0.5 m,5°),in park and suburban area,which shows powerful generalization ability because the model is only trained on theurban area.The medium-precision localization in the urban area is affected by numerous dynamic objects.We further compare the performance on different foliage categories from[9],FoliageandMixed Foliagewith the reference database underNo Foliage,which is the most challenging problem for this dataset.The results are shown in Table IV,from which we can see that our result is better than baselines under different conditions of foliage for the localization with medium and high precision.To investigate the performance under different weather conditions,we compare the models with baselines on theOvercast,Cloudy,andLow Sunconditions with the reference database underSunnyin Table V,which covers almost all the weather conditions.It could be seen that our results present the best medium-and highaccuracy results on most of the weather conditions.TheCloudyweather contains plenty of clouds in the sky,which provides some noise in the activation map for fine retrieval with reference to the clear sky underSunny,which could be regarded as a kind of dynamic objects.

    TABLE IIIRESULTS COMPARISON TO BASELINES ON CMU-SEASONS DATASET

    TABLE IVCOMPARISON WITH BASELINES ON FOLIAGE CONDITION REFERENCE IS NO FOLIAGE

    From the results of different areas,vegetation,and weather,it can be seen that the finer retrieval boosts the results of coarse retrieval.Moreover,the coarse-to-fine retrieval strategy gives better performance than the fine-only method in some cases,showing the significance and effectiveness for highand medium-precision localization of the two-stage strategy.The reasonable explanation for the good performance under different foliage and weather conditions lies in that the latent content code is robust and invariant for changing vegetation and illumination.All the results (including ours) are from theofficial benchmark website of long-term visual localization[9].Some results of fine-retrieval are shown in Fig.7,where the activation maps give the localizing information of feature maps and the salient areas mostly exist around the edges or adjacent parts of different instance patches due to the gradient-based activation.

    TABLE VCOMPARISON WITH BASELINES ON WEATHER CONDITION REFERENCE IS SUNNY

    C.Evaluation on RobotCar Dataset

    In order to further validate the generalization ability of our proposed method to the unseen scenarios,we directly use the pre-trained models on urban area of CMU-Seasons to test on the RobotCar dataset,according to the correspondent condition from CMU-Seasons for every query condition of RobotCar based on Table II.Considering the database images are much more than query images under each condition,the two-stage strategy is skipped for practicality and efficiency,only testing coarse-only and fine-only models.The metric for both coarse and fine retrieval is the mean value of thecosine similarityon the height and width dimension as shown in (16).

    The comparison results are shown in Table VI,where we can see that our method outperforms other baseline methods under theNightandNight-rainconditions.Note that the model we use for the night-time retrieval is the same as the database because night-time images are not included in the training set,showing the effectiveness of the representation learning in the latent space form autoencoder-structured model.Since the images underNightandNight-rainconditions have too poor context or localizing information to find the correct similarity activation maps,the coarse model performs better than the finer model.

    Our results under all theDayconditions are the best for high-precision performance,showing the powerful generalization ability in the unknown scenarios and environments through attaining satisfactory retrieval-based localization results.All the results (including ours) are also from the official benchmark website of long-term visual localization[9].Some day-time results are shown in Fig.8,including all the environments which have similar ones among pre-trained models on CMU-Seasons dataset.

    D.Ablation Study

    Fig.7.Results on CMU-Seasons dataset.For each set of images in (a) to (e),the top left is the query image while the top right is the database image under the condition of Sunny+No Foliage.The query images of set (a) to (e) are under the conditions of Low Sun+Mixed Foliage,Overcast+Mixed Foliage,Low Sun+Snow,Low Sun+Foliage and Sunny+Foliage,respectively.The visualizations of similarity activation maps are on the bottom row for all the query or database RGB images.

    TABLE VIRESULTS COMPARISON TO BASELINES ON ROBOTCAR DATASET

    For the further ablation study in Table I,we implement different strategies (Coarse-only,Fine-only,andCoarse-to-fine) and different loss terms (FCL,Triplet FCL,SAM,andTriplet SAM) during model training,and test them on CMUSeasons dataset.The only difference betweenCoarse-onlyandFine-onlylies in whether the model is trained withSAMor not,while coarse-to-fine strategy follows the two-stage strategy in Section IV-C.It could be seen thatCoarse-onlymodels perform the best in low-precision localization,which is suitable to provide the rough candidates for the upcoming finer retrieval.With the incorporation ofSAM-related loss,the medium-and high-precision accuracies increase while the low-precision one decreases.TheCoarse-to-finecombines the advantages ofCoarse-onlyandFine-onlytogether,improving the low-precision localization of fine models as well as the medium-and high-precision localization of coarse models simultaneously,which shows the effectiveness and significance of the two-stage strategy by overcoming both the weaknesses.Furthermore,because of the high-quality potential candidates provided byCoarse-onlymodel,some medium-precision results ofCoarse-to-fineon the last row perform the best and other results are extremely close the best ones,which shows the promising performance of the twostage strategy.

    From the first two rows ofCoarse-onlyandFine-onlyin Table I,theFlipped Negativeand theHard Negativesamples are shown to be necessary and beneficial to the final results,especially for the flipping operation for data augmentation.On the third and fourth row,the DIFL with FCL performs better than vanilla ComboGAN (3),which indicates that FCL assists to extract the domain-invariant feature.Due to the effective self-supervised triplet loss with hard negative pairs,the performance withTriplet FCLorTriplet SAMis significantly improved compared with the results on the fourth or ninth row,respectively.To validate the effectiveness ofAdaptive Marginin triplet loss,we compare the results ofConstant MarginandAdaptive Margin,which show that the model with adaptive margin gives better results than that with constant margin for bothTriplet FCLandTriplet SAM.The last row inFine-onlystrategy shows the hybrid adaptive triplet losses of both FCL and SAM are beneficial to the fine retrieval.Note that the settings of training and testing for Table I are consistent internally,but are slightly different from the experimental settings in the [63] in many aspects,like training epochs,the metrics for retrieval,the choice of the pre-trained models for testing,etc.Also,the adaptive margin for triplet loss is partially influenced by the hard negative samples,because the less distance of negative pairs means relatively less margin to positive pairs which reduce the positive distance and the adaptive margin increases consequently.

    Fig.8.Results on RobotCar dataset.For each set of images in (a) to (e),the top left is the day-time query image while the top right is the database image under the condition of Overcast.The query images of set (a) to (e) are under the conditions of Dawn,Overcast-summer,Overcast-winter,Snow and Sun,respectively.The visualizations of similarity activation maps are on the bottom row for all the query or database RGB images.

    E.Real-Site Experiment

    For the real-site experiment,the dataset is collected through an AGV with ZED Stereo Camera and RTK-GPS,and mobile robot is shown as Fig.9(a).The routine we choose is around the Lawn besides the School Building on the campus,which is around 2 km and is shown in Fig.9(b).We collect different environments including the weather,daytime,and illumination changes,as classify them asSunny,Overcast,andNight,respectively.There are 12 typical places as key frames with 25 different angles of view point for challenging localization,marked in red circles in Fig.9(b),compensating the single perspective of driving scenes in both CMU-Seasons and Oxford RobotCar dataset.There are 300 images for each environment and some samples of the dataset are shown as Fig.10.The same places along the routes are mainly within the distance of 5 m,which acts as the 25 groundtruth images of place recognition from GPS data.

    Fig.9.Image (a) shows the mobile robot used to collect dataset with RTKGPS and ZED Stereo Camera.Image (b) shows the routines of the dataset under changing environments,illustrated in different colored lines and the red circles indicate the 12 typical places for recognition and retrieval with different perspectives.

    Since all three environments are during autumn,we use the CMU-Seasons pretrained models underLow Sun+Mixed FoliageforSunny,Overcast+Mixed FoliageforOvercast,andCloudy+Mixed FoliageforNightin the experiments.The three place recognition experiments are query images underSunnywith database underOvercast,query images underSunnywith database underNight,and query images underNightwith database underOvercast.

    Fig.10.Dataset images for real-site experiment.Column (a) is Sunny,Column (b) is Overvast,and Column (c) is Night.The first and last two rows show the changing perspectives,which gives more image candidates for the typical places.

    For each query image,we retrieve Top-Ncandidates (Nfrom 1 to 25) from the database and calculate average recall rate to demonstrate the performance of the coarse-only and fine-only methods.For the coarse-to-fine method,first the coarse-only model retrieves the Top-2Ncandidates (2Nfrom 2 to 50) and then the fine-only model retrieves finer Top-Ncandidates (Nfrom 1 to 25) from them,the average recall is calculated over all the query images.

    As shown in the Fig.11,the results of three proposed methods are validated under three different retrieval environmental settings.From the results,it can be seen that the coarse-only method performs better than the fine-only method in the large-scale place recognition,which is consistent with the results of coarse precision on CMUSeasons in Table I.Besides,the coarse-to-fine strategy obviously improves the performance of both coarse-only and fine-only methods,which shows that the effectiveness and applicability of the two-stage method.The coarse-to-fine performance within top 5 recall is limited by the performance of fine model,which is improved as the number of database retrieval candidates (N) increases.

    Since time consumption is important for place recognition in robotic applications,we have measured the time cost of three proposed methods in the real-site experiment.As shown in Table VII,the average time of inference is to extract the feature representation through encoder while the average time of retrieval is to retrieve top 25 out of 300 database candidates through brute-force searching in the real-site experiment.By comparing the three methods,it could be seen that although the inference time of coarse-to-fine is almost the sum of coarse-only and fine-only,the time consumption is short enough for representation extraction.For brute-force retrieval,the time of coarse-to-fine is a little bit larger than the coarseonly and fine-only methods because the second finer-retrieval stage only find top 25 out of 50 coarse candidates,which costs much less time.Note that the retrieval time cost could be significantly reduced through other ways of search,like KDtree instead of brute-force search,but these techniques are beyond the focus of this work so Table VII only gives relative time comparison of the three proposed strategies,validating the time-efficiency and effectiveness of the two-stage method.

    VI.CONCLUSION

    In this work,we have formulated a domain-invariant feature learning architecture for long-term retrieval-based localization with feature consistency loss (FCL).Then a novel loss based on gradient-weighted similarity activation map (Grad-SAM) is proposed for the improvement of high-precision performance.The adaptive triplet loss based on FCL loss or Grad-SAM loss is incorporated to the framework to form the coarse or fine retrieval methods,resulting in the coarse-to-fine testing pipeline.Our proposed method is also compared with several state-of-the-art image-based localization baselines on CMUSeasons and RobotCar-Seasons dataset,where our results outperform the baseline methods for image retrieval in medium-and high-precision localization in challenging environments.Real-site experiment validate the efficiency and effectiveness of the proposed further.However,there are a few concerns about our method that the performance under the dynamic scenes is weak compared to other image-based methods,which can be addressed by adding semantic information to enhance the robustness to dynamic objects in the future.Another concern lies in the unified model for robust visual localization where the front-end network collaborate with representation learning better.

    Fig.11.Results of the real site experiment.(a) is the result of Sunny query images with the database of Overcast;(b) is the result of Sunny query images with the database of Night;(c) is the result of Night query images with the database of Overcast.

    TABLE VIITIME CONSUMPTION OF DIFFERENT METHODS

    ACKNOWLEDGMENT

    The authors would like to thank Zhijian Qiao from Department of Automation at Shanghai Jiao Tong University for his contribution to the real-site experiments including the collection of dataset and the comparison experiments.

    大香蕉97超碰在线| 亚洲成色77777| a级毛片在线看网站| 日本wwww免费看| 久久青草综合色| 少妇 在线观看| 国产高清三级在线| 免费大片黄手机在线观看| 亚洲精品,欧美精品| 丰满人妻一区二区三区视频av| 中国国产av一级| 欧美精品亚洲一区二区| 欧美高清成人免费视频www| 亚洲电影在线观看av| 只有这里有精品99| 男女国产视频网站| 国产日韩欧美视频二区| 亚洲无线观看免费| 日本色播在线视频| 免费观看无遮挡的男女| 99热这里只有精品一区| 国产高清不卡午夜福利| 国产精品国产三级国产av玫瑰| av卡一久久| 中文字幕制服av| 欧美高清成人免费视频www| 22中文网久久字幕| 大码成人一级视频| 精品一区二区三区视频在线| 久久久久久久亚洲中文字幕| 夜夜看夜夜爽夜夜摸| 精品国产一区二区久久| 久久午夜福利片| 午夜精品国产一区二区电影| 人人妻人人澡人人爽人人夜夜| av福利片在线| 亚洲婷婷狠狠爱综合网| 99热这里只有是精品50| 成人毛片a级毛片在线播放| 日韩中文字幕视频在线看片| 亚洲国产精品成人久久小说| 免费少妇av软件| 色网站视频免费| 日韩中字成人| 国精品久久久久久国模美| 妹子高潮喷水视频| 国产毛片在线视频| 久久精品国产自在天天线| 啦啦啦中文免费视频观看日本| 妹子高潮喷水视频| 色94色欧美一区二区| 最新中文字幕久久久久| 插阴视频在线观看视频| videos熟女内射| 欧美日韩综合久久久久久| 亚洲伊人久久精品综合| 久久精品久久精品一区二区三区| av线在线观看网站| 寂寞人妻少妇视频99o| 久久久久视频综合| 最近手机中文字幕大全| 免费人成在线观看视频色| 国产精品一区二区性色av| 精品午夜福利在线看| av不卡在线播放| 久久久久久久亚洲中文字幕| 亚洲av中文av极速乱| √禁漫天堂资源中文www| 国内精品宾馆在线| 又黄又爽又刺激的免费视频.| 国产有黄有色有爽视频| √禁漫天堂资源中文www| 狂野欧美激情性bbbbbb| 一个人看视频在线观看www免费| 少妇人妻一区二区三区视频| 精品少妇内射三级| 在线观看一区二区三区激情| 国内少妇人妻偷人精品xxx网站| 欧美国产精品一级二级三级 | 欧美日韩视频精品一区| 大片免费播放器 马上看| 丝瓜视频免费看黄片| 看十八女毛片水多多多| 日韩中文字幕视频在线看片| 插阴视频在线观看视频| 秋霞在线观看毛片| 国产精品一区www在线观看| 免费播放大片免费观看视频在线观看| 极品人妻少妇av视频| 久久青草综合色| 免费看av在线观看网站| a级片在线免费高清观看视频| 制服丝袜香蕉在线| 菩萨蛮人人尽说江南好唐韦庄| 成人18禁高潮啪啪吃奶动态图 | 免费av中文字幕在线| 国内精品宾馆在线| 精品人妻一区二区三区麻豆| 十八禁高潮呻吟视频 | 精品午夜福利在线看| 久久午夜福利片| 韩国高清视频一区二区三区| 久久国产精品男人的天堂亚洲 | 韩国av在线不卡| 国产色爽女视频免费观看| 亚洲经典国产精华液单| 最新的欧美精品一区二区| 国产成人免费观看mmmm| 精品卡一卡二卡四卡免费| 午夜免费男女啪啪视频观看| 精品人妻熟女毛片av久久网站| 免费av不卡在线播放| 91午夜精品亚洲一区二区三区| 精品久久久久久电影网| 国产精品三级大全| 人人妻人人添人人爽欧美一区卜| 国产亚洲精品久久久com| 18禁动态无遮挡网站| 久久久精品免费免费高清| av天堂中文字幕网| 街头女战士在线观看网站| 777米奇影视久久| 日日摸夜夜添夜夜爱| a 毛片基地| 午夜精品国产一区二区电影| 18+在线观看网站| 大话2 男鬼变身卡| 亚洲国产av新网站| 亚洲高清免费不卡视频| 中文字幕久久专区| 婷婷色综合大香蕉| 毛片一级片免费看久久久久| 99热这里只有是精品50| 2022亚洲国产成人精品| 午夜福利视频精品| 日韩欧美一区视频在线观看 | 欧美 日韩 精品 国产| 久久久久久久久久人人人人人人| 九色成人免费人妻av| 在线免费观看不下载黄p国产| 99re6热这里在线精品视频| 少妇丰满av| 18禁在线无遮挡免费观看视频| 极品少妇高潮喷水抽搐| 婷婷色麻豆天堂久久| 国产黄片视频在线免费观看| 亚洲三级黄色毛片| 欧美bdsm另类| 熟女人妻精品中文字幕| 成人亚洲精品一区在线观看| 亚洲欧洲精品一区二区精品久久久 | 9色porny在线观看| 极品少妇高潮喷水抽搐| 国产精品偷伦视频观看了| 久久国产亚洲av麻豆专区| 少妇精品久久久久久久| 欧美日韩在线观看h| 好男人视频免费观看在线| 久久狼人影院| 免费播放大片免费观看视频在线观看| 最近2019中文字幕mv第一页| 黄色配什么色好看| 搡女人真爽免费视频火全软件| 亚洲国产精品一区三区| av在线app专区| 看免费成人av毛片| 七月丁香在线播放| 亚洲精品456在线播放app| 大陆偷拍与自拍| 亚洲欧洲日产国产| 亚洲精品一区蜜桃| 自拍偷自拍亚洲精品老妇| 黄色毛片三级朝国网站 | 国产精品国产三级国产av玫瑰| 一级毛片 在线播放| 久久精品国产亚洲av涩爱| 日韩一本色道免费dvd| 亚洲精品国产色婷婷电影| 欧美3d第一页| 色5月婷婷丁香| 街头女战士在线观看网站| 日本猛色少妇xxxxx猛交久久| 国产亚洲精品久久久com| 成人午夜精彩视频在线观看| 欧美日韩视频精品一区| 成人无遮挡网站| 日韩一区二区视频免费看| a级毛片在线看网站| 偷拍熟女少妇极品色| 久久久久久久国产电影| 看免费成人av毛片| 99久久中文字幕三级久久日本| 午夜福利视频精品| 又黄又爽又刺激的免费视频.| 寂寞人妻少妇视频99o| 多毛熟女@视频| 国产日韩欧美视频二区| 亚洲在久久综合| 亚洲av中文av极速乱| 久久午夜综合久久蜜桃| 国产黄色视频一区二区在线观看| 国产成人精品一,二区| 日本免费在线观看一区| 中国三级夫妇交换| 午夜91福利影院| 91精品国产九色| av视频免费观看在线观看| 日本黄大片高清| 国产精品国产三级国产av玫瑰| 我要看黄色一级片免费的| 新久久久久国产一级毛片| 久久国产精品大桥未久av | 午夜老司机福利剧场| 涩涩av久久男人的天堂| 人人妻人人澡人人爽人人夜夜| 久久精品国产自在天天线| 日本爱情动作片www.在线观看| 欧美3d第一页| 九草在线视频观看| 久久国产乱子免费精品| 欧美日韩视频高清一区二区三区二| 久久99热6这里只有精品| 久久久久久久久久成人| 91aial.com中文字幕在线观看| 一级片'在线观看视频| 中文字幕精品免费在线观看视频 | 在线观看免费视频网站a站| 免费观看a级毛片全部| 中文字幕精品免费在线观看视频 | 午夜视频国产福利| 97超视频在线观看视频| 国产无遮挡羞羞视频在线观看| 精品熟女少妇av免费看| 国产精品女同一区二区软件| 成年女人在线观看亚洲视频| 国产精品国产三级国产av玫瑰| 色婷婷av一区二区三区视频| 热re99久久精品国产66热6| 夜夜看夜夜爽夜夜摸| 日本爱情动作片www.在线观看| 国产淫语在线视频| 各种免费的搞黄视频| 亚洲成人一二三区av| 国产免费一级a男人的天堂| 久久 成人 亚洲| a级一级毛片免费在线观看| 国产精品秋霞免费鲁丝片| 99热全是精品| 国产熟女欧美一区二区| 一二三四中文在线观看免费高清| 99re6热这里在线精品视频| 一边亲一边摸免费视频| 中文字幕人妻熟人妻熟丝袜美| 久久99精品国语久久久| 久久久久国产精品人妻一区二区| 两个人的视频大全免费| 插阴视频在线观看视频| 国产精品一区二区在线观看99| 亚洲国产精品一区三区| 日本欧美视频一区| 26uuu在线亚洲综合色| 亚洲精品成人av观看孕妇| 国产高清不卡午夜福利| 少妇猛男粗大的猛烈进出视频| 蜜桃在线观看..| 国产欧美另类精品又又久久亚洲欧美| 久久精品国产亚洲av天美| 国产精品福利在线免费观看| 最近2019中文字幕mv第一页| 欧美精品一区二区大全| 国产精品熟女久久久久浪| 久久久久国产网址| 免费大片18禁| 在线播放无遮挡| 国产精品人妻久久久影院| 亚洲美女搞黄在线观看| 观看av在线不卡| 在线 av 中文字幕| 免费观看av网站的网址| 一本久久精品| 久久久久久久久久人人人人人人| 久久韩国三级中文字幕| 不卡视频在线观看欧美| 一级二级三级毛片免费看| 久久国产精品男人的天堂亚洲 | 少妇人妻一区二区三区视频| 久久99热6这里只有精品| 中文天堂在线官网| 老司机影院成人| 夫妻性生交免费视频一级片| 一二三四中文在线观看免费高清| 日韩,欧美,国产一区二区三区| 夫妻午夜视频| av专区在线播放| 99热网站在线观看| 国产在线一区二区三区精| 深夜a级毛片| 99热6这里只有精品| 成人毛片60女人毛片免费| 美女内射精品一级片tv| 女性生殖器流出的白浆| 免费观看无遮挡的男女| 九草在线视频观看| 亚洲情色 制服丝袜| 国产淫语在线视频| 2018国产大陆天天弄谢| 青春草视频在线免费观看| 亚洲国产精品专区欧美| 黑人巨大精品欧美一区二区蜜桃 | 一级,二级,三级黄色视频| 日本欧美国产在线视频| 欧美日韩一区二区视频在线观看视频在线| 丰满少妇做爰视频| 亚洲成人av在线免费| 一二三四中文在线观看免费高清| 岛国毛片在线播放| 亚洲欧美日韩卡通动漫| 欧美日本中文国产一区发布| 综合色丁香网| 久久久久久久久久成人| 校园人妻丝袜中文字幕| 欧美+日韩+精品| av天堂久久9| 日韩欧美一区视频在线观看 | 亚洲,欧美,日韩| 欧美另类一区| 少妇被粗大的猛进出69影院 | av网站免费在线观看视频| 春色校园在线视频观看| 午夜福利在线观看免费完整高清在| 美女内射精品一级片tv| 色网站视频免费| 毛片一级片免费看久久久久| 欧美最新免费一区二区三区| 美女内射精品一级片tv| 成人亚洲欧美一区二区av| 91久久精品国产一区二区成人| 内射极品少妇av片p| 欧美性感艳星| 久久av网站| 国产亚洲最大av| 日韩精品免费视频一区二区三区 | av网站免费在线观看视频| 午夜影院在线不卡| 亚洲美女搞黄在线观看| 亚洲怡红院男人天堂| 中文字幕亚洲精品专区| 日本色播在线视频| 国产91av在线免费观看| 国产一区二区三区av在线| 韩国av在线不卡| 亚洲av中文av极速乱| 最近的中文字幕免费完整| 天天操日日干夜夜撸| 一本一本综合久久| 人妻夜夜爽99麻豆av| 春色校园在线视频观看| 三级国产精品欧美在线观看| 视频中文字幕在线观看| 国产伦精品一区二区三区四那| 日韩av免费高清视频| 婷婷色av中文字幕| 伊人亚洲综合成人网| 亚洲欧美中文字幕日韩二区| 国产精品一区二区性色av| 日韩亚洲欧美综合| 在线观看av片永久免费下载| 韩国av在线不卡| 日产精品乱码卡一卡2卡三| 久久久久国产精品人妻一区二区| 免费高清在线观看视频在线观看| 国产亚洲一区二区精品| 国产 一区精品| 嘟嘟电影网在线观看| 国产精品无大码| 麻豆精品久久久久久蜜桃| 一级,二级,三级黄色视频| 国产黄色视频一区二区在线观看| 蜜臀久久99精品久久宅男| 少妇熟女欧美另类| 国产 一区精品| 亚洲精品456在线播放app| 日本色播在线视频| 亚洲精品乱码久久久v下载方式| 日本wwww免费看| 高清在线视频一区二区三区| av又黄又爽大尺度在线免费看| 免费黄频网站在线观看国产| 在线观看三级黄色| 久久这里有精品视频免费| 免费av中文字幕在线| 欧美变态另类bdsm刘玥| 青春草亚洲视频在线观看| 午夜福利,免费看| 天堂8中文在线网| 免费看光身美女| 嫩草影院新地址| 大香蕉久久网| 最后的刺客免费高清国语| 日本免费在线观看一区| a 毛片基地| 亚洲av中文av极速乱| 韩国高清视频一区二区三区| 看非洲黑人一级黄片| 欧美成人精品欧美一级黄| 婷婷色综合大香蕉| 97在线人人人人妻| 亚洲自偷自拍三级| 赤兔流量卡办理| 噜噜噜噜噜久久久久久91| 黄色毛片三级朝国网站 | 热re99久久国产66热| 国产白丝娇喘喷水9色精品| 亚洲婷婷狠狠爱综合网| 国产免费福利视频在线观看| 日韩 亚洲 欧美在线| 亚洲精品久久久久久婷婷小说| 丰满饥渴人妻一区二区三| 如日韩欧美国产精品一区二区三区 | 高清欧美精品videossex| 亚洲av日韩在线播放| 人妻少妇偷人精品九色| 亚洲欧美精品专区久久| 九九爱精品视频在线观看| 七月丁香在线播放| 国产欧美日韩精品一区二区| 久久影院123| 欧美日本中文国产一区发布| 丰满少妇做爰视频| 91久久精品电影网| 日本黄色片子视频| 一区二区av电影网| 久久国产精品男人的天堂亚洲 | 久久久久视频综合| 久久毛片免费看一区二区三区| 久久综合国产亚洲精品| 亚洲av二区三区四区| 最近手机中文字幕大全| 亚洲激情五月婷婷啪啪| 国产精品蜜桃在线观看| 中国三级夫妇交换| 两个人免费观看高清视频 | 免费av不卡在线播放| 免费观看的影片在线观看| 成人毛片60女人毛片免费| 99热6这里只有精品| 日本色播在线视频| 欧美国产精品一级二级三级 | 嫩草影院入口| 三级国产精品欧美在线观看| 99热这里只有是精品在线观看| 最黄视频免费看| 久久久久久久久久久丰满| 日韩三级伦理在线观看| 久久久欧美国产精品| 视频中文字幕在线观看| 大陆偷拍与自拍| 2021少妇久久久久久久久久久| 国产永久视频网站| 欧美成人精品欧美一级黄| 美女国产视频在线观看| 秋霞伦理黄片| 日本av手机在线免费观看| 久久久久久久久久人人人人人人| 黄色视频在线播放观看不卡| 免费观看a级毛片全部| 高清在线视频一区二区三区| 色哟哟·www| 高清视频免费观看一区二区| 国产精品久久久久久av不卡| 两个人的视频大全免费| 成人二区视频| 我的女老师完整版在线观看| 日韩电影二区| 亚洲国产精品一区三区| av福利片在线观看| 亚洲欧美一区二区三区国产| 久久精品国产自在天天线| av在线老鸭窝| 亚洲av电影在线观看一区二区三区| av有码第一页| 亚洲国产色片| 黄片无遮挡物在线观看| 精品人妻一区二区三区麻豆| 男女国产视频网站| 国产白丝娇喘喷水9色精品| 国产精品熟女久久久久浪| 亚洲精品久久久久久婷婷小说| 国产欧美日韩一区二区三区在线 | 99久国产av精品国产电影| 日产精品乱码卡一卡2卡三| 蜜桃久久精品国产亚洲av| 欧美精品国产亚洲| 伊人久久精品亚洲午夜| av不卡在线播放| 自拍偷自拍亚洲精品老妇| 亚洲精品一区蜜桃| 成人特级av手机在线观看| 国产在线免费精品| 男人爽女人下面视频在线观看| 亚洲一区二区三区欧美精品| 丝瓜视频免费看黄片| 亚洲内射少妇av| 日本黄色日本黄色录像| 99久国产av精品国产电影| 日本与韩国留学比较| 亚洲图色成人| 国产精品欧美亚洲77777| 一个人看视频在线观看www免费| 午夜老司机福利剧场| 亚洲av中文av极速乱| 九九爱精品视频在线观看| 亚洲内射少妇av| 日韩,欧美,国产一区二区三区| 大香蕉97超碰在线| 如日韩欧美国产精品一区二区三区 | 久久精品久久久久久久性| 日本vs欧美在线观看视频 | 久久韩国三级中文字幕| 久久久久久久久久久久大奶| 国产爽快片一区二区三区| 少妇的逼好多水| 久久精品国产亚洲av涩爱| 久久国内精品自在自线图片| 久久99蜜桃精品久久| 91成人精品电影| 久久久久人妻精品一区果冻| 人妻制服诱惑在线中文字幕| 国产淫片久久久久久久久| 日本午夜av视频| 成人综合一区亚洲| av国产精品久久久久影院| 亚洲欧美精品自产自拍| 男人狂女人下面高潮的视频| 黑人高潮一二区| 一个人看视频在线观看www免费| 国产精品久久久久成人av| 国产精品偷伦视频观看了| 国产亚洲av片在线观看秒播厂| 久久午夜综合久久蜜桃| 在线天堂最新版资源| 丰满饥渴人妻一区二区三| 日韩大片免费观看网站| 国产伦在线观看视频一区| 春色校园在线视频观看| 国产一区二区三区av在线| 99热这里只有是精品50| 免费观看性生交大片5| 成人毛片60女人毛片免费| 在线亚洲精品国产二区图片欧美 | 少妇人妻精品综合一区二区| 搡女人真爽免费视频火全软件| 午夜视频国产福利| 能在线免费看毛片的网站| 一区二区三区精品91| 又爽又黄a免费视频| 日韩强制内射视频| 欧美日韩综合久久久久久| 天堂俺去俺来也www色官网| 中文字幕人妻丝袜制服| 国产精品伦人一区二区| 国产男人的电影天堂91| 精品酒店卫生间| av女优亚洲男人天堂| 少妇被粗大猛烈的视频| 80岁老熟妇乱子伦牲交| 丰满少妇做爰视频| 国产一区二区三区av在线| 欧美97在线视频| 免费观看无遮挡的男女| 亚洲精品亚洲一区二区| 久久久国产欧美日韩av| 最近中文字幕高清免费大全6| 中文天堂在线官网| 久久久久久久久久久丰满| 99九九线精品视频在线观看视频| 午夜福利在线观看免费完整高清在| 晚上一个人看的免费电影| √禁漫天堂资源中文www| 夫妻性生交免费视频一级片| 国产亚洲一区二区精品| 啦啦啦在线观看免费高清www| 岛国毛片在线播放| 一级黄片播放器| av免费在线看不卡| 国产在线视频一区二区| 日韩 亚洲 欧美在线| 桃花免费在线播放| 精品少妇黑人巨大在线播放| 黄色视频在线播放观看不卡| av福利片在线| 久久久精品免费免费高清| 亚洲va在线va天堂va国产| 国产熟女午夜一区二区三区 | 亚洲av国产av综合av卡| 91成人精品电影| 国产高清有码在线观看视频| 亚洲精品视频女| 欧美精品高潮呻吟av久久| 久久 成人 亚洲| 国产亚洲5aaaaa淫片| 成人综合一区亚洲| 18禁在线无遮挡免费观看视频| 国产精品成人在线| 在线观看免费视频网站a站| 能在线免费看毛片的网站| 哪个播放器可以免费观看大片| 久久狼人影院| 王馨瑶露胸无遮挡在线观看| 亚洲欧美精品专区久久| 精品一品国产午夜福利视频| 精品卡一卡二卡四卡免费| 午夜av观看不卡| 国产欧美日韩精品一区二区| 插阴视频在线观看视频|