• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Weakly-Supervised Crowd Density Estimation Method Based on Two-Stage Linear Feature Calibration

    2024-04-15 09:37:16YongChaoLiRuiShengJiaYingXiangHuandHongMeiSun
    IEEE/CAA Journal of Automatica Sinica 2024年4期

    Yong-Chao Li , Rui-Sheng Jia , Ying-Xiang Hu , and Hong-Mei Sun

    Abstract—In a crowd density estimation dataset, the annotation of crowd locations is an extremely laborious task, and they are not taken into the evaluation metrics.In this paper, we aim to reduce the annotation cost of crowd datasets, and propose a crowd density estimation method based on weakly-supervised learning, in the absence of crowd position supervision information, which directly reduces the number of crowds by using the number of pedestrians in the image as the supervised information.For this purpose, we design a new training method, which exploits the correlation between global and local image features by incremental learning to train the network.Specifically, we design a parent-child network (PC-Net) focusing on the global and local image respectively, and propose a linear feature calibration structure to train the PC-Net simultaneously, and the child network learns feature transfer factors and feature bias weights, and uses the transfer factors and bias weights to linearly feature calibrate the features extracted from the Parent network,to improve the convergence of the network by using local features hidden in the crowd images.In addition, we use the pyramid vision transformer as the backbone of the PC-Net to extract crowd features at different levels, and design a global-local feature loss function (L2).We combine it with a crowd counting loss(LC) to enhance the sensitivity of the network to crowd features during the training process, which effectively improves the accuracy of crowd density estimation.The experimental results show that the PC-Net significantly reduces the gap between fullysupervised and weakly-supervised crowd density estimation, and outperforms the comparison methods on five datasets of ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50, UCF_QNRF and JHU-CROWD++.

    I.INTRODUCTION

    WITH the increase of the global population and human social activities, large crowds often gather in public places, which brings huge hidden dangers to public safety.Therefore, determining how to accurately estimate crowd density has become an important research topic in the field of public safety.To train a robust and reliable network for accurate crowd density estimation, most existing crowd density estimation networks use a fully-supervised or semi-supervised training method, the network model is trained through the ground truth generated by manual annotation, which requires a lot of manpower, material and financial resources,and in large-scale dense crowd images, interference factors such as low resolution, object occlusion, and scale changes make it difficult to label each pedestrian in the crowd.Therefore, determining how to trade off the accuracy of crowd density estimation and dataset labeling cost, and save the dataset labeling cost without losing counting accuracy becomes a challenge.

    The crowd density estimation method mainly obtains the number of crowds by extracting crowd information from the image.Existing crowd density estimation training methods mainly include fully-supervised methods [1]-[25] and semisupervised methods [26]-[39].The fully-supervised method is to obtain the ground truth by manually labeling each pedestrian in the image, and then training the network model through the ground truth, although this method shows high performance in crowd density estimation, it requires significant manpower, material and financial resources to label people in the image; the ground truth for the semi-supervised method is mainly divided into two types, that is, we mark all pedestrians in some images and mark some pedestrians in all images; this method is close to the fully-supervised method in crowd density estimation and shows good robustness, but this method still needs to label crowds in the image, and the training process is very cumbersome.Moreover, the problem faced by both fully-supervised and semi supervised methods is the limitation of the dataset.Then, the method for obtaining the distribution of the crowd changes, such with a change in the shooting perspective or the spatial distribution characteristics of the crowd, the ground truth obtained under the current labeling method, needs to be re-labeled, and the labeled ground truth will not be used to evaluate the counting performance during the test process.This means that the ground truth labeled for each pedestrian is redundant.To reduce the cost of manual labeling, weakly-supervised training methods are proposed, and the main difference between these methods and the fully-supervised and semi-supervised methods are that the weakly-supervised methods do not require any manual annotation of the crowd location information at all, while the fully-supervised and semi-supervised methods require manual annotation of all or part of the crowd location information.In fact, without the demand for locations, the crowd numbers can be obtained in other economical ways.For instance, with an already collected dataset, the crowd numbers can be obtained by gathering the environmental information, e.g., detection of disturbances in spaces, or estimation of the number of moving crowds.Chanet al.[40] segment the scene by crowd motions and estimate the crowd number by calculating the area of the segmented regions.To collect a novel counting dataset, we can employ sensor technology to obtain the crowd number in constrained scenes, such as mobile crowd sensing technology [41].Moreover, Shenget al.[42] propose a GPSless energy-efficient sensing scheduling to acquire the crowd number more economically.On the other hand, several approaches [43]-[46] prove that, with the estimated results,there is no tight bond between the crowd number and the location.The weakly-supervised labeling data in this paper, all of which were obtained from already collected datasets, use only the crowd’s quantity labeling and drop the location labeling information.

    However, although such weakly-supervised methods save the cost of dataset labeling, the ensuing problem is that the network does not know the characteristics of pedestrians at the beginning of the training process, due to the lack of the location information of the crowd as the training label, and the characteristics of pedestrians are learned only after several iterations, which leads to reduced sensitivity of the network to crowd features, and the convergence speed of the network becomes very slow, and the network model’s ability to fit features is substantially reduced, which affects the accuracy of crowd density estimation.Therefore, the weakly-supervised approach of simply removing the crowd location information saves the cost of labeling the dataset, but limits the performance of the network and does not fundamentally solve the problem.

    To solve the above problem, inspired by the optimal iterative learning control methods [47]-[49], reaction–diffusion neural networks [50] and the model latent factor analysis[51]-[54], we reconsider the training approach of the crowd density estimation model and also sample weakly-supervised data labels, i.e., we use only the number of pedestrians in the image as supervised.However, to compensate for the missing crowd location information and to improve the convergence speed of the network and the feature fitting ability, we designed a novel and effective training method, using a parent-child network with the same parameters to learn different features in the crowd, and then using a linear transformation to correct the information location of the features extracted by the parent network using hidden features learned by the child network, to accelerate the network’s ability to adapt to the features.Our training method, significantly improves the convergence speed of the network; the network performance as well as the counting accuracy, is not much different from the fullysupervised method, and, since the parent network has the same parameters as the child network, the increment of the number of parameters of the parent-child network model is very small compared to the number of parameters of the parent network, and the increase in the number of parameters is well within the acceptable range compared to the improved performance of the network.To address the above problems,this paper designs a crowd density estimation method based weakly-supervised learning, which trains the network by correlating between global and local image features to improve the performance of the network model.The main contributions of this paper are as follows:

    1) We design a weakly-supervised crowd density estimation method, which based on using only the number of crowd as supervised information without using location label supervision.It omits the manual labeling work without losing the crowd density estimation performance and greatly saves the cost of network training compared to existing fully-supervised methods.

    2) We design a novel and effective training approach by designing a parent-child network, which uses incremental learning, by the characteristic linear calibration structure to enhance the adaptability of the network to hidden features using transfer factors and offset weights.It improves the performance of weakly-supervised learning methods, and we verify its effectiveness in this task.

    3) We design a loss function that adds the error between the parent network features and child network features (L2) to the ground truth and predicted counting error (LC), and use gradient descent to optimize the features extracted by the parentchild network to accelerate the convergence speed of network training and improve the accuracy of crowd density estimation.

    II.RELATED WORK

    1)Fully-Supervised/Semi-Supervised Crowd Density Estimation Methods: With the development of big data, machine learning, and convolutional neural networks [55]-[61], a large number of convolutional neural network (CNN)-based crowd density estimation methods have been proposed.Basic CNN is first applied to crowd density estimation, such as CNNboosting [1], Wanget al.[2], these networks use basic CNN layers, including convolutional layer, pooling layer, fully connected layer, no additional feature information is required,which are simple and easy to implement, but the crowd estimation accuracy is low.Multi-column CNN is subsequently widely used, such as MCNN [3], MBTTBF [4], Multi-scale-CNN [5], CP-CNN [6], DADNet [7], these networks usually use different columns to capture multi-scale information.However the information captured by different columns is redundant and wastes many training resources.To solve the problem of redundant feature extraction by multi-column CNN, Single-column CNN is applied to crowd density estimation, such as CSRNet [8], SANet [9], SPN [10], CMSM [11],TEDnet [12], and IA-MFFCN [13].These networks usually deploy a single deeper CNN instead of the bloated structure of multi-column network architecture, do not increase the complexity of the network, and have higher training efficiency, so it has received extensive attention.However, with the development of the density map-based method, the background noise in the image seriously affects the display of the detailed information of the crowd distribution, how to filter out the background noise to highlight the crowd location information has become a challenge.

    Therefore, attentional mechanisms have been widely introduced into crowd density estimation tasks, and, attentional mechanisms can supplement the features extracted by the backbone network or the head network by providing the capability to encode distant dependencies or heterogeneous interactions to highlight the head position.ADcrowdNetp designs an attention image generation structure [14], attentional neural field (ANF) uses local and global self-attention to capture long-range dependencies [15], attention guided feature pyramid network (AP-FPN) proposes an attention guided feature pyramid network [16], which adaptively combines high-level and low-level features to generate high-quality density maps with accurate spatial location information, and multi-scale feature pyramid network (MFP-Net) designs a feature pyramid fusion module using different depth and scale convolution kernels [17] where the receptive field of CNN is expanded to improve the training speed of the network, PDANet uses a feature pyramid to extract crowd features of different scales to improve counting accuracy [18], and SPN uses the scale pyramid network to effectively capture multi-scale crowd characteristics [10], and obtain more comprehensive crowd characteristic information.Meanwhile, researchers have attempted to transfer Transformer models in the field of natural language processing to the task of crowd density estimation [19]-[23],[62]-[66].Transformer uses self-attention to capture the global dependency between input and output, where the advantage is that it is not limited by local interactions, can mine long-distance dependencies and can perform parallel calculations, where the most appropriate inductive bias can be learned according to different task objectives, thereby capturing the global context information of the image and modeling the dependencies between global features, which is a good solution to the limited receptive field of CNN, especially in the presence of uneven scales in dense crowds.In 2020, Dosovitskiyet al.[19] proposed the vision transformer (ViT)model, an image classification method based entirely on the self-attention mechanism, which is also the first work of Transformer to replace convolution.In 2021, Sunet al.[24]demonstrated the importance of global contextual information in the task of crowd density estimation.In 2021, TDCrowd combines ViT and density map to estimate the number of people in the crowd [25], which solves the problem of background noise interference in crowd density estimation, and improves the accuracy of crowd density estimation.

    However, the aforementioned CNN or ViT methods require a large number of labels for training, and labeling the crowd density estimation dataset is a laborious task.

    2)Weakly-Supervised Crowd Density Estimation Methods:To reduce the cost of labeling the dataset, some weakly-supervised crowd density estimation methods have been developed.In the weakly-supervised methods, there is no need to label any crowd location information, and image-level count labels are used as the weakly-supervised signal for training.In 2016,Borstelet al.[37] proposed a weakly-supervised density estimation method based on the Gaussian process, using the number of objects as the label to train the network, but this method partitions the image, so that different partitions will repeat the same target, causing the estimated number of targets to be higher than the actual number.In 2019, Maet al.[38] proposed a weakly-supervised density estimation method using Bayesian loss, which performs expectation calculation from the probability density map estimated from the network, and regresses to estimate the number of people in the crowd,which improves the counting efficiency under the weaklysupervised method.In 2019, Samet al.[36], designed an autoencoder to train the network in a weakly-supervised way,updating only a small number of parameters during training,in an attempt to achieve a nearly un-supervised method for crowd density estimation.In 2020, Yanget al.[39], proposed a network based on soft label ranking, which highlights the supervision of crowd size based on the original crowd density estimation network.In 2020, Samet al.[29], by matching statistics of the distribution of labels, proposed a weaklysupervised training method that does not use image-level location labeling information.To ease the overfitting problem, in 2019, Wanget al.[27] explores the generation of synthetic crowd images to reduce the burden of annotation and alleviate overfitting.With the application of ViT in the field of crowd density estimation, in 2021, TransCrowd applied ViT to crowd density estimation for the first time [21], and proposed a weakly-supervised counting method, which greatly improved the accuracy of crowd density estimation in the weakly-supervised mode, but was affected by the simple structure of the model, where extraction of features was limited.

    Compared with previous weakly-supervised methods, we proposed a weakly-supervised method based on linear calibration of parent-child network features, which can effectively reduce labeling cost during training, while maintaining stateof-the-art performance, achieving an optimal trade-off between crowd density estimation accuracy and dataset labeling cost.

    III.PROPOSED METHOD

    A. Overview of the Network Architecture

    To improve the convergence speed of the network under the weakly-supervised training method, we propose a parent-child network (PC-Net).It exploits the correlation between global and local features in images to enhance the network’s ability to fit the features by incrementally learning and continuously linearly correcting the features extracted by the network.The proposed PC-Net structure is shown in Fig.1.The PC-Net achieves a better balance between accuracy and training costs.Specifically, PC-Net is divided into two parts, the Parent network and Child network, which have the same backbone network.We design a pyramid vision transformer as the feature extraction backbone network to extract crowd features at different levels.In the process of network training, the Parent network learns crowd features through global images, while the Child network learns feature transfer factors and feature bias weights from local images.Then, the crowd features learned by the Parent network are corrected by a linear correction structure to obtain a feature map that contains richer and more accurate global contextual information.Meanwhile, during the training of the network, the Parent and Child networks are updated with the learned weights by gradient descent using different losses to improve the accuracy of the crowd density estimation.Finally, a 1 × 1 convolutional layer is used to output the final density map.In the following sections, we describe our framework in detail.

    B. Backbone Network

    In PC-Net, the subject network is divided into two parts,Parent-Net and Child-Net.In order to use incremental learning, linearly correcting the crowd features, Parent-Net and Child-Net have the same network structure.In order to adapt to the problem of scale variability existing in crowd images, a pyramid vision transformer feature extraction backbone network is designed in this paper to extract crowd features at different levels, as shown in Fig.1, while using a multi-scale window to restrict the calculation of the vision transformer’s self-attention mechanism to non-overlapping local regions,which improves computational efficiency.Since the vision transformer can not directly process 2D images, an image preprocessing process is required to convert 2D images into 1D image block sequences before the images are input to pyramid vision transformer.The process of image preprocessing and the structure of the pyramid vision transformer are shown as follows.

    1) Image Partition

    Before the image is input into the pyramid vision transformer, the 2D image is converted into a 1D image block sequence.To improve the computational efficiency, the input image is divided intoN×Nfixed windows, and the image in the window is divided into image blocks of fixed size, and the self-attention calculation is performed in each window, as shown in Fig.2.

    2)Pyramid Vision Transformer

    Fig.2.The process of the image partition.

    Fig.3.The structure of pyramid vision transformer.The feature map of each layer needs to be partitioned first to convert the 2D image into a 1D sequence,and then perform feature reshape on the processed 1D sequence to generate 2D features.

    When extracting multi-scale crowd features, a multi-layer pyramid vision transformer structure is used.Between layers,the scale of the feature map is controlled by a strategy of progressive shrinkage.Simultaneously, the scheme using multiscale windows restricts the self-attention calculation process to non-overlapping local windows, and expands the window layer by layer through cross-window connections, which improves the computational efficiency.The method in this paper designs a three-layer transformer-encoder structure, as shown in Fig.3.

    Specifically, the size of the input image isH×W× 3, the size of the output feature mapFiafter LayeriisHi×Wi×Ci,and the size of the image patch in LayeriisKi×Ki× 3, whereK1= 4,K2= 2,K3= 2, the number of Layeriwindows isNi×Ni, whereN1= 4,N2= 2,N3= 1, each window of Layericontainsimages patch, and linearly project the image patch into a 1D sequence and embed position information,after the transformer-encoder extracts features, visualize feature sequence rearrangement as feature maps, whereCiis less thanCi-1.The transformer-encoder of LayeriincludesLilayers twin multi-head attention mechanism (TMSA) and multilayer perceptron (MLP), whereL1= 2,L2= 6,L3= 2, and each layer is processed by layer normalization (LN) and residual connection.Before TMSA and MLP, LN is used to normalize the feature sequence, which makes the training process more stable and effectively avoids the problem of gradient disappearance or gradient explosion.And residual connection is used after TMSA and MLP, and the features processed by TMSA and MLP are superimposed with the features before processing to avoid the degradation problem of matrix weights in the network.The calculation process is as follows:

    MSA containsmself-attention (SA) modules.In each independent SA, input sequence, calculate the query (Q), key(K) and value (V) of the sequence, where the process is as follows:

    In the formula,WQ,K,Vare learnable matrices, and the outputs ofmself-attention modules are connected in series,which can be expressed as

    MLP contains two linear layers with the Gaussian error linear unit (GELU) activation function.This paper uses the GELU activation function of a standard normal distribution,as shown in (8)

    The first linear layer expands the dimension of the feature sequence from D to 4D, and the second linear layer shrinks the dimension of the feature sequence from 4D to D.

    C. Linear Feature Calibration

    In order to improve the convergence speed and feature fitting ability of the weakly-supervised crowd counting method during training, we propose a linear feature calibration structure.To achieve feature calibration and transfer between Parent-Net and Child-Net, we consider that the feature parameters of Parent-Net and Child-Net belong to the same linear spaceVn(nrepresents the number of channels of features).Each channel feature in the Child-Net can be transferred from the corresponding channel feature in the Parent-Net by a linear transformation.Fig.4 shows how the Child-Net feature’s parameters are transferred from the Parent-Net by a linear calibration.

    Fig.4.The process of the linear feature calibration.

    InFig.4, wedefinethechannelfeaturesinParent-NetasFP∈Rh×w×n(h,w,nrepresentthelength,width,andnumber of channels of the features, respectively), the feature transfer factors as α ∈R1×1×n, and the feature bias weights as β ∈R1×1×n, so the process of linear feature correction can be expressed as

    D. Loss Function

    In order to further strengthen the method proposed in this paper, we make full use of the correlation between the local and global crowd feature information to train the network, and improve the accuracy of crowd density estimation.The comprehensive loss function is designed, which consists ofLCloss function andL2loss function, as shown in (10)

    In the formula,LCis the counting loss of the PC-Net estimated number of people with the ground truth, andL2is the MSE loss between PC-Net predicted density map and parentnet predicted density map, where, theLCcounting loss can be expressed as

    In the formula,Ndenotes the number of images in the training set,FY(Xi,θ) denotes the estimated number of people obtained from theXi(i=1,...,N) images, andθdenotes a set of parameters that can be learned;Yidenotes the true number of people in theXi(i=1,...,N) images.L2loss can be expressed as

    In the formula,Ndenotes the number of images in the training set,Xirepresents theith image of the input,θdenotes a set of parameters that can be learned, andZ(Xi,θ) denotes the prediction result of PC-Net andZP(Xi,θ) denotes the prediction result of Parent-Net.

    E. Crowd Density Map Generation

    The crowd features extracted by PC-Net contains the location information of each pedestrian.We use a focal inverse distance transform (FIDT) to process the features to generate a visualized crowd density map [67].The specific process can be expressed as follows: if there areZpedestrian feature points in an image, the following processing is performed on the feature images:

    In (13),Zdenotes the set of all crowd feature points, and for any feature point (x,y), the Euclidean distanceP(x,y) is calculated with its nearest feature point (x′,y′).Since the distance between feature points varies greatly, it is difficult to perform distance regression directly, so the inverse function is used for regression, as shown in (14), whereIis the processing result of FIDT,Cis an additional constant, usually set to 1, to avoid division by 0 in the calculation process, andP(x,y) is exponentially processed to slow down the decay of the crowd head information, andIis visually displayed to generate a visual crowd density map.Finally, the predicted crowd density values are obtained by 2D integrating and summing the generated density maps.In the experiments,A= 0.02,B= 0.75 were set.

    IV.EXPERIMENTS

    A. Training Process

    In the training phase, one iteration updates parameters for two models.As shown in Fig.1, first, the data are fed into Parent-Net for training, and the global featureFPis optimized using the gradient descent method, as follows:

    In the formula, εdenotes the learning rate of Parent-Net, andLCis the count loss of the Parent-Net estimated number of crowd with Ground Truth.Second, we use the Linear Feature Calibration structure to transferFPchannel-by-channel into Child-Net to obtainFC, the process of transfer, as shown in(9).Since the transfer factorαand the bias weightβused in linear feature calibration need to be learned by Child-Net, we need to feed the local image data into Child-Net and optimizeFCwith the gradient descent method, as follows:

    In the formula, μdenotes the learning rate of Child-Net.Loos is the value of the integrated loss function designed in this paper.In the testing phase, we use the best-performing model on the test set to make an inference.

    B. Training Hyper-Parameter Settings

    During training we use the Adam optimizer, Batch_size is set to 16, the learning rate εin the Parent-Net and μ in the Child-Net are initialized as 0.0001, reduced by 0.5 times after every 50 epochs, where the GELU function is used as an activation function to improve the training speed and effectively avoid the disappearance and explosion of the gradient.We use l2 regularization of 0.0001 to avoid over-fitting.Since the images in the dataset have different resolutions, the resolution of all images is adjusted to 768 × 768.The experimental environment is shown in Table I.

    TABLE I EXPERIMENTAL ENVIRONMENT (TABLE I INTRODUCES THE EXPERIMENTAL ENVIRONMENT PARAMETERS FROM THE ASPECTS OF SYSTEM, FRAME, LANGUAGE, CPU, GPU AND RAM)

    C. Datasets

    In this work, extensive experiments are conducted on five crowd datasets of ShanghaiTech Part A, ShanghaiTech Part B,UCF_CC_50, UCF_QNRF and JHU-CROWD++.Unlike fully-supervised methods, only count-level labels are used as supervision information in the training process.Choose a representative crowd image on each dataset, as shown in Fig.5.The crowd images in each dataset have different degrees of uneven crowd scale variation.

    1)ShanghaiTech[3]: It has 1198 crowd images with a total of 330165 people.The dataset contains two parts, A and B.Part A includes 482 highly crowded crowd images, of which 300 form the training dataset and the remaining 182 form the testing dataset; Part B includes 716 relatively sparse crowd images, of which 400 images form the training dataset, and the remaining 316 images form the testing dataset.

    2)UCF_CC_50[68]: It has 50 crowd images, these images have different resolutions and different viewing angles.The number of pedestrians per crowd image varies from 94 to 4543, with an average of 1280 pedestrians per image.Due to the limited number of images in this dataset and the large span of the number of people in the image, five-fold cross-validation is used in this dataset.

    3)UCF_QNRF[69]: It has 1535 crowd images with a total of 12 500 people, of which 1201 form the training sample set and the remaining 334 form the test sample set.The number of pedestrians per crowd image varies from 49 to 12 865, with an average of 815 pedestrians per image.

    Fig.5.Crowd images from five crowd datasets.(a) From the ShanghaiTech Part A dataset; (b) From the ShanghaiTech Part B dataset; (c) From the UCF_CC_50 dataset; (d) From the UCF_QNRF dataset; (e) JHUCROWD++ dataset.

    4)JHU-CROWD++[70]: It is an unconstrained dataset with 4372 images that are collected under various weatherbased conditions such as rain, snow, etc.and contains 2722 training images, 500 validation images, and 1600 testing images.This dataset contains 1.5 million annotations at both image level and head-level.The total number of people in each image ranges from 0 to 25 791.

    D. Evaluation Metric

    In this paper, we use mean absolute error (MAE), mean squared error (MSE), and mean absolute percentage error(MAPE) as evaluation metrics for PC-Net performance.MAE is the average absolute value of the difference between the target and estimated densities, and it is the averageL1loss between the target and estimated densities.It can highlight outliers in the data, and its value is not affected by the influence of outliers, making it more robust in evaluating algorithm performance.MSE is the average squared value of the difference between the target density and the estimated density, and it is the averageL2loss between the target density and the estimated density, which can penalize larger error values.MSE usually magnifies the effect of squared error to make it easier to distinguish between models with larger error values.MAPE is a measure of the relative error between the estimated and actual values, which makes it easier to compare the variability of algorithms on different datasets, and it uses the percentage error to measure the prediction error, which is more convenient in practice, more intuitive, easy to explain.MAPE can avoid the problem of “mean squared error inflation” that tends to occur in MSE, i.e., when there are outlier values in the dataset, as the impact on MAPE is smaller.In summary, the three metrics MAE, MSE, and MAPE are chosen to evaluate the algorithm in this paper, which can well demonstrate the robustness as well as the accuracy of PC-Net.The calculation is shown as follows:

    Intheformula,Nrepresentsthenumberoftestimages,Cirepresentstheactualnumberofpeopleintheith image,andC?irepresents the estimated number of people in theith image.When the values of MAE, MSE and MAPE are smaller, the error between the estimated number of people and the actual number of people is smaller, indicating that the effect of the experiment is better.

    E. Experiment 1: Comparisons With State-of-the-Art Methods

    The ShanghaiTech dataset is a crowded and multi-scale dataset, to verify the counting performance of PC-Net.Experiments are performed on this dataset and compared with stateof-the-art methods, and the results of MAE, MSE and MAPE are given in Table II.The UCF_CC_50 dataset includes 50 grayscale images, where the images have different resolutions and viewing angles, which is a very challenging dataset with various crowd scenes and a limited total number of images;Therefore, five-fold cross-validation is performed to maximize the use of samples, and the dataset is randomly divided into 5 equal parts.Each part contains 10 images, four of which are used as the training dataset, and the remaining one is used as the testing dataset, where a total of five trainings and testings are performed.Finally the average value of the error index is taken as the final experimental result, and compared with state-of-the-art methods.The results of MAE, MSE and MAPE are given in Table II.UCF_QNRF dataset is also a crowded and multi-scale dataset, which is collected from three different datasets and includes various scenes around the world.The total number of images and the total number of people far exceed the first three datasets, and compared with state-of-the-art methods, the results of MAE, MSE and MAPE are given in Table II.JUU-CROWD++ is a super large dataset, which contains crowd images under various complex weather conditions.Compared with state-of-the-art methods,the results of MAE, MSE and MAPE are given in Table II.

    1)Performance on the ShanghaiTech Dataset: In this paper,PC-Net is compared with state-of-the-art methods, and the results are show shown in Table II, where we divide these methods into two groups.The first group is the fully-supervised methods, which uses location information and population number information as supervised information.The second group is the weakly-supervised methods, which uses only population number information as supervised information.According to Table II, PC-Net is very competitive with the first group.Although MAE, MSE, and MAPE do not achieve the optimal results, they are more advantageous than most of the fully-supervised methods such as GL, LW-Count, etc.,PC-Net largely closes the gap in counting performance between weakly-supervised methods and fully-supervised methods, and its labeling cost is much lower than that of fullysupervised methods.The advantage of PC-Net over the second group is more obvious, as MAE, MSE and MAPE are better than the existing weakly-supervised methods.On Part A,MAE, MSE and MAPE are improved by 11.2%, 14.8% and 12.5%, respectively, and on Part B, MAE, MSE and MAPE are improved by 21.5%, 35.4% and 22.2%, respectively.Thus,it is demonstrated that PC-Net can achieve the best density estimation performance with a weakly-supervised training mode by training with feature linear correction.Figs.6(a) and 6(b) shows some visualization results of PC-Net on Part A and Part B datasets.

    TABLE II COMPARISON OF PC-NET AND THE STATE-OF-THE-ART METHODS ON THE SHANGHAITECH, UCF_CC_50, UCF-QNRF AND JHU_CROWD++ DATASETS.L DENOTES THE TRAINING LABEL CONTAINS LOCATION INFORMATION, AND C DENOTES THE TRAINING LABEL CONTAINS POPULATION NUMBER INFORMATION.RED AND BLUE INDICATE THE FIRST AND THE SECOND-BEST PERFORMANCES, RESPECTIVELY

    It can be seen that PC-Net performs well on two datasets,generating accurately distributed density maps with high resolution, and the prediction results are close to the true values.Comparing Figs.6(a) and 6(b), the ShanghaiTech Part A dataset is extremely crowded and has little change in crowd scale, while the ShanghaiTech Part B dataset is relatively sparse but has large change in crowd scale, which indicates that PC-Net can be a good fit for different degrees of crowd scale changes.The third column of Fig.6, gives the heat map of the Parent-Net output, and we use the red box to mark out the obvious misidentification or omission of identification.It can be seen that extracting the crowd features using only Parent-Net can easily produce misidentification of crowd features.The process of crowd feature correction and transfer, on the other hand, corrects the location information of the crowd well, which further compensates for the lack of crowd location information under the weakly-supervised crowd counting method and further improves the accuracy of the crowd counting.

    2)Performance on the UCF_CC_50 Dataset: According to Table II, under the weakly-supervised training, compared to the second group, PC-Net outperforms other weakly-supervised methods on the UCF_CC_50 dataset, with MAE, MSE and MAPE improving by 38.8%, 43.7% and 46.3%, respectively, which proves the superiority of PC-Net.However,compared with the first group, PC-Net has obvious shortcomings, probably because the data in this dataset is limited and the number of people included in the images spans a relatively large range.The prediction results are not stable enough, and there are a small number of images with large errors, which leads to a decrease in the performance of the method.Fig.7 shows some visualization results of PC-Net on UCF_CC_50 dataset.

    Fig.6.Visualization results of the density maps on (a) ShanghaiTech Part A and (b) ShanghaiTech Part B, respectively.

    In the second column of Fig.7, the crowd density map generated by PC-Net is given, it can be seen that PC-Net can make good predictions and generate accurate density maps in crowded scenarios with variable scales, and the generated density maps have different sparsity for different scales of crowds, but the estimated values have some errors relative to the real values, such as the first set of images, which are a small number of images with large errors in the test of this paper.It is possible that the low brightness of the image is affecting the counting performance of the network.To further evaluate the visualized crowd density images, we manually label several samples containing crowd locations and perform a visual display of crowd locations, as shown in the third column of Fig.7.A new set of evaluation metrics, structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), were also used to evaluate the generated crowd density maps with labeled density maps, which compensate for the shortcomings of the one-dimensional evaluation metrics such as MAE and MSE.The experimental results show that PC-Net can fit the location information of the crowd well, and although there are some location errors, they are within the acceptable range.To summarize, PC-Net’s counting performance is slightly insufficient in the face of extremely crowded crowds, so more data is needed for training to improve the accuracy of the model on extremely crowded datasets.

    Fig.7.Visualization results of the density maps on UCF_CC_50.

    3)Performance on the UCF_QNRF Dataset: According to Table II, compared with the second group of methods, in the weakly-supervised mode, the MAE, MSE and MAPE of PCNet improved by 12.8%, 11.6% and 13.4%, respectively,which indicates a significant improvement in the prediction effect.PC-Net achieved optimal counting accuracy on this dataset and showed excellent robustness.Compared with the first group of methods, PC-Net also outperforms some of the fully-supervised training methods, such as L2R and TEDnet,etc., further narrowing the gap in counting performance between weakly-supervised training methods and fully-supervised training methods, and comparing some of the most advanced crowd density estimation methods, PC-Net greatly reduces the injection cost of the dataset label, although its performance is slightly worse.Fig.8 shows some visualization results of PC-Net on the UCF_QNRF dataset.

    Fig.8.Visualization results of the density maps on UCF_QNRF.

    It can be seen that PC-Net has a good ability to fit the crowd of different scales in the first image of Fig.8, and generates an accurate and high resolution density map, which reflects that PC-Net has a good ability to solve the problem of drastic changes in the scale of the crowd.PC-Net also generates an accurate density map for the denser crowd in the second image, but there is a certain error in the estimated value relative to the real value, which is a small number of images with large errors in the test of this paper’s method, probably because the difference in lighting interferes with the counting accuracy, and more training is needed in the next step to improve the robustness of the model and exclude large errors.

    4)Performance on the JHU-CROWD++Dataset: According to Table II, PC-Net has a great advantage over both the first and second group of methods, and it is superior to the weakly-supervised methods, such as the advanced method TransCrowd.In addition, compared with fully-supervised methods, such as MCNN and CSRNet, the counting accuracy of PC-Net has been significantly improved on this dataset, and MAE, MSE and MAPE all achieved the second best performance, which proves the effectiveness of our method.Fig.9 shows some visualization results of PC-Net on JHUCROWD++ dataset, including the plots of crowd density in rainy and snowy days.It can be seen that PC-Net can better process the crowd images under the deteriorating weather conditions.

    F. Experiment 2: Actual Experiment

    In order to test the performance of PC-Net in practical applications, we conducted experiments in several real scenarios.To ensure the applicability and universality of the experiments, images taken by cameras on campuses, subway stations and city roads were randomly selected as test set, The test set contains a total of 400 images with more than 10 scenes, each containing a number of people ranging from 0 to 2000, all with a resolution of 768 × 768, and these data generally have uneven scales, background noise and other common factors that affect the accuracy of crowd density estimation.We conducted multiple groups of experiments and took the average value as the result of the test, and the experimental results are shown in Table III.Fig.10 shows some visualization results of actual experiment.

    Fig.9.Visualization results of the density maps on JHU-CROWD++.

    TABLE III COMPARISON OF PC-NET AND THE OTHER METHODS ON THE RANDOM DATASET

    Fig.10.Visualization results of the density maps of the actual experiment.

    It can be seen that the test results of PC-Net on the unfamiliar dataset still outperformed the compared algorithms, and the MAE, MSE, and MAPE all obtained the optimal results.Here we randomly selected the visualization results of four scenes, and we can see that PC-Net has some adaptability to scenes that have never been seen before, and can also generate accurate and high-resolution crowd density maps, and the predicted crowd density is within an acceptable error range compared with the real crowd density.However, the results of the multi-scene test reveal the problem where the migration of PC-Net for multiple scenes is slightly insufficient, such as the third and fourth group of images.The error of the crowd density in this scene is obviously slightly larger, and the main problem is the poor adaptability of PC-Net for this scene.Therefore, PC-Net needs to increase the training sample and test in multiple scenes to adjust its model parameters and also increase its adaptability to multiple scenes.

    V.DISCUSSION

    A. Study of Training Hyper-Parameter Settings

    In the training process of the network, the selection of the initial training hyper-parameter is crucial to the success of the network training.Setting good parameters can help avoid gradient disappearance or gradient explosion during the network training process, and make the neural network learn the features of the data more quickly and accurately, and improve the training effect and generalization ability of the model.In order to determine the optimal initialization parameters, we discussed the effects of different Batch_size, learning rate, activation function and optimizer on the performance of PC-Net in ShanghaiTech Part A.The experimental results are shown in Fig.11.

    Fig.11.Visualization results of the study of different initialization hyperparameter settings.(a) Denotes the MAE values for different Batch_size; (b)Denotes the MAE values for different Learning rate; (c) Denotes the MAE values for different Activation function; (d) Denotes the MAE values for different Optimizer.

    It can be seen that PC-Net is more sensitive to Batch_size and learning rate during the training process.As the Batch_size increases, the parallel performance of GPU is fully utilized, thus speeding up the training of the model.A larger Batch_size requires more memory storage, and a larger Batch_size may lead to overfitting because the model is more likely to memorize a larger Batch_size and thus fail to learn the overall features of the input data; therefore, on balance, we set the Batch_size to 16.Due to the complexity of the crowd density estimation task and the depth of PC-Net layers, we consider setting a smaller initial learning rate in order to avoid unstable or scattered training.The experimental results show that optimal model performance is achieved when the initial learning rate is set to 0.0001.For the activation function and the optimizer, the experimental results show that PC-Net is less sensitive to them.We compared five activation functions(GELU, Sigmoid, ReLU, Tanh, Softmax) and three optimizers (SGD, Adam, Momentum).The experimental results show that PC-Net achieves optimal results when GELU is used as the activation function and Adam is used as the optimizer.In summary, we set the Batch_size to 16, set the initial learning rate to 0.0001, and use GELU as the activation function and Adam optimizer at the beginning of the training.

    B. Study of Backbone Network

    With CNN-based deep learning, because CNNs have a small receptive field, this limits the upper limit of the global feature extraction range of the network.Therefore, CNNbased methods are more capable of extracting local crowd information in small intervals, but are not enough for global crowd information extraction of the whole image, which makes it difficult for CNN-based methods to establish global context features.However, ViT has the advantage of capturing long context dependencies and a global receptive domain,which is a good remedy for this deficiency of CNN.We calculated effective receptive fields for both VGG network and ViT.Specifically, we measure the effective receptive field of different layers as the absolute value of the gradient of the center location of the feature map with respect to the input.Results are averaged across all channels in each map for 16 randomly selected images, with results in Fig.12.

    Fig.12.Visualization results of the effective receptive fields for VGG and ViT.

    We observe that lower layer effective receptive fields for ViT are indeed larger than in VGG, and while VGG effective receptive fields grow gradually, ViT receptive fields become much more global midway through the network.ViT receptive fields also show strong dependence on their center patch due to their strong residual connections.Overall, VGG effective receptive fields are highly local and grow gradually, ViT effective receptive fields shift from local to global.To further verify the superiority of the performance of pyramid vision transformer, we conducted an ablation study using the first 10 layers of VGG16 replacing pyramid vision transformer as the backbone network of PC-Net, keeping the other structures the same; the results are shown in Table IV.

    As can be seen, the performance of the pyramid vision transformer is significantly better than that of VGG.On the Part A dataset, MAE, MSE and MAPE are improved by 5.6%,2.2% and 6.3%, respectively.On the Part B dataset, MAE,MSE and MAPE are improved by 23.2%, 27.3% and 24.1%,respectively.On UCF_CC_50 dataset, MAE, MSE and MAPE are improved by 10.6%, 9.8% and 14.9%, respectively.On UCF_QNRF dataset, MAE, MSE and MAPE are improved by 13.8%, 11.0% and 15.3%, respectively.On the JHUCROWD++ dataset MAE, MSE and MAPE are improved by 23.7%, 26.4% and 27.6%, respectively.This is further proof of the superiority of PC-Net’s performance.

    TABLE IV RESULTS OF BACKBONE NETWORK ABLATION STUDY

    TABLE V RESULTS OF PYRAMID VISION TRANSFORMER ABLATION STUDY

    C. Study of Pyramid Vision Transformer

    The pyramid vision transformer structure proposed in this paper consists of three layers of ViT; to verify its rationality,ablation experiments were conducted on five datasets, keeping the other structures the same in the experiments to test the performance of the pyramid vision transformer under different configurations.The results are shown in Table V, where L* represents the number of layers of ViT in pyramid vision transformer.

    As can be seen, the performance of PC-Net improves as the first three layers of ViT are stacked in the pyramid vision transformer, but when the ViT is stacked to the 4th layer, the performance of the model is almost the same as the 3-layer ViT, and even some metrics appear to decrease; when the number of layers of ViT continues to increase to the 5th layer,the performance of the model starts to decrease rapidly.We believe that as the depth of the network increases, the gradients in the backpropagation may become very small, leading to the gradient vanishing problem, or the gradients become very large, leading to the gradient exploding problem.These problems can make the training process difficult and make convergence impossible.Moreover, as the depth of the network increases, the number of parameters in the network increases exponentially, which can over-fit the network and make it unable to generalize to new datasets, reducing the generalization ability of the network.Therefore, we take the above factors into consideration and set the number of layers of ViT in pyramid vision transformer as 3.

    D. Study of Linear Feature Calibration

    In this paper, we propose a new training method using a linear feature calibration to train the network through incremental learning, which utilizes the correlation between global and local image features.To verify its effectiveness, we tested the convergence speed of the network under different supervision methods on the ShanghaiTech dataset, and the results are shown in Fig.13.

    Fig.13.Convergence speed of networks under different supervision methods.The abscissa is the training Epochs, and the ordinate is the loss value during training.The three training methods sample the same backbone network,which is the backbone network proposed in this paper.

    Here, the “weakly-supervised” training method means that instead of using the linear feature calibration structure proposed in this paper, a channel attention fusion approach is used, and the features extracted from the Parent-Net and Child-Net are weighted and fused.It can be seen that the convergence speed and the fitting ability of our proposed the training method are clearly better than those of the “weaklysupervised” training method.However, it can be seen that,compared with the fully-supervised method, the convergence stability of PC-Net is poor during the training process.The reason is that during the training process, there is uncertainty in the sample labels, which increases the learning difficulty of the model, and the model may be affected by noise and learn the wrong features, resulting in overfitting or under fitting,causing the model to converge unstably.We believe that we can try to use a nonlinear feature correction process to increase the stability of the training process.

    E. Study of Loss function

    The loss function is very important in the training process of the network, and using different loss functions will have a great impact on the performance of the model regression,therefore, a comprehensive loss function is designed and the weight ofL2andLCis adjusted by loss weights.To obtain the optimal loss function, experiments are conducted on the ShanghaiTech and UCF_QNRF datasets, and the values of?is discussed.The results are shown in Fig.14.

    As can be seen, different weight loss functions have an impact on the performance of the network model, and MAE and MSE change with ?, showing a trend of decreasing and then increasing, which proves the rationality of the two-part loss function.The optimal values of MAE and MSE were obtained at ?=0.6, which proved the improvement of the comprehensive loss function on the network performance.

    F. Study of Network Parameters

    To analyze the parameter complexity and time complexity of PC-Net, we compared MAE, Params, and inference time on the ShanghaiTech dataset, and the experimental results are shown in Table VI.

    As can be seen, the advantage of PC-Net is that it uses a weakly-supervised training method, which reduces the training cost; the MAE as well as the density estimation performance is good, however, the number of parameters is slightly larger and the required inference time is longer.Therefore, the performance of PC-Net suffers and the density estimation accuracy decreases when applied to devices with limited computational resources, such as embedded devices.Therefore, in future work, we consider studying a lightweight method based on PC-Net to analyze the parameter bottleneck layer in PCNet, find the part of the network that consumes the most time and computational resources, and compress it to improve the training and application of the network.

    VI.CONCLUSION AND FUTURE WORK

    Fig.14.MAE and MSE in ShanghaiTech Part A and UCF_QNRF datasets under different counting loss weights.

    TABLE VI VI COMPARISON OF THE PARAMS, MAE AND RUNNING TIME OF PC-NET AND OTHER METHODS ON THE SHANGHAITECH DATASET

    In this paper, an effective weakly-supervised crowd density estimation method is proposed and a novel training method is used to achieve an optimal balance between training costs and counting performance.The network mainly consists of a pair of parent-child networks and a linear feature calibration structure.Specifically, the parent network is used to extract the crowd features, the child network is used to extract the feature correction parameters and bias weights, and the features are calibrated using the linear feature calibration structure to improve the convergence speed as well as the fitting ability of the network.In addition, a pyramid vision transformer is used as the backbone network of the PC-Net to solve the problem of uneven scale in the crowd, while the spatial correlation and crowd sensitivity of density map are enhanced by global-local feature loss and counting loss.

    In future work, we will study a crowd counting and positioning method based on PC-Net, which can not only achieve a better personal positioning function and counting accuracy,but also the number of parameters is smaller and more stable during training.

    国产色爽女视频免费观看| 人妻一区二区av| 在线精品无人区一区二区三| 人妻夜夜爽99麻豆av| 精华霜和精华液先用哪个| 亚洲欧洲国产日韩| 色哟哟·www| 六月丁香七月| 妹子高潮喷水视频| 亚洲国产精品国产精品| 夜夜骑夜夜射夜夜干| 高清av免费在线| 大陆偷拍与自拍| 亚洲性久久影院| 国产午夜精品一二区理论片| 国产淫语在线视频| 国产欧美日韩精品一区二区| 久久狼人影院| h视频一区二区三区| 97在线人人人人妻| 亚洲国产精品国产精品| 中文字幕人妻熟人妻熟丝袜美| 高清毛片免费看| 中文字幕制服av| 久久精品国产a三级三级三级| 日本91视频免费播放| 国产精品麻豆人妻色哟哟久久| 日韩在线高清观看一区二区三区| 亚洲av中文av极速乱| 日韩三级伦理在线观看| 男女免费视频国产| 国内揄拍国产精品人妻在线| 高清不卡的av网站| 在线亚洲精品国产二区图片欧美 | 日韩精品免费视频一区二区三区 | 国产亚洲一区二区精品| 亚洲成人一二三区av| 极品少妇高潮喷水抽搐| 九草在线视频观看| 成人毛片60女人毛片免费| 午夜av观看不卡| 久久久国产一区二区| 99九九在线精品视频 | 精品国产乱码久久久久久小说| 亚洲精品国产色婷婷电影| 麻豆成人午夜福利视频| 亚洲丝袜综合中文字幕| 国产永久视频网站| 搡老乐熟女国产| 麻豆成人午夜福利视频| 80岁老熟妇乱子伦牲交| 日韩一本色道免费dvd| 黑人高潮一二区| 18禁在线无遮挡免费观看视频| 超碰97精品在线观看| 另类亚洲欧美激情| 80岁老熟妇乱子伦牲交| 国产精品久久久久久久电影| 亚洲真实伦在线观看| 日韩中文字幕视频在线看片| 精品久久久精品久久久| 成人美女网站在线观看视频| 日本-黄色视频高清免费观看| 九九久久精品国产亚洲av麻豆| 视频中文字幕在线观看| 建设人人有责人人尽责人人享有的| 丝瓜视频免费看黄片| 日本黄色日本黄色录像| av免费观看日本| 成人漫画全彩无遮挡| 欧美xxxx性猛交bbbb| 又黄又爽又刺激的免费视频.| 久久久久久久久久人人人人人人| 午夜日本视频在线| 午夜久久久在线观看| 一区二区三区免费毛片| 内地一区二区视频在线| 国产高清不卡午夜福利| 99热网站在线观看| 国国产精品蜜臀av免费| 边亲边吃奶的免费视频| 天美传媒精品一区二区| 国产一区二区在线观看日韩| 免费高清在线观看视频在线观看| 精品国产露脸久久av麻豆| 美女脱内裤让男人舔精品视频| 欧美日韩一区二区视频在线观看视频在线| 日本黄大片高清| 亚洲精品aⅴ在线观看| 免费高清在线观看视频在线观看| 欧美日韩视频精品一区| 又大又黄又爽视频免费| 一本色道久久久久久精品综合| 一边亲一边摸免费视频| 亚洲情色 制服丝袜| 18禁在线无遮挡免费观看视频| 精品卡一卡二卡四卡免费| 99热全是精品| 久久精品国产亚洲av涩爱| 日本与韩国留学比较| 狂野欧美激情性xxxx在线观看| 国产精品99久久99久久久不卡 | 久久久久久久久久久丰满| 国产高清有码在线观看视频| 日韩欧美精品免费久久| 人妻少妇偷人精品九色| 亚洲国产成人一精品久久久| 欧美变态另类bdsm刘玥| 蜜桃在线观看..| 成人免费观看视频高清| 免费观看av网站的网址| 久久久午夜欧美精品| 人人妻人人澡人人看| 国产男女内射视频| 人人澡人人妻人| 精品一区在线观看国产| 国精品久久久久久国模美| 下体分泌物呈黄色| 婷婷色av中文字幕| av黄色大香蕉| a级一级毛片免费在线观看| 色5月婷婷丁香| 九九在线视频观看精品| 午夜福利网站1000一区二区三区| 免费人妻精品一区二区三区视频| 日韩中字成人| 99久久中文字幕三级久久日本| 国产高清国产精品国产三级| 少妇猛男粗大的猛烈进出视频| 老司机影院成人| 欧美丝袜亚洲另类| 国产精品国产av在线观看| 人体艺术视频欧美日本| 观看av在线不卡| 欧美精品国产亚洲| 日本爱情动作片www.在线观看| 热re99久久国产66热| 亚洲精品国产av成人精品| 国产在视频线精品| 亚洲av免费高清在线观看| 嘟嘟电影网在线观看| 国产爽快片一区二区三区| 久久国产精品男人的天堂亚洲 | 午夜福利视频精品| 久久久久网色| 青春草国产在线视频| av线在线观看网站| 日韩电影二区| 三级经典国产精品| 精品人妻偷拍中文字幕| 日本av免费视频播放| 久久久国产一区二区| 久久6这里有精品| 国产精品国产三级国产专区5o| 伊人久久国产一区二区| 欧美三级亚洲精品| 久久影院123| 啦啦啦啦在线视频资源| 久久国产精品大桥未久av | 亚洲三级黄色毛片| 国产午夜精品久久久久久一区二区三区| 人人妻人人添人人爽欧美一区卜| 免费黄频网站在线观看国产| 亚洲精品视频女| 久久精品国产亚洲av涩爱| 在线观看免费日韩欧美大片 | 韩国高清视频一区二区三区| 99视频精品全部免费 在线| 各种免费的搞黄视频| av女优亚洲男人天堂| 久久久欧美国产精品| 国产精品成人在线| 最后的刺客免费高清国语| 久久久久久久亚洲中文字幕| 如日韩欧美国产精品一区二区三区 | 欧美精品一区二区免费开放| 日韩欧美精品免费久久| 日日摸夜夜添夜夜添av毛片| 菩萨蛮人人尽说江南好唐韦庄| 欧美精品国产亚洲| 国产亚洲欧美精品永久| 日韩亚洲欧美综合| 在线免费观看不下载黄p国产| 午夜影院在线不卡| .国产精品久久| 精品国产乱码久久久久久小说| 亚洲欧洲精品一区二区精品久久久 | 国产精品久久久久久精品古装| 久久亚洲国产成人精品v| 男女边摸边吃奶| 日韩大片免费观看网站| 亚洲综合精品二区| 熟妇人妻不卡中文字幕| 久久毛片免费看一区二区三区| av有码第一页| 亚洲精品aⅴ在线观看| 极品人妻少妇av视频| 国产精品女同一区二区软件| 黄色视频在线播放观看不卡| 欧美3d第一页| 久久狼人影院| 精品国产露脸久久av麻豆| 精品久久久久久久久亚洲| 丰满乱子伦码专区| 欧美 亚洲 国产 日韩一| 纯流量卡能插随身wifi吗| 中文资源天堂在线| 欧美精品人与动牲交sv欧美| 爱豆传媒免费全集在线观看| 美女内射精品一级片tv| 久久精品国产a三级三级三级| 日本黄大片高清| 99九九线精品视频在线观看视频| 国产一区有黄有色的免费视频| 2022亚洲国产成人精品| 久久国内精品自在自线图片| 在线精品无人区一区二区三| 在线观看三级黄色| 国产精品蜜桃在线观看| 少妇高潮的动态图| 少妇的逼好多水| 嘟嘟电影网在线观看| 国产精品久久久久久久电影| 久久99热这里只频精品6学生| 免费黄频网站在线观看国产| 免费观看a级毛片全部| 午夜91福利影院| 日韩一区二区视频免费看| 成人黄色视频免费在线看| 一区二区三区免费毛片| 国产精品99久久久久久久久| 亚州av有码| 一区二区三区免费毛片| 久久国内精品自在自线图片| 免费播放大片免费观看视频在线观看| 久久久久精品久久久久真实原创| 久久久国产一区二区| 国产综合精华液| 在线观看www视频免费| 国产日韩欧美在线精品| 久久女婷五月综合色啪小说| 日韩一本色道免费dvd| 多毛熟女@视频| 国模一区二区三区四区视频| 亚洲精品第二区| 国产日韩欧美亚洲二区| 少妇被粗大的猛进出69影院 | 亚洲欧洲日产国产| 国产精品久久久久久精品电影小说| 99久久人妻综合| 国产成人精品福利久久| 国产精品偷伦视频观看了| 国国产精品蜜臀av免费| 亚洲国产精品成人久久小说| 国产色婷婷99| 欧美xxⅹ黑人| 五月玫瑰六月丁香| 老司机影院成人| 国产成人精品一,二区| 99精国产麻豆久久婷婷| 综合色丁香网| 又黄又爽又刺激的免费视频.| 国产精品国产三级专区第一集| 精品少妇内射三级| 久久av网站| 亚洲精品乱久久久久久| 国产精品伦人一区二区| 永久免费av网站大全| 一级毛片久久久久久久久女| 欧美97在线视频| 久久国产乱子免费精品| 久久精品夜色国产| 国产男女超爽视频在线观看| 9色porny在线观看| 成年女人在线观看亚洲视频| 亚洲伊人久久精品综合| 免费看日本二区| 交换朋友夫妻互换小说| 一本久久精品| 久久精品国产亚洲av涩爱| 国产亚洲91精品色在线| 亚洲精品乱码久久久v下载方式| 亚洲精品视频女| 一级毛片我不卡| 少妇被粗大的猛进出69影院 | 国产精品久久久久久久电影| 嫩草影院新地址| h视频一区二区三区| 国产一区二区在线观看av| 国产精品一二三区在线看| 亚洲国产精品专区欧美| 韩国av在线不卡| 天堂中文最新版在线下载| 中文字幕av电影在线播放| 久久婷婷青草| 99久久中文字幕三级久久日本| 在线观看www视频免费| 寂寞人妻少妇视频99o| 91精品国产国语对白视频| kizo精华| 午夜激情久久久久久久| 极品少妇高潮喷水抽搐| 亚洲欧洲国产日韩| 香蕉精品网在线| 欧美精品一区二区大全| 精品少妇内射三级| 99re6热这里在线精品视频| 有码 亚洲区| 久久精品国产亚洲av天美| 亚洲精品,欧美精品| 久久久久网色| 成年美女黄网站色视频大全免费 | 99热这里只有是精品在线观看| 日本黄大片高清| 亚洲高清免费不卡视频| 亚洲性久久影院| 精品久久久久久久久av| 少妇精品久久久久久久| 少妇人妻 视频| a级毛色黄片| 制服丝袜香蕉在线| 99九九在线精品视频 | 久久国内精品自在自线图片| 精品国产一区二区三区久久久樱花| 免费大片18禁| 日韩成人伦理影院| 日本猛色少妇xxxxx猛交久久| 亚洲不卡免费看| 在线观看免费日韩欧美大片 | 国产午夜精品一二区理论片| 精品久久久久久久久av| 大又大粗又爽又黄少妇毛片口| 草草在线视频免费看| 亚洲精品456在线播放app| 国产永久视频网站| 欧美日韩av久久| 精品一区二区三区视频在线| 一级爰片在线观看| 日日撸夜夜添| 伊人久久国产一区二区| 菩萨蛮人人尽说江南好唐韦庄| 91精品国产九色| 汤姆久久久久久久影院中文字幕| 免费黄色在线免费观看| 成人毛片60女人毛片免费| 亚洲va在线va天堂va国产| 性高湖久久久久久久久免费观看| 嫩草影院新地址| 91精品一卡2卡3卡4卡| 一级a做视频免费观看| 精品亚洲成国产av| 极品人妻少妇av视频| 亚洲国产成人一精品久久久| 3wmmmm亚洲av在线观看| 亚洲av不卡在线观看| freevideosex欧美| 99热6这里只有精品| 男女免费视频国产| 一本一本综合久久| 肉色欧美久久久久久久蜜桃| 中文字幕精品免费在线观看视频 | 另类亚洲欧美激情| 精品一区在线观看国产| 秋霞伦理黄片| 日韩av免费高清视频| 晚上一个人看的免费电影| av黄色大香蕉| 亚洲欧美日韩东京热| 亚洲久久久国产精品| 最新的欧美精品一区二区| 99久久精品热视频| 久久 成人 亚洲| 亚洲四区av| 免费少妇av软件| 国产伦精品一区二区三区视频9| 欧美日韩亚洲高清精品| 观看av在线不卡| 久久99热6这里只有精品| 我的老师免费观看完整版| 国产综合精华液| 伊人久久国产一区二区| 欧美区成人在线视频| 国内揄拍国产精品人妻在线| 成人无遮挡网站| 嘟嘟电影网在线观看| 人体艺术视频欧美日本| 亚洲精品国产色婷婷电影| 久久久久久人妻| av网站免费在线观看视频| 天美传媒精品一区二区| 又黄又爽又刺激的免费视频.| 蜜臀久久99精品久久宅男| 亚洲成人手机| 99久久精品一区二区三区| 少妇人妻一区二区三区视频| 美女cb高潮喷水在线观看| 国产精品人妻久久久久久| 99热网站在线观看| 久久久久国产精品人妻一区二区| 高清黄色对白视频在线免费看 | 欧美日韩一区二区视频在线观看视频在线| 久久国内精品自在自线图片| 久久99一区二区三区| 国产免费一区二区三区四区乱码| 免费av不卡在线播放| 91久久精品国产一区二区成人| 免费大片黄手机在线观看| 综合色丁香网| 九草在线视频观看| 91精品国产国语对白视频| 国产精品一区二区三区四区免费观看| 校园人妻丝袜中文字幕| 精品酒店卫生间| 国产高清有码在线观看视频| 日韩熟女老妇一区二区性免费视频| 国产免费视频播放在线视频| 久久国产精品大桥未久av | 天天躁夜夜躁狠狠久久av| 国产精品一区www在线观看| 两个人的视频大全免费| 51国产日韩欧美| 日韩人妻高清精品专区| 国产日韩欧美视频二区| 国产 一区精品| 欧美日韩亚洲高清精品| 亚洲怡红院男人天堂| 晚上一个人看的免费电影| 精品国产乱码久久久久久小说| 久久久久久久国产电影| 色婷婷av一区二区三区视频| 在线观看一区二区三区激情| 精品久久国产蜜桃| 两个人免费观看高清视频 | 国产白丝娇喘喷水9色精品| 日日爽夜夜爽网站| 久久久久久久久久人人人人人人| 久久国内精品自在自线图片| 欧美精品人与动牲交sv欧美| 国产精品国产三级国产专区5o| 街头女战士在线观看网站| 国产精品伦人一区二区| 亚洲欧美成人综合另类久久久| 中文字幕精品免费在线观看视频 | 亚洲国产精品999| 亚洲精品国产成人久久av| 日韩精品免费视频一区二区三区 | 国产亚洲5aaaaa淫片| 性色av一级| 如日韩欧美国产精品一区二区三区 | 欧美日韩视频高清一区二区三区二| 两个人免费观看高清视频 | 日韩熟女老妇一区二区性免费视频| 美女内射精品一级片tv| 久久人妻熟女aⅴ| 国产淫片久久久久久久久| 久久久亚洲精品成人影院| 久久精品夜色国产| 成人黄色视频免费在线看| 狠狠精品人妻久久久久久综合| 日韩制服骚丝袜av| 国产色爽女视频免费观看| 久久久国产一区二区| 国产午夜精品一二区理论片| 日本爱情动作片www.在线观看| 国产精品.久久久| 99久久综合免费| 日本免费在线观看一区| a 毛片基地| 欧美亚洲 丝袜 人妻 在线| 久久精品久久久久久久性| 日韩中字成人| 涩涩av久久男人的天堂| 2018国产大陆天天弄谢| 尾随美女入室| 男男h啪啪无遮挡| 亚洲欧美日韩东京热| 搡老乐熟女国产| 视频中文字幕在线观看| 如何舔出高潮| 国产精品久久久久久久电影| 国精品久久久久久国模美| 在线播放无遮挡| 日韩三级伦理在线观看| 91精品国产九色| 日韩人妻高清精品专区| 国产精品久久久久成人av| 十八禁网站网址无遮挡 | 中文字幕制服av| 日韩人妻高清精品专区| 精品久久久精品久久久| a级片在线免费高清观看视频| 日韩中文字幕视频在线看片| 九九爱精品视频在线观看| 国产精品福利在线免费观看| 六月丁香七月| 国产一区二区三区综合在线观看 | 久久99一区二区三区| 亚洲高清免费不卡视频| 伦精品一区二区三区| 欧美少妇被猛烈插入视频| 国产精品蜜桃在线观看| 汤姆久久久久久久影院中文字幕| 人妻人人澡人人爽人人| 熟女电影av网| 一区二区三区免费毛片| 亚洲成色77777| 中文字幕精品免费在线观看视频 | 在线看a的网站| 最近的中文字幕免费完整| 亚洲精品,欧美精品| 午夜精品国产一区二区电影| 日本vs欧美在线观看视频 | 视频中文字幕在线观看| av福利片在线观看| av女优亚洲男人天堂| 国产黄频视频在线观看| 亚洲精品国产av蜜桃| 亚洲综合色惰| 亚洲av在线观看美女高潮| 日韩中字成人| 丁香六月天网| 亚洲欧洲日产国产| 在线观看免费视频网站a站| 国产精品人妻久久久影院| 日韩电影二区| 校园人妻丝袜中文字幕| 最近中文字幕2019免费版| 人妻少妇偷人精品九色| 亚洲熟女精品中文字幕| 中文在线观看免费www的网站| 午夜久久久在线观看| 欧美激情国产日韩精品一区| 亚洲国产最新在线播放| 日韩欧美 国产精品| 欧美3d第一页| 七月丁香在线播放| 另类亚洲欧美激情| 91精品国产国语对白视频| 亚洲欧美一区二区三区黑人 | 肉色欧美久久久久久久蜜桃| 亚洲怡红院男人天堂| av又黄又爽大尺度在线免费看| av线在线观看网站| 成人国产麻豆网| 韩国高清视频一区二区三区| 欧美精品人与动牲交sv欧美| 99久国产av精品国产电影| 欧美日韩一区二区视频在线观看视频在线| 亚洲高清免费不卡视频| h日本视频在线播放| 久久97久久精品| 看免费成人av毛片| 一二三四中文在线观看免费高清| 精品少妇黑人巨大在线播放| 国产精品一区二区在线不卡| 久久婷婷青草| 精品午夜福利在线看| 国产精品麻豆人妻色哟哟久久| 亚洲精品乱久久久久久| 一级爰片在线观看| 国产男人的电影天堂91| 大又大粗又爽又黄少妇毛片口| 啦啦啦在线观看免费高清www| 99热这里只有是精品50| 国产成人精品婷婷| 日韩欧美一区视频在线观看 | 欧美日韩亚洲高清精品| 哪个播放器可以免费观看大片| av在线app专区| 多毛熟女@视频| 亚洲人成网站在线观看播放| 搡老乐熟女国产| 国产精品蜜桃在线观看| 亚洲精品一区蜜桃| 国产一区二区三区av在线| 日韩 亚洲 欧美在线| 日本免费在线观看一区| 精品久久久精品久久久| .国产精品久久| 91精品伊人久久大香线蕉| 看十八女毛片水多多多| 大片免费播放器 马上看| 日韩在线高清观看一区二区三区| 精品久久国产蜜桃| 精品亚洲成国产av| 少妇人妻 视频| 18禁动态无遮挡网站| 亚洲av成人精品一二三区| 自线自在国产av| 精品国产一区二区久久| 国产欧美日韩一区二区三区在线 | 高清午夜精品一区二区三区| 亚洲欧美清纯卡通| 黄色日韩在线| 在线观看免费高清a一片| www.av在线官网国产| 美女内射精品一级片tv| 老司机影院毛片| 久久国内精品自在自线图片| 日韩欧美 国产精品| 黄色视频在线播放观看不卡| 亚洲av国产av综合av卡| 国产乱来视频区| 美女主播在线视频| 亚洲精品中文字幕在线视频 | 伦理电影免费视频| 女人久久www免费人成看片| 国产精品久久久久成人av| 日本午夜av视频| av专区在线播放| 国产免费一区二区三区四区乱码| 伊人亚洲综合成人网| 国产在线一区二区三区精| 99九九线精品视频在线观看视频| 精品熟女少妇av免费看|