• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    ST‐SIGMA:Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    2022-12-31 03:49:14YangFangBeiLuoTingZhaoDongHeBingbingJiangQilieLiu

    Yang Fang|Bei Luo|Ting Zhao|Dong He|Bingbing Jiang|Qilie Liu

    1School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing,China

    2School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing,China

    3School of Electrical Engineering,Korea Advanced Institute of Science and Technology(KAIST),Daejeon,Republic of Korea

    4School of Information Science and Technology,Hangzhou Normal University,Hangzhou,China

    Abstract Scene perception and trajectory forecasting are two fundamental challenges that are crucial to a safe and reliable autonomous driving(AD)system.However,most proposed methods aim at addressing one of the two challenges mentioned above with a single model.To tackle this dilemma,this paper proposes spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting(ST‐SIGMA),an efficient end‐to‐end method to jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents within a unified framework.ST‐SIGMA adopts a trident encoder–decoder architecture to learn scene semantics and agent interaction information on bird’s‐eye view(BEV)maps simultaneously.Specifically,an iterative aggregation network is first employed as the scene semantic encoder(SSE)to learn diverse scene information.To preserve dynamic interactions of traffic agents,ST‐SIGMA further exploits a spatio‐temporal graph network as the graph interaction encoder.Meanwhile,a simple yet efficient feature fusion method to fuse semantic and interaction features into a unified feature space as the input to a novel hierarchical aggregation decoder for downstream prediction tasks is designed.Extensive experiments on the nuScenes data set have demonstrated that the proposed ST‐SIGMA achieves significant improvements compared to the state‐of‐the‐art(SOTA)methods in terms of scene perception and trajectory forecasting,respectively.Therefore,the proposed approach outperforms SOTA in terms of model generalisation and robustness and is therefore more feasible for deployment in real‐world AD scenarios.

    KEYWORDS feature fusion,graph interaction,hierarchical aggregation,scene perception,scene semantics,trajectory forecasting

    1|INTRODUCTION

    In recent years,autonomous driving(AD)has made great progress,and its practical application value is becoming increasingly prominent.However,there are still open challenges[1,2]within current AD,for example,scene perception and trajectory forecasting,which are important to perceiving the surrounding environment of the ego‐agent and predicting the future trajectories of neighbouring traffic agents given sensory data and past motion states[3].Specifically,scene perception is designed to sense its surroundings to avoid collisions,while trajectory forecasting aims to optimise path planning.Recently,some works[4,5]attempt to address these two problems with a framework,which needs to handle two heterogeneous information:scene semantics and interaction relations.However,this task is challenging due to the following two main factors.First,the coexistence of multi‐category moving agents(i.e.pedestrians,cars,cyclists etc.)in an AD environment hampers the LiDAR‐only approach to perceive different shapes and the camera‐only approach to predict motion states.Second,the ego‐vehicle faces multimodal interactions with surrounding traffic agents,termed spatial interaction.Furthermore,the future motion trends of traffic agents depend largely on their previous motion states,called temporal interaction.Since complex spatio‐temporal interactions are intertwined,a single‐model network often fails to explicitly model the interactions.Most early methods of scene perception rely on object detection and tracking pipelines but cannot identify unseen objects in the training set,and therefore,they cannot accurately perceive unseen traffic agents in scenes.

    Based on the above observations,this paper employs a bird's‐eye view(BEV)and an occupancy grid map(OGM)to represent the surrounding environment and traffic agent’s motion states.Figure 1 illustrates three perception pipelines with BEV maps for AD.Figure 1a depicts the instance‐level prediction[6],which focusses on the 2D detection bounding box without motion estimation[7].Figure 1b illustrates the method proposed by Wu et al.[8],which performs joint scene perception and motion prediction.Figure 1c demonstrates the prediction results of our proposed ST‐SIGMA.Compared with(a)and(b),ST‐SIGMA can simultaneously perform both instance‐level and pixel‐level detection as well as dense motion prediction.Benefiting from the fusion of graph interaction features with scene semantic features and the hierarchical aggregation decoder(HAD),our model achieves better perceptual and predictive performance than SOTA methods,as shown in our prediction results.

    Most of the trajectory forecasting methods[9,10]in AD deal with the forecasting task by breaking it down into three subproblems[11],that is,detection,tracking,and forecasting.For detection,they use advanced 2D[12,13]or 3D[14]detectors to perceive the surrounding traffic objects[15].For tracking,the 2D or 3D visual object tracking methods[16,17]fulfil the data association,which is essential for generating motion trajectories of seen traffic agents.For forecasting,based on the past trajectories obtained by Multiple Object Tracking(MOT),a temporal model is built for trajectory forecasting.However,this step‐by‐step manner suffers from several inherent deficiencies.First,the‘barrel effect’restricts the performance of trajectory forecasting by the detection and tracking performance.Second,each step has significant computational complexity,and this detection‐association‐forecasting ideology inevitably increases time consumption,including training and inference time overhead,making detection‐tracking‐forecasting tasks difficult to perform in real time.

    In addition,the input modalities for the trajectory forecasting model are diverse.Luo et al.[4]model trajectory forecasting problems by taking LiDAR‐only data[18]as the network input.Ivanovic and Pavone[19]use a recurrent sequence model and a variational deep generative model to generate a distribution of future trajectories for each agent[20].They rely solely on the past motion information of agents to model future motion states that lack context information.

    FIGURE 1(a)Shows the predicted results of a general instance‐level detector,(b)is the output of MotionNet,and(c)denotes the predicted results of the proposed ST‐SIGMA.The left images are initial bird’s‐eye view(BEV)representations,the middle ones are the corresponding ground‐truth maps,and the right images are the output of three different methods,respectively.The arrows indicate future motion predictions of the foreground grid cells,and the different colours represent different agent categories.The orientation and the length of the arrows represent the direction and distance of the agents'movement.The central area of the BEV map represents the location of the ego‐agent

    To remedy the above‐mentioned issues,this paper proposes a unified scene perception and graph interaction encoding framework,ST‐SIGMA,consisting of the scene semantic encoder(SSE),graph interaction encoder(GIE),and HAD.SSE takes multi‐sweep LiDAR data in BEV as the input to extract high‐level semantic features,and GIE utilises agents'previous state information to encode graph‐structured interactive relations between neighbouring traffic agents.Then,both output features are propagated into HAD for pixel‐level prediction tasks.Notably,our model captures features of both the scene and interaction to compensate for the deficiencies in prior work.

    In summary,the contribution of this paper is threefold:

    ·An iterative aggregation network is developed as the SSE,which iteratively aggregates shallow and deep features to preserve as much spatial and semantic information as possible for multi‐task dense predictions.

    ·An attention‐based feature fusion method is designed to efficiently fuse the semantic and interaction encoding features to facilitate multimodel feature learning.

    ·The proposed ST‐SIGMA framework can learn scene semantics and graph interaction features in a unified framework for pixel‐level scene perception and trajectory forecasting.

    The rest of this paper is organised as follows:Related work is discussed in Section 2.Details on the proposed scene perception and trajectory forecasting framework are presented in Section 3.The experimental results and analysis on the nuScenes[21]data set are presented in Section 4.And finally,Section 5 draws the conclusion and the future work of this paper.

    2|RELATED WORK

    This section revisits some of the key works that have been proposed for scene perception,graphical interaction representation,and trajectory prediction,respectively.We also illustrate the similarities and differences between the proposed ST‐SIGMA and other works.

    2.1|Scene perception

    The canonical scene perception task targets the identification of the location and class of objects.According to the input modality,this task is categorised into 2D object detection,3D object detection,and multimodal object detection.2D object detection includes two‐stage methods[22],single‐stage methods[23],and transformer‐based methods[24].With the increasing adoption of LiDAR in AD,3D object detection has recently gained increasing attention.The voxel‐based method voxelizes irregular point clouds into 2D/3D compact grids and then adopts 2D/3D Convolutional Neural Networks(CNNs)[6,25].The point‐based method leverages permutation invariant operators to abstract features from raw points[26,27].Due to the ambiguity of single modality,fusion‐based object detection is emerging to address this drawback[28,29].Specifically,our method follows the pipeline of multimodal object detection,where we leverage LiDAR point clouds and graph interaction data to perform both pixel‐level categorisation and instance‐level object detection.

    2.2|Graph interaction representation

    Graph convolutional networks(GCN)[30]can model the dependencies between graph nodes and propagate neighbouring information to each node.It is receiving more attention in many vision tasks[31,32].ST‐GCN[33]applies to construct a sequence of skeleton graphs for action recognition,where each graph node corresponds to a joint of the human body.Refs.[34,35]use ST‐GCN to learn interaction features from human trajectories in the past and then design a time‐extrapolator CNN for future trajectory generation.Weng et al.[9]propose a graph neural network‐based feature interaction mechanism that is applied to a unified MOTand forecasting framework to improve socially aware feature learning.However,none of these trajectory prediction methods fully consider the direct or indirect interactions between traffic participants.Instead,the proposed ST‐SIGMA explicitly explores the interaction relationships between multiple traffic agents in AD scenes by leveraging ST‐GCN as a GIE.

    2.3|Trajectory forecasting

    Significant progress has been made in trajectory forecasting based on different data modalities.Fast and Furious[4]develops a joint detector,tracking,and trajectory forecasting framework by encoding an OGM over multiple LiDAR frames.MotionNet[8]and Fang et al.[36]propose a novel spatio‐temporal pyramid network(STPN)to jointly perform pixel‐level perception and trajectory prediction.Besides,parallelized tracking and prediction[9]utilises the past motion states of traffic agents in a top‐down grid map as the model input,and then the recurrent neural networks are applied to extract and aggregate temporal features as the interaction representation for motion prediction.With the development of high‐definition map(HDM)data,HDM data‐based trajectory forecasting methods receive more attention.GOHOME[37]exploits the graph representations of the HDM with sparse projection to provide a heatmap output that depicts the probability distribution of an ego‐agent's future position.THOMAS[38]leverages hierarchical and sparse image generation for multi‐agents'future heatmap estimation.Unlike these works,this paper utilises the fusion of the scene and interaction features for pixel‐level trajectory forecasting.

    3|PROPOSED METHOD

    We aim to perform simultaneous multi‐agent scene perception and trajectory forecasting in the 2D space of BEV.First,the BEV representation of the point cloud is fed to the SSE network for scene semantics encoding.Meanwhile,multi‐agent past trajectories are fed to the GIE network for interactive graph encoding as well.Then,the HAD network further fuses the features from SSE and GIE to perform scene perception and trajectory forecasting.The overall ST‐SIGMA framework is illustrated in Figure 2.Section 3.1 describes the BEV representation process.Section 3.2 presents details about scene information manipulation and the semantic encoder building process.Section 3.3 introduces motion state data preparation and explains how the graph interaction is modelled,given agents'past trajectories.Section 3.4 gives the deep aggregation decoder construction and configuration and elaborates on the rationale of network design.

    3.1|BEV representation

    FIGURE 2 The proposed scene perception and trajectory prediction framework,ST‐SIGMA,consists of three essential modules:SSE,GIE,and HAD.Each of them plays its own unique role for scene semantics encoding,graph interaction encoding,and feature fusion,please see the corresponding sections for details.GIE,graph interaction encoder;HAD,hierarchical aggregation decoder;SSE,scene semantic encoder;ST‐GCN,spatio‐temporal graph convolution network;ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    To make the raw point cloud from the LiDAR sensor structured and compatible with the network input,we need to transfer raw point clouds into BEV maps.Concretely,the origin of the coordinate system for each point cloud sweep changes over time due to the movement of LiDAR mounted on the ego‐agent,which leads to plausible motion estimation.To alleviate these issues,following Ref.[8],we first synchronise all the past point frames to the current coordinate system for coordinate alignment.Then wespecify the range of scene regionfrom the raw3D cloud point at timestampt,denoted as Mt∈RLc×Wc×Hc,whereLc,WcandHcdenote the length,width,and height,respectively.The origin is located at the position of the ego‐agentP0=[x0,y0,z0]by specifyingP0as the origin of synchronised coordinate system.The valid range of scene Mtw.r.tP0can be represented by,whereH0is the vertical distance from the LiDAR sensor to the ground.Thereafter,the Mtis voxelised with a predefined voxel size[δl,δw,δh]into a discretised grid mapItwith a size of.We simply encodeItas a binary BEV map.In particular,a voxel occupied by the point cloud is assigned a value of 1;otherwise,it is assigned as?1.The binary mapItis taken as the input into the SSE network for scene semantics encoding.

    3.2|Scene semantic encoder

    In the STPN of the baseline,we observe that pyramidal connections are linear,and the shallower layers’features are not sufficiently aggregated to mitigate their inherent semantic weaknesses.Instead of STPN,we propose SSE for progressive spatio‐temporal feature aggregations inspired by Ref.[39].Concretely,SSE takes a set of sequential BEV mapss={st|t=?T,…,?1}∈RT×Z×X×Yas the input,whereTis the number of BEV maps,andZ,XandYdenote the number of channel(height),Xaxis andYaxis of each map.The network consists of five spatio‐temporal convolution(STC)blocks.The first STC block,STC‐0,extents height channels fromZtoCwhile reserving the map resolution.The remaining four blocks double the number of height channels and reduce the resolution by a factor of two at each stage to model high‐level semantic features.The output scene semantic feature maps S∈R16C×X∕16×Y∕16are then fused with graph interaction features G(detailed in Section 3.3)to compose the semantic and interactive aggregations as the final encoding features A,which are further propagated into the decoder network with multiheads.At the same time,to preserve as much of the spatial information of the low‐level features as possible for better resolving the pixel‐level information for fine‐grain output tasks,we apply iterative aggregation connections,which are compositional and non‐linear,and the earlier aggregated layers pass through more aggregations.The iterative aggregation functionIaggregates the shallow and deep features from a series of layers{l1,…,ln},which is formulated as follows:

    whereAis the aggregation node.Our SSE network has four iteration levels and six aggregation nodes.Each nodeAaccepts two input branches:the first branch is from feature maps that share the same resolution as aggregated features;the second branch is the feature maps down‐sampled from the first input branch.For dimension and resolution adaptation,the features of the second branch are fed into a projection blockPand an up‐sampling blockU.The projection functionPprojects the feature map channel of stage 2–6 to the number of 16,and then the un‐sampling functionUup‐samples the projected feature maps to the same resolution as the features of stage 1(256×256).The up‐sampled features are taken as input to aggregation nodeAin Equation(1).The SSE network is shown in Figure 3.

    3.3|Graph interaction encoder

    Besides scene semantic information,the interaction relationships between agents will influence their future trajectory in the BEV map.In general,the past motion states of an agent directly or indirectly affect its own and its neighbours'future trajectories.Therefore,we extend the GCN to the ST‐GCN to encode the graph interaction between agents,as shown in Figure 4.

    To emphasise the difference from the original GCN,we revisit the preliminary knowledge of the graph convolution network(GCN),and the original GCN is formulated as follows:

    where P is the sampling function aggregating the information of neighbours aroundvi,and the superscript(l)denotes layerl.W is the learnable parameters of graph model,is a normalisation term,and Θ is the cardinality of the neighbour set.

    Given multi‐agent's motion state vectorfor agentiat timestampt,it serves as the initial input of the first layer(layer0)of GIE.Here,a set of sequential motion state vectorvitfrom timestamp?Tto?1 compose the input of GIE,denoted as,whereMis the number of agents andDmeans the feature dimension for each graph node.General graph convolution networks consider only single‐frame graph representationGt=(Vt,Et),whereVt=is the set of vertices of the graph,andis the set of edges betweeni‐th andj‐th vertex.We assume that there is an edge betweenvitandvjtif theL2distance between them is less than a predefined thresholdd.To better encode the mutual interaction between neighbouring agents,each edge is weighted by a kernel functionkij

    t,defined as follows:

    where‖·‖is theL2distance from BEV between thei‐th andj‐th vertex,which means thatifvitandvjtare connected,andeijt=0 otherwise.Allkijtform the adjacency matrixAtof graphGt.To ensure a proper GCN,the adjacency matrixAtis normalised with identity matrixIand the degree matrix Λ as follows:

    Similar to Refs.[33,34],ST‐GCNs define a new spatio‐temporal graphGconsisting of a sequential subgraphs asG={Gt|t=?T,…,?1},and allGtshare the same topology but vary vertex attributes ofvtwith a different timestampt.Thus,the graph is defined asG=(V,E),in whichV={vi|i=1,…,M},and.It means that each vertex ofGconsists of the temporal aggregation of spatial node attributes;in this way,the temporal information is naturally embedded in graphG.In addition,the normalised adjacency matrixAis the stack of{A?T,…,A?1},in whichAtis calculated as Equation(4).The vertices'representation of layerlat timestamptis denoted asV(l)t,andV(l)is the temporal stack of.Our final ST‐GCN layers is formulated as follows:

    where W(l)is the learnable parameters at a laterl.After N‐layer graph convolution operation,the ST‐GCN outputs final graph feature mapsV(N),denoted as graph interaction embedding G,which is further fused with scene semantics encoding S and is then taken as input to the HAD network for dense prediction tasks.

    3.4|Hierarchical aggregation decoder

    Given the scene semantic feature maps S output by SSE and graph interaction feature maps G output by GIE,the proposed hierarchical aggregation decoder network takes S and G as input for simultaneous multi‐layer feature aggregation and heterogeneous feature fusion.Specifically,S is a stack of the multi‐scale feature maps{s(1),…,s(N)},s(n)is the output of then‐th aggregation stage with resolution[X/2n?1,Y/2n?1],where[X,Y]is the original size of the BEV map.There are five aggregated feature maps,except the fifth‐stage featuress(5),which are directly output by the STC‐4 block and are further fused with G.The feature maps from one to four stages■s(n)■4n=1are produced by iterative aggregation operation,defined in Equation(1).As for G,the original input of GIE is theT‐sequential stack ofN‐agents'motion state matrixFor model suitability,we first transform the dimensional order of Q to∈RD×T×M.Then,the ST‐GCN feature mapsV(N)isproduced by the ST‐GCN network,given graph vertices^Q and adjacency matrixA∈RT×M×M,andV(N)is fed into a graph residual block for the graph representation aggregator(GRA),which is composed of a stack of residual convolution operations,to obtain the final graph interaction feature maps G∈RT××M,where=M=X/16.Thus,s(5)and G share the same feature resolution and can be seamlessly fused[40]after applying the ST‐GCN and GRA network.We design three fusion approaches for heterogeneous feature fusion,as shown in Figure 5,and conduct ablation study for their effectiveness comparison in Section 4.4.3.The core of the ST‐SIGMA framework lies in the HAD network,shown in Figure 3,which plays a vital role in two aspects of dense prediction tasks.First,it enables us to neatly fuse multimodal feature maps even with diverse dimensions,resolution,and channels.Second,HAD network can iteratively and progressively aggregate features from shallow to deep to learn a deeper,higher,and finer resolution decoder with fewer parameters and better accuracy.

    FIGURE 3 Hierarchical aggregation decoder(HAD)and scene semantics encoder(SSE)architectures.The SSE is responsible for scene semantic feature extraction.The HAD fuses the features from the SSE and graph interaction encoder(GIE)and then performs dense predictions.STC,spatio‐temporal convolution

    FIGURE 4 Graph interaction encoder(GIE),which is implemented with the spatio‐temporal graph convolution network(ST‐GCN).Generally,the ST‐GCN is the temporal stack of spatial graph representations.Please refer to Section 3.3 for more details

    FIGURE 5 Three feature fusion approaches are used in our method,and C is the concatenation operation.(a)Is the concatenation‐based fusion,(b)is the addition‐based fusion,and(c)is the hybrid fusion method,which includes concatenation,addition,and attention operations.GIE,graph interaction encoder;SSE,scene semantic encoder

    3.5|Loss function

    The proposed HAD network generates final feature map F∈RZ×X×Y,andZ,X,Yare the channel,width,and height of feature maps,followed by three prediction heads for object detection,pixel‐level categorisation,and trajectory forecasting,respectively.Feature maps F are first taken as input into bottle neckconv2dfor feature adaptation.Each task is supervised by following three loss functions.

    Forobject detection loss,we apply the cross‐entropy loss(CE)for box classification,which is defined as follws:

    whereyidenotes the ground‐truth label of thej‐th sample,yi=1 means the foreground,andpiis the probability belonging to the foreground predicted by the learnt model.For bounding box regression,we employ a linear combination ofl1‐norm loss and the generalised IoU lossLGIoU[41].The final regression loss can be formulated as follows:

    where 1{yi=1}is an indicator function for the positive sample,biis thei‐th predicted bounding box,andbis the ground‐truth bounding box.λIoUandλ1are the regularisation parameters.

    Forpixel‐level categorisation loss,we employ the focal loss(FL)[42]to handle the class imbalance issues,which is defined as follows:

    wherepti=piifyi=1,andpti=1?piotherwise,piis the predicted probability that thei‐th pixel belongs to the foreground category,andyiis the ground‐truth category labels;interested readers can refer to related literature[42]for more details.

    Fortrajectory forecasting loss,with the analysis in Ref.[45],we employ a weighted smoothL1loss functionLtffor trajectory forecasting,where the weight setting follows that of the categorisation loss.However,the above loss can only guarantee the global normalisation of training and cannot guarantee the local spatiotemporal consistency.Therefore,we additionally adopt the spatiotemporal consistency loss to augment spatiotemporal consistency learning,which is defined as follows:

    In Equation(9),‖·‖is the smoothL1loss,bkis the object instance with indexk,and∈R2and∈R2are the predicted displacement values of position(i,j)and position(i′,j′)at timet,respectively.It assumes that the motion states of all pixels within an instance box should be very close to each other without much jitters,referred to as spatial consistency.Similar to spatial consistency,the predicted motion states of each agents:,denoted as the average movements of all included pixels,should be smooth without large displacement changes during a short time durationΔt,whereKis the number of cells,andβtcandβscrepresent weight parameters of the temporal‐and spatial‐consistency loss,respectively.

    The overall loss function of the ST‐SIGMA model is the weighted sum of above multi‐task losses,which is defined as follows:

    whereLclsandLregare computed for instance detection loss,LflcandLstcare formulated as pixel‐level categorisation and trajectory forecasting losses,andπiis the trade‐off parameters for balancing the multi‐task learning.

    4|EXPERIMENTS AND ANALYSIS

    In this section,we evaluate the performance of the proposed ST‐SIGMA method on the nuScenes data set.First,we give an introduction to the data set.Second,the implementation details and the evaluation criterion are presented.Then,we give the details of experimental analysis and compare the proposed method with existing SOTA methods.Finally,we demonstrate the effectiveness and efficiency of each module through comprehensive ablation studies.

    4.1|Training and test data set

    We use the nuScenes data set for all our experiments and evaluation.The nuScenes data set provides a full suite of sensor data for autonomous vehicles,including six cameras,1 LiDAR,5 mm wave radars,as well as Global Positioning System and Inertial measurement unit.It contains 1000 scenes with annotated samples,each of which is sampled within 20 Hz and contains various driving scenarios.Because nuScenes only provides the 3D bounding boxes for point clouds object detection and does not provide motion or trajectory information,we obtain the motion states between two adjacent frames by calculating the displacement vector of corresponding point clouds within the labelled bounding boxes based on theirX–Yaxis coordinate values and the displacement relative to their centre positions.For point clouds outside the bounding box,such as the background,road,roadside building etc.,the movement values are set to zero.At the same time,we crop the point clouds,and set the range on thex‐axis to be 32 m for the positive and negative axis,respectively,and the same range on they‐axis.On thez‐axis,we take into account that the LiDAR sensor is mounted on top of the vehicle,so the negative axis direction is set to 3 m and the positive axis is 2 m.

    4.2|Implementation details

    The size of scene range Mtis set as[?32,32]×[?32,32]×[?3,2]m3,and then Mtis discretised by the voxel size[0.25,0.25,0.4]into a grid mapItof size[256,256,13].We use five temporal frames of synchronised point clouds as SSE network inputwith tensor size 5×13×256×256.We define five categories for instance‐level classification and pixel‐level categorisation prediction,for example,vehicle,pedestrian,bicycle,background,and others.The‘other’category includes all the remaining objects in nuScenes to handle the possible unseen objects beyond the data used in our paper.For the GIE network,we use 8‐dimension motion vectorsfor each traffic agent,whereand each quantity containsx‐axis andy‐axis components.For spatial‐temporal graphG,we construct network inputwith size 5×8×20×20,which are generated from the same five temporal frames with the input of SSE.And the SSE encoder outputs scene semantic features S,GIE outputs graph interaction features G,and both of them share the same size 32×16×16.Then,we apply three feature fusion approaches to fuse them,that is,channel‐wise concatenation,channel‐wise addition,and attention‐augmented addition.And we further verify their effectiveness in the ablation studies.The visualisation of predicted results by ST‐SIGMA is shown in Figure 6.

    4.3|Evaluation criterion

    For trajectory forecasting,we calculates the relative displacements of the corresponding point clouds in adjacent frames.Meanwhile,all grid cells within BEV map are classified into three groups according to their moving speed,for example,static,slow(speed≤5 m/s),and fast(speed>5 m/s).In each group,the averageL2‐norm distance between the estimated displacement and the ground truth displacement,that is,the average displacement error(ADE),is calculated in Equation(11):

    In Equation(11),Nrepresents the number of traffic agents,represents the predicted trajectory of then‐th traffic agents at timestampt,andrepresents the true trajectory,wheret=1,…,Tfuture.Equation(12)computes the Overall cell category Accuracy(OA)of all cells,formulated as follows:

    where Correct classified Cells(CC)represents the number of correctly classified cells,and AC represents the total number of cells.And Equation(13)calculates the Mean Category Accuracy(MCA),which indicates the average category accuracy:

    where CA(Bg)represents the classification accuracy of the background,CA(Vehicle)represents the classification accuracy of vehicles,CA(Ped)represents the classification accuracy of pedestrians,CA(Bike)represents the classification accuracy of bicycles,and CA(Others)represents the classification accuracy of other unseen traffic objects,respectively.

    4.4|Experimental analysis

    FIGURE 6 The visualisation of predicted results by each head,for example,the predicted pixel categorisation(using different colours to represent different categories),bounding boxes,and the trajectories of traffic agents

    We evaluate the performance of our proposed ST‐SIGMA method and compare it with other SOTA perception and trajectory forecasting methods.Specifically,for pixel‐level categorisation and trajectory forecasting,we compare with the following methods:static model,which only considers the static environment;FlowNet3D[43],HPLFlowNet[44],Neural scene flow prior[46]and recurrent closest point[47],which estimate the scene flow between adjacent point cloud frames with linear dynamic assumption;PointRCNN[27],which is the combination of the PointRCNN detector and Kalman filter for bounding box prediction and trajectory forecasting of objects in BEV representation;LSTM‐Encoder–Decoder[45],which estimates the multi‐frame OGMs by using the same prediction head as ST‐SIGMA while preserving its backbone network;and MotionNet[8],which adopts LiDAR point clouds and the STPN framework for scene perception and motion prediction.It is noteworthy that our ST‐SIGMA is inspired by the baseline[8],but with three novel contributions.First,we discard the STPN backbone and replace it with the iterative aggregation network,which can better propagate low‐level features to high‐level stages for multi‐scale feature aggregation.Second,we employ a GIE to further learn interactive relations between traffic agents.Third,we introduce an additional instance‐box prediction head for instance‐level object detection,which can directly boost pixel categorisation performance by capturing higher‐level semantic information from traffic scenes.For evaluating the performance of the bounding box prediction,we compare our method with SOTA detectors,including PointPillars[6],PointPainting[29],shape signature networks[48],Class‐balanced grouping and sampling[49],and CenterPoint[14].Moreover,we further compare the complexity of the proposed ST‐SIGMA with different scene perception and trajectory forecasting methods.

    4.4.1|Quantitative analysis

    To better evaluate the performance of the proposed method in different traffic scenarios,we first divide all the grid cells into 3 groups by different moving speeds:static(velocity=0 m/s),slow(velocity≤5 m/s),and fast(velocity>5 m/s).As shown in Tables 1–5,the ST‐SIGMA‐Baseline adopts the STPN backbone network proposed by the baseline method to extract scene semantic features from the BEV representation,and is additionally equipped with GIE.ST‐SIGMA employs the iterative aggregation network instead of STPN,and includes SSE,GIE,HAD,and the attention‐based fusion.

    From Table 1,we can see that for static cells,the static model gets the best trajectory forecasting performance,but the ADE increase with the moving speed of grid cells,which means the static model only performs well under extreme conditions and is not suitable for real‐world and dynamic AD scenes.A similar phenomenon happens to FlowNet3D[43],HPLFlow-Net[44],and PointRCNN[27];all of them achieve good prediction results for static cells;however,their performance drops dramatically for moving objects.Unlike the above methods,MotionNet[8]has a fairly stable model performance and shows the excellent performance for slowly moving objects;it outperforms all other methods including ours when moving velocity≤5 m/s.Instead,our proposed ST‐SIGMA method yields the best performance for fast moving objects.Specifically,it outperforms MotionNet by 0.0217 in terms of ADE,and 0.0637 in terms of median displacement error,respectively.This superiority is attributed to the graph interaction features,which can effectively model dynamic interactive relations,especially for the fast‐moving traffic agents with velocity>5 m/s.In addition,we can draw the conclusion that each component(SSE,HAD,and feature fusion approaches)plays a positive role for performance improvement.Notably,ST‐SIGMA with the attention‐based fusion approach achieves the best performance of trajectory forecasting,which suppresses the baseline by 2.3%,please refer to Tables 1 and 5 for more details.

    For evaluating pixel categorisation performance,we apply two types of loss functions:CE and FL.As shown in Table 2,ST‐SIGMA+ALL+CE denotes ST‐SIGMA with CE loss,and ST‐SIGMA+ALL+FL denotes ST‐SIGMA with FL function.We compare the proposed method with PointRCNN,LSTM‐Encoder–Decoder,and MotionNet.Table 2 demonstrates that across all five categories,the proposed ST‐SIGMA achieves the highest accuracy for four foreground categories,that is,car,pedestrian,bike,and others.We consider MotionNet as the baseline method;for CE,ST‐SIGMA increases categorisation accuracy by 1.5% compared to the baseline method.When applying FL,it further improves the performance of the baseline by 1.8%.For the pedestrian category with a small sample size,our method receives a significant performance improvement.Specifically,it increases the categorisation accuracy from 77.2 to 83.8,which outperforms the baseline method by 6.6%.For‘others’foreground categories,whose categorical information are not specified in this paper,ST‐SIGMA still achieves much better accuracy than the baseline by more than a 10%performance improvement.And the MCA of ST‐SIGMA is 75.5,which is 4.4%higher than that of the baseline.Notably,there is no obvious performance improvement for the background category and the overall cell categorisation.We analyse that for most point cloud frames,the point clouds belonging to the background category is much more than foreground point clouds.Once we use multi‐frame BEV maps to get the temporal‐aggregated scene features,these aggregated features can facilitate the detection of the foreground object with a small number of point clouds,but for the background category with a large number of point clouds,iterative aggregation network may cause over‐fitting issues due to over‐aggregation of background features.We will address this issue in a future study.

    TABLE 1 In the comparison of trajectory forecasting performance,we compare ours with some state‐of‐the‐art methods

    TABLE 2 The comparison of pixel‐level categorisation performance

    TABLE 3 The complexity comparison of the proposed ST‐SIGMA and some state‐of‐the‐art scene perception and trajectory forecasting methods

    Besides dense pixel‐wise predictions,we additionally add an instance‐level detection for the object bounding box prediction,as shown in Table 4.We compare the performance of ours with the most commonly used 3D object detectors.Our detector is based on the aggregated BEV maps and only performs 2D detection,although the performance of ST‐SIGMA is far inferior to most of SOTA 3D detectors,such as CenterPoint.However,employing this detection function can help to disambiguate the pixel categorisation process by providing both the spatial consistency constraint and the additional semantic information.To further evaluate the efficiency of the proposed method,we further compare the complexity of ST‐SIGMA with the following methods,PointPillars[6],PointRCNN[27],Flow-Net3D[43]and MotionNet[8].Table 3 shows that the complexity of the proposed ST‐SIGMA is about 10% higher than that of the baseline method.We attribute the complexity growth to the usage of multimodal data input and the iterative network architecture.

    4.4.2|Qualitative analysis

    To qualitatively evaluate the proposed ST‐SIGMA method,we visualise predicted results of the pixel categorisation and trajectory forecasting in Figures 7 and 8,and compare it with corresponding ground‐truth maps and the predicted results of the baseline method.For the convenience of qualitative analysis,we first give the detailed description of the elements and attributes in the following figures.

    Specifically,in Figure 7a,there are five different colours of point clouds;different colours denote the different categories of traffic agents,for example,blue points represent the background,purple points mean vehicles(cars or truck),black points mean the pedestrian,green points denote the bicycle,and red points mean other foreground categories.In Figure 7b,besides point clouds,there are five different colours of arrows,where each colour represents the category information that is the same with points,and the arrows represent the future trajectory of each point.Concretely,the length of the arrowindicates the moving distance,and the direction of the arrow means the moving orientation.Figure 7a shows the comparison of ground‐truth maps,the predicted results of our ST‐SIGMA,and the baseline method.In pixel categorisation results,there are fewer pixels misclassified by ST‐SIGMA than the baseline,as shown in the regions enclosed by the circle.It demonstrates that the categorisation performance of the proposed method is better than that of the baseline.As for trajectory forecasting results in Figure 7b,it is clear to see that the points at the bottom of the map are misclassified by the baseline,but ST‐SIGMA gives robust categorisation results thanks to the additional instance detection function.More visualisation of trajectory forecasting results are shown in Figure 8,where each row represents the comparison between ground truth,predicted results of ST‐SIGMA,and the baseline in a traffic scene.In first row,the points belonging to other category at the top of the map are misclassified into vehicles by the baseline.Instead,the proposed ST‐SIGMA can accurately categorise them.However,in the second row,for traffic agents that are too close or too far from the ego‐vehicle,our methods occasionally perform false detection,which indicates that our method has lower robustness compared to the baseline method in these traffic scenarios.

    TABLE 4 The comparison of object detection performance

    FIGURE 7 The pixel‐level categorisation prediction results(first row),and the trajectory forecasting results(second row),from left to right:the ground‐truth category map,the predicted results of ST‐SIGMA,and the predicted results of the baseline method.ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    TABLE 5 Ablation study about the effect of each component on the ST‐SIGMA framework

    4.4.3|Ablation study

    In this section,we estimate the validity and effectiveness of each proposed component through comprehensive ablation studies.First,we focus on the proposed ST‐SIGMA framework,which consists of four key components,SSE,GIE,HAD,and feature fusion approaches(Con,Add,and Att).Table 5 shows that the ST‐SIGMA‐Baseline adopts the STPN backbone network from the baseline to model BEV maps,and is additionally equipped with GIE.In Table 5,the second row replaces STPN with the iterative aggregation network for scene semantics encoding,and leaves the rest unchanged.The third row further applies the HAD network for hierarchical multimodel feature aggregation.And the fourth to sixth rows apply Concatenation‐based fusion,element‐wise Addition‐based fusion,and Attention‐based fusion approaches,respectively.As shown in Figure 5,(a)is the Concatenation fusion(Con),(b)is the element‐wise Addition fusion(Add),and(c)is the Attention‐based fusion(Att),respectively.We can also draw the conclusion that each component plays a positive role in the performance improvement of ST‐SIGMA.If we take MotionNet as the baseline,we can see that only simply adding interaction encoding cannot help improve the trajectory forecasting accuracy.And ST‐SIGMA with the attention‐based fusion approach obtains the best performance,which suppresses MotionNet by 2.3%.

    FIGURE 8 The pixel‐level categorisation prediction of three selected scenes,from left to right:the ground‐truth category map,the predicted results of ST‐SIGMA,and the predicted results of the baseline.ST‐SIGMA,Spatio‐temporal semantics and interaction graph aggregation for multi‐agent perception and trajectory forecasting

    Furthermore,we analyse the effect of different weighted adjacency matrix kernel functions,which define the mutual influence between vertices and provide the prior knowledge of social relations among traffic agents.Since the graph interactions have a large impact on the future trajectory of agents,specifically,we analyse the influence of different kernel functions,k1,k2,k3,k4andk5in Equation(14),on the trajectory forecasting performance(ADE)of multiple traffic agents,and the results are shown in Table 6.

    In Equation(14),we setk1=1 as the baseline in the weighted adjacency matrix.And the second kernel functionk2is defined as the Euclidean distance(L2 norm)between traffic agents to simulate their influence on each other.The third kernelk3is defined as the inverse of the L2 norm and adds a residual parameter?to the denominator,making sure the denominator is not equal to 0.The fourth kernel functionk4is calculated using the Gaussian radial basis function[50].And the fifth kernelk5is also defined as the inverse of the L2 norm,but different fromk3,we setk5=0 in the case of,wherevitandvj

    trepresent thei‐th agent andj‐th agent at timestampt.It indicates that two traffic agents are considered to be the same object when they are at the same location.The performances of these kernel functions are shown in Table 6,and through ablation experiments,we can see that the best performance is produced byk5,since the future motion of traffic agents are more sensitive to theinfluence of other similar objects physically close to them.Therefore,this study computes similarity measurements for traffic agents to address this issue.Conversely,without this condition,the relationship between traffic agents in the model cannot be correctly represented.Therefore,this paper usesk5to define the adjacency matrix in all experiments.

    TABLE 6 Ablation analysis about the influence of different kernel functions on trajectory forecasting performance

    In addition,we also try to find the optimal number of input BEV frames for trajectory forecasting performance.To achieve this goal,we draw the curve graph in Figure 9 to show the relationship between the number of input frames and the average displacement errors of trajectory forecasting.It is observed that when the number of input frames increases from 1 to 5,all ADE are consistently decreased.However,the time and space complexity will increase significantly with the increase in the input frame number.We need to make a trade‐off between the number of input frames and the model performance.As shown in Figure 9,when the number of frames exceeds 5,the model performance gradually saturates and even decreases for fast moving traffic agents.Hence,we set the number of input frames to 5 for all experiments.

    5|CONCLUSIONS

    This paper proposes ST‐SIGMA,a unified framework for simultaneous scene perception and trajectory forecasting.The proposed method can jointly and accurately perceive the AD environment and forecast the trajectories of the surrounding traffic agents,which are crucial for real‐world AD systems.Experimental results show that the proposed ST‐SIGMA framework outperforms the SOTA method by 2.3% higher MCA for pixel categorisation and 4.4%less ADE for trajectory forecasting.In future work,we intend to utilise multimodal data fusion(LiADR,camera,HD‐Map)and adopt traffic rules to further improve the performance of scene perception and trajectory forecasting while maintaining the acceptable complexity cost.

    FIGURE 9 Ablation analysis about the effect of the number of input frames on the trajectory forecasting performance.Considering the accuracy and the efficient trade‐off,we chose 5 as the optimal number of input frames for all experimental settings.ADE,average displacement error

    ACKNOWLEDGEMENTS

    This work was supported in part by the Science and Technology Research Program of Chongqing Municipal Education Commission(No.KJQN202100634 and No.KJZD‐K201900605),National Natural Science Foundation of China(No.62006065),Basic and Advanced Research Projects of CSTC(No.cstc2019jcyj‐zdxmX0008).

    CONFLICT OF INTEREST

    The authors declared that they have no conflicts of interest to this work.

    DATA AVAILABILITY STATEMENT

    Research data are not shared.

    ORCID

    Yang Fanghttps://orcid.org/0000-0001-6705-4757

    国产精品.久久久| 国产爽快片一区二区三区| 日本欧美视频一区| 99精国产麻豆久久婷婷| 精品一区二区免费观看| 国产av国产精品国产| 欧美xxxx性猛交bbbb| 国产又爽黄色视频| 日本-黄色视频高清免费观看| 精品久久久久久电影网| 一区二区三区乱码不卡18| 中文字幕亚洲精品专区| 亚洲伊人久久精品综合| 久久午夜综合久久蜜桃| 亚洲五月色婷婷综合| 久久久a久久爽久久v久久| 捣出白浆h1v1| 香蕉国产在线看| 伦理电影免费视频| 在线免费观看不下载黄p国产| 夜夜爽夜夜爽视频| 国产女主播在线喷水免费视频网站| 亚洲av.av天堂| av卡一久久| 女人被躁到高潮嗷嗷叫费观| 大香蕉久久网| 国产 一区精品| 亚洲国产精品一区三区| 国产精品99久久99久久久不卡 | 久久久久久人人人人人| 女人久久www免费人成看片| 波多野结衣一区麻豆| 亚洲国产精品一区三区| 免费看av在线观看网站| 黄色一级大片看看| 美女国产视频在线观看| 99久国产av精品国产电影| 精品人妻一区二区三区麻豆| 肉色欧美久久久久久久蜜桃| 九色成人免费人妻av| 大香蕉97超碰在线| 精品一品国产午夜福利视频| 汤姆久久久久久久影院中文字幕| 在线观看美女被高潮喷水网站| 国产亚洲一区二区精品| 久久鲁丝午夜福利片| 国产一区二区三区av在线| 国产精品一区二区在线不卡| 黄色 视频免费看| 精品人妻熟女毛片av久久网站| 只有这里有精品99| 久久 成人 亚洲| 欧美日韩亚洲高清精品| 久久国产精品大桥未久av| 精品久久久久久电影网| 国产老妇伦熟女老妇高清| av在线app专区| 国产精品一区二区在线观看99| 国产极品粉嫩免费观看在线| 一本久久精品| h视频一区二区三区| 国产女主播在线喷水免费视频网站| 18禁裸乳无遮挡动漫免费视频| 国产在视频线精品| 国产成人精品一,二区| 亚洲欧美精品自产自拍| 黑人猛操日本美女一级片| 国产成人aa在线观看| av视频免费观看在线观看| 久久久亚洲精品成人影院| 国产精品三级大全| 亚洲精品中文字幕在线视频| 国产白丝娇喘喷水9色精品| 日韩精品有码人妻一区| 99久久精品国产国产毛片| 久久久久久久亚洲中文字幕| 一级片免费观看大全| 精品久久国产蜜桃| 婷婷成人精品国产| 五月开心婷婷网| 成人国产麻豆网| 最后的刺客免费高清国语| 精品亚洲成a人片在线观看| 插逼视频在线观看| 午夜久久久在线观看| 国产成人aa在线观看| 亚洲精华国产精华液的使用体验| 99香蕉大伊视频| 国产免费视频播放在线视频| 老女人水多毛片| 欧美xxⅹ黑人| 在线观看国产h片| 99热这里只有是精品在线观看| 国产精品不卡视频一区二区| 最黄视频免费看| 热re99久久国产66热| 乱码一卡2卡4卡精品| 精品熟女少妇av免费看| 成人影院久久| 国产爽快片一区二区三区| 久久久久久久久久久免费av| 中文精品一卡2卡3卡4更新| 男人爽女人下面视频在线观看| 亚洲精品久久成人aⅴ小说| 亚洲精品乱码久久久久久按摩| 国产视频首页在线观看| 亚洲精品国产av蜜桃| 亚洲经典国产精华液单| 国产精品三级大全| 欧美成人精品欧美一级黄| 久久久国产精品麻豆| 久久久久久伊人网av| 春色校园在线视频观看| 97超碰精品成人国产| 波多野结衣一区麻豆| 国产精品久久久久久av不卡| 精品国产一区二区三区四区第35| 亚洲av国产av综合av卡| 91精品国产国语对白视频| 伊人久久国产一区二区| 日韩一本色道免费dvd| 中国国产av一级| 成人二区视频| 精品熟女少妇av免费看| 在线观看美女被高潮喷水网站| 18禁观看日本| 男女下面插进去视频免费观看 | 欧美激情极品国产一区二区三区 | 秋霞伦理黄片| 久久99精品国语久久久| 成人无遮挡网站| 在线观看人妻少妇| 久久久久久久久久久久大奶| 免费观看性生交大片5| 乱人伦中国视频| 成年美女黄网站色视频大全免费| 欧美国产精品一级二级三级| 亚洲美女搞黄在线观看| 亚洲精品乱码久久久久久按摩| 久久久国产一区二区| 日日爽夜夜爽网站| 久久这里有精品视频免费| 性高湖久久久久久久久免费观看| 久久这里只有精品19| www日本在线高清视频| 亚洲精品一二三| 69精品国产乱码久久久| 色婷婷久久久亚洲欧美| 亚洲在久久综合| 一区二区av电影网| 日本色播在线视频| 少妇猛男粗大的猛烈进出视频| 精品人妻一区二区三区麻豆| 高清黄色对白视频在线免费看| 亚洲欧洲精品一区二区精品久久久 | 欧美人与善性xxx| 亚洲av成人精品一二三区| 国产精品久久久久成人av| 日韩欧美一区视频在线观看| 制服诱惑二区| 亚洲av免费高清在线观看| 欧美变态另类bdsm刘玥| 99热网站在线观看| 建设人人有责人人尽责人人享有的| av有码第一页| 国产亚洲午夜精品一区二区久久| 午夜精品国产一区二区电影| 久久久久久久久久人人人人人人| 久久ye,这里只有精品| 国产欧美日韩一区二区三区在线| 美女大奶头黄色视频| 国产成人av激情在线播放| 国产在线一区二区三区精| 最近最新中文字幕大全免费视频 | 色94色欧美一区二区| 精品午夜福利在线看| 麻豆精品久久久久久蜜桃| 国产精品.久久久| 欧美精品国产亚洲| 精品99又大又爽又粗少妇毛片| 免费观看在线日韩| 国产精品一区二区在线不卡| 国产精品久久久av美女十八| 国产极品粉嫩免费观看在线| 99热全是精品| 人体艺术视频欧美日本| 久久综合国产亚洲精品| 久久久国产欧美日韩av| 哪个播放器可以免费观看大片| 成人无遮挡网站| 国产精品国产av在线观看| av免费观看日本| 国产精品久久久久成人av| 18+在线观看网站| 国产欧美日韩综合在线一区二区| 国产 精品1| 黄色一级大片看看| 在线观看国产h片| 亚洲av在线观看美女高潮| 美女大奶头黄色视频| 日本爱情动作片www.在线观看| av黄色大香蕉| 日本av免费视频播放| 最近手机中文字幕大全| 国产在线视频一区二区| 中文字幕最新亚洲高清| 在线观看人妻少妇| 日韩中字成人| 国产日韩欧美视频二区| 高清黄色对白视频在线免费看| 午夜久久久在线观看| 国产精品国产三级国产专区5o| 高清在线视频一区二区三区| 免费女性裸体啪啪无遮挡网站| 精品一区二区三区四区五区乱码 | 国产黄色免费在线视频| 一级毛片我不卡| 国产成人午夜福利电影在线观看| 日韩大片免费观看网站| 香蕉精品网在线| 寂寞人妻少妇视频99o| 日韩成人av中文字幕在线观看| 亚洲国产毛片av蜜桃av| 香蕉国产在线看| 老司机亚洲免费影院| 99re6热这里在线精品视频| 亚洲成国产人片在线观看| 99热全是精品| 男的添女的下面高潮视频| 亚洲,一卡二卡三卡| 夜夜爽夜夜爽视频| 亚洲伊人久久精品综合| 香蕉精品网在线| 寂寞人妻少妇视频99o| av国产精品久久久久影院| 国产一区有黄有色的免费视频| 成人手机av| 精品久久久精品久久久| 99国产精品免费福利视频| 日韩中文字幕视频在线看片| 老熟女久久久| 日本av免费视频播放| 成年人午夜在线观看视频| 看免费成人av毛片| 精品一区在线观看国产| 黄色一级大片看看| 在线观看国产h片| 久久久国产欧美日韩av| 观看av在线不卡| 久久久久精品性色| 精品国产露脸久久av麻豆| 久久午夜综合久久蜜桃| 精品一品国产午夜福利视频| av又黄又爽大尺度在线免费看| 久热久热在线精品观看| 这个男人来自地球电影免费观看 | 天美传媒精品一区二区| 一区二区日韩欧美中文字幕 | 一级黄片播放器| 90打野战视频偷拍视频| 亚洲国产av影院在线观看| 夫妻午夜视频| 黑丝袜美女国产一区| 在线看a的网站| 亚洲国产av影院在线观看| 亚洲精品av麻豆狂野| 极品少妇高潮喷水抽搐| 一级毛片黄色毛片免费观看视频| 18禁裸乳无遮挡动漫免费视频| 国产精品一区www在线观看| 亚洲性久久影院| 一区二区av电影网| 一区二区三区乱码不卡18| 亚洲国产看品久久| 超色免费av| 啦啦啦中文免费视频观看日本| 精品99又大又爽又粗少妇毛片| 插逼视频在线观看| 久久久国产一区二区| 国产乱来视频区| 女人久久www免费人成看片| 色吧在线观看| 成年人午夜在线观看视频| tube8黄色片| 只有这里有精品99| 最近2019中文字幕mv第一页| 在线看a的网站| 精品国产国语对白av| 尾随美女入室| 中文字幕最新亚洲高清| 考比视频在线观看| 亚洲内射少妇av| 亚洲美女搞黄在线观看| 国产成人免费无遮挡视频| 80岁老熟妇乱子伦牲交| 午夜久久久在线观看| av在线老鸭窝| av免费在线看不卡| 亚洲国产精品国产精品| 亚洲国产av新网站| 自线自在国产av| 两性夫妻黄色片 | 国产精品国产av在线观看| 国产 一区精品| 日韩中文字幕视频在线看片| 一区二区三区精品91| 亚洲精品色激情综合| 国产黄色视频一区二区在线观看| 国产成人午夜福利电影在线观看| 亚洲天堂av无毛| 国语对白做爰xxxⅹ性视频网站| av黄色大香蕉| 亚洲av电影在线进入| 国产精品免费大片| 性色avwww在线观看| 亚洲精品一二三| 香蕉丝袜av| 亚洲精品国产色婷婷电影| 97精品久久久久久久久久精品| www.av在线官网国产| 少妇人妻 视频| av国产久精品久网站免费入址| 新久久久久国产一级毛片| 在线天堂最新版资源| 欧美bdsm另类| 国精品久久久久久国模美| 夜夜爽夜夜爽视频| 男人操女人黄网站| 男人爽女人下面视频在线观看| 国产精品人妻久久久久久| 亚洲精品视频女| 2018国产大陆天天弄谢| 青春草国产在线视频| 国产成人精品在线电影| 777米奇影视久久| 精品亚洲乱码少妇综合久久| 日韩一本色道免费dvd| 人人妻人人添人人爽欧美一区卜| 国产淫语在线视频| 亚洲内射少妇av| 日韩视频在线欧美| 国产国拍精品亚洲av在线观看| 久久久a久久爽久久v久久| 亚洲伊人久久精品综合| av黄色大香蕉| 爱豆传媒免费全集在线观看| 日韩视频在线欧美| 多毛熟女@视频| 久久狼人影院| 97在线视频观看| 久久精品aⅴ一区二区三区四区 | 七月丁香在线播放| av黄色大香蕉| 18禁裸乳无遮挡动漫免费视频| 国产深夜福利视频在线观看| 菩萨蛮人人尽说江南好唐韦庄| 女人被躁到高潮嗷嗷叫费观| 免费播放大片免费观看视频在线观看| 国产在线一区二区三区精| 国产精品秋霞免费鲁丝片| 中文字幕制服av| 亚洲熟女精品中文字幕| 欧美 日韩 精品 国产| 成人黄色视频免费在线看| 丝袜脚勾引网站| 男女免费视频国产| 日韩 亚洲 欧美在线| 边亲边吃奶的免费视频| 免费黄色在线免费观看| 街头女战士在线观看网站| 国产精品国产三级国产专区5o| 1024视频免费在线观看| 51国产日韩欧美| 一级,二级,三级黄色视频| 亚洲欧美一区二区三区国产| 亚洲精品久久久久久婷婷小说| 日韩 亚洲 欧美在线| av网站免费在线观看视频| 热99国产精品久久久久久7| 最近最新中文字幕大全免费视频 | 国产 一区精品| 亚洲精品456在线播放app| 黄片无遮挡物在线观看| 啦啦啦啦在线视频资源| 精品国产乱码久久久久久小说| 三上悠亚av全集在线观看| 又大又黄又爽视频免费| 久久精品国产a三级三级三级| 日本色播在线视频| 国产在线一区二区三区精| 亚洲av.av天堂| 乱码一卡2卡4卡精品| 99久久中文字幕三级久久日本| videos熟女内射| 我的女老师完整版在线观看| 在线观看一区二区三区激情| 免费av中文字幕在线| 国产午夜精品一二区理论片| 综合色丁香网| 成人免费观看视频高清| 一区二区三区乱码不卡18| 精品熟女少妇av免费看| 日韩 亚洲 欧美在线| 国产极品天堂在线| 久久久久久久大尺度免费视频| 少妇被粗大猛烈的视频| 亚洲少妇的诱惑av| 又粗又硬又长又爽又黄的视频| 青春草视频在线免费观看| 男女边吃奶边做爰视频| 欧美激情 高清一区二区三区| 免费大片18禁| 亚洲经典国产精华液单| 2021少妇久久久久久久久久久| 亚洲国产精品一区二区三区在线| 国产成人aa在线观看| 国产精品国产三级专区第一集| 热re99久久精品国产66热6| 美女主播在线视频| 亚洲国产精品国产精品| 久久人人爽人人爽人人片va| a级片在线免费高清观看视频| 成人18禁高潮啪啪吃奶动态图| 不卡视频在线观看欧美| 成人无遮挡网站| 日本黄大片高清| 欧美日韩av久久| 人成视频在线观看免费观看| 丝袜美足系列| 欧美+日韩+精品| 又大又黄又爽视频免费| 亚洲欧洲国产日韩| a级毛片黄视频| 国产一区二区激情短视频 | 岛国毛片在线播放| 啦啦啦在线观看免费高清www| 亚洲精品国产av蜜桃| 亚洲精品视频女| 国产精品嫩草影院av在线观看| 亚洲经典国产精华液单| 青青草视频在线视频观看| 日韩精品免费视频一区二区三区 | 毛片一级片免费看久久久久| 97超碰精品成人国产| 精品午夜福利在线看| 丝瓜视频免费看黄片| 欧美精品一区二区免费开放| 日韩一区二区三区影片| 一级片免费观看大全| 免费日韩欧美在线观看| 精品国产露脸久久av麻豆| 亚洲精品av麻豆狂野| 日本猛色少妇xxxxx猛交久久| 男人添女人高潮全过程视频| 欧美精品av麻豆av| av有码第一页| 色婷婷久久久亚洲欧美| 丝袜人妻中文字幕| 九九爱精品视频在线观看| 各种免费的搞黄视频| 午夜视频国产福利| 色视频在线一区二区三区| 波野结衣二区三区在线| 国产精品一区www在线观看| 国产精品成人在线| 黑人巨大精品欧美一区二区蜜桃 | 亚洲国产看品久久| 青青草视频在线视频观看| 亚洲经典国产精华液单| 热99国产精品久久久久久7| 亚洲av国产av综合av卡| 成年av动漫网址| 大码成人一级视频| 成年女人在线观看亚洲视频| 久久久久人妻精品一区果冻| 少妇被粗大的猛进出69影院 | 亚洲国产精品一区三区| 99热全是精品| 丝袜人妻中文字幕| 久久精品夜色国产| 大陆偷拍与自拍| 亚洲国产最新在线播放| 老司机影院毛片| 91精品国产国语对白视频| 看免费成人av毛片| 国产精品人妻久久久影院| 高清毛片免费看| 婷婷色综合大香蕉| 晚上一个人看的免费电影| 日本色播在线视频| 亚洲精品国产av蜜桃| 一区二区av电影网| 亚洲伊人色综图| 成人国产麻豆网| av线在线观看网站| 精品一区二区免费观看| 亚洲少妇的诱惑av| 少妇人妻久久综合中文| 欧美成人午夜免费资源| 99香蕉大伊视频| 超碰97精品在线观看| 在线观看免费日韩欧美大片| 亚洲精品成人av观看孕妇| 男女下面插进去视频免费观看 | 国产毛片在线视频| 久久久久人妻精品一区果冻| 宅男免费午夜| 777米奇影视久久| 亚洲伊人色综图| 26uuu在线亚洲综合色| 亚洲精品456在线播放app| 亚洲激情五月婷婷啪啪| 女性被躁到高潮视频| 久久国产精品大桥未久av| 久久久久久久精品精品| 在线观看免费视频网站a站| 国产一区亚洲一区在线观看| 高清在线视频一区二区三区| 最新中文字幕久久久久| 久久久久久久大尺度免费视频| 国产精品偷伦视频观看了| 少妇人妻精品综合一区二区| 只有这里有精品99| 久久人人爽人人片av| 国产极品天堂在线| 少妇人妻精品综合一区二区| 边亲边吃奶的免费视频| 欧美人与性动交α欧美软件 | 大片免费播放器 马上看| 亚洲人成77777在线视频| 毛片一级片免费看久久久久| 色5月婷婷丁香| 日韩成人av中文字幕在线观看| 精品一品国产午夜福利视频| 久久ye,这里只有精品| 欧美激情国产日韩精品一区| 啦啦啦中文免费视频观看日本| 日本黄色日本黄色录像| 中文字幕av电影在线播放| 黄色配什么色好看| 日日爽夜夜爽网站| 国产一区二区三区综合在线观看 | 久热这里只有精品99| 男女高潮啪啪啪动态图| 国产毛片在线视频| 国产av国产精品国产| 国产高清不卡午夜福利| 亚洲精品国产av成人精品| 天堂中文最新版在线下载| 美女国产高潮福利片在线看| 纵有疾风起免费观看全集完整版| 黄色视频在线播放观看不卡| 亚洲精品乱码久久久久久按摩| 久久精品久久久久久久性| 美女主播在线视频| 高清毛片免费看| 一级毛片 在线播放| 伊人久久国产一区二区| 国产男人的电影天堂91| 成人午夜精彩视频在线观看| 久久久精品94久久精品| 免费在线观看完整版高清| 久久精品久久精品一区二区三区| 国产免费现黄频在线看| 我要看黄色一级片免费的| 天天操日日干夜夜撸| 一二三四在线观看免费中文在 | 国产综合精华液| 久久久久久久久久久免费av| 又黄又粗又硬又大视频| av网站免费在线观看视频| 在线免费观看不下载黄p国产| 91国产中文字幕| 18禁动态无遮挡网站| 国产精品秋霞免费鲁丝片| 久久久国产欧美日韩av| 高清av免费在线| 国产伦理片在线播放av一区| 啦啦啦视频在线资源免费观看| 美女脱内裤让男人舔精品视频| 老司机亚洲免费影院| 1024视频免费在线观看| 成人二区视频| 久久这里只有精品19| 老女人水多毛片| 男女边吃奶边做爰视频| 亚洲人成网站在线观看播放| 日本av手机在线免费观看| 国产成人精品福利久久| 王馨瑶露胸无遮挡在线观看| 中文字幕另类日韩欧美亚洲嫩草| 女性被躁到高潮视频| 这个男人来自地球电影免费观看 | 美女国产视频在线观看| 国产色婷婷99| 人人妻人人添人人爽欧美一区卜| 国产熟女欧美一区二区| 亚洲欧美中文字幕日韩二区| 亚洲精品久久久久久婷婷小说| 成人二区视频| 极品人妻少妇av视频| 久久国产精品男人的天堂亚洲 | 久久久精品免费免费高清| 国产精品一区二区在线不卡| 亚洲精品美女久久av网站| 99九九在线精品视频| 国产女主播在线喷水免费视频网站| 欧美成人午夜精品| 26uuu在线亚洲综合色| 国产日韩欧美亚洲二区| 国产精品免费大片| 国产在线视频一区二区| 久久国内精品自在自线图片| 这个男人来自地球电影免费观看 | 国产精品一区二区在线不卡|