• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    2017-06-27 08:09:23ZhangXiuxia張秀霞TanGuangmingChenMingyuYaoErlin
    High Technology Letters 2017年2期

    Zhang Xiuxia (張秀霞), Tan Guangming, Chen Mingyu, Yao Erlin

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    A case study of 3D RTM-TTI algorithm on multicore and many-core platforms①

    Zhang Xiuxia (張秀霞)②***, Tan Guangming*, Chen Mingyu*, Yao Erlin*

    (*State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P.R.China) (**University of Chinese Academy of Sciences, Beijing 100049, P.R.China)

    3D reverse time migration in tiled transversly isotropic (3D RTM-TTI) is the most precise model for complex seismic imaging. However, vast computing time of 3D RTM-TTI prevents it from being widely used, which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores. After data parallelism and memory optimization, the hot spot function of 3D RTM-TTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVIDIA K20 GPU compared with serial CPU baseline. This study makes RTM-TTI practical in industry. Since the computation pattern in RTM is stencil, the approaches also benefit a wide range of stencil-based applications.

    3D RTM-TTI, Intel Xeon Phi, NVIDIA K20 GPU, stencil computing, many-core, multicore, seismic imaging

    0 Introduction

    3D reverse time migration in tiled transverse isotropy (3D RTM-TTI) is the most precise model used in complex seismic imaging, which remains challenging due to technology complexity, stability, computational cost and difficulty in estimating anisotropic parameters for TTI media[1,2]. Reverse time migration (RTM) model was first introduced in the 1983[3]by Baysal. However, the 3D RTM-TTI model is more recent[1,2,4], which is much more precise and intricate in complex seismic imaging. Normally, RTM-TTI needs thousands of iterations to get image data in particular precision. In our practical medium-scale data set, it takes around 606 minutes to iterate 1024 times with five processes on Intel Xeon processors. It will cost more when dealing with larger dataset or iterating more times in order to get more accurate result in future experiments. Enormous computing time prevents 3D RTM-TTI from being widely used in industry.

    The limitations of current VLSI technology resulting in memory wall, power wall, ILP wall and the desire to transform the ever increasing number of transistors on a chip dictated by Moore’s Law into faster computers have led most hardware manufacturers to design multicore processors and specialized hardware accelerators. In the last few years, specialized hardware accelerators such as the Cell B.E. accelerators[5], general-purpose graphics processing units (GPGPUs)[6]have attracted the interest of the developers of scientific computing libraries. Besides, more recent Intel Xeon Phi[7]also emerges in Graph500 rankings. High performance energy efficiency and high performance price ratio feature these accelerators. Our work is trying to address the enormous computing time of 3D RTM-TTI by utilizing them.

    The core computation of RTM model is a combination of three basic stencil calculations:x-stencil,y-stencil andz-stencil as explained later. Although the existing stencil optimization methods could be adopted on GPU and CPU, it’s more compelling than ever to design a more efficient parallel RTM-TTI by considering the relationship among these stencils. Besides, there is not much performance optimization research on Intel Xeon Phi. Fundamental research work on Intel Xeon Phi is needed to find their similarity and difference of the three platforms.

    In this paper, implementation and optimization of 3D RTM-TTI algorithms on CPUs, Intel Xeon Phi and GPU are presented considering both architectural features and algorithm characteristics. By taking the algorithm characteristics into account, a proper low data coupling task partitioning method is designed. Considering architecture features, a series of optimization methods is adopted explicitly or implicitly to reduce high latency memory access and the number of memory accesses. On CPU and Xeon Phi, we start from parallelization in multi-threading and vectorization, kernel memory access is optimized by cache blocking, huge page and loop splitting. On GPU, considering GPU memory hierarchy, a new 1-pass algorithm is devised to reduce computations and global memory access. The main contributions of this paper can be summarized as follows:

    1. Complex 3D RTM-TTI algorithm is systematically implemented and evaluated on three different platforms: CPU, GPU, and Xeon Phi, which is the first time to implement and evaluate 3D RTM-TTI on these three platforms at the same time.

    2. With deliberate optimizations, the 3D RTM-TTI obtains considerable performance speedup which makes RTM-TTI practical in industry.

    3. Optimization methods are quantitatively evaluated which may guide other developers and give us some insight about architecture in software aspect. By analyzing the process of designing parallel codes, some general guides and advice in writing and optimizing parallel program on Xeon Phi, GPUs and CPUs are given.

    The rest of the paper is organized as follows: An overview of algorithm and platform is given in Section 1. Section 2 and 3 highlight optimization strategies used in the experiments on CPU, Xeon Phi and GPU respectively. In Section 4, the experimental results and analysis of the results are presented. Related work is discussed in Section 5. At last, conclusion is done in Section 6.

    1 Background

    To make this paper self-contained, a brief introduction is given to 3D RTM-TTI algorithm, then the architecture of Intel MIC and NVIDIA GPU K20 and programming models of them are described respectively.

    1.1 Sequential algorithm

    RTM model is a reverse engineering process. The main technique for seismic imaging is to generate acoustic waves and record the earth’s response at some distance from the source. It tries to model propagation of waves in the earth in two-way wave equation, once from source and once from receiver. The acoustic isotropic wave can be written as partial differential functions[8]. Fig.1 shows the overall 3D RTM-TTI algorithm, which is composed of shots loop, nested iteration loop and nested grid loop. Inside iteration, it computes front and back propagation wave field, boundary processing and cross correlation. In timing profile, most of the computation time of 3D RTM-TTI algorithm is occupied by the wave field computing step. Fig.2 shows the main wave updating operations within RTM after discretization of partial differential equations. Wave updating function is composed of derivative computing, like most finite differential computing, they belong to stencil computing. Three base stencils are combined to formxy,yz,xzstencils,asFig.3shows.Eachcellinwavefieldneedsacubicof9×9×9toupdateasFig.4shows.Allthesethreestencilshaveoverlappedmemoryaccess.

    1.2 Architecture of Xeon Phi

    Xeon Phi (also called MIC)[7]is a brand name given to a series of manycore architecture. Knight Corner is the codename of Intel’s second generation manycore architecture, which comprises up to sixty-one processor cores connected by a high performance on-die bidirectional interconnect. Each core supports 4 hardware threadings. Each thread replicates some of the architectural states, including registers, which makes it very fast to switch between hardware threads. In addition to the IA cores, there are 8 memory controllers supporting up to 16 GDDR5 channels delivering up to 5.5GT/s. In each MIC core, there are two in-order pipelines: scalar pipeline and vector pipeline. Each core has 32 registers of 512 bits width. Programming on Phi can be run both natively like CPU and in offload mode like GPU.

    1.3 Kepler GPU architecture

    NVIDIA GPU[6]is presented as a set of multiprocessors. Each one is equipped with its own CUDA cores and shared memory (user-managed cache). Kepler is the codename for a GPU microarchitecture developed by NVIDIA as the successor to the Fermi. It has 13 to 15 SMX units, as for K20, the number of SMX units is 13. All multiprocessors have access to global device memory. Memory latency is hidden by executing thousands of threads concurrently. Registers and shared memory resources are partitioned among the currently executing threads, context switching between threads is free.

    Fig.3 One wave field point updating

    Fig.4 Stencil in a cubic

    2 Implementation and optimization on Intel Xeon Phi and CPU

    Optimizing RTM on Intel Xeon Phi and CPU is similar due to similar programming model, the optimization methods of these two platforms are proposed in detail in this section.

    2.1 Parallelization

    2.1.1 Multi-threading

    Intel threading building blocks (TBB) thread library is used to parallelize 3D RTM-TTI codes on CPU and Xeon Phi. Since grid size is much larger than the thread size, the task is partitioned in 3D dimension sub-cubic. Fig.5 demonstrates TBB template for 3D task partition, and the task size is (bx,by,bz).OnCPUandXeonPhiplatforms,eachthreadcomputesdeviationsinthesub-cubic.Anautomatictuningtechniqueisusedtosearchthebestnumberofthreads.ForRTMapplication,theoptimalnumberofthreadsonXeonPhiis120,thebestthreadsnumberofIntelXeonCPUNUMA-coreis12threads.

    2.1.2 Instruction level parallel: SIMDization

    One of the most remarkable features of Xeon Phi is its vector computing unit. Vector length is 512 bits, which is larger than CPU’s vector 256 bits AVX vector. One Xeon Phi vector instruction can be used to compute 512/8/4 = 16 single float type data at once. Vector instruction is used by unrolling the innermost loop and using #pragmasimdintrinsic.

    2.2Memoryoptimization

    2.2.1Cacheblocking

    Cacheblockingisastandardtechniqueforimprovingcachereuse,becauseitreducesthememorybandwidthrequirementofanalgorithm.Thedatasetinasinglecomputingnodeinourapplicationis4.6GB,whereascachesizefortheprocessorsinCPUandXeonPhiislimitedtoafewMBs.Thefactthathigherperformancecanbeachievedforsmallerdatasetsfittingintocachememorysuggestsadivide-and-conquerstrategyforlargerproblems.Cacheblockingisaneffectwaytoimprovelocality.Cacheblockingisusedtoincreasespatiallocality,i.e.referencingnearbymemoryaddressesconsecutively,andreduceeffectivememoryaccesstimeoftheapplicationbykeepingblocksoffuturearrayreferencesatthecacheforreuse.Sincethedatatotalusedisfarbeyondcachecapacityandnon-continuousmemoryaccess,acachemissisunavoidable.It’seasiertoimplementcacheblockingonthebasisofourpreviousparallelTBBimplementation,becauseTBBisataskbasedthreadlibrary,eachthreadcandoseveraltasks,soaparallelprogramcanhavemoretasksthanthreads.Thetasksize(bx,by,bz)isadjustedtosmallcubicthatcouldbecoveredbyL2cache.

    2.2.2Loopsplitting

    Loopsplittingorloopfissionisasimpleapproachthatbreaksaloopintotwoormoresmallerloops.Itisespeciallyusefulforreducingthecachepressureofakernel,whichcanbetranslatedtobetteroccupancyandoverallperformanceimprovement.Ifmultipleoperationsinsidealoopbodyreplyondifferentinputsandtheseoperationsareindependent,then,theloopsplittingcanbeapplied.Thesplittingleadstosmallerloopbodiesandhencereducestheloopregisterpressure.ThedataflowofPandQarequitedecoupled.It’sbettertosplitthemtoreducethestressofcache.IterateondatasetPandQrespectively.

    2.2.3Hugepagetable

    SinceTLBmissesareexpensive,TLBhitscanbeimprovedbymappinglargecontiguousphysicalmemoryregionsbyasmallnumberofpages.SofewerTLBentriesarerequiredtocoverlargervirtualaddressranges.Areducedpagetablesizealsomeansareductionmemorymanagementoverhead.Touselargerpagesizesforsharedmemory,hugepagesmustbeenabledwhichalsolocksthesepagesinphysicalmemory.Thetotalmemoryusedis4.67GB,andmorethan1Mpagesof4kBsizewillbeused,whichexceedswhatL1andL2TLBcanhold.Byobservationofthealgorithm,itisfoundthatPandQareusedmanytimes,hugepagesareallocatedforthem.Regular4kBpageandhugepagearemixedlyusedtogether.Theusingmethodissimple.First,interactwithOSbywritingourinputintotheprocdirectory, and reserve enough huge pages. Then usemmapfunction to map huge page files into process memory.

    3 Implementation and optimizations on GPU

    3.1 GPU implementation

    The progress of RTM is to compute a serials of derivatives and combine them to update wave fieldPandQ. In GPU implementation, there are several separate kernels to compute each derivative. Without losing generality, we give an example how to compute dxyin parallel. The output of this progress is a 3D grid of dxy. Task partition is based on result dxy. Each thread computenzpoints, each block computebx·bypanel,andlotsofblockswillcoverthetotalgrid.

    3.2Computingreductionand1-passalgorithmoptimization

    Fig.3showsseveralkindsofderivatives.Thetraditional2-passcomputationistocompute1-orderderivativedx, dy, dz,andthencomputedxy, dyz, dxzbasedonit.Thismethodwillbringadditionalglobalreads,globalwritesandstoragespace.Amethodtoreduceglobalmemoryaccessisdevisedbyusingsharedmemoryandregisters:1-passalgorithm.Similarto2-passalgorithm,eachthreadcomputesaz-direction result of dxy.The1-orderresultxy-panel is stored in shared memory, and register double buffering is used to reduce shared memory reading. Fig.6 shows a snapshot of register buffering.

    Fig.6 1-pass computing window snapshot

    4 Evaluation

    4.1 Experiment setup

    The experiment is conducted on three platforms. The main parameters are listed in Table 1. The input of RTM is single pulse data with grid dimension of 512×312×301. The algorithm iterates 1000 times. The time in this section is the average time of one iteration.

    Table 1 Architecture parameters

    4.2 Overall performance

    Fig.7 shows performance comparison of three platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. The hotspot function of 3D RTMTTI gains 35.99X speedup on two Intel Xeon CPUs, 89.75X speedup on one Intel Xeon Phi, 89.92X speedup on one NVDIA K20 GPU compared with serial CPU baselines. Our work makes RTM-TTI practical in industry. The result also shows obviously that accelerators are better at 3D RTM-TTI algorithm than traditional CPUs. The hotspot function gains around 2.5X speedup on GPU and Xeon Phi than that on two CPUs. On one hand, because the data dependency in RTM algorithm is decoupled, plenty of parallelism could be applied. Accelerators have more cores, threads, and wider vector instructions. For example, Xeon Phi has 60 computing cores. Besides that, it has 512-bit width vector instruction. Tesla K20 GPU has 2496 cores. Hence, accelerators are good at data parallelism computing. RTM algorithm is a memory bounded application. Accelerators like Xeon Phi and GPU have 7X and 5X more theoretical memory bandwidth than CPU as shown in Table 1.

    Fig.7 Performance evaluations of three platforms

    4.3 Performance analysis

    On CPU, the wave updating function gains 35.99X speedup compared with single thread CPU baseline. 20.12X speedup comes from parallelism of multi-threading and vector instruction as 1.96X comes from memory optimization, such as cache blocking, loop splitting and huge page configuring, as Figs 8 and 9 show.

    Fig.10 and Fig.11 show the parallelism and memory optimization performance of Xeon Phi respectively. RTM gains 13.81X for using 512-bit vector instruction on Phi. From Table 1, it is seen that the ideal speedup for single float SIMD on Xeon Phi is 16X. SIMD is near to the ideal limit. It’s due to cache miss which will make the pipeline stalled. The multi-threading on Xeon Phi gains 40.13X speedup, there are 60 cores on Xeon Phi. Xeon Phi has very good scalability in multi-threading and wide vector instruction. RTM gains 2.08X speedup due to cache blocking, because cache blocking reduces cache miss rate and provides good memory locality which will benefits SIMD and multi-threading. RTM gains 1.44X by using huge page for reducing L2 TLB miss rate. Loop splitting gains 1.69X speedup to reduce cache pressure in advance. When compared on the same platform, 2806.13X speedup is gained compared with the single thread Xeon Phi baseline. Of this, 554.53X is from parallelism of multi-threading and vector instruction, 5.06X is achieved from memory optimization. Here Intel Phi is more sensitive to data locality according to more speedup gains from explicit memory optimization.

    Fig.8 Parallelism evaluation on CPU (MT:multi-threading, Vec: vectorization)

    Fig.9 Memory optimization on CPU (Ca: cache blocking, Sp:splitting)

    As Fig.12 shows, RTM gains 1.23X speedup by using 1-pass algorithm on GPU, and 1.20X speedup by using texture memory in 1-pass algorithm. In total,the hot spot function gains 2.33X speedup compared with the baseline parallel GPU implementation. Threads block and grid selection are very important to the performance of application. Making full use of fast memory, such as shared memory and texture memory, will benefit application a lot. Explicit data locality plays an important role in application performance on GPU.

    Fig.10 Parallelization on Phi

    Fig.11 Memory optimization on Phi (HP:huge page)

    Fig.12 Memory optimization on GPU evaluation

    5 Related work

    Araya-Polo[9]assessed RTM algorithm in three kinds of accelerators: IBM Cell/B.E., GPU, and FPGA, and suggested a wish list from programming model, architecture design. However they only listed some optimization methods, and didn’t evaluate the impact quantitatively on RTM performance. Their paper was published earlier than Intel Xeon Phi, so performance about Xeon Phi is not included in that paper. In this paper, we choose much more popular platforms, and we evaluated each optimization method quantitatively. Heinecke[10]discussed performance of regression and classification algorithms in data mining problems on Intel Xeon Phi and GPGPU, and demonstrated that Intel Xeon Phi was better at sparse problem than GPU with less optimizations and porting efforts. Micikevicius[11]optimized RTM on GPU and demonstrated considerable speedups. Our work differs from his in that the model in his paper is average derivative method, our’s model is 3D RTM-TTI, which is more compelling.

    6 Conclusion and Future work

    In this paper, we discussed the enormously time-consuming but important seismic imaging application 3D RTM-TTI by parallel solution, and presented our optimization experience on three platforms: CPU, GPU, and Xeon Phi. To the best of our knowledge this is the first simultaneous implementation and evalution of 3D RTM-TTI on these three new platforms. Our optimized 3D RTM-TTI gains considerable performance speedup. Optimization on the Intel Xeon Phi architecture is similiar to CPU due to similar x86 architecture and programming model. Thread parallelization, vectorization and explicit memory locality are particularly critical for this architecture to achieve high performance. Vector instruction plays an important role in Xeon Phi, and loop dependence should be dismissed in order to use them, otherwise, performance will be punished. In general, memory optimizations should be explicaed such as using shared memory, constant memory etc. To benefit GPU applications a lot, bank conflicts should be avoided to get higher practical bandwidth. In future, we will evaluate our distributed 3D RTM-TTI algorithm and analysis communications.

    [ 1] Alkhalifah T. An acoustic wave equation for anisotropic media.Geophysics, 2000, 65(4):1239-1250

    [ 2] Zhang H, Zhang Y. Reverse time migration in 3D heterogeneous TTI media. In: Proceedings of the 78th Society of Exploration Geophysicists Annual International Meeting, Las Vegas, USA, 2008. 2196-2200

    [ 3] Baysal E, Kosloff D D, Sherwood J W. Reverse time migration.Geophysics, 1983, 48(11):1514-1524

    [ 4] Zhou H, Zhang G, Bloor B. An anisotropic acoustic wave equation for modeling and migration in 2D TTI media. In: Proceedings of the 76th Society of Exploration Geophysicists Annual International Meeting, San Antonio, USA, 2006. 194-198

    [ 5] Gschwind M, Hofstee H P, Flachs B, et al. Synergistic processing in cell’s multicore architecture.IEEEMicro, 2006, 26(2):10-24

    [ 6] NVIDIA Cooperation, NVIDIA’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf, White Paper, 2009

    [ 7] Intel Cooperation, Intel Xeon Phi coprocessor system software developers guide. https://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html, White Paper, 2014

    [ 8] Micikevicius P. 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Washington, D.C., USA, 2009. 79-84

    [ 9] Araya-Polo M, Cabezas J, Hanzich M, et al. Assessing accelerator-based HPC reverse time migration.IEEETransactionsonParallelandDistributedSystems, 2011, 22(1):147-162

    [10] Heinecke A, Klemm M, Bungartz H J. From GPGPU to many-core: NVIDIA Fermi and Intel many integrated core architecture.ComputinginScience&Engineering, 2012,14(2): 78-83

    [11] Zhou H, Ortigosa F, Lesage A C, et al. 3D reverse-time migration with hybrid finite difference pseudo spectral method. In: Proceedings of the 78th Society of Exploration Geophysicists Annual Meeting, Las Vegas, USA, 2008. 2257-2261

    Zhang Xiuxia, born in 1987, is a Ph.D candidate at Institute of Computing Technology, Chinese Academy of Sciences. Her research includes parallel computing, compiler and deep learning.

    10.3772/j.issn.1006-6748.2017.02.010

    ①Supported by the National Natural Science Foundation of China (No. 61432018).

    ②To whom correspondence should be addressed. E-mail: zhangxiuxia@ict.ac.cn

    on Apr. 16, 2016

    免费日韩欧美在线观看| 他把我摸到了高潮在线观看 | svipshipincom国产片| 黄色a级毛片大全视频| 无限看片的www在线观看| 久久婷婷成人综合色麻豆| 欧美午夜高清在线| 高清毛片免费观看视频网站 | 香蕉丝袜av| 最黄视频免费看| 久久热在线av| 妹子高潮喷水视频| 捣出白浆h1v1| bbb黄色大片| 亚洲精品久久午夜乱码| 在线亚洲精品国产二区图片欧美| 精品人妻熟女毛片av久久网站| 久久亚洲真实| 亚洲欧洲精品一区二区精品久久久| 欧美激情久久久久久爽电影 | www.熟女人妻精品国产| 在线观看66精品国产| 国产在线观看jvid| 满18在线观看网站| 色尼玛亚洲综合影院| www日本在线高清视频| 男女午夜视频在线观看| 亚洲av电影在线进入| 日韩人妻精品一区2区三区| 亚洲av国产av综合av卡| 一进一出抽搐动态| 麻豆国产av国片精品| 麻豆成人av在线观看| 亚洲av成人一区二区三| 黑人巨大精品欧美一区二区蜜桃| 午夜福利免费观看在线| av片东京热男人的天堂| 老司机深夜福利视频在线观看| 久久精品国产a三级三级三级| 国产在线免费精品| 亚洲国产av影院在线观看| 啦啦啦视频在线资源免费观看| 丝袜在线中文字幕| 美女高潮到喷水免费观看| 如日韩欧美国产精品一区二区三区| 亚洲欧洲精品一区二区精品久久久| 黄色成人免费大全| 人妻一区二区av| 欧美日韩成人在线一区二区| 99精品在免费线老司机午夜| 国产欧美日韩一区二区三区在线| 成年人免费黄色播放视频| 如日韩欧美国产精品一区二区三区| 久久久久久久久免费视频了| 一区二区三区激情视频| 亚洲熟妇熟女久久| 国产麻豆69| 热re99久久国产66热| 男人舔女人的私密视频| 叶爱在线成人免费视频播放| 色综合欧美亚洲国产小说| 亚洲专区字幕在线| 一区二区日韩欧美中文字幕| 在线观看免费日韩欧美大片| 国产1区2区3区精品| 老熟妇仑乱视频hdxx| 捣出白浆h1v1| 欧美日韩福利视频一区二区| 搡老熟女国产l中国老女人| 精品亚洲乱码少妇综合久久| 黄频高清免费视频| 欧美另类亚洲清纯唯美| 在线观看www视频免费| 黄片小视频在线播放| 欧美日韩成人在线一区二区| 午夜福利视频精品| 啦啦啦免费观看视频1| 免费女性裸体啪啪无遮挡网站| 桃花免费在线播放| 手机成人av网站| av欧美777| 亚洲精华国产精华精| 男女高潮啪啪啪动态图| 中文亚洲av片在线观看爽 | 亚洲一码二码三码区别大吗| 欧美国产精品一级二级三级| 亚洲精品国产区一区二| 老司机影院毛片| e午夜精品久久久久久久| 久久精品亚洲精品国产色婷小说| 男女午夜视频在线观看| 亚洲精华国产精华精| 亚洲精品国产色婷婷电影| 午夜两性在线视频| 五月开心婷婷网| 美女主播在线视频| 免费久久久久久久精品成人欧美视频| 久久热在线av| 午夜福利视频精品| 国产成+人综合+亚洲专区| 亚洲三区欧美一区| 精品人妻1区二区| 另类亚洲欧美激情| 国产精品自产拍在线观看55亚洲 | 成人手机av| av又黄又爽大尺度在线免费看| 脱女人内裤的视频| 水蜜桃什么品种好| 91成人精品电影| 岛国毛片在线播放| 汤姆久久久久久久影院中文字幕| 亚洲精品乱久久久久久| 午夜久久久在线观看| 啪啪无遮挡十八禁网站| 十分钟在线观看高清视频www| 热re99久久国产66热| 99久久人妻综合| 99在线人妻在线中文字幕 | 18在线观看网站| 狂野欧美激情性xxxx| 亚洲欧美日韩高清在线视频 | 久久免费观看电影| 成年女人毛片免费观看观看9 | 国产极品粉嫩免费观看在线| 久久久精品94久久精品| 久久久久网色| 亚洲男人天堂网一区| 麻豆国产av国片精品| 亚洲中文日韩欧美视频| 日韩精品免费视频一区二区三区| 久久性视频一级片| 一本综合久久免费| 久久毛片免费看一区二区三区| 国产精品成人在线| 美女福利国产在线| 国产精品久久久av美女十八| 中亚洲国语对白在线视频| 丰满迷人的少妇在线观看| 国产欧美日韩一区二区三区在线| 夜夜爽天天搞| 一级毛片女人18水好多| 成人国产一区最新在线观看| 一级片免费观看大全| 久久影院123| 丝袜在线中文字幕| 欧美在线一区亚洲| 看免费av毛片| 国产精品香港三级国产av潘金莲| av福利片在线| 欧美精品啪啪一区二区三区| 99国产精品免费福利视频| 亚洲国产av影院在线观看| 757午夜福利合集在线观看| 欧美变态另类bdsm刘玥| 操美女的视频在线观看| 97人妻天天添夜夜摸| 波多野结衣av一区二区av| 成人亚洲精品一区在线观看| 一级,二级,三级黄色视频| 最黄视频免费看| 在线观看免费视频网站a站| 欧美亚洲 丝袜 人妻 在线| 啦啦啦在线免费观看视频4| tube8黄色片| 最新的欧美精品一区二区| 亚洲精品av麻豆狂野| 一个人免费看片子| 日本av免费视频播放| 欧美日韩国产mv在线观看视频| 日本欧美视频一区| av福利片在线| 又紧又爽又黄一区二区| 精品国产乱码久久久久久男人| 色尼玛亚洲综合影院| 欧美日韩亚洲高清精品| 夜夜爽天天搞| 搡老熟女国产l中国老女人| 人妻 亚洲 视频| 香蕉国产在线看| 91九色精品人成在线观看| 两个人免费观看高清视频| tocl精华| 曰老女人黄片| 国产一区二区激情短视频| e午夜精品久久久久久久| 亚洲专区国产一区二区| 国产一区二区三区在线臀色熟女 | 精品久久久久久电影网| 久久久久久免费高清国产稀缺| 黑丝袜美女国产一区| 亚洲精品国产一区二区精华液| av免费在线观看网站| 亚洲精品国产精品久久久不卡| 亚洲免费av在线视频| 亚洲欧美精品综合一区二区三区| 亚洲av日韩精品久久久久久密| 久久久国产一区二区| 婷婷成人精品国产| 天堂中文最新版在线下载| 久久久久精品国产欧美久久久| 纯流量卡能插随身wifi吗| 国产成人精品久久二区二区91| 精品国产乱子伦一区二区三区| 91成人精品电影| 国产成人影院久久av| 午夜日韩欧美国产| 久久精品国产亚洲av香蕉五月 | 亚洲成人手机| 精品少妇黑人巨大在线播放| av一本久久久久| 自线自在国产av| 天天操日日干夜夜撸| 久久99热这里只频精品6学生| 国产一区有黄有色的免费视频| 欧美精品一区二区免费开放| 欧美乱码精品一区二区三区| 老熟女久久久| 91成年电影在线观看| 视频区图区小说| 天天躁日日躁夜夜躁夜夜| 天天影视国产精品| 精品福利永久在线观看| 日韩中文字幕视频在线看片| 久久国产亚洲av麻豆专区| 婷婷成人精品国产| 91精品三级在线观看| 久久精品亚洲av国产电影网| 亚洲精品国产色婷婷电影| 国产精品1区2区在线观看. | 99久久人妻综合| 国产成人精品在线电影| 亚洲成a人片在线一区二区| 亚洲av片天天在线观看| 97在线人人人人妻| 大陆偷拍与自拍| bbb黄色大片| 一级毛片电影观看| 国产精品一区二区在线观看99| 黄片大片在线免费观看| 99精品在免费线老司机午夜| 亚洲久久久国产精品| 水蜜桃什么品种好| www.999成人在线观看| 麻豆av在线久日| 亚洲精品乱久久久久久| 丝袜在线中文字幕| 最近最新免费中文字幕在线| 亚洲专区字幕在线| 女人久久www免费人成看片| 国产精品欧美亚洲77777| 伊人久久大香线蕉亚洲五| 最新的欧美精品一区二区| 亚洲专区字幕在线| 国产男女超爽视频在线观看| 中文字幕av电影在线播放| 一区二区日韩欧美中文字幕| 自拍欧美九色日韩亚洲蝌蚪91| 天天躁夜夜躁狠狠躁躁| 超色免费av| 欧美 亚洲 国产 日韩一| 十八禁网站网址无遮挡| 亚洲欧洲精品一区二区精品久久久| 午夜福利视频在线观看免费| 久久久精品国产亚洲av高清涩受| 在线观看www视频免费| 婷婷丁香在线五月| 亚洲成人免费电影在线观看| 一级,二级,三级黄色视频| 欧美乱码精品一区二区三区| 十八禁网站网址无遮挡| 久久人人爽av亚洲精品天堂| 在线永久观看黄色视频| 高清av免费在线| 黑人巨大精品欧美一区二区蜜桃| 欧美成狂野欧美在线观看| 午夜激情av网站| 日韩视频一区二区在线观看| 国产免费视频播放在线视频| 99re6热这里在线精品视频| 国产精品久久电影中文字幕 | 女人被躁到高潮嗷嗷叫费观| 欧美成狂野欧美在线观看| 老司机在亚洲福利影院| 男女下面插进去视频免费观看| 亚洲av成人一区二区三| 日韩熟女老妇一区二区性免费视频| 亚洲精品国产色婷婷电影| 丁香欧美五月| 少妇的丰满在线观看| 狠狠精品人妻久久久久久综合| 高清毛片免费观看视频网站 | svipshipincom国产片| 无遮挡黄片免费观看| 国产精品美女特级片免费视频播放器 | 90打野战视频偷拍视频| 久久国产亚洲av麻豆专区| 亚洲精品国产精品久久久不卡| 1024视频免费在线观看| 成人三级做爰电影| 亚洲色图av天堂| 在线永久观看黄色视频| 99热国产这里只有精品6| 在线观看免费视频日本深夜| 久久精品91无色码中文字幕| 在线十欧美十亚洲十日本专区| 亚洲av电影在线进入| 日本av手机在线免费观看| av视频免费观看在线观看| 精品欧美一区二区三区在线| 亚洲国产毛片av蜜桃av| 两人在一起打扑克的视频| 18禁观看日本| 精品熟女少妇八av免费久了| 在线亚洲精品国产二区图片欧美| 99精品久久久久人妻精品| 天堂俺去俺来也www色官网| 交换朋友夫妻互换小说| a级毛片黄视频| 狠狠婷婷综合久久久久久88av| 久久 成人 亚洲| 亚洲久久久国产精品| 亚洲精品国产色婷婷电影| 一区福利在线观看| 法律面前人人平等表现在哪些方面| 亚洲美女黄片视频| 亚洲成国产人片在线观看| 国产深夜福利视频在线观看| 人人妻人人澡人人看| 黄色 视频免费看| 桃红色精品国产亚洲av| 国产精品亚洲一级av第二区| 欧美日韩成人在线一区二区| 建设人人有责人人尽责人人享有的| 欧美在线一区亚洲| 777久久人妻少妇嫩草av网站| 久久精品国产亚洲av高清一级| 一本久久精品| 中文字幕人妻丝袜制服| 热99国产精品久久久久久7| 99国产综合亚洲精品| 男男h啪啪无遮挡| 久久久久精品人妻al黑| 丁香六月欧美| 女性生殖器流出的白浆| 少妇猛男粗大的猛烈进出视频| 一二三四社区在线视频社区8| 成人影院久久| 日韩欧美三级三区| 91老司机精品| 日韩三级视频一区二区三区| 91老司机精品| 99re在线观看精品视频| 亚洲一卡2卡3卡4卡5卡精品中文| 免费在线观看视频国产中文字幕亚洲| 亚洲,欧美精品.| 精品久久久久久电影网| 水蜜桃什么品种好| 露出奶头的视频| 欧美大码av| 国产一区有黄有色的免费视频| 夜夜夜夜夜久久久久| www.自偷自拍.com| 国产一区二区在线观看av| 老熟女久久久| 首页视频小说图片口味搜索| 巨乳人妻的诱惑在线观看| 国产区一区二久久| 日韩成人在线观看一区二区三区| av又黄又爽大尺度在线免费看| 欧美av亚洲av综合av国产av| 欧美一级毛片孕妇| 久久精品aⅴ一区二区三区四区| 免费久久久久久久精品成人欧美视频| 国产又色又爽无遮挡免费看| 亚洲综合色网址| 国产在线观看jvid| 国产成人啪精品午夜网站| 久久精品aⅴ一区二区三区四区| 美女扒开内裤让男人捅视频| 国产精品亚洲av一区麻豆| 97在线人人人人妻| 精品久久蜜臀av无| 欧美黑人精品巨大| av天堂久久9| 久久精品aⅴ一区二区三区四区| 老司机午夜十八禁免费视频| netflix在线观看网站| 日本欧美视频一区| 久久性视频一级片| 亚洲专区字幕在线| 日韩欧美三级三区| 午夜激情久久久久久久| 天天躁夜夜躁狠狠躁躁| 久久久水蜜桃国产精品网| 国产免费现黄频在线看| 一级毛片电影观看| 99久久人妻综合| 免费人妻精品一区二区三区视频| 下体分泌物呈黄色| 欧美激情极品国产一区二区三区| www.熟女人妻精品国产| 国产真人三级小视频在线观看| 日本vs欧美在线观看视频| 精品一区二区三区视频在线观看免费 | 国产男女超爽视频在线观看| 国产精品一区二区在线不卡| 天天躁狠狠躁夜夜躁狠狠躁| 亚洲色图 男人天堂 中文字幕| 老司机深夜福利视频在线观看| 国产欧美日韩精品亚洲av| 精品欧美一区二区三区在线| 悠悠久久av| 中亚洲国语对白在线视频| 丰满少妇做爰视频| 高清视频免费观看一区二区| 极品人妻少妇av视频| 亚洲午夜精品一区,二区,三区| 久久久国产精品麻豆| 亚洲avbb在线观看| 午夜免费成人在线视频| 又大又爽又粗| 不卡av一区二区三区| 岛国毛片在线播放| 成人影院久久| 国产精品久久久av美女十八| 免费观看a级毛片全部| 黄色成人免费大全| 精品国产乱码久久久久久男人| 国产精品电影一区二区三区 | 国产不卡av网站在线观看| 性少妇av在线| 飞空精品影院首页| 电影成人av| 美国免费a级毛片| 一二三四社区在线视频社区8| 国产亚洲欧美精品永久| 亚洲国产成人一精品久久久| 日韩人妻精品一区2区三区| 欧美人与性动交α欧美软件| 美女高潮到喷水免费观看| 久久久久精品国产欧美久久久| 免费看a级黄色片| 国产色视频综合| 色婷婷久久久亚洲欧美| 欧美激情极品国产一区二区三区| 亚洲欧洲日产国产| av天堂在线播放| 亚洲色图 男人天堂 中文字幕| 两性午夜刺激爽爽歪歪视频在线观看 | 考比视频在线观看| 国产成人av教育| 久久av网站| 又大又爽又粗| 亚洲中文日韩欧美视频| 在线永久观看黄色视频| e午夜精品久久久久久久| 久久久久久久国产电影| 超色免费av| 在线十欧美十亚洲十日本专区| 欧美激情极品国产一区二区三区| 国产精品欧美亚洲77777| 丝袜喷水一区| 一区在线观看完整版| 成年人午夜在线观看视频| 90打野战视频偷拍视频| 国产日韩一区二区三区精品不卡| 日韩视频在线欧美| 色婷婷av一区二区三区视频| 国产无遮挡羞羞视频在线观看| 男女边摸边吃奶| 亚洲欧美一区二区三区久久| 久久av网站| av一本久久久久| 国产xxxxx性猛交| 久久久精品国产亚洲av高清涩受| 日本黄色视频三级网站网址 | cao死你这个sao货| 一区二区三区乱码不卡18| 国产三级黄色录像| 久久久久久久久免费视频了| 国产精品香港三级国产av潘金莲| aaaaa片日本免费| 成人av一区二区三区在线看| 桃红色精品国产亚洲av| 九色亚洲精品在线播放| 亚洲欧美一区二区三区黑人| 真人做人爱边吃奶动态| 精品少妇黑人巨大在线播放| av天堂久久9| 人妻 亚洲 视频| 亚洲综合色网址| 在线观看一区二区三区激情| 成人黄色视频免费在线看| 国产精品1区2区在线观看. | 宅男免费午夜| 午夜激情久久久久久久| 午夜福利视频在线观看免费| 日本vs欧美在线观看视频| 亚洲专区中文字幕在线| 波多野结衣一区麻豆| 亚洲精品中文字幕在线视频| 久久精品成人免费网站| 亚洲欧美精品综合一区二区三区| 欧美在线一区亚洲| 一边摸一边抽搐一进一出视频| 国产有黄有色有爽视频| 99国产精品99久久久久| 成年人免费黄色播放视频| 脱女人内裤的视频| 深夜精品福利| 久久国产精品人妻蜜桃| 人妻 亚洲 视频| 久久久国产一区二区| 久久国产精品男人的天堂亚洲| 国产成人欧美在线观看 | 国产精品麻豆人妻色哟哟久久| 人人妻人人澡人人看| 好男人电影高清在线观看| 国产av又大| 久久久精品区二区三区| 欧美亚洲 丝袜 人妻 在线| 侵犯人妻中文字幕一二三四区| 99国产精品一区二区蜜桃av | 国产精品亚洲一级av第二区| 亚洲一区二区三区欧美精品| 国产精品免费视频内射| 久久亚洲真实| 精品少妇一区二区三区视频日本电影| 1024视频免费在线观看| 国产视频一区二区在线看| 18禁国产床啪视频网站| 亚洲精品自拍成人| 亚洲午夜精品一区,二区,三区| 最近最新中文字幕大全免费视频| 免费在线观看视频国产中文字幕亚洲| 久久99一区二区三区| 一边摸一边抽搐一进一出视频| 精品福利永久在线观看| 亚洲精品一卡2卡三卡4卡5卡| 一级,二级,三级黄色视频| 国产老妇伦熟女老妇高清| 大型av网站在线播放| 成年版毛片免费区| 岛国毛片在线播放| 麻豆av在线久日| 黑人巨大精品欧美一区二区mp4| 中亚洲国语对白在线视频| 亚洲专区国产一区二区| 久久免费观看电影| 日韩一区二区三区影片| 丝袜人妻中文字幕| 在线观看www视频免费| 69精品国产乱码久久久| 国产精品久久久久久精品古装| 国产欧美日韩综合在线一区二区| www.自偷自拍.com| 老司机午夜十八禁免费视频| 丝袜喷水一区| 中文字幕色久视频| 精品一品国产午夜福利视频| 久久ye,这里只有精品| 一本一本久久a久久精品综合妖精| 脱女人内裤的视频| 国产精品一区二区在线观看99| 欧美乱码精品一区二区三区| 免费观看av网站的网址| 亚洲一区中文字幕在线| av超薄肉色丝袜交足视频| 久久久久久免费高清国产稀缺| 两性夫妻黄色片| 纯流量卡能插随身wifi吗| 精品乱码久久久久久99久播| 亚洲国产欧美在线一区| 他把我摸到了高潮在线观看 | 啦啦啦免费观看视频1| 大片电影免费在线观看免费| 国产精品国产av在线观看| 50天的宝宝边吃奶边哭怎么回事| 亚洲男人天堂网一区| 日日摸夜夜添夜夜添小说| 精品国产乱码久久久久久男人| 曰老女人黄片| 别揉我奶头~嗯~啊~动态视频| 999久久久国产精品视频| 色综合欧美亚洲国产小说| 久久av网站| 啦啦啦在线免费观看视频4| 香蕉丝袜av| 国产亚洲av高清不卡| 热99re8久久精品国产| 精品视频人人做人人爽| 人人澡人人妻人| 精品国产一区二区三区久久久樱花| 国产精品久久久久久精品电影小说| 欧美乱妇无乱码| 丁香六月天网| av天堂久久9| 老司机影院毛片| 国产精品国产高清国产av | 国产欧美日韩综合在线一区二区| 免费少妇av软件| 两个人看的免费小视频| 国产一区二区三区在线臀色熟女 | 精品熟女少妇八av免费久了| 天天躁狠狠躁夜夜躁狠狠躁| 亚洲精品国产色婷婷电影| 亚洲熟妇熟女久久| 亚洲av成人一区二区三| 久久精品国产亚洲av高清一级| av免费在线观看网站| 成年人黄色毛片网站| 99久久人妻综合| 亚洲人成电影免费在线| 黄色怎么调成土黄色| 久久久久精品人妻al黑|