• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Double DQN Method For Botnet Traffic Detection System

    2024-05-25 14:39:56YutaoHuYuntaoZhaoYongxinFengandXiangyuMa
    Computers Materials&Continua 2024年4期

    Yutao Hu ,Yuntao Zhao,? ,Yongxin Feng and Xiangyu Ma

    1School of Information Science and Engineering,Shenyang Ligong University,Shenyang,110159,China

    2Graduate School,Shenyang Ligong University,Shenyang,110159,China

    ABSTRACT In the face of the increasingly severe Botnet problem on the Internet,how to effectively detect Botnet traffic in realtime has become a critical problem.Although the existing deep Q network(DQN)algorithm in Deep reinforcement learning can solve the problem of real-time updating,its prediction results are always higher than the actual results.In Botnet traffic detection,although it performs well in the training set,the accuracy rate of predicting traffic is as high as%;however,in the test set,its accuracy has declined,and it is impossible to adjust its prediction strategy on time based on new data samples.However,in the new dataset,its accuracy has declined significantly.Therefore,this paper proposes a Botnet traffic detection system based on double-layer DQN (DDQN).Two Q-values are designed to adjust the model in policy and action,respectively,to achieve real-time model updates and improve the universality and robustness of the model under different data sets.Experiments show that compared with the DQN model,when using DDQN,the Q-value is not too high,and the detection model has improved the accuracy and precision of Botnet traffic.Moreover,when using Botnet data sets other than the test set,the accuracy and precision of the DDQN model are still higher than DQN.

    KEYWORDS DQN;DDQN;deep reinforcement learning;botnet detection;feature classification

    1 Introduction

    With the rapid development of technology,the internet environment has become increasingly severe.Starting from Mixter’s introduction of Distributed Denial-of-Service (DDoS) in the “TFN”toolkit on Internet Relay Chat (IRC) in 1999,which released the first botnet plugin [1],to today’s everyday use of HTTPS and Peer-to-Peer(P2P)as invasion methods for botnet traffic[2],the threat posed by a botnet to hosts and websites is becoming more significant.Attackers implant malicious software or viruses into computers or Internet of Things(IoT)devices through various means,such as launching DDoS attacks,propagating malware,phishing,and click fraud against critical targets,invading and infecting target hosts,making them “botnet nodes,” and then using these nodes to continue attacking other targets,forming large and small botnet families.Users are unaware that their hosts have been infected.After constructing a botnet,attackers simultaneously send large amounts of data packets or requests from target servers to the infected“botnet”nodes,occupying the server’s processing capacity and bandwidth resources,thereby preventing normal users from accessing the server and forcing the server to shut down due to resource exhaustion.

    In June and July 2022,China experienced 26,000 cases of cybersecurity breaches events [3].According to Kaspersky’s 2022 security report,China accounted for 17.70% of the world’s Trojan and botnet invasions [4].In addition,in March 2022,Toyota suffered a system shutdown due to supplier receipt of botnet software attacks,while in May,its Asian subsidiary,Nikkei Group,located in Singapore,was also attacked by botnet software.That same month,SpiceJet,an Indian airline,was also assaulted by botnet software,resulting in hundreds of passengers being stranded at the country’s airports.Some super-large botnet families contain tens of thousands of infected hosts,while others are small botnet families with less than one hundred infected hosts[5].Due to the P2P end-to-end data transmission mode,it is difficult to determine the location of the controller’s host in botnet tracing,so the only option is to continuously improve their detection capabilities to prevent hosts from being infected by a botnet.This problem poses a significant threat to global cybersecurity.

    In the early stages of internet development,detecting IRC botnets was relatively easy due to the open nature of the IRC protocol.The earliest detection methods were based on traffic filtering rules such as port,protocol,and source address.Still,this method could only detect known attack traffic and not new types of attacks,such as using P2P protocols for decentralization and conducting large-scale distributed attacks.Therefore,in a botnet constructed with it,botnet hosts communicate through a centralized server and a P2P network for connection and communication.Use to detect this type of botnet traffic,special techniques are required.One standard method is based on statistical analysis and machine learning algorithms to analyze and identify traffic through feature extraction and classification algorithms.This method can detect P2P botnet traffic and detect new unknown attacks.

    As technology continues to develop,detection methods and capabilities are constantly improving.Modern detection methods include feature analysis,data mining,behavior analysis,machine learning,deep learning,and other technologies.This article combines feature engineering and extraction from machine learning classifiers,high-dimensional deep learning inputs,and real-time reinforcement learning capabilities to create a detection model for botnet traffic.By identifying and classifying features of the dataset,convenient features are selected and trained using the detection model to recognize all types of attacks and detect botnet traffic.

    2 Structure

    As botnet technology continues to evolve,detection techniques are constantly improving based on the characteristics of botnet traffic.During the early days of IRC chat rooms,Zeidanloo et al.developed a signature-based detection method by matching known malicious code samples to identify infected botnets in response to attack methods such as remote execution vulnerabilities and password cracking.Additionally,computer system behavior analysis,such as network traffic and process behavior,can detect abnormal behavior but may generate some false positives[6].Zhao et al.grouped data packets with the same properties of the IP five-tuple(source IP address,destination IP address,source port number,destination port number,protocol identifier)into a data stream using TCP/UDP protocols,then partitioned the data stream using time windows to achieve botnet detection[7].After conducting a feature importance analysis,Hijawi et al.detected Android botnet URL feature sets,resulting in a significant interception of botnet traffic packets[8].

    As machine learning began to gain traction in various fields,Joshi proposed combining support vector machines(SVM)and random forest classifier(RFC)to classify the N-BaIoT dataset,effectively separating abnormal behavior traffic from regular traffic.However,accuracy still needs improvement[9].Afterward,numerous teams and researchers utilized various machine learning algorithms for detection,such as decision trees,K-Nearest Neighbor (KNN),multi-layer perceptron (MLP),naive Bayes,random forest,SVM,and data sets based on each time window,which achieved high accuracy rates through gradient boosting for botnet detection.Their models used these datasets and produced high accuracy rates[10–18].

    With the rise of deep learning,neural networks that have evolved from MLP have gradually become the foundation for constructing complex models.Models such as Recurrent Neural Networks (RNNs),Convolutional Neural Networks (CNNs),and Long Short-Term Memory (LSTM)are built upon this foundation.Torres et al.proposed using RNNs and achieved an astonishing 98% accuracy in detecting a small amount of data samples using 5-fold cross-validation [19].Pektas et al.employed Deep Neural Networks (DNNs) to model network traffic and detect botnet traffic [20].Hussain et al.utilized the ResNet-18 residual network to detect scanning activities and DDoS attacks during the zero-day phase of botnets[21].Chen addressed network degradation issues using the residual spatiotemporal network and designed the Res-1DCNN-LSTM model by combining LSTM and CNN to detect botnet traffic data samples[22].Qi combined CNN and LSTM,merging spatial and temporal features for identification.They improved accuracy by using small convolutional kernels and the GELU activation function[23].

    Furthermore,they augmented and enhanced the dataset using Generative Adversarial Networks(GANs).They employed transfer learning to utilize the discriminator as a novel botnet detector,enabling excellent performance even with imbalanced datasets.Qi utilized the CFR algorithm for feature selection and employed a model structure that combines BiGRU and CNN to enhance accuracy and computational efficiency.The residual module and ELU activation function were referenced to strengthen the model’s ability to extract low-frequency attack features.An attention mechanism was introduced to handle essential features.This detection method achieved a multiclassification accuracy of 99.77% and 99.24% on the CIC-IDS2017 and TON_IoT datasets,with binary classification accuracies of 99.82%and 99.62%[24].Xue et al.integrated deep learning with the Internet of Things(IoT)and designed a bidirectional LSTM-RNN model to detect botnet activities in user IoT devices and networks[25].

    In 2013,Volodymyr and the DeepMind team introduced the Deep Reinforcement Learning(DRL) concept.They combined the Q-learning algorithm of reinforcement learning with the highdimensional nature of deep learning,resulting in the proposed DQN algorithm.In 2015,they demonstrated the capabilities of their model built using the DQN algorithm by surpassing some human players and achieving high scores in Atari games [26].Building upon this work,the Hado team introduced the concept of DDQN,utilizing two neural networks to estimate Q-values and address the issue of overestimation [27].Due to its real-time updating strategy,deep reinforcement learning has been extensively applied in fields such as autonomous driving and autopilot systems.Zuo et al.employed DDQN for object detection,enabling rapid identification of objects within a brief timeframe[28].In network security,Wen utilized deep reinforcement learning to generate adversarial samples for evading JavaScript malware detection models,thereby evading malicious code for antivirus purposes[29].

    With the emergence of GANs,the approach to detection methods has shifted from passive defense to proactive offense.McDermott approached botnet detection from the perspective of GANs,generating samples to continually challenge the detector and provide better guidance,thereby enhancing the model’s ability to detect unfamiliar samples[30].This approach has paved the way for subsequent botnet detection.Venturi et al.created an adversarial dataset using GANs to evade traditional machine learning detectors.Experimental results showed that this dataset had a detection evasion rate of 84%in traditional detectors like RF and SVM[31].

    Not only that,but the successful combination of DQN and GAN has elevated botnet detection to new heights.Wu et al.proposed a framework based on deep reinforcement learning that effectively generates malicious traffic by automatically adding perturbations to samples to deceive detection models.Experimental results showed a significant improvement in the evasion rate while helping the detection model identify vulnerabilities and enhance its robustness [32].The Apruzzese team introduced a deep reinforcement learning mechanism framework that protects botnet detectors from adversarial attacks.It was validated on various machine learning algorithms and public datasets,demonstrating significant improvements compared to existing techniques [33].The Randhawa team presented a deep reinforcement learning(DRL)GAN model called RELEVAGAN.DRL is utilized to automatically modify the content of the generator,allowing it to bypass the discriminator’s detection.This enables the discriminator to undergo more complex training,significantly enhancing its detection capability compared to models without DRL[34].

    3 Methodology

    This paper proposes an OneR-DDQN model constructed using the machine learning OneR classifier and the deep reinforcement learning DDQN algorithm.The aim is to maintain high predictive capabilities for different botnet datasets.The entire process is divided into four steps:Merging and preprocessing the datasets,conducting feature engineering with machine learning to select relevant features,building the DDQN model,comparison experiments with the built DDQN model and the DQN model to compare the performance of the two algorithms under different datasets,to using the existing deep learning LSTM detection model as an objective model judging reference.This paper also explores the impact of feature engineering without using the OneR classifier during model training.The experimental procedure and model framework are illustrated in Fig.1.

    3.1 Dataset

    The datasets used in this study are from the CIC-IDS series provided by the Canadian Institute for Cybersecurity.The datasets include CIC-IDS2017,CIC-Dos2017,CSE-CIC-IDS2018,and CICDDos2019.The NumPy library was employed in this paper to merge labels with the same label values and filter out certain data samples.The resulting merged dataset is referred to as the CIC-Collection,and the corresponding label values obtained are presented in Table 1.

    Table 1: “Label”labels in the CIC-Colletcion dataset

    Each label is associated with the features from the four datasets.By using the Pandas library for data reading,there are a total of 9,167,581 records with 79-dimensional features as shown in Table 2.

    Table 2: Features of the CIC-Collection dataset

    However,not all features are relevant or informative.The dataset contains certain features completely irrelevant for prediction,referred to as pollution features,and need to be discarded.Following the ranking of feature pollution severity,this paper utilized the drop function to remove nine features,as shown in Table 3.

    Table 3: The deleted 9-dimensional feature contaminates the dataset

    There are also some features,such as Active Std and Idle Std,that can affect the dataset.However,after data batching on the smaller dataset,the degree of feature pollution is not significant or noticeable,so these features are retained.

    During the process of identifying polluted features,this study discovered 11 features with a predictive capacity of zero.These features have no meaningful value in any classification model,and some even have None as their feature values.For these features,the drop function was also used to discard them,as shown in Table 4.

    Table 4: Deleted 11 dimensional meaningless features

    Figure 1: Flow diagram of botnet traffic detection based on DDQN algorithm

    Through the three steps described above,this study obtained a clean sample dataset (CICCollection)of four datasets.The dataset contains detailed labels for personal attacks and corresponding broader attack categories.There are no None values,and no duplicate samples exist.The dataset comprises 9,167,581 records,with each record having 59-dimensional features.The dataset is stored in the Parquet file format.

    Like CSV files,Parquet files also can be directly read through Python’s built-in library.However,Parquet files offer smaller compressed storage space and more efficient reading than CSV files.The Pyarrow library is a built-in Python library specifically designed for reading datasets,which can read CSV files,parquet files,and other datasets files.

    An example of the data samples in the dataset is presented in Table 5,by Pyarrow library reading.

    Table 5: Data sample of CIC-Collection botnet dataset

    The process of mixing and preprocessing specific datasets is shown in Fig.2.

    Figure 2: Dataset mixing and features preprocessing

    3.2 Feature Selection of OneR Classifiers

    After obtaining the preprocessed dataset,selecting the most suitable features for training the detection model is necessary.Otherwise,the model’s performance may vary depending on the features used.If the selected features are irrelevant to the detection task,training the model using these features will naturally result in poor detection capability.Therefore,this study introduces the OneR algorithm,which is a machine learning technique,for feature selection.

    The OneR algorithm was initially proposed by Holte in 1993 and has since been applied in various fields,including medical diagnosis,credit scoring,and text classification.The OneR classifier is a simple yet effective algorithm for constructing rule-based classification models.It works by selecting a single feature from the dataset and determining the most frequently occurring class for each unique value of that feature.This process generates a set of rules that can be used to predict the class based on the feature values of new instances[35].

    The OneR algorithm is highly efficient in computation and suitable for large datasets with many features.Additionally,it can provide valuable insights into the most significant features of a specific classification problem,aiding in feature selection and interpretation.

    The algorithm of the OneR classifier is as follows:

    Before performing the classification operation with the OneR classifier,due to the OneR classifier’s inability to judge features of some string types,using string for feature classification makes OneR make counterproductive judgments.It is necessary to discard certain features that cannot be used for classification,such as features of“type”type.After this preprocessing step,the dataset is reduced to 44 dimensions.It reduces feature dimensions and noise and allows the OneR classifier to focus on making reasonable judgments about the remaining features.At the same time,due to the dataset having 9.17 million pieces of data,it is suitable for the OneR classifier,which processes a large number of data samples.Using the OneR classifier can effectively avoid overfitting compared to other classifiers.

    Then,the remaining features are input into the OneR classifier,which has a machine-learning decision tree classifier at its core.The decision tree has a depth of 1 layer and uses the Gini coefficient.

    Next,the dataset is divided into a training set and a testing set at 80%and 20%ratios,allowing the OneR classifier to complete the judgment after a brief training session.After fitting the training set,predictions are made on both the testing and training sets using the prediction function.The evaluation metrics are then calculated by comparing the predicted results with the accurate labels.If the metric evaluation value is more significant than 0.5,the feature name,metric value,and prediction result of that feature are saved.These values are stored in a list and converted into a data frame.The metric evaluation value,specifically the roc_auc_score,is calculated directly using the metrics method from the sklearn library.Based on the metric evaluation value (roc_auc_score) greater than 0.5,features with significant discriminatory power are selected.The average prediction results on the testing and training sets are computed for these valuable features,resulting in an overall prediction result vector.The ROC curve parameters are then calculated based on the overall prediction results and proper training set labels.ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds.The calculation method for the ROC curve is as follows:

    TPR(True Positive Rate)represents the probability of correct predictions,and FPR(False Positive Rate)represents the probability of incorrect predictions.The area under the ROC curve(AUC score)is used as a widely recognized standard for evaluating the overall performance of classifiers.The AUC value ranges from 0 to 1,with higher values indicating better performance.The random selection benchmark is set at an AUC score of 0.5,with features having an AUC score greater than 0.5 considered more informative for classification.The flowchart of the OneR classifier process is depicted in Fig.3.

    Figure 3: The working principle of feature selection for OneR classifier

    After constructing the OneR classifier,the study proceeds by inputting the samples,excluding Benign,into the classifier for each of the seven attack methods.The ROC curves are then generated based on the best-performing feature for each attack,as shown in Fig.4.

    The OneR classifier performs perfect ROC curves in attacks such as Bruteforce and Botnet.It indicates its ability to accurately identify most samples in these attack categories without additional learning.However,its effectiveness diminishes when dealing with Infiltration attacks,as it exhibits no discerning capability.The study then proceeds to use the metrics class from the sklearn library to compute and obtain the top-ranked 2-dimensional features for each attack,resulting in 14 features,as shown in Table 6.From these features,the most effective ones are selected as inputs for the detection model.

    Table 6: AUC score for filtering features of various attack categories

    Figure 4: ROC curve of OneR classifier in botnet traffic

    In the subsequent detection model training,feature selection is crucial in achieving high performance.Since features 17,47,and 48 have appeared among the features the classifier selects,count them only once to avoid repeatedly adding the same features,which may result in varying numbers used in the model.Feature 8(Bwd IAT Max)has shown poor performance.It may occupy a certain number of features but may not bring good judgment results.These four features will exclude.The other features not selected do not mean that Botnet traffic cannot be judged,but the ability to judge Botnet traffic is not so good compared to the selected 10-dimension feature.The remaining 10-dimensional features are chosen as the input for training the model as shown in Table 7.

    Table 7: Features for later detection model training

    The entire feature selection process illustrated in Fig.5.

    Now,the OneR classifier has completed feature selection,and these 10-dimensional features will be input into the detection model as input values.The detection model will learn how to judge whether the traffic belongs to Botnet traffic through these features through training and memory of neural network.Although unselected feature values may provide learning capabilities for the detection model,the directionality and detection capabilities provided by features classified by OneR will be more powerful.

    3.3 DDQN Detection Model

    From the Q value obtained from the earliest prediction results to record the feedback value to train the Q-learning algorithm of the model to the DQN algorithm using neural networks and constantly updating the target network to obtain the constantly updated Q value to the DDQN of double Q values set to reduce the excessive prediction of Q-values,the differences in the detection models of these three Reinforcement learning make the construction of their detection models more complicated.

    Due to the extension of DDQN from DQN,an extension of Q-learning,this study constructs models based on Q-learning,DQN,and DDQN sequentially.Due to the inability of Q-learning to handle high-dimensional features (mentioned in Section 3.3.1),this article only compares the capabilities of the DQN model and the DDQN model.In addition,9.16 million pieces of data are enormous and a burden for computers.Therefore,this article randomly selects 20,000 pieces of data from the data each time and divides them into training and testing sets according to 80%and 20%for the learning and training of the detection model.The DQN and DDQN models are trained using the same training set.Finally,the performance of these three detection models is compared using the same test set.In addition,to test the model’s adaptability to unfamiliar environments,we use different data sets to evaluate.In four original data sets,the original CIC-IDS2017 data set has more complex data.Hence,this data set poses a more significant challenge to the detection model than the other three.We input this data set into the model for detection to evaluate the model,For further comparative testing.

    3.3.1 Q-learning Model

    Q-learning is the foundational algorithm for both DQN and DDQN.It is a value-based reinforcement learning algorithm used to solve problems in Markov Decision Processes(MDP).In an MDP,an agent interacts with an environment over discrete time steps.At each time step,the agent observes the current state of the environment and takes action.The environment then transitions to a new state,giving the agent a reward signal.By estimating the value or utility of each state,agents can make decisions that maximize the long-term cumulative rewards.Furthermore,MDP helps agents learn to make optimal decisions based on the given rewards and transition probabilities,and Q-learning follows the reward mechanism of it.The ultimate goal of Q-learning is to choose the optimal policy in each prediction,which maximizes the accumulation of Q-values and rewards.At each step,Q-learning obtains a state-action value function,denoted asQ(st,at),representing the expected cumulative return when taking a specific action in a given state.The optimal decision sequence in the MDP can be derived by solving the Bellman equation,resulting in theQ(s,a)state-action value function as shown in Eq.(2).

    In the context of the given equation,“rt+1+γ rt+2+γ2rt+3+...”represents the total discounted reward starting from time step t.γis the discount factor,which ranges between 0 and 1.A higher value ofγindicates a greater consideration of the long-term value of future states,while a lower value ofγfocuses more on immediate rewards.Therefore,a higherγvalue is chosen in the early stages to encourage the model to explore longer-term rewards.As the number of epochs increases,γgradually decreases,leading to a gradual decrease in the model’s learning ability and eventual convergence.Based on this,Q-value formula can be derived as shown in Eq.(3).

    In the equation,“st”represents the current state,“at”represents the action taken in the current state,“r”represents the immediate reward obtained after taking the actionat,“s′”represents the next state reached after taking the action,and“α”is the learning rate parameter.This calculation formula uses temporal difference learning to update the Q-value.It selects the maximumQ(st′,at′) from the next state“s′”and multiplies it by“γ,”then adds the actual reward value“r”to obtain the final true value ofQ(st,at).

    Due to the large sample size and excessive number of features in the dataset,the Q-learning algorithm only focuses on the interaction between one or two features and the environment.Once the dimensions expand significantly,the Q-learning algorithm will likely experience a “Curse of Dimensionality”phenomenon,which can lead to abysmal results.So in this experiment,Q-learning is not utilized to build a Q-learning model due to its inability to handle high-dimensional feature inputs.However,Q-learning can learn the optimal policy through exploration without prior knowledge of the environment,which is what we need.Subsequently,DQN and DDQN are extensions and enhancements based on Q-learning.

    3.3.2 DQN Model

    In order to overcome the problem of the “Curse of Dimensionality,” experts in the field of Reinforcement learning integrate the advantages of deep learning into reinforcement learning.First,the deep learning neural network can help reinforcement learning better learn many complex problems.Second,through the neural network,reinforcement learning will also have more acceptable and learning feature dimensions,thus solving the problem of the “Curse of Dimensionality.”The DQN is the first algorithm in deep reinforcement learning and an extension of the Q-learning algorithm in the context of deep learning.It is a model that combines reinforcement learning and deep learning successfully.DQN utilizes neural networks to approximate the Q-value function,enabling learning in high-dimensional and complex state spaces while possessing some generalization capability.In addition,DQN also advanced the Q table into a Q network,which can load more sample content for intelligent agents to learn.

    The network architecture used in DQN consists of three convolutional layers followed by two fully connected layers.The Q-value function is denoted asQ(s,a;θ),and updating the network essentially means updating its weight parameters,denoted asθ.Onceθis determined,the neural network’s parameters are also determined.Compared to the Q-table used in Q-learning,the Q-network contains more information and useful insights for making judgments.

    Theε-greedystrategy is vital to the exploration-exploitation trade-off in deep reinforcement learning.Theε-greedystrategy balances exploring new actions and exploiting the current knowledge to maximize cumulative rewards.Initially,when the agent has limited knowledge about the environment,exploration is given more weight(higherεvalue)to ensure a broader exploration of the state-action space.As the agent learns and accumulates more experiences,the exploitation component becomes more dominant (lowerεvalue) to exploit the learned Q-values and maximize rewards.The DQN algorithm in this paper utilizes theε-greedystrategy,which differs from the greedy approach that always selects the action with the maximum value function,denoted asat=maxaQ(st,a;θ).It incorporates a parameterε,where the algorithm randomly selects an actionat=maxaQ(st,a;θ)from all possible actions with a probability ofε.with a probability of 1–ε,it selectsat=maxaQ(st,a;θ)based on the maximum value from the Q-network.As the number of iterations increases,εdecreases,causing the model to increasingly favor selecting actions based on the maximum value from the Qnetwork.This approach ensures both the exploration characteristic of reinforcement learning and the utilization of the maximum value function.

    The DQN algorithm starts by providing an environment statesto the agent,who then utilizes the value function network to obtain allQ(s,a)values associated with states.The agent decides by employing theε-greedystrategy to select an actiona.The environment responds to the action by providing a reward value“r”and the following environment states’.This completes one epoch.The parameters of the value function network are then updated based on the reward value “r,”and the process proceeds to the next epoch.This cycle continues until a converged value function network is trained,concluding the loop.The workings of one epoch are depicted in Fig.6.

    At the algorithm level,DQN extends the Q-learning algorithm.However,since DQN approximates the Q-values using a neural network,determining the“θ”weights of the neural network becomes crucial in the entire process.In this study,the gradient descent algorithm is employed to minimize the loss function,continuously adjusting theθweights.The algorithm for the loss function is depicted as Eq.(4).

    Figure 6: DQN neural network structure

    In the equation,θirepresents the parameters of the Q-network neural network at the i-th iteration,andrepresents the parameters of the target neural network at the i-th iteration.Q(s,a;θi)represents the Q-value function for taking action a in the current network state,which is used to evaluate the Q-value generated by the current state-action pair.represents the output of the target network.By introducing the target network,the target Q-values remain unchanged for a certain period of time,reducing the correlation between the current Q-values and the target Q-values to some extent,thereby improving the stability of the algorithm.The gradient descent is then performed onθ,as shown in Eq.(5).

    After every N iteration of gradient descent,the parameters of the current network are copied to the target network,which eventually outputs the converged Q-value function.

    In the botnet traffic detection model,to facilitate agent learning,when the agent selects an action to classify the traffic type,it receives a reward value“r”.The reward function is defined as follows:

    When the agent makes a correct selection,positive feedback of 5 scores is rewarded.On the other hand,when the agent makes an incorrect selection,negative feedback of -5 scores is given.Additionally,to prevent the agent from avoiding making decisions indefinitely,a penalty of 0.1 scores is deducted every 1 ms.This approach improves the model’s accuracy and enhances its decision-making speed.

    In addition,DQN introduces the Replay Memory mechanism.Traditional Q-learning algorithms interact and improve based on the current policy,discarding samples generated from each model’s utilization interaction after learning.This approach is inefficient in utilizing data samples but also leads to correlated data samples.If the model assumes the presence of correlations between data samples that do not exist,its decision-making ability will be significantly compromised.Therefore,in DQN,a quadruple(st,at,rt,) is added to the experience replay memory as a data sample.During training,the model learns from a batch of samples of both new data and randomly sampled old data from the replay memory to change the model’s policy and update the Q-network parameters.Samples are stored sequentially in the experience replay memory according to their time order.If the memory is full,new samples will overwrite the oldest ones.The experience replay memory uniformly and randomly samples a batch from the replay memory of samples from the cached samples for the model to learn.This approach ensures that each training sample typically comes from multiple interaction sequences,reducing data correlations and improving sample utilization.

    After the model is constructed,5000 randomly selected samples from the preprocessed botnet dataset are inputted for training until convergence.The entire working principle of the DQN botnet traffic detection model is illustrated in Fig.7.

    Figure 7: DQN botnet detection model

    The overall pseudocode of the DQN detection model is as follows:

    At this point,the construction of the botnet traffic detection based on the DQN algorithm is complete,and it will be used for comparison with the subsequent construction of the DDQN model.

    3.3.3 DDQN Model

    The DQN algorithm tends to overestimate Q-values,resulting in the detection or performance ability needing to be better in the test set than in the training set,despite having high Q-values.By examining the algorithms of Q-learning and DQN,it can be observed that the max operation is used in Eqs.(3)–(5),which idealizes the selection and evaluation of an action value.However,if function approximation and noise are present,this operation will consistently yield overestimated results.To address this issue,the DDQN algorithm decouples the two Q-value functions,updating them randomly and utilizing each other’s experiences to update the network weightsθandθ-.By separating the action selection and action evaluation,DDQN aims to avoid the problem of overestimation.The formula is shown as Eq.(7).

    In the equation,DDQN uses the maximum Q-value from the first DQN to select an action and then employs the second DQN to compute the selected action.This approach effectively avoids the issue of inaccurate Q-value estimation caused by noise and function approximation.DDQN still utilizes a greedy policy to learn the estimated Q-values but evaluates its policy using the second set of weight parameters,θ-.The exchange ofθandθ-is performed continuously to update the model.Compared to DQN,DDQN introduces relatively minor changes.However,it effectively reduces the impact of noise on DQN,resulting in more accurate detection capabilities for the botnet traffic monitoring model and preventing premature convergence during training.

    Since the early stages of the algorithm are consistent with DQN,this paper adopts the pseudocode of DQN for implementation.The code is as follows:

    The working principle of DDQN is shown in Fig.8.

    Figure 8: DDQN botnet detection model

    With that,the construction of the botnet traffic detection based on the DDQN algorithm is complete.

    4 Analysis of Experimental Results

    4.1 Data

    In Section 3.1,the dataset has been processed in this study.A random sample of 5000 data samples was selected according to the proportion for the experiments.The quantity and labels of the selected dataset are shown in Table 8.

    Table 8: Randomly selected botnet dataset samples

    4.2 Model Assessment

    In this study,an additional set of 5000 samples was selected from the dataset using the same method described earlier as the test set.Two approaches were employed for feature classification:One using the OneR classifier and the other randomly selecting 10-dimensional features.These features were then separately inputted into the DQN and DDQN detection models.The Q-values of DDQN and DQN were compared,followed by the comparison of their precision and accuracy.Based on these observations,the conclusions of the experiment were drawn.The formulas for calculating accuracy and precision are as follows:

    In Eqs.(8) and (9),TP represents true positive,which refers to the positive samples correctly predicted as positive by the model.TN represents true negative,which refers to the negative samples correctly predicted as unfavorable.FP represents false positive,which refers to the negative samples incorrectly predicted as positive by the model.FN represents false negatives,which refers to the positive samples incorrectly predicted as unfavorable.

    4.2.1 Model Comparison with OneR Classifiers in CIC-Collection Test Set

    In this experiment,The 10 features selected by the OneR classifier,as mentioned in Section 3.2,are used for training.The comparison of Q-values between the models is shown in Fig.9.

    Figure 9: Comparison of Q-values

    It can be observed that,compared to DQN,DDQN does not frequently change its policy,resulting in less fluctuation in Q-values.Although DDQN does not achieve the same high score as DQN(4.3241),it also avoids the meager score (-3.1224).Overall,DDQN improves the stability of the model’s predictions.

    Next,we introduced the LSTM detection model to compare it with the two models to observe their judgment capabilities.The comparison of the accuracy and precision of the three models is shown in Fig.10.Since all data of the three models are higher than 0.5,the ordinate range of the line chart in Fig.10 is set between 0.5–1.

    From the figure,it can be found that although DDQN initially had lower accuracy in processing the test set than DQN,due to its dual Q-value characteristic,the accuracy after convergence was significantly higher than DQN,and DDQN completely outperformed DQN in terms of accuracy.From these two evaluation indicators,it can be concluded that DDQN has significantly stronger stability and adaptability than DQN;The difference between the two models and LSTM is not significant,and in terms of precision,DDQN and DQN are better than LSTM detection models after convergence.

    Figure 10: Comparison of accuracy and precision between DQN detection model and DDQN detection model

    However,the use of test sets alone cannot directly reflect the advantages of the Deep reinforcement learning DDQN detection model and DQN detection model over the deep learning LSTM detection model,nor can it fully reflect the superiority of the DDQN detection model over the DQN detection model,because the test sets are ultimately narrow and limited.So in the next experiment,the CICISD2017 dataset mentioned in Section 3.3.1,which has more data samples and is more complex and confusing,will be used to compare the detection results of the three models again and draw the conclusion.

    4.2.2 Model Comparison with OneR Classifiers in CIC-IDS2017 Datasets

    In this experiment,we directly used the features provided by the OneR classifier in Section 3.2 for training and testing,intending to verify the detection ability of the three models in unfamiliar datasets under the same batch of features.The comparison of Q-values between the models is shown in Fig.11.

    It can be observed that despite replacing the dataset,the trend of Q-value change is consistent with the test,and it is not higher than the best case of DQN but also not lower than the worst case of DQN.DDQN still performs more stably than DQN.

    Next,we introduced the LSTM detection model to compare it with the two models to observe their judgment capabilities.The comparison of the accuracy and precision of the three models is shown in Fig.12.

    It can be observed that after replacing the dataset,the accuracy and precision of the three models decreased,with LSTM’s accuracy dropping to 80.42% and DQN’s dropping to 83.94%.Although DDQN decreased,it still maintained an accuracy of 90.23%;In terms of accuracy,the DDQN detection model is also better than the DQN and LSTM models.Therefore,it can be concluded that Deep reinforcement learning has a stronger adaptability than deep learning,and because DDQN has set double Q-values,its adaptability and stability have been greatly improved on the basis of the original DQN.

    Figure 11: Comparison of Q-values between DDQN and DQN

    5 Conclusion

    This paper proposes a deep reinforcement learning method based on Double DQN for botnet traffic detection models.Compared to the DQN model built in the same way,the DDQN model improves the stability of traffic detection by decoupling the selection and action,which can maintain good detection capabilities in various environments.Experimental results demonstrate that feature selection using the OneR classifier enhances the detection capability of the model.Additionally,the stability of the DDQN model enables it to adapt better to variations in different datasets.Regardless of whether feature selection is performed using the OneR classifier,the detection model constructed with DDQN outperforms the DQN detection model in terms of accuracy and precision.However,due to the parameter exchange between the target network and the leading network in DDQN,the model becomes more complex,requiring more significant time and space costs than DQN.In future research,exploring other deep reinforcement learning algorithms that can reduce time and space costs may be beneficial.Furthermore,besides the OneR classifier,other classifiers can be explored to select features and train improved botnet traffic detection models.

    Acknowledgement:The authors gratefully acknowledge the anonymous reviewers for their valuable comments.It is precisely these suggestions that make this study more complete and professional.

    Funding Statement:This study was supported by the Liaoning Province Applied Basic Research Program,2023JH2/101600038.

    Author Contributions:Yutao Hu was responsible for the writing of the article,and together with Xiangyu Ma,they completed the entire experimental part of the article.Yuntao Zhao checked the feasibility and standardization of the article and communicated with the journal editor as the corresponding author.Yongxin Feng approved this study and contacted and provided financial support.

    Availability of Data and Materials:Not applicable.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    爱豆传媒免费全集在线观看| 成人欧美大片| 日韩欧美精品免费久久| 欧美在线一区亚洲| 国产黄色小视频在线观看| 99九九线精品视频在线观看视频| 99久久中文字幕三级久久日本| 亚洲av免费在线观看| 亚洲18禁久久av| 嫩草影院精品99| 青青草视频在线视频观看| 日日干狠狠操夜夜爽| 嫩草影院入口| 免费观看的影片在线观看| 在现免费观看毛片| 夜夜爽天天搞| 久久久久九九精品影院| 美女cb高潮喷水在线观看| 最近的中文字幕免费完整| 久久精品国产亚洲网站| 男插女下体视频免费在线播放| 免费黄网站久久成人精品| 春色校园在线视频观看| 老司机影院成人| 最近手机中文字幕大全| 乱人视频在线观看| 色综合站精品国产| 国产精品精品国产色婷婷| 最近中文字幕高清免费大全6| 亚洲美女视频黄频| 在线观看一区二区三区| 亚洲成人中文字幕在线播放| 少妇人妻一区二区三区视频| 日韩欧美精品v在线| 国产 一区精品| eeuss影院久久| 久久精品国产清高在天天线| 有码 亚洲区| 亚洲av第一区精品v没综合| 少妇猛男粗大的猛烈进出视频 | 国产精品久久久久久亚洲av鲁大| 欧美日韩精品成人综合77777| 少妇人妻一区二区三区视频| 日韩制服骚丝袜av| 黄片wwwwww| 男人舔女人下体高潮全视频| 美女国产视频在线观看| 色吧在线观看| 亚洲最大成人av| 欧美激情在线99| 毛片一级片免费看久久久久| 插阴视频在线观看视频| 亚洲av免费高清在线观看| 国产麻豆成人av免费视频| 国产69精品久久久久777片| 亚洲第一电影网av| 久久人妻av系列| 校园春色视频在线观看| 欧美成人精品欧美一级黄| 亚洲不卡免费看| 国内精品久久久久精免费| 亚洲成av人片在线播放无| 在线播放无遮挡| 少妇熟女aⅴ在线视频| 白带黄色成豆腐渣| 国产老妇女一区| 欧美+亚洲+日韩+国产| 亚洲精品久久久久久婷婷小说 | 欧美变态另类bdsm刘玥| 大又大粗又爽又黄少妇毛片口| 久久这里有精品视频免费| 99热这里只有是精品50| 久久久久久久久中文| www.av在线官网国产| 91久久精品电影网| 3wmmmm亚洲av在线观看| 国产精品人妻久久久影院| 亚洲国产欧美在线一区| 淫秽高清视频在线观看| 久久亚洲精品不卡| 久久久久久久午夜电影| 亚洲精品久久国产高清桃花| 一级黄片播放器| 亚洲不卡免费看| 精品久久久久久久人妻蜜臀av| 一级黄色大片毛片| 边亲边吃奶的免费视频| av免费观看日本| 亚洲熟妇中文字幕五十中出| 免费观看人在逋| 成人无遮挡网站| 男人和女人高潮做爰伦理| 亚洲最大成人手机在线| 桃色一区二区三区在线观看| 99久国产av精品| 国产伦精品一区二区三区四那| 欧美区成人在线视频| 精品久久久久久久久亚洲| 色吧在线观看| 91久久精品国产一区二区成人| 国产片特级美女逼逼视频| 99热这里只有精品一区| 欧美高清成人免费视频www| 亚洲国产高清在线一区二区三| 精华霜和精华液先用哪个| 国产精品福利在线免费观看| 亚洲精品色激情综合| 成年女人永久免费观看视频| 免费观看人在逋| 国产美女午夜福利| 国产综合懂色| 两性午夜刺激爽爽歪歪视频在线观看| 免费观看在线日韩| 男人舔奶头视频| 深夜精品福利| 国产高清有码在线观看视频| 欧美成人一区二区免费高清观看| 亚洲欧美日韩高清专用| 中文字幕制服av| 国产久久久一区二区三区| 中文资源天堂在线| 午夜福利在线观看免费完整高清在 | 成人综合一区亚洲| 亚洲色图av天堂| 丝袜喷水一区| 99热这里只有是精品50| 黄色日韩在线| 亚洲乱码一区二区免费版| 国产亚洲91精品色在线| 午夜老司机福利剧场| 久久草成人影院| 日韩视频在线欧美| 国产老妇女一区| www日本黄色视频网| 国产精品日韩av在线免费观看| 天天躁日日操中文字幕| 麻豆成人午夜福利视频| 99久国产av精品| 国产伦在线观看视频一区| 久久精品91蜜桃| 亚洲真实伦在线观看| 国产精品一区二区三区四区久久| 卡戴珊不雅视频在线播放| 桃色一区二区三区在线观看| 看片在线看免费视频| 亚洲人成网站在线播放欧美日韩| 国产精品久久久久久亚洲av鲁大| 亚洲自偷自拍三级| 天美传媒精品一区二区| 99热全是精品| 内地一区二区视频在线| 看黄色毛片网站| 人妻夜夜爽99麻豆av| 午夜福利在线在线| 国产成人精品婷婷| 18禁在线无遮挡免费观看视频| 51国产日韩欧美| 又黄又爽又刺激的免费视频.| 国产伦在线观看视频一区| 日本-黄色视频高清免费观看| 深夜a级毛片| 亚洲欧美日韩卡通动漫| 成年免费大片在线观看| 成人欧美大片| 天堂中文最新版在线下载 | 日日啪夜夜撸| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 九九久久精品国产亚洲av麻豆| 亚州av有码| 乱人视频在线观看| 亚洲欧美中文字幕日韩二区| 久久久久久久亚洲中文字幕| 熟女电影av网| 色噜噜av男人的天堂激情| 嫩草影院新地址| 欧美色欧美亚洲另类二区| 大又大粗又爽又黄少妇毛片口| 成人亚洲欧美一区二区av| 久久精品国产自在天天线| 一区二区三区四区激情视频 | 欧美精品一区二区大全| 久久久久久久久大av| 日韩av在线大香蕉| 国产精品一二三区在线看| 国产精品野战在线观看| 少妇高潮的动态图| 久久热精品热| АⅤ资源中文在线天堂| 国产黄色小视频在线观看| 日韩一区二区三区影片| 亚洲av成人精品一区久久| 国产毛片a区久久久久| 日产精品乱码卡一卡2卡三| 亚洲av免费在线观看| 一区二区三区四区激情视频 | 国产免费男女视频| 国产亚洲5aaaaa淫片| 色吧在线观看| 最近手机中文字幕大全| 99在线视频只有这里精品首页| 老师上课跳d突然被开到最大视频| 一个人看的www免费观看视频| 身体一侧抽搐| 一边亲一边摸免费视频| 禁无遮挡网站| 日本一本二区三区精品| 悠悠久久av| 99在线人妻在线中文字幕| а√天堂www在线а√下载| 韩国av在线不卡| 99久久精品一区二区三区| 亚洲真实伦在线观看| 一区二区三区四区激情视频 | 久久99蜜桃精品久久| 日韩成人av中文字幕在线观看| 啦啦啦啦在线视频资源| 特级一级黄色大片| 欧美日韩在线观看h| 91在线精品国自产拍蜜月| 晚上一个人看的免费电影| www.av在线官网国产| 国产精品无大码| 免费人成视频x8x8入口观看| 午夜福利成人在线免费观看| 黄色配什么色好看| 亚洲国产精品sss在线观看| 亚洲精品国产成人久久av| 久99久视频精品免费| 国产亚洲精品av在线| 精华霜和精华液先用哪个| 欧美在线一区亚洲| 国产精品永久免费网站| 永久网站在线| 欧美一区二区国产精品久久精品| 亚洲人与动物交配视频| 精品久久久久久久人妻蜜臀av| 99久久久亚洲精品蜜臀av| 深爱激情五月婷婷| 国产日本99.免费观看| 日本av手机在线免费观看| 亚洲一区高清亚洲精品| 亚洲国产精品合色在线| 成人国产麻豆网| 禁无遮挡网站| 日韩一区二区视频免费看| 国产一级毛片七仙女欲春2| АⅤ资源中文在线天堂| 久久99热6这里只有精品| 赤兔流量卡办理| 日韩欧美 国产精品| 亚洲精品国产成人久久av| 极品教师在线视频| 亚洲色图av天堂| 深夜精品福利| 国产黄色视频一区二区在线观看 | 国产成人午夜福利电影在线观看| 亚洲国产精品sss在线观看| 欧美在线一区亚洲| 亚洲av电影不卡..在线观看| 午夜视频国产福利| 日韩人妻高清精品专区| 国产av不卡久久| 日韩亚洲欧美综合| 亚洲欧美成人精品一区二区| 亚洲一区二区三区色噜噜| 亚洲国产欧美在线一区| 亚洲成人久久爱视频| 亚洲国产日韩欧美精品在线观看| 美女大奶头视频| 精品久久久久久久末码| 中文字幕久久专区| 亚洲精品乱码久久久久久按摩| 久久久久久大精品| 99久国产av精品| 九九在线视频观看精品| 国产高清视频在线观看网站| 久久人人爽人人片av| 内射极品少妇av片p| 少妇丰满av| 午夜老司机福利剧场| 成人欧美大片| 日产精品乱码卡一卡2卡三| 久久6这里有精品| 国产精品麻豆人妻色哟哟久久 | 非洲黑人性xxxx精品又粗又长| 赤兔流量卡办理| 中文亚洲av片在线观看爽| 精品久久久久久久久av| 99久久精品国产国产毛片| 在线播放无遮挡| www日本黄色视频网| 久久国内精品自在自线图片| 欧美变态另类bdsm刘玥| 自拍偷自拍亚洲精品老妇| 天美传媒精品一区二区| 亚洲不卡免费看| 永久网站在线| 国产一区二区三区在线臀色熟女| 成人亚洲精品av一区二区| 午夜福利在线观看免费完整高清在 | 波多野结衣高清作品| 99热全是精品| 在线观看66精品国产| 国产精品三级大全| 51国产日韩欧美| 黄色一级大片看看| 国产一区亚洲一区在线观看| 欧美3d第一页| 欧美日韩综合久久久久久| 少妇的逼好多水| 性欧美人与动物交配| 三级男女做爰猛烈吃奶摸视频| 亚洲18禁久久av| 人妻系列 视频| 三级男女做爰猛烈吃奶摸视频| 哪里可以看免费的av片| 国产69精品久久久久777片| 国产精品1区2区在线观看.| 有码 亚洲区| 男人狂女人下面高潮的视频| 免费av不卡在线播放| 亚洲国产欧美在线一区| 青春草国产在线视频 | 91aial.com中文字幕在线观看| а√天堂www在线а√下载| 亚洲在线观看片| 国产精品久久久久久亚洲av鲁大| 精品免费久久久久久久清纯| 十八禁国产超污无遮挡网站| ponron亚洲| 午夜亚洲福利在线播放| 精品久久久噜噜| 国产极品天堂在线| 岛国在线免费视频观看| 内地一区二区视频在线| 天天躁夜夜躁狠狠久久av| 18禁在线无遮挡免费观看视频| 成人综合一区亚洲| 国产 一区精品| 午夜老司机福利剧场| 国产精品人妻久久久久久| 少妇被粗大猛烈的视频| 国产片特级美女逼逼视频| 人妻制服诱惑在线中文字幕| 国产男人的电影天堂91| 高清毛片免费看| 日本免费一区二区三区高清不卡| 97超视频在线观看视频| 欧美3d第一页| 永久网站在线| 免费观看人在逋| 国产色婷婷99| 男人舔女人下体高潮全视频| 精品欧美国产一区二区三| 伊人久久精品亚洲午夜| 美女国产视频在线观看| 久久99热6这里只有精品| 午夜福利在线观看免费完整高清在 | 狂野欧美白嫩少妇大欣赏| 白带黄色成豆腐渣| 色综合站精品国产| 高清毛片免费看| 男的添女的下面高潮视频| 日本av手机在线免费观看| 啦啦啦观看免费观看视频高清| 久久精品久久久久久噜噜老黄 | 乱系列少妇在线播放| 国产免费男女视频| 亚洲av成人av| 如何舔出高潮| 我的老师免费观看完整版| 欧美性猛交黑人性爽| 亚洲第一电影网av| 久久久午夜欧美精品| 中国美女看黄片| 精品久久久久久久人妻蜜臀av| www.色视频.com| 可以在线观看的亚洲视频| 国产人妻一区二区三区在| 中文资源天堂在线| 国产精品免费一区二区三区在线| 亚洲一区高清亚洲精品| 日本黄大片高清| 黄色视频,在线免费观看| 国产亚洲av片在线观看秒播厂 | 美女内射精品一级片tv| 亚洲三级黄色毛片| 国产蜜桃级精品一区二区三区| 麻豆av噜噜一区二区三区| 全区人妻精品视频| 一夜夜www| 看片在线看免费视频| 亚洲av.av天堂| 婷婷色av中文字幕| 国产精品一区二区性色av| 丝袜美腿在线中文| 国产午夜精品一二区理论片| 久久精品国产清高在天天线| 久久草成人影院| av.在线天堂| 国产黄色小视频在线观看| 国产激情偷乱视频一区二区| 最新中文字幕久久久久| 少妇熟女欧美另类| 精品不卡国产一区二区三区| 久久久久久国产a免费观看| 大又大粗又爽又黄少妇毛片口| 久久婷婷人人爽人人干人人爱| 久99久视频精品免费| 夫妻性生交免费视频一级片| 欧美一区二区国产精品久久精品| 亚洲美女搞黄在线观看| 亚洲真实伦在线观看| 久久国内精品自在自线图片| www.av在线官网国产| 舔av片在线| 一级毛片aaaaaa免费看小| 美女国产视频在线观看| 中出人妻视频一区二区| 欧美一区二区精品小视频在线| 亚洲av一区综合| 国产av麻豆久久久久久久| 亚洲一级一片aⅴ在线观看| 国产精品蜜桃在线观看 | 久久精品影院6| 亚洲国产精品国产精品| 免费观看a级毛片全部| 亚洲欧美日韩高清专用| 久久鲁丝午夜福利片| АⅤ资源中文在线天堂| 久久精品国产鲁丝片午夜精品| 久久久午夜欧美精品| 午夜精品在线福利| 变态另类成人亚洲欧美熟女| 综合色av麻豆| 国产又黄又爽又无遮挡在线| 97人妻精品一区二区三区麻豆| 最近2019中文字幕mv第一页| 丝袜喷水一区| 婷婷色综合大香蕉| 99久久精品国产国产毛片| 国产白丝娇喘喷水9色精品| 免费观看人在逋| 中文字幕久久专区| 91麻豆精品激情在线观看国产| 国产69精品久久久久777片| 日韩av不卡免费在线播放| 亚洲精品国产av成人精品| 黄色一级大片看看| 欧美一区二区亚洲| 日韩欧美在线乱码| 国产精品免费一区二区三区在线| 免费黄网站久久成人精品| 一区二区三区高清视频在线| av视频在线观看入口| 麻豆成人av视频| 长腿黑丝高跟| 91精品国产九色| 日本与韩国留学比较| 久久精品夜色国产| 伦精品一区二区三区| 亚洲最大成人av| 亚洲在线自拍视频| 午夜精品国产一区二区电影 | ponron亚洲| 国产亚洲精品久久久com| 国产成人午夜福利电影在线观看| 91麻豆精品激情在线观看国产| 99热这里只有精品一区| 婷婷色综合大香蕉| 亚洲一区高清亚洲精品| 久久久久久九九精品二区国产| 亚洲欧美日韩高清专用| 亚洲电影在线观看av| 97热精品久久久久久| 白带黄色成豆腐渣| 日本免费a在线| 国产伦在线观看视频一区| or卡值多少钱| 赤兔流量卡办理| 有码 亚洲区| 青春草视频在线免费观看| 亚洲av.av天堂| 老女人水多毛片| 亚洲欧美成人综合另类久久久 | 尤物成人国产欧美一区二区三区| 最新中文字幕久久久久| 中出人妻视频一区二区| 2021天堂中文幕一二区在线观| 非洲黑人性xxxx精品又粗又长| 91午夜精品亚洲一区二区三区| 欧美色视频一区免费| 大香蕉久久网| 成人午夜高清在线视频| 中国美白少妇内射xxxbb| 乱人视频在线观看| 亚洲欧美成人精品一区二区| 一边亲一边摸免费视频| 99久久成人亚洲精品观看| 国产精品久久久久久亚洲av鲁大| 国产精品美女特级片免费视频播放器| 欧美日韩一区二区视频在线观看视频在线 | 欧美精品一区二区大全| 麻豆成人午夜福利视频| 亚洲电影在线观看av| 亚洲五月天丁香| 三级男女做爰猛烈吃奶摸视频| 亚洲五月天丁香| 国产黄片美女视频| 美女内射精品一级片tv| 亚洲av二区三区四区| 91在线精品国自产拍蜜月| 亚洲欧美日韩东京热| 免费大片18禁| 91精品一卡2卡3卡4卡| 床上黄色一级片| 国产成人a区在线观看| 亚洲一级一片aⅴ在线观看| 日本一本二区三区精品| 亚洲美女视频黄频| 中文字幕免费在线视频6| 男人舔女人下体高潮全视频| 欧美另类亚洲清纯唯美| 国产不卡一卡二| 欧美高清性xxxxhd video| 国内揄拍国产精品人妻在线| 亚洲欧美精品综合久久99| 看非洲黑人一级黄片| 性欧美人与动物交配| 少妇人妻一区二区三区视频| 久久草成人影院| 亚洲精品久久久久久婷婷小说 | 日韩成人av中文字幕在线观看| 中文资源天堂在线| 观看美女的网站| 不卡视频在线观看欧美| a级毛片免费高清观看在线播放| 国产午夜精品一二区理论片| 亚洲最大成人av| 伦精品一区二区三区| 成人午夜高清在线视频| 看十八女毛片水多多多| 插阴视频在线观看视频| av福利片在线观看| 一本久久中文字幕| 亚洲一区二区三区色噜噜| 久久午夜亚洲精品久久| 国产成人精品久久久久久| 人妻少妇偷人精品九色| 欧美+亚洲+日韩+国产| 成年免费大片在线观看| 久久精品国产自在天天线| 亚洲第一电影网av| 亚洲精品色激情综合| 国内精品久久久久精免费| 九色成人免费人妻av| 精品国内亚洲2022精品成人| 国产精品日韩av在线免费观看| 小蜜桃在线观看免费完整版高清| 97超视频在线观看视频| 联通29元200g的流量卡| 长腿黑丝高跟| 人人妻人人看人人澡| 我要搜黄色片| 国产精品国产高清国产av| 六月丁香七月| 亚洲中文字幕日韩| 亚洲国产精品成人久久小说 | 少妇人妻精品综合一区二区 | 亚洲av熟女| 成年版毛片免费区| 99久国产av精品国产电影| 色哟哟哟哟哟哟| 麻豆精品久久久久久蜜桃| 日韩中字成人| 一本精品99久久精品77| 国产黄a三级三级三级人| 亚洲国产日韩欧美精品在线观看| 69人妻影院| 精品不卡国产一区二区三区| av女优亚洲男人天堂| 国产熟女欧美一区二区| 久99久视频精品免费| 国产真实伦视频高清在线观看| 别揉我奶头 嗯啊视频| 亚洲av成人av| 国产成人一区二区在线| 成人特级av手机在线观看| 成人永久免费在线观看视频| 99久久无色码亚洲精品果冻| 欧美日韩一区二区视频在线观看视频在线 | 国产亚洲91精品色在线| 国产精品不卡视频一区二区| а√天堂www在线а√下载| 欧美最新免费一区二区三区| 国产精华一区二区三区| 男女啪啪激烈高潮av片| 国产精品蜜桃在线观看 | 麻豆成人午夜福利视频| 最好的美女福利视频网| 久久久久久久久久久丰满| 九色成人免费人妻av| 一个人看视频在线观看www免费| 美女内射精品一级片tv| 一区二区三区高清视频在线| 久久精品国产99精品国产亚洲性色| 自拍偷自拍亚洲精品老妇| 国产乱人偷精品视频| 国产成人福利小说| 成人高潮视频无遮挡免费网站| 中国美女看黄片| 男的添女的下面高潮视频| 婷婷色av中文字幕| 国产精品国产三级国产av玫瑰| 中国美白少妇内射xxxbb|