• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning?

    2021-12-22 06:48:20WeiDeng鄧為GuoyuanQi齊國元andXinchenYu蔚昕晨
    Chinese Physics B 2021年12期
    關(guān)鍵詞:齊國

    Wei Deng(鄧為) Guoyuan Qi(齊國元) and Xinchen Yu(蔚昕晨)

    1Tianjin Key Laboratory of Advanced Technology in Electrical Engineering and Energy,School of Control Science and Engineering,Tiangong University,Tianjin 300387,China

    2School of Mechanical Engineering,Tiangong University,Tianjin 300387,China

    Keywords: COVID-19,SIHR model,deep reinforcement learning,DQN,secondary outbreak,economy

    1. Introduction

    As of April 13, 2021, the number of diagnosed cases of COVID-19 worldwide reached 137 941 696, and at least 2 967 745 individuals have died from this virus since the first report in December 2019.[1]According to the research,[2]the new coronavirus is highly contagious with a relatively low case fatality rate, and has a long asymptomatic infection period. The infected individuals in the incubation period can infect normal people without any symptoms.[3]Therefore,the most effective measure to prevent the rapid spread of COVID-19 is nucleic acid detection, isolation measures and travel tracing.[4]However, extreme blockade measures have disastrous consequences for economy. The quarantine policy may be an effective short-term measure. However, the indefinite quarantine before the vaccine is released and put on the market on a large scale will prevent billions of people in the world from earning income,especially in countries with a more vulnerable economy, leading to an increase in the mortality rate of low-income people,[5]especially children.[6]

    Dynamic and mathematical models that simulated the spread of diseases can guide government policymakers to mitigate the detrimental consequences of the epidemic.[7]Many researchers have analyzed and predicted the spread of the epidemic by adopting the improved Susceptible-Exposed-Infectious-Recovered (SEIR).[8–11]Fanget al.simulated the transmission of COVID-19 and the impact of quarantine measures on the epidemic.[8]Mandalet al. established the Susceptible-Exposed-Quarantined-Infectious-Recovered (SEQIR) model in Ref. [9], and formulated reliable epidemic prevention and control measures through the optimal control methods. Huanget al. studied the consequences of relaxing control measures in Spain.[10]Yuet al. proposed the SIHR model in which the parameters were designed as piecewise functions in lockdown time, and studied the possible secondary outbreaks after India loosened control.[11]Wanget al.proposed a novel epidemic model based on two-layered multiplex networks to explore the influence of positive and negative preventive information on epidemic propagation.[12]Huanget al. proposed a new vaccination update rule on complex network to discuss the role of vaccine efficacy in the vaccination behavior.[13]Ronget al.studied the dependence of model parameters on the basic reproduction number.[14]Cuiet al.studied individuals’effective preventive measures against epidemics through reinforcement learning.[15]Tonget al.adopted agent-based simulation to assess disease-prevention measures during pandemics.[16]Some researchers have also adopted machine learning to predict the COVID-19,but have not considered the epidemic control.[17–19]

    In the literature above and the latest research of COVID-19, economy is not considered in the model of SEIR. Under the economic pressure caused by strict quarantine measures in the epidemic,some countries have pursued a balance between epidemic prevention and control and economic recovery. To accurately predict the spread of COVID-19 and evaluate consequences beyond the epidemic itself,the model must consider how quarantine measures may affect the economy.[20–24]However,to our best knowledge,there has been no model concerning preventing both peoples’lives and economic development that impacts the people’s welfare. We can regard the control of the epidemic and the economy’s development as an optimal control problem.

    Deep reinforcement learning (RL) is a machine learning technique that combines the perception ability of deep learning with the decision-making ability of the RL.Compared with other traditional decision-making optimization algorithms,the RL can realize model-free self-learning of high-dimensional mapping relationships from state to action. The RL is widely used in self-driving, optimal scheduling, path planning and other fields to solve optimal control problems.[25–27]Mnihet al.[28]introduced Deep Q-Network(DQN)that combines the deep neural networks and the RL. The DQN is an effective method of deep RL.Compared with traditional RL,the DQN can effectively improve learning efficiency in situations where the state space is too large or the environment is unknown.The balance between the retraining of the epidemic of Covid-19 and economic development is decision-making and policy optimization.Therefore,choosing the advanced method of the DQN to make an optimal policy is of great value and necessity.At present,most of the research on COVID-19 has mainly been devoted to giving analysis and prediction of the development trend of the pandemic. However, we have not found an optimal strategy for economic development and epidemic prevention and control using deep RL through searching references.

    In this paper, the SIHR model is adopted to simulate the spread of the epidemic, aiming to study the development of COVID-19 at different stages. The contribution and innovation of this paper are as follows.

    (i) An economic model affected by epidemic isolation measures is established. The development of the epidemic can be roughly divided into five stages, according to the government’s response measures and the trend of newly diagnosed cases. The effective reproduction number and the eigenvalues at the equilibrium point are introduced to verify the effectiveness of the model.

    (ii)Based on the deep reinforcement learning method of DQN,the blocking policy to maximize the economy under the premise of controlling the number of infections as much as possible is studied. The abilities of different countries to resist economic risks by adjusting the reward coefficient are simulated. From this,the optimal control policy of different countries is formulated.

    The remainder of this paper is organized as follows. In Section 2, the deep RL based on the DQN is introduced. In Section 3, a training experiment of deep RL based on the SIHR-based compartment model is designed. Section 4 studies the optimal policy in different conditions and adopts the optimal policy at different time points. In Section 5, a summary is made.

    2. Deep reinforcement learning

    Deep RL is a machine learning technique that combines the perception ability of deep learning with the decisionmaking ability of reinforcement learning.[29]Figure 1 shows the general framework of deep RL. Deep neural network obtains target observation information from the environment and provides state information. The RL takes environmental feedback as input and returns a policy that maximizes the timediscounted expected future rewards.

    Fig.1. Deep reinforcement learning framework.

    2.1. Markov decision process

    The government’s policy on COVID-19 can be approximately modeled as a Markov decision process (MDP). In a Markov process,we assume that the government does not fully understand their situation and what measures should be taken next.It only considers its current state and takes action leading to a new state. The MDP usually consists of four parts:O(observation state space),A(set of possible actions),P(transition probabilities),andV(set of value of the reward). At the stateot, the government takes the actionatand transfers from the current state to the next stateot+1with probabilityp. Finally,the government gets a rewardvtfor its action. This process can go on,or it can stop at a terminating state.

    The strategyπrepresents the probability distribution of actionAin each stateO. The goal of RL is to find an optimal economy-life balanced strategyπ?that maximizes the cumulative rewardVπthrough continuous interaction with the environment

    As the system environment changes,the method of calculating cumulative rewards will also be changed. In round tasks such as formulating a policy over a period of time, we usually useT-step of cumulative rewards,

    whereEπ[·]is the expectation under the strategyπ.

    2.2. Calculation of value function

    For a strategy,the value function can predict the cumulative reward that the government policy will obtain based on the current state in the future,which will bring great convenience to RL. For theT-step cumulative reward, given the current stateoand actiona,the state-action value function is the longterm reward expectation generated under the guidance of the strategyπ,which can be defined as

    From this,we can get the Bellman equation

    We can see that the state-action value function can be expressed in a recursive form.

    For all state-action pairs, there is an optimal strategyπ?to obtain the maximum expected return value. The strategyπ?is called the optimal strategy that can balance economic recovery and epidemic prevention and control, and its state-action value function can be defined as

    The Bellman equation changes to

    2.3. Deep Q network algorithm

    When the state space of the environment is vast, or the model is unknown,it is too costly for the government to obtain the value function using state transition functions or tables. It is necessary to approximate the value function through a nonlinear function approximator such as the deep neural network.This nonlinear function approximator can effectively store the experience accumulated by the government in adopting different policies. Equation(7)shows the updating process of theQfunction in table format,

    The DQN algorithm uses a deep neural network to approximate theQfunction,and equation(8)shows the updating process of its value function,

    whereαis the learning rate,andwis the weight of the neural network.

    When training a neural network,we use the mean square error to define the error function

    To get the maximumQvalue, we use the stochastic gradient descent method to update the parameters. We get the optimal strategy based on

    In the DQN training process,parameter selection and evaluation actions based on the same target value network will lead to overestimatingQvalue during the learning process, which will lead to more significant errors in the result. There are two groups of neural networks with different parameters and the same structure in double DQN.The online network is used to select the action corresponding to the maximumQvalue,and the target network is used to evaluate theQvalue of the optimal action. The target formula is as follows:

    Double DQN can separate action selection and strategy evaluation by using two sets of neural networks.In this way,we can estimate theQvalue more accurately and improve the speed of convergence.

    3. System model and scene construction

    3.1. Epidemic model and economic model

    The SIR dynamic model was firstly used for studying the Black Death in 1927.[30]The SIR-liked model has been widely adopted to simulate the spread of various infectious diseases.To simulate the spread of COVID-19 in different stages, we adopt the SIHR model[11]and add the isolation rate related to government quarantine measures.On this basis,we also establish an economic model affected by the quarantine measures.The following assumptions are needed:

    i)The community population is a closed system.

    ii)Everyone in the population is susceptible.

    iii)All the infected individuals enter the hospital for treatment.

    iv)Everyone in the population is not vaccinated.

    v)Ignore the impact of virus mutation on the transmission rate.

    The total populationNis composed of the susceptible individuals(S),the infected individualsI(latent individuals and those capable of spreading the coronavirus), the hospitalized individualsH(diagnosed patients diagnosed by the hospital),the recovered individualsR(immune to the coronavirus) and the dead individualsD. A schematic description of the model is depicted in Fig.2.

    Fig.2. Flow diagram of the dynamic system of COVID-19.

    Some susceptible individualsSwill be infected by contacting the infectiousI(inflowI), and the transmission rateα(t) indicates the possibility of infection per infector transmitting the disease to the susceptible.lrepresents the isolation rate that is mandated by the government and execution of people in the closed region. And the higherl, the lower isolation, andl=0 means the infectious route is completely cut off.Nis the total population andN=S+I+H+R+D.Yet,due to the limited diagnostic resources,only a portion of people could be diagnosed, soβ(t) indicates the probability of diagnosis. After being diagnosed, the patients are almost entirely isolated, so they would not be transmitted to others.The diagnosed infectorsIreceive treatment to reduce because of the recovery rateγ(t)and the mortality rated(t)caused by the disease,and the recovered individuals are not be infected if they have developed an immunity. The cumulative diagnosed cases can be expressed byC(t)=H(t)+R(t)+D(t). The following equations summarize the spread-prevention-infection dynamics model:

    wherel,α,β(t),γ(t),andd(t)respectively represent the rates of isolation,transmission,diagnosis,cure,and death based on the infectious disease model.β(t),γ(t),andd(t)are designed as Sigmoid cumulative functions 1/(1+ek(t?τ))composed ofk,τ,andtin different stages,kis usually positive inβ(t),γ(t),and negative ind(t), which means that theβ(t)andγ(t)will increase astincreases, while thed(t) is just opposite. The parameters setting above was given by Ref.[11].

    In the economic model, populations’ production will be affected by the lockdown measures.Compared with economic indicators such as gross domestic product (GDP), we only consider the wealth created by individuals, not the economic growth brought about by consumption.In our simulation,populations can be divided into two types:those whose productivity is highly damaged by quarantine and those whose productivity is less damaged.The total economic output is the sum of the outputs of all the individuals in the environment minus the medical expenses for treating patients. The individuals who are not isolated have normal productivity,isolated individuals lose a high percentage of their productivity (represent byη),dead individuals have no productivity, and hospitalized individuals have no productivity and pay for treatment. The following economic outputGper capita is proposed:

    whereηandμrepresent the reduced productivity per capita and average treatment expense,respectively.

    3.2. Indictors of controllability and stability of spread

    In terms of the controllability of the epidemic, the basic reproduction number(R0)measures the probability of the disease being transmitted to other populations through naive populations in initial stage (Ronget al., 2020). A real-time indicator in measuring the spread risk and the controllability of the spread is effective reproduction number (Re(t)).[31]In Eq.(12),Re(t)can be expressed as

    where??S(t) and ?C(t) represent the net newly infectious individuals and the net newly diagnosed infections.

    From the perspective of stability of the SIHR model, we solve the equilibrium point of the model(12)as(S?,0,0,R?,D?,C?),whileS?,R?,D?,C?can be any positive numbers less thanNand satisfyN=S?+R?+D?,C?=R?+D?.Under the premise of considering the stability of the epidemic, we can modify the model(12)as

    and assume thatX=(S I H R D). Now equation(15)can be expressed as

    whereBrepresent the 5×5 matrix to the right-hand side of Eq.(15). The characteristics equation ofBat the equilibrium point can be expressed as

    Then we can obtain the following eigenvalues:

    Here we observed thatλ1<0 and we give a specific example to analyze the role ofλ2andRe(t) in the spread of the epidemic. Supposing a closed area has 65 500 000 people, 500 unquarantined virus carriers, 100 diagnosed cases, no deaths and recovered cases in the outbreak stage. Figure 3 shows the simulated results with fixedβ(t)=0.10,α=0.5,and varyingl.

    From Fig. 3, we can observe that the newly diagnosed cases ?C(t) shows a single wave, the correspondingλ2andRe(t) decline. Moreover,λ2>0 andRe(t)>1 indicates that the newly infected cases increase and exceed the newly diagnosed cases, which means that the risk of spread of the pandemic may exists temporarily, and the system (15) will be in divergence. The biggerλ2andRe(t)are,the faster ?C(t)will grow. Conversely,λ2<0 andRe(t)<1 indicate the decline of ?C(t), which means the epidemic is under control and the system will be finally stable. The smallerλ2andRe(t) are,the faster ?C(t) will decline. It is worth noting that in the case ofλ2≡0 andRe(t)≡1,?C(t)will be a constant,which also means the infected individualsIwill not increase further.Therefore,λ2andRe(t) accurately depicts the stability and controllability of the system (15) and pandemic and further prove the effectiveness of the SIHR model. These results also indicates that the spread of the epidemic can be effectively affected by the quarantine measuresl,which is conducive to the establishment of the reward function.

    Fig.3. Simulated results with varying l and fixed β(t)=0.10,α =0.5,(a)the newly diagnosed cases,(b)effective reproduction number,(c)eigenvalues of equilibrium point.

    3.3. Preconditions for RL training

    Due to the constraints of physical conditions, the degree of public cooperation, system time lag and other factors, the following assumption must be considered:

    (I)The government needs to formulate a long-term quarantine policy, after at leastNdays, the government could change the isolation measures.

    (II) The government needs to implement different quarantine measures to deal with the changing situation of the epidemic. The quarantine ratesl1,l2,l3,l4represent the quarantine measures after the gradual unblocking in the state of emergency.

    (III)The system is updated in days. The number of diagnosed cases, deaths and the recovered cases will change with timet,and the smallest unit of timetis a day.

    3.4. Space and reward function

    When selecting statespace parameters, the performance improvement brought by an excellent new state information is significantly higher than that of other work. Similarly, some irrelevant interference information will have a counterproductive effect. The impact of dead individuals and recovered individuals on the epidemic is minimal, so they are not used as a statespace parameter. Statespace parameters include susceptible individualsS, infectious individualsI, hospitalized individualsHand timet. The observation state space is expressed as

    Action space includes isolation rates corresponding to isolation measures of different strengthsl1,l2,l3,l4andl1

    Besides, the isolation ratelrepresents the actionathat the government can perform,which meansa=l.

    The reward function penalizes the increase of the number of diagnosed cases,and also rewards the cumulative economic output.v?(st,at)ensures that the epidemic can be controlled,v+(st,at) ensures the maximization of cumulative gross production value. The reward function is expressed as

    wherev+(st,at)=G,v?(st,at)=?C/N, and?is the reward coefficient, representing the government’s emphasis on the economy.

    3.5. Economy-life optimal algorithm

    In this paper,we propose a short-term economy-life optimal algorithm based on deep RL,and its overall flow is shown in Fig.4.

    I)The original COVID-19 data is divided into several different stages to fit the SIHR model. Then the model is used to provide training data for RL, which can simulate the development of the epidemic under different government policies.The better the model fitted,the more reference value the optimal policy.

    II)The optimal policy derived from RL is mainly affected by the reward function. The reward coefficient?represents the government’s emphasis on the economy. Therefore, the optimal strategy for different countries can be formulated by adjusting the?.

    Fig.4. Algorithm flow chart.

    4. Experimental results and analysis

    It is noted that most countries are still suffering from the epidemic. The COVID-19 is far from over until the vaccine is successfully developed and put on the market on a large scale. Therefore, it is significant to adopt deep RL to study the economic-epidemic balance policies of different countries.According to the government’s response measures and the trend of newly diagnosed cases,the COVID-19 can be roughly divided into five stages:

    Stage I Outbreak stage. At the beginning of the epidemic, the government ignored the severity of the epidemic.The number of newly diagnoses has increased rapidly.

    Stage II Lockdown stage. The government implemented a strict isolation policy. The number of newly diagnoses peaked and began to decline.

    Stage III Gradually unblocking stage. The number of newly diagnoses has further decreased. The government began to unblock the city to recover the economy gradually.

    Stage IV Buffer stage. During this stage,the number of infections remained at a low level. But there is still a risk of an outbreak.

    Stage V Second or third outbreak stage. The number of newly diagnoses increased again after reaching the bottom,and the epidemic broke out again.

    The stage of the epidemic in different countries is shown in Fig.5. As shown in Fig.5,China and Iceland have entered the buffer stage early, and there has been no secondary outbreak. After entering the controllable stage of the epidemic,most European countries experienced a second outbreak.

    Fig.5. Stage of COVID-19 in different countries.

    Table 1. Fitting results of parameters.

    Fig. 6. Fitting curve and reported data, (a) cumulative confirmed cases, (b) newly diagnosed cases, (c) cumulative cured cases, and (d)cumulative dead cases.

    We notice that the Italian data is very representative.Therefore, we use data from different stages in Italy as the training data for the RL. Here, we fit the parameters ofl,α,β(t),γ(t), andd(t) by using the least square functionsfminconandlsqnonlinof Matlab.[14]The Italian government began to vaccinate the people on December 27,the number of vaccinated people(2 doses)reached 4 055 458(6.8%of the population)by April 13.[32]To avoid the influence of the vaccinated individual,twenty sets of data from February 22 to November 10 in Italy are used to fit the model. Figure 6 shows the fitting curve and the reported data. The model-based parameters by fitting the reported data are shown in Table 1.

    From Fig.6,we can see that the development of the epidemic in Italy can also be roughly divided into the above five stages. The fitting results are excellent, and the curve fits the real data. As shown in Table 1,the transmission rateαis usually fixed in different stages of an epidemic,and only changes during the second or third outbreak stage.The quarantine ratelrepresents the intensity of the government’s policy in response to the epidemic.lis different at each stage, but in a round of the epidemic,it first declines and then rises.This phenomenon shows that the government always locks down cities when the epidemic is severe and releases the lockdown to restore the economy after the epidemic eases. The diagnosis rateβ(t)and the cure rateγ(t) increase over time, while the mortality rate is the opposite. It is noted that in the second round of the epidemic,althoughlis nearly unchanged andαis significantly lower than the previous stage,a second outbreak still occurred.This phenomenon is due to the relaxation of vigilance by the government and the public during the second outbreak stage,resulting in a significant decrease in the diagnosis rate compared to the previous stage. Hidden virus carriers were not isolated,which led to a second outbreak.

    We adopted the coefficient of determinationR2to evaluate the goodness of the fitting results,[11]and the closer theR2is to 1, the better the fitting results. TheR2can be expressed as

    whereyi, ?yi,and ˉyirepresent the value of reported data,average value of reported data, and the fitted value in Italy from February 22 to November 10. Table 2 shows theR2of the cumulative diagnosed cases,daily diagnosed cases,recovered cases, and dead cases. The mean ofR2at different stages reached more than 0.84 and most value ofR2reached more than 0.9 or even 0.99. These results indicate that our model can fit the real data well, which is conducive to the training process of deep reinforcement learning and come up with an effective scheme.

    Table 2. Goodness of fitting results.

    4.1. Control strategy during outbreak stage

    On the premise of controlling the spread of the epidemic,recovering the economy as much as possible has become a concern for governments of many countries. We take the first day of the Italian government’s lockdown (March 10) as the starting point, 90 days later as a round, and assume that the government can take new quarantine measures at least 20 days after. Based on the TensorFlow framework,a fully connected neural network with a 3-layer network structure as theQ-value network of DQN has been designed. The input layer is a 5-dimensional feature tensor, including susceptible individualsS, infectious individualsI, hospitalized individualsH, timetand actiona. The hidden layer has five layers of the network,with each layer of the network having 20 neuron nodes. The output layer is a 4-dimensional tensor,which represents theQvalue of different actions (l1,l2,l3,l4). The memory buffer capacity is 10 000,and the random batch size is 64.

    We use the e-greedy strategy to train the agent,[22]which helps the government obtain a better strategy. The agent randomly explores actionsain the initial stage,and gets the corresponding reward valuevafter performing the actionato update theQvalue of Eq.(3). As the training progresses,it gradually replaces random exploration with network predictions.The agent selects the actiona=lwith the maximumQvalue of Eq.(6)in the output layer of the neural network and sends it to the SIHR model of Eq.(12)as the isolation rate at the next moment.

    Figure 7 shows the agent’s performance after 6500 episodes of training(90 days after the initial date is episode).In each episode,the agent made 90 action choices and updated the parameters in the neural network. The abscissa represents the number of training episodes, and the ordinate represents the rewards obtained by the agent in each episode. The result shows that as the number of training rounds increases,the agent gets convergent and steady rewards,which indicates that the agent already has some intelligent features.

    Figure 8 shows the optimal control strategy and epidemic development trend obtained by the agent after 20 000 episodes of training. We provide four isolation rates, as shown in Fig. 8(a), corresponding to the government’s isolation measures in different training periods. The agent decides to select which isolation rate according to the training. Consequently,figures 8(b), 8(c), and 8(d) show the newly diagnosed cases,the total diagnosed cases,and the cumulative dead cases after the government took quarantine measures using the deep RL.

    Fig. 8. Impact of government’s control after March 11 on, (a) isolation rate based on isolation measure, (b) newly diagnosed cases, (c)cumulative diagnosed cases,and(d)economic output compared to the pre-epidemic period.

    Fig.7. Training process.

    Figure 8(a)shows the optimal selection using the deep RL training. From Figs.8(a)and 8(b),in the outbreak stage when the newly diagnosed cases are increasing,the strategy given by the agent tends to adopt the most stringent isolation measures in the early stage of the epidemic becausel1that is the least number in the early stage is taken. After the epidemic is basically controlled,the agent recommends gradually lifting the lockdown measures to recover the economy.In the unblocking process,the isolation rate rises froml1tol2,then skipsl3and directly rises tol4.The rate of decrease in the number of newly diagnosed patients slowed down,but after the second release,the number of newly diagnosed people rose slightly.However,as the government stepped up the virus detection measures,the number of newly diagnosed people continued decreasing. In Fig.8(d),we can see that as the government gradually relaxes the isolation measures, the economic growth rate has also increased.

    These results indicate that the government should immediately adopt the most severe isolation measures in response to the rapidly spreading epidemic. In the process of gradual unblocking,the time and degree of unblocking not only affect the speed of economic recovery, but also determine whether there will be a second outbreak in the future. After accumulating experience through thousands of training episodes, the RL can formulate effective prevention and control strategies for the epidemic.

    4.2. Control strategy in different situations in outbreak stage

    Considering the differences in the industrial structure and economic risk resistance of different countries,too strict isolation measures may bring greater risks to economically vulnerable countries. Therefore, the epidemic prevention and control policy should be combined with the conditions of different countries.

    What is directly related to the government’s concern for the economy is the reward coefficient?in the reward function.The reward coefficient will affect the weight of the economy in the reward function—the smaller the?,and the more economical the policy. We take different reward coefficients?1,?2,?3(?1

    Figure 9(a) compares the isolation rate corresponding to the optimal control scheme under different parameters?. The smaller?is, the more emphasis is on recovering the economy,and the earlier the first unblocking and gradual unblocking. And in the case of?=?3, the degree of unblocking is more conservative. Figures 9(b)and 9(c)show the trend of the newly diagnoses and total diagnoses under different strategies.Compared with the reward coefficient?3, the final cumulative diagnosed cases of?1,?2were increased by 107.47%and 6.67%, respectively, and the cumulative dead case increased by 9.59%and 0.65%,respectively. As?decreases,the isolation measures become more relaxed,leading to that the newly diagnosed case and the cumulative diagnosed case increase.And a second outbreak occurred for?=?1,which is the consequence of striving to recover the economy in the short term.Figure 9(d)shows the trend of economic output under different strategies. Compared with the reward coefficient?3, the final economic output of?1,?2were increased by 8.17%and 18.95%,respectively. With the decrease of?,the average isolation rate decreases,which means more people are engaged in production activities, and the cumulative gross product value increases. Table 3 compares the specific data. Compared with the?2,the final economic output of?1is not much higher,and it pays a huge price with the much higher hospitalized people and deaths. Part of the reason is that the second outbreak has led to more diagnosed cases and medical expenses.

    Fig.9. Impact of government’s control after March 11 in different ?,(a)isolation rate based on isolation measure,(b)newly diagnosed cases,(c)cumulative diagnosed cases,and(d)economic output compared to the pre-epidemic period.

    Table 3. Comparison of data of different reward coefficients on March 11.

    These results show that based on different reward coefficients?, the epidemic control strategies given by the agent after training are also different. The smaller the?,the weaker the country’s ability to resist risks in the economy. The economy in short term will be more considered when formulating policies. The time for unblocking will come earlier and the intensity of unblocking will be greater, which will lead to an increase in the diagnosed cases and even a second outbreak.However, economically biased policies can only reduce economic losses in the short term. In the long term, looser policies will lead to more diagnosed cases and deaths,and a higher probability of recurrence will lead to a longer duration of the epidemic,which will delay the economic recovery.

    The above policies have one thing in common: the government implemented lockdown measures in the early stages of the outbreak to avoid significant medical expenses and deaths caused by the increasing diagnosed cases. We assume that the government did not lock down the city to maintain the economy and only adopted minimal quarantine measuresl2within 90 days after March 11. Figure 10 compares the economic growth curve of this policy and the optimal strategy recommended by the agent. It can be seen from the figure that although the adoption of loose quarantine measures can achieve rapid economic growth in the short term,as the number of diagnosed cases and deaths further increases, medical expenditures increase. The growth rate of the economy slowed down and reached an inflection point on April 24,which means that most of the population in the environment has been diagnosed and hospitalized without considering the carrying capacity of the medical system. They were unable to work,and the medical expenses exceeded the economic output of the whole society,and the economy began to grow negatively. After the epidemic was basically controlled,the economy of negative policy began to grow again,but the speed was significantly lower than the optimal strategy. Compared with the optimal policy given by the deep RL, the economic output decreased by 37.8%under the negative policy that the government adopted minimal quarantine measurel2.

    Fig.10. Economic output curve under different control strategies.

    The above results indicate that whether it is from the perspective of ensuring economic growth or controlling the spread of the epidemic,the strictest isolation measures should be taken during the outbreak stage when the newly diagnosed cases increasing rapidly. When the epidemic is under control,gradual unblocking will help recover the economy.

    4.3. Public policy in different time points of the second outbreak stage

    Due to the economic pressure caused by the long-term lockdown, European countries have gradually unblocked the city after the epidemic was basically under control. However,there have still been some virus carriers in the environment.The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. As time passed,the newly diagnosed cases in most European countries,including Italy,began to rebound,and the epidemic entered the second outbreak stage or even the third outbreak stage.

    The conclusions we got in the first outbreak stage are still applicable to the second or third outbreak stage. Specific government policies can be given after the RL training. In this section, we have set September 26, October 6, and October 16 as the starting date for the government to adopt isolation measures to study the impact of the control strategy on the epidemic and economy on the different dates of the second outbreak stage. From Fig. 11, although the control strategy on different dates has little effect on the epidemic’s duration,the sooner control measures are taken, the fewer cumulative diagnosed cases and cumulative dead cases,and the higher the total economic output. Table 4 compares the specific data.

    In Fig. 11(a), after the government implemented lockdown measures, the number of cumulative diagnosed cases began to slow down and eventually stabilized. Besides, in Figs.11(a)and 11(b),compared with the date of lockdown on September 26,the final cumulative diagnosed cases of October 6,and October 16 were increased by 17.54%and 64.37%,respectively, and the cumulative dead case increased by 4.94%and 15.81% respectively. According to Fig. 11(c), the later the government takes lockdown measures,the greater the economic loss,even if the government can obtain more economic growth in the early stage. The reason for this phenomenon is that the later the government lockdown the country, the more infections and hospitalizations in the environment, and the time for unblocking will be later,which will lead to more significant economic losses.

    The results indicate that if the government can take effective prevention and control measures in time during the second explosion, it can effectively reduce the number of people infected with the epidemic and ensure continued economic growth. Although the policy of balancing economy and epidemic has controlled the spread of the epidemic,the virus carriers in the population have not completely disappeared. If the government relaxes inspections or the people’s awareness of epidemic prevention declines, the epidemic may break out again. Therefore, the government should strengthen personal nucleic acid testing and establish the case tracing mechanism to increase the diagnosis rate.

    Fig. 11. Impact of the same reward coefficient on: (a) cumulative diagnosed cases, (b) cumulative dead cases, and (c) economic output compared to the pre-epidemic period.

    Table 4. Comparison of data after adopting optimal policy at different dates.

    5. Conclusion

    At present, the global COVID-19 epidemic is still severe. More and more countries have experienced second or even third outbreaks. The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. Under the premise of controlling the spread of the epidemic, how to ensure economic development as much as possible has become a major problem considered by many countries.In the above research,we improved the SIHR model to simulate the spread of COVID-19 in Italy at different stages and the determination coefficientR2is used to evaluate the goodness of the fitting results. On this basis, we established an economic model affected by the quarantine measures. We used the effective regeneration number and the eigenvalues at the equilibrium point of the model as indicators of controllability and stability of model.We adopted the DQN-based deep reinforcement learning method and introduced the cumulative diagnoses and cumulative gross production value into the reward function as rewards and punishments. After adequate training, an economy-life balanced policy at different stages of the epidemic was formulated.

    The research results show that our model and scheme are effective,to control the spread of the epidemic effectively,the government should adopt the most stringent blockade measuresl1during the outbreak stage,and the timetfor unblocking should be determined by the country’s ability to resist economic risks. These results also suggest that optimal policies may differ in various countries dependent on the level of disease spread and anti-economic risk ability?. For example,in countries with more vulnerable economies and a lower transmission rateα, the consequences of the disease may be less than those of other countries. In contrast, the consequences of blockade policies may cause an economic crisis which will lead many people to be unemployed and difficult to live.In the second outbreak stage,the sooner the lockdown measures are taken, the smaller the losses caused by the epidemic will be.Although the economic outputGwill suffer in the short term,it will benefit the long term.

    The research is not only applicable to Italy,but also provides references for other countries to formulate policies.Similarly,deep reinforcement learning can also be applied to different models. When the model is closer to the real world,the optimal strategy given by deep reinforcement learning will be more accurate.

    Data availability statement

    The data that supports the findings of this study are available within the article[and its supplementary material].

    猜你喜歡
    齊國
    Modeling and dynamics of double Hindmarsh–Rose neuron with memristor-based magnetic coupling and time delay?
    蝸牛的故事
    老馬識(shí)途
    遠(yuǎn)水救不了近火
    遠(yuǎn)水救不了近火
    鄒忌比美
    奢華萬乘國 齊地瑪瑙紅——齊國瑪瑙器藝術(shù)欣賞
    齊國強(qiáng) 作品
    秉筆直書
    略論古齊國的治國之道
    夜夜爽夜夜爽视频| 十八禁网站网址无遮挡 | 十八禁网站网址无遮挡 | 欧美性感艳星| 日本欧美国产在线视频| 中文字幕人妻熟人妻熟丝袜美| kizo精华| 国产av精品麻豆| 哪个播放器可以免费观看大片| 亚洲精品国产色婷婷电影| 嫩草影院入口| 国产亚洲一区二区精品| 亚洲av中文字字幕乱码综合| 国产精品一区二区性色av| 国产伦精品一区二区三区视频9| 中国国产av一级| 日本欧美国产在线视频| 久久国产精品男人的天堂亚洲 | 男人爽女人下面视频在线观看| 直男gayav资源| 人人妻人人爽人人添夜夜欢视频 | 一区在线观看完整版| 国产亚洲最大av| 亚洲成人手机| 亚洲aⅴ乱码一区二区在线播放| 欧美成人a在线观看| 好男人视频免费观看在线| 春色校园在线视频观看| 中文字幕精品免费在线观看视频 | 水蜜桃什么品种好| 2021少妇久久久久久久久久久| 能在线免费看毛片的网站| 国内精品宾馆在线| 精品一区二区三区视频在线| 免费看av在线观看网站| 人妻系列 视频| 亚洲人成网站在线播| 亚洲图色成人| 乱码一卡2卡4卡精品| 国产午夜精品一二区理论片| 国产精品人妻久久久久久| 久久久久久久久久成人| 亚洲最大成人中文| 黑丝袜美女国产一区| 精品熟女少妇av免费看| 亚洲av日韩在线播放| 日韩人妻高清精品专区| 欧美精品人与动牲交sv欧美| 欧美精品一区二区免费开放| 久久久久精品久久久久真实原创| 搡女人真爽免费视频火全软件| 国产一区二区在线观看日韩| 国产伦精品一区二区三区视频9| 久久人人爽人人爽人人片va| 丰满乱子伦码专区| 高清欧美精品videossex| 高清毛片免费看| 久久这里有精品视频免费| 边亲边吃奶的免费视频| 男女边吃奶边做爰视频| 一级黄片播放器| 国产精品国产av在线观看| 久久久亚洲精品成人影院| 亚洲内射少妇av| 国产精品99久久99久久久不卡 | 国产在线一区二区三区精| 成人免费观看视频高清| 亚洲欧美成人综合另类久久久| 一区二区三区四区激情视频| 在线亚洲精品国产二区图片欧美 | 99国产精品免费福利视频| av在线观看视频网站免费| 激情 狠狠 欧美| 久久人人爽人人片av| 午夜免费男女啪啪视频观看| 午夜福利在线在线| 偷拍熟女少妇极品色| 久久99热6这里只有精品| 中国美白少妇内射xxxbb| 亚洲精品久久久久久婷婷小说| 亚洲精品国产色婷婷电影| 99热这里只有是精品在线观看| 欧美老熟妇乱子伦牲交| 亚洲成人中文字幕在线播放| 啦啦啦视频在线资源免费观看| 纯流量卡能插随身wifi吗| 国产成人精品福利久久| 国产精品国产三级国产av玫瑰| 一本久久精品| 国产色爽女视频免费观看| 免费观看av网站的网址| 欧美日韩在线观看h| 精品久久久久久久久亚洲| 免费不卡的大黄色大毛片视频在线观看| 多毛熟女@视频| 最近中文字幕高清免费大全6| 亚洲aⅴ乱码一区二区在线播放| av国产久精品久网站免费入址| 下体分泌物呈黄色| 啦啦啦啦在线视频资源| 亚洲精品一区蜜桃| 边亲边吃奶的免费视频| 人体艺术视频欧美日本| 日韩一本色道免费dvd| 亚洲av成人精品一二三区| 九色成人免费人妻av| 日日摸夜夜添夜夜爱| 久久久久网色| 欧美成人午夜免费资源| 精品视频人人做人人爽| 国产成人freesex在线| 男女边摸边吃奶| 男女边吃奶边做爰视频| 蜜桃亚洲精品一区二区三区| 亚洲经典国产精华液单| 在线观看三级黄色| 国产中年淑女户外野战色| 久久亚洲国产成人精品v| 欧美日韩在线观看h| 卡戴珊不雅视频在线播放| 精品亚洲乱码少妇综合久久| 天堂中文最新版在线下载| 国产亚洲午夜精品一区二区久久| 日本黄色片子视频| 色网站视频免费| 久久久久视频综合| 国产中年淑女户外野战色| 精品少妇黑人巨大在线播放| 日本wwww免费看| 亚洲综合精品二区| 欧美变态另类bdsm刘玥| 久久久久精品久久久久真实原创| 亚洲四区av| 国产伦精品一区二区三区四那| 汤姆久久久久久久影院中文字幕| 亚洲色图综合在线观看| 久久久a久久爽久久v久久| 亚洲欧美精品自产自拍| 在线观看三级黄色| 视频中文字幕在线观看| 波野结衣二区三区在线| av网站免费在线观看视频| 成人黄色视频免费在线看| 国内揄拍国产精品人妻在线| 国产成人免费观看mmmm| 免费观看性生交大片5| av不卡在线播放| 啦啦啦啦在线视频资源| 久久精品熟女亚洲av麻豆精品| 国产精品一区二区三区四区免费观看| 免费人妻精品一区二区三区视频| 高清日韩中文字幕在线| 国产综合精华液| 亚洲国产精品成人久久小说| 丝瓜视频免费看黄片| 18+在线观看网站| 亚洲色图av天堂| 亚洲精品亚洲一区二区| 97超视频在线观看视频| 少妇猛男粗大的猛烈进出视频| 国产成人aa在线观看| av线在线观看网站| 国产又色又爽无遮挡免| 欧美区成人在线视频| 国产一区二区在线观看日韩| 中文精品一卡2卡3卡4更新| 久久97久久精品| 狂野欧美激情性bbbbbb| 乱码一卡2卡4卡精品| 久久久久国产网址| 国产爱豆传媒在线观看| 亚洲怡红院男人天堂| 嫩草影院新地址| 成年人午夜在线观看视频| 欧美 日韩 精品 国产| 91精品一卡2卡3卡4卡| 人妻夜夜爽99麻豆av| 久久久成人免费电影| 日日摸夜夜添夜夜爱| 国产乱来视频区| 在线观看免费视频网站a站| 精品国产一区二区三区久久久樱花 | 亚洲欧美一区二区三区国产| 又黄又爽又刺激的免费视频.| 黄色配什么色好看| 亚洲精品日韩在线中文字幕| 亚洲国产欧美在线一区| 久久国产亚洲av麻豆专区| 狂野欧美白嫩少妇大欣赏| 久热这里只有精品99| 永久网站在线| 九九久久精品国产亚洲av麻豆| 久久综合国产亚洲精品| av不卡在线播放| 国产一区有黄有色的免费视频| 亚洲精品视频女| 免费大片18禁| 久久久久久久亚洲中文字幕| 国产精品国产三级专区第一集| 美女内射精品一级片tv| 国产又色又爽无遮挡免| 18禁裸乳无遮挡动漫免费视频| 少妇 在线观看| 成人亚洲精品一区在线观看 | 观看免费一级毛片| 亚洲美女黄色视频免费看| 最后的刺客免费高清国语| 少妇的逼水好多| 婷婷色综合大香蕉| 亚洲成人手机| 精品久久久噜噜| 水蜜桃什么品种好| 精品午夜福利在线看| 色婷婷久久久亚洲欧美| 国产欧美日韩精品一区二区| 国产极品天堂在线| 国产黄片视频在线免费观看| 国产欧美日韩一区二区三区在线 | 在线观看免费高清a一片| 亚洲不卡免费看| 精品国产一区二区三区久久久樱花 | 欧美少妇被猛烈插入视频| 十八禁网站网址无遮挡 | 亚洲精品亚洲一区二区| 中国国产av一级| 特大巨黑吊av在线直播| 亚洲在久久综合| 边亲边吃奶的免费视频| 少妇的逼好多水| 美女cb高潮喷水在线观看| 蜜桃亚洲精品一区二区三区| 国产av一区二区精品久久 | 免费久久久久久久精品成人欧美视频 | 久久国产乱子免费精品| 在线观看美女被高潮喷水网站| a级毛色黄片| 国产av一区二区精品久久 | 久久97久久精品| 热re99久久精品国产66热6| 亚洲精华国产精华液的使用体验| 国产免费一级a男人的天堂| 成人国产av品久久久| 黑丝袜美女国产一区| 国产高清国产精品国产三级 | 欧美少妇被猛烈插入视频| 久久久久久久久久久丰满| 国产亚洲5aaaaa淫片| 午夜福利影视在线免费观看| 内射极品少妇av片p| 中文精品一卡2卡3卡4更新| 国产爽快片一区二区三区| 国产爱豆传媒在线观看| 2021少妇久久久久久久久久久| 久久鲁丝午夜福利片| 精品久久久久久久久av| 亚洲人成网站高清观看| 日本午夜av视频| 成年人午夜在线观看视频| 成人免费观看视频高清| 嫩草影院新地址| 国产av一区二区精品久久 | 亚洲精品国产av蜜桃| 国产免费一级a男人的天堂| 六月丁香七月| 国产在线免费精品| 美女xxoo啪啪120秒动态图| 国产色婷婷99| 色吧在线观看| 中文字幕免费在线视频6| 日韩人妻高清精品专区| 久久久久网色| 一级毛片黄色毛片免费观看视频| 国产av国产精品国产| 久热这里只有精品99| 丰满人妻一区二区三区视频av| 亚洲精品中文字幕在线视频 | 欧美精品一区二区免费开放| 97精品久久久久久久久久精品| 最近中文字幕2019免费版| 天堂8中文在线网| 男人添女人高潮全过程视频| 少妇人妻久久综合中文| 特大巨黑吊av在线直播| 亚洲美女黄色视频免费看| 亚洲精品日本国产第一区| 欧美日韩综合久久久久久| 观看美女的网站| 日韩一区二区视频免费看| 菩萨蛮人人尽说江南好唐韦庄| 成年免费大片在线观看| 亚洲精品亚洲一区二区| 精品人妻熟女av久视频| 亚洲av日韩在线播放| 欧美xxxx黑人xx丫x性爽| 国产在线视频一区二区| 校园人妻丝袜中文字幕| 亚洲久久久国产精品| 国产亚洲午夜精品一区二区久久| 国产免费又黄又爽又色| 18禁裸乳无遮挡动漫免费视频| 欧美高清性xxxxhd video| 深夜a级毛片| 黄色视频在线播放观看不卡| 国产精品一区二区性色av| 日韩欧美精品免费久久| 欧美日韩综合久久久久久| 亚洲欧美日韩卡通动漫| 黄色一级大片看看| 啦啦啦视频在线资源免费观看| 国产成人一区二区在线| 久久影院123| 免费人成在线观看视频色| 国产成人91sexporn| 蜜桃亚洲精品一区二区三区| 国产午夜精品久久久久久一区二区三区| 国产精品久久久久久精品电影小说 | 免费av中文字幕在线| 国产 一区 欧美 日韩| 久久精品久久久久久噜噜老黄| 国产无遮挡羞羞视频在线观看| 久久热精品热| 国产精品一区www在线观看| 日本免费在线观看一区| 国产精品久久久久成人av| 一个人看的www免费观看视频| 综合色丁香网| 亚洲最大成人中文| 亚洲aⅴ乱码一区二区在线播放| 精品少妇黑人巨大在线播放| 午夜免费鲁丝| 看免费成人av毛片| 91在线精品国自产拍蜜月| 伦精品一区二区三区| 一级毛片电影观看| 国产av一区二区精品久久 | 久久精品国产鲁丝片午夜精品| 26uuu在线亚洲综合色| 热re99久久精品国产66热6| 日产精品乱码卡一卡2卡三| a 毛片基地| 国产免费福利视频在线观看| 高清欧美精品videossex| 美女脱内裤让男人舔精品视频| 欧美成人a在线观看| 国产精品.久久久| 黄色怎么调成土黄色| 亚洲,欧美,日韩| 美女xxoo啪啪120秒动态图| 少妇精品久久久久久久| 黑人高潮一二区| 中文字幕人妻熟人妻熟丝袜美| 亚洲伊人久久精品综合| av黄色大香蕉| 我的女老师完整版在线观看| 国产日韩欧美亚洲二区| 黑人高潮一二区| 国产精品麻豆人妻色哟哟久久| www.色视频.com| 97在线人人人人妻| 一级毛片 在线播放| 丝瓜视频免费看黄片| 亚洲欧美成人综合另类久久久| 亚洲精品国产av蜜桃| 少妇丰满av| 精品99又大又爽又粗少妇毛片| 精品国产乱码久久久久久小说| 99热6这里只有精品| 人体艺术视频欧美日本| 丰满迷人的少妇在线观看| 人妻系列 视频| 亚洲精品乱码久久久v下载方式| 亚洲av欧美aⅴ国产| 亚洲欧美清纯卡通| 精品少妇黑人巨大在线播放| 国产一区二区在线观看日韩| 亚洲欧美清纯卡通| 看十八女毛片水多多多| 亚洲av电影在线观看一区二区三区| 亚洲国产毛片av蜜桃av| 哪个播放器可以免费观看大片| 国产成人aa在线观看| 日本免费在线观看一区| 美女福利国产在线 | 日韩国内少妇激情av| 久久国产乱子免费精品| 亚洲va在线va天堂va国产| 精品一区二区三卡| 欧美bdsm另类| 色婷婷久久久亚洲欧美| 久久鲁丝午夜福利片| 少妇的逼好多水| 高清毛片免费看| 91午夜精品亚洲一区二区三区| 久久韩国三级中文字幕| a级一级毛片免费在线观看| 国产精品秋霞免费鲁丝片| 18禁裸乳无遮挡免费网站照片| 国产精品熟女久久久久浪| 亚洲人成网站在线观看播放| 大码成人一级视频| 纯流量卡能插随身wifi吗| 日本午夜av视频| 成人二区视频| 亚洲精品视频女| 国产免费一区二区三区四区乱码| 日韩成人伦理影院| 国产精品三级大全| 国产色婷婷99| 亚洲av.av天堂| 免费看日本二区| 日韩一本色道免费dvd| 性色avwww在线观看| 亚洲国产精品999| 五月玫瑰六月丁香| av黄色大香蕉| 少妇人妻 视频| 午夜精品国产一区二区电影| 在线观看免费日韩欧美大片 | 日韩中字成人| 免费看日本二区| 日本与韩国留学比较| 国产人妻一区二区三区在| 国产黄片美女视频| 国产亚洲欧美精品永久| 99热这里只有是精品50| 美女国产视频在线观看| av线在线观看网站| 国产真实伦视频高清在线观看| 国产精品一区www在线观看| 欧美三级亚洲精品| 午夜福利视频精品| 亚洲国产精品999| 欧美国产精品一级二级三级 | 直男gayav资源| 国产精品福利在线免费观看| 在线 av 中文字幕| 免费大片黄手机在线观看| 男女边吃奶边做爰视频| 亚洲精品一区蜜桃| 欧美日韩精品成人综合77777| 麻豆精品久久久久久蜜桃| 国产av精品麻豆| 美女福利国产在线 | 少妇被粗大猛烈的视频| 最近的中文字幕免费完整| 大陆偷拍与自拍| 一区二区三区四区激情视频| 少妇的逼水好多| 国产精品人妻久久久影院| 亚洲精品乱码久久久v下载方式| 久久99精品国语久久久| 菩萨蛮人人尽说江南好唐韦庄| 亚洲av在线观看美女高潮| 18禁裸乳无遮挡动漫免费视频| 日韩一区二区视频免费看| 国产精品女同一区二区软件| 老司机影院毛片| 国内精品宾馆在线| 免费人妻精品一区二区三区视频| 免费黄网站久久成人精品| 日韩三级伦理在线观看| 91精品一卡2卡3卡4卡| 91久久精品国产一区二区成人| 国产av精品麻豆| a 毛片基地| 91aial.com中文字幕在线观看| 纯流量卡能插随身wifi吗| 最近最新中文字幕免费大全7| 国产伦精品一区二区三区四那| 观看美女的网站| 色网站视频免费| 乱系列少妇在线播放| 国产精品福利在线免费观看| 久久久久久久久久久丰满| 高清av免费在线| 高清视频免费观看一区二区| 美女内射精品一级片tv| 在线观看av片永久免费下载| 亚洲精品456在线播放app| 日本wwww免费看| 精品视频人人做人人爽| 免费观看性生交大片5| 在线天堂最新版资源| 免费黄色在线免费观看| 最黄视频免费看| 国产亚洲av片在线观看秒播厂| 免费久久久久久久精品成人欧美视频 | 男女边摸边吃奶| 久久久久久久大尺度免费视频| 精品久久久久久电影网| 久久久精品免费免费高清| 色视频在线一区二区三区| 精品国产露脸久久av麻豆| 国产 精品1| 五月伊人婷婷丁香| 少妇精品久久久久久久| 一级二级三级毛片免费看| av福利片在线观看| 简卡轻食公司| 少妇被粗大猛烈的视频| 大又大粗又爽又黄少妇毛片口| 欧美日韩视频高清一区二区三区二| 欧美日韩综合久久久久久| 久久久久国产精品人妻一区二区| 亚洲欧美成人综合另类久久久| 免费人成在线观看视频色| 久久99热这里只频精品6学生| 欧美三级亚洲精品| 啦啦啦啦在线视频资源| 18禁裸乳无遮挡动漫免费视频| 欧美高清成人免费视频www| 啦啦啦视频在线资源免费观看| 国产免费一级a男人的天堂| 一级av片app| 爱豆传媒免费全集在线观看| 国产伦在线观看视频一区| 一本久久精品| 亚洲欧美精品专区久久| 中文字幕人妻熟人妻熟丝袜美| 狂野欧美激情性xxxx在线观看| 黄色日韩在线| 久久人人爽人人片av| 菩萨蛮人人尽说江南好唐韦庄| 国产av一区二区精品久久 | 99re6热这里在线精品视频| 亚洲综合精品二区| 国产精品成人在线| 精品亚洲成a人片在线观看 | 亚洲色图av天堂| 久久久久精品性色| 精品久久久久久久久亚洲| 大又大粗又爽又黄少妇毛片口| 久久青草综合色| 日本-黄色视频高清免费观看| 建设人人有责人人尽责人人享有的 | 王馨瑶露胸无遮挡在线观看| 久久久午夜欧美精品| 高清在线视频一区二区三区| 久久影院123| 高清午夜精品一区二区三区| 热re99久久精品国产66热6| 18禁裸乳无遮挡动漫免费视频| 伦精品一区二区三区| 亚洲精品国产av成人精品| 国产精品成人在线| 在线观看一区二区三区激情| 免费观看性生交大片5| 九九久久精品国产亚洲av麻豆| 成人一区二区视频在线观看| 美女福利国产在线 | 日韩亚洲欧美综合| 亚洲天堂av无毛| 日韩电影二区| 亚洲成人手机| 亚洲真实伦在线观看| 精品人妻一区二区三区麻豆| 国产精品爽爽va在线观看网站| 精品国产一区二区三区久久久樱花 | 亚洲精品一二三| 一本久久精品| 哪个播放器可以免费观看大片| 青春草视频在线免费观看| 国产免费视频播放在线视频| 亚洲一区二区三区欧美精品| 国产大屁股一区二区在线视频| 国产综合精华液| 我的老师免费观看完整版| 国产爽快片一区二区三区| 精品一区在线观看国产| 免费av不卡在线播放| 亚洲精品第二区| 18禁裸乳无遮挡动漫免费视频| 伊人久久精品亚洲午夜| 精品人妻熟女av久视频| 日本vs欧美在线观看视频 | 亚洲天堂av无毛| 亚洲丝袜综合中文字幕| 国产成人a∨麻豆精品| 蜜臀久久99精品久久宅男| 国产成人a区在线观看| 久久 成人 亚洲| 这个男人来自地球电影免费观看 | 亚洲图色成人| 欧美性感艳星| 永久免费av网站大全| 亚洲人成网站在线播| 搡老乐熟女国产| 高清午夜精品一区二区三区| 如何舔出高潮| 精品一区在线观看国产| 日本免费在线观看一区| 中文字幕亚洲精品专区| 国国产精品蜜臀av免费| 亚洲成人av在线免费| 亚洲精品456在线播放app| 亚洲国产欧美人成| 国产精品久久久久久久电影| 久久久久久久大尺度免费视频| 亚洲精品国产色婷婷电影| 久久人妻熟女aⅴ| 免费黄网站久久成人精品| 18禁在线播放成人免费| 日本黄大片高清| 99久久人妻综合| 欧美性感艳星| 交换朋友夫妻互换小说| av在线app专区| av国产精品久久久久影院| 亚洲精品日韩在线中文字幕| 在线观看一区二区三区激情| 美女福利国产在线 | 久久精品夜色国产| 亚州av有码| 黄色一级大片看看| 99热这里只有是精品在线观看|