• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Lyapunov characterization of robust policy optimization

    2023-11-16 10:12:32LeileiCuiZhongPingJiang
    Control Theory and Technology 2023年3期

    Leilei Cui·Zhong-Ping Jiang

    Abstract In this paper,we study the robustness property of policy optimization(particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems’fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.

    Keywords Policy optimization·Policy iteration(PI)·Input-to-state stability(ISS)·Lyapunov’s direct method

    1 Introduction

    Through reinforcement learning(RL)techniques,agents can iteratively minimize the specific cost function by interacting continuously with unknown environment.Policy optimization is fundamental for the development of RL algorithms as introduced in[1].Policy optimization first parameterizes the control policy,and then,the performance of the control policy is iteratively improved by updating the parameters along the gradient descent direction of the given cost function.Since the linear quadratic regulator (LQR)problem is tractable and widely applied in many engineering fields, it provides an ideal benchmark example for the theoretical analysis of policy optimization.For the LQR problem, the control policy is parameterized by a control gain matrix,and the gradient of the quadratic cost with respect to the control gain is associated with a Lyapunov matrix equation.Based on these results, various policy optimization algorithms, including vanilla gradient descent, natural gradient descent and Gauss–Newton methods,are developed in[2–5].Compared with other policy optimization algorithms with a linear convergence rate,the control policies generated by the Gauss–Newton method converge quadratically to the optimal solution.

    It is noticed that the Gauss–Newton method with the step size of 1/2 coincides with the policy iteration(PI)algorithm[6, 7], which is an important iterative algorithm in RL and adaptive/approximate dynamic programming (ADP) [1, 8,9].From the perspective of the PI, the Lyapunov matrix equation for computing the gradient can be considered as policy evaluation.The update of the policy along the gradient direction can be interpreted as policy improvement.The steps of policy evaluation and policy improvement are iterated in turn to find the optimal solution of LQR.Various PI algorithms have been proposed for important classes of linear/nonlinear/time-delay/time-varying systems for optimal stabilization and output tracking [10–14].In addition,PI has been successfully applied to sensory motor control[15],and autonomous driving[16,17].

    The convergence of the PI algorithm is ensured under the assumption that the accurate knowledge of system model is accessible.However, in reality, the system model obtained by system identification [18] is used for the PI algorithm or the PI algorithm is directly implemented through a data-driven approach using input-state data [10, 19–22].Consequently,the PI algorithm is hardly implemented accurately due to modeling errors, inaccurate state estimation,measurement noise,and unknown system disturbances.The robustness of the PI algorithm to unavoidable noise is an important property to be investigated, which lays a foundation for better understanding RL algorithms.There are several challenges for studying the robustness of the PI algorithm.Firstly, the nonlinearity of the PI algorithm makes it hard to analyze the convergence property.Secondly, it is difficult to quantify the influence of noise,since noise may destroy the monotonic property of the PI algorithm or even result in a destabilizing controller.

    In this paper,we study the robustness of the PI algorithm in the presence of noise.The contributions are summarized as follows.Firstly,by viewing the PI algorithm as a nonlinear system and invoking the concept of input-to-state stability(ISS)[23],particularlythesmall-disturbanceISS[24,25],we investigate the robustness of the PI algorithm under the influence of noise.It is demonstrated that when subject to noise,thecontrolpoliciesgeneratedbythePIalgorithmwilleventually converge to a small neighborhood of the optimal solution of LQR as long as noise is sufficiently small.Different from[24,25],where the analysis is trajectory-based,we directly utilize Lypuanov’s direct method to analyze the convergence of the PI algorithm under disturbances.As a result,an explicit expression of the upperbound on the noise is provided.The size of the neighborhood in which the control policies will ultimately stay is demonstrated as a quadratic function of the noise.Secondly,by utilizing Willems’fundamental lemma,a learning-based PI algorithm is proposed.Compared with the conventional learning-based control approach where the exploratory control input is hard to design such that the persistent excitation condition is satisfied[24],the persistently exciting exploratory signal of the proposed method can be easily designed by checking the rank condition of a Hankel matrix related to the exploration signal.Finally, based on the small-disturbance ISS property of the PI algorithm, we demonstrated that the proposed learning-based PI algorithm is robust to the state measurement noise and unknown system disturbances.

    The remaining contents of the paper are organized as follows.Section 2 reviews the LQR problem and the celebrated PI algorithm.In Sect.3,the small-disturbance ISS property of the PI algorithm is studied.Section 4 proposes a learningbased PI algorithm and the robustness of the algorithm is analyzed.Several numerical examples are given in Sect.5,followed by some concluding remarks in Sect.6.

    2 Preliminaries and problem formulation

    2.1 Policy iteration for discrete-time LQR

    The discrete-time linear time-invariant(LTI)system is represented as

    wherexk∈Rnanduk∈Rmare the state and control input,respectively;AandBare system matrices with compatible dimensions.

    Assumption 1 The pair(A,B)is controllable.

    Under Assumption 1,the discrete-time LQR is to minimize the following accumulative quadratic cost

    whereQ=QT?0 andR=RT?0.The optimal controller of the discrete-time LQR is

    whereV?=(V?)T?0 is the unique solution to the following discrete-time algebratic Riccati equation(ARE)

    For a stabilizing control gainL∈ Rm×n, the corresponding cost in (2) isJd(x0,-Lx) =, whereVL=(VL)T?0 is the unique solution of the following Lyapunov equation

    and the functionG(·):Sn→Sn+mis defined as

    The discrete-time PI algorithm was developed by [7] to iteratively solve the discrete-time LQR problem.Given an initial stabilizing control gainL0,the discrete-time PI algorithm is represented as:

    Procedure 1 (Exact PI for discrete-time LQR)

    1.Policy evaluation:get G(Vi)by solving

    2.Policy improvement:get the improved policy by

    The monotonic convergence property of the discrete-time PI is shown in the following lemma.

    2.2 Policy iteration for continuous-time LQR

    Consider the continuous-time LTI system

    wherex(t)∈Rnis the state;u(t)∈Rmis the control input;x0is the initial state;AandBare constant matrices with compatible dimensions.The cost of system(9)is

    Under Assumption 1, the classical continuous-time LQR aims at computing the optimal control policy as a function of the current state such thatJc(x0,u) is minimized.The optimal control policy is

    whereP?=(P?)T? 0 is the unique solution of the continuous-time ARE[26]:

    For a stabilizing control gainK∈ Rm×n, the corresponding cost in (10) is, wherePK=(PK)T?0 is the unique solution of the following Lyapunov equation

    and the functionM(·):Sn→Sn+mis defined as

    Given an initial stabilizing control gainK0, the celebrated continuous-time PI developed in[6]iteratively solves the continuous-time LQR problem.The continuous-time PI algorithm is represented as:

    Procedure 2 (Exact PI for continuous-time LQR)

    1.Policy evaluation:get M(Pi)by solving

    2.Policy improvement:get the improved policy by

    Given an initial stabilizing control gainK0,by iteratively solving(15)and(16),Pimonotonically converges toP?and(A-BKi) is Hurwitz, which is formally presented in the following lemma.

    2.3 Problem formulation

    For the discrete-time and continuous-time PI algorithms,the accurate model knowledge(A,B) is required for the algorithm implementation.The convergence of the PI algorithms in Lemmas 1 and 2 are based on the assumption that the accurate system model is attainable.However,in reality,system uncertainties are unavoidable,and the PI algorithms cannot be implemented exactly.Therefore,in this paper,we investigate the following problem.

    Problem 1When the policy evaluation and improvement steps of the PI algorithms are subject to noise,will the convergence properties in Lemmas1and2still hold?

    3 Robustness analysis of policy iteration

    In this section, we will formally introduce the inexact PI algorithms for the discrete-time and continuous-time LQR in the presence of noise.By invoking the concept of input-tostate stability[23],it is rigorously shown that the optimized control policies converge to a neighborhood of the optimal control policy,and the size of the neighborhood depends on the magnitude of the noise.

    3.1 Robustness analysis of discrete-time policy iteration

    According to the exact discrete-time PI algorithm in(7)and(8), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI algorithm is as follows.

    Procedure 3 (Inexact PI for discrete-time LQR)

    1.Inexact policy evaluation:get?Gi∈Sm+n as an approximation of G(?Vi),where?Vi is the solution of

    2.Inexact policy improvement:get the improved policy by

    Remark 1The noiseΔGican be caused by various factors.For example, in data-driven control [24], the matrixG(?Vi)is identified by the collected input-state data.Since noise possibly pollutes the collected data, ?Gi, instead ofG(?Vi),is obtained.Other factors that may causeΔGiinclude the inaccurate system identification,the residual error of numerically solving the Lyapunov equation, and the approximate values ofQandRin inverse optimal control in the absence of the exact knowledge of the cost function.

    Next,by considering the inexact PI as a nonlinear dynamical system with the state ?Vi, we analyze its robustness to noiseΔGiby Lyapunov’s direct method and in the sense of small-disturbance ISS.For any stabilizing control gainL,define the candidate Lyapunov function as

    whereVL=?0 is the solution of(5).SinceVL≥V?(obtained by Lemma 1),we have

    Remark 2SinceJd(x0,-Lx) =xT0VLx0, whenx0~N(0,In),Ex0Jd(x0,-Lx)=Tr(VL).Hence,the candidate Lyapunov function in (20) can be considered as the difference between the value function of the controlleru(x(t))=-Lx(t)and the optimal value function.

    For anyh> 0, define a sublevel setLh= {L∈Rm×n|(A-BL)is Schur,Vd(VL) ≤h}.SinceVLis continuous with respect to the stabilizing control gainL, it readily follows thatLhis compact.Before the main theorem about the robustness of Procedure 3, we introduce the following instrumental lemma,which provides an upperbound onVd(VL).

    Lemma 3For any stabilizing control gain L,let L′=(R+BTVL B)-1BTVL A and EL=(L′-L)T(R+BTVL B)(L′-L).Then,

    where

    ProofWe can rewrite(4)as

    It was the day before Thanksgiving -- the first one my three children and I would be spending without their father, who had left several months before. Now the two older children were very sick with the flu, and the eldest1 had just been prescribed bed rest for a week.

    In addition,it follows from(5)that

    Subtracting(24)from(25)yields

    Since(A-BL?) is Schur, it follows from [27, Theorem 5.D6]that

    Taking the trace of(27)and using the main result of[28],we have

    Hence,the proof is completed.■

    Lemma 4For any L∈Lh,

    ProofSince(A-BL)is Schur,it follows from(5)and[27,Theorem 5.D6]that

    Hence,(29)readily follows from(31).■

    The following lemma shows that the sublevel setLhis invariant as long as the noiseΔLiis sufficiently small.

    ProofInduction is applied to prove the statement.Wheni=0, ?L0∈Lh.Suppose ?Li∈Lh,then,by[27,Theorem 5.D6],we have ?Vi?0.We can rewrite(17)as

    Considering(19),we have

    In addition,it follows from(19)that

    Plugging(33)and(34)into(32),and completing the squares,we have

    where

    If

    it is guaranteed that

    Writingdown(17)atthe(i+1)thiteration,andsubtracting it from(35),we have

    Following[27,Theorem 5.D6],we have

    Taking the trace of(40),and using Lemma 3 and the result in[28]yield

    It follows from(41),Lemma 4 and[28]that

    whereb2is defined as

    Hence,if

    it is guaranteed that

    Now,by Lypaunov’s direct method and by viewing Procedure 3 as a discrete-time nonlinear system with the state ?Vi, it is shown that ?Viconverges to a small neighbourhood of the optimal solution as long as noise is sufficiently small.

    Lemma 6For any h> 0and?L0∈Lh,if

    there exist a K-function ρ(·)and a KL-function κ(·,·),such that

    ProofRepeating(42)fori,i-1,...,0,we have

    Considering(21),it follows that

    The proof is thus completed.■

    The small-disturbance ISS property of the Procedure 3 is shown in the following theorem.

    Consequently,

    3.2 Robustness analysis of continuous-time policy iteration

    According to the exact PI for continuous-time LQR in(15) and (16), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI is as follows.

    Procedure 4 (Inexact PI for continuous-time LQR)

    1.Inexact policy evaluation:get?Mi∈Sm+n as an approximation of M( ?Pi),where?Pi is the solution of

    2.Inexact policy improvement:get the updated control gain by

    For any stabilizing control gainK, define the candidate Lyapunov function as

    wherePK=?0 is the solution of(13),i.e.

    SincePK≥P?,we have

    Foranyh>0,definethesublevelsetKh={K∈Rm×n|(ABK)is Hurwitz,Vc(PK)≤h}.SincePKis continuous with respect to the stabilizing control gainK,the sublevel setKhis compact.

    The following lemmas are instrumental for the proof of the main theorem.

    ProofThe Taylor expansion ofeDtis

    Hence,

    Pick av∈Rnwhich satisfies.Then,

    Hence,the lemma follows readily from(63).■

    The following lemma presents an upperbound of the Lyapunov functionVc(PK).

    Lemma 8For any stabilizing control gain K,let K′=R-1BTPK,where PK=PTK?0is the solution of(58),and EK=(K′-K)TR(K′-K).Then,

    ProofRewrite ARE(12)as

    Furthermore,(58)is rewritten as

    Subtracting(65)from(66)yields

    ConsideringK′=R-1BTPKand completing the squares in(67),we have

    Since(A-BK?) is Hurwitz, by (68) and [27, equation(5.18)],we have

    Taking the trace of (69), considering the cyclic property of trace and[28],we have(64).■

    Lemma 9For any K∈Kh,

    ProofSinceA-BKis Hurwitz,it follows from(58)that

    Taking the trace of(71),and considering[28],we have

    The proof is hence completed.■

    Writing down(54)for the(i+1)th iteration,and subtracting it from(75),we have

    Since(A-B?Ki+1) is Hurwitz (e(h) ≤e1(h)), it follows from[27,equation(5.18)]and(78)that

    Taking the trace of(79),and considering[28]and Lemma 9,we have

    It follows from(80)and Lemmas 7 and 8 that

    Taking the expression of ?Ki+1into consideration,we have

    Plugging(82)into(81),it follows that if

    we have

    ProofIt follows from Lemma 10,(81)and(82)that for anyi∈Z+,

    Repeating(86)fori,i-1,...,1,0,we have

    By(59),we have

    Hence,(85)follows readily.■

    With the aforementioned lemmas, we are ready to propose the main result on the robustness of the inexact PI for continuous-time LQR.

    ProofSuppose ?Ki∈Lh.If

    (ΔMuu,i+Muu( ?Pi))is invertible.It follows from(56)and

    Remark 3Compared with[24,25],in Theorems 1 and 2,the explicit expressions of the upperbounds on the small disturbance are given, such that at each iteration, the generated control gain is stabilizing and contained in the sublevel setsLhandKh.In addition, it is observed from (48) and (88)that the control gains generated by the inexact PI algorithms ultimately converge to a neighborhood of the optimal solution,and the size of the neighborhood is proportional to the quadratic form of the noise.

    4 Learning-based policy iteration

    In this section,based on the robustness property of the inexact PI in Procedure 3,we will develop a learning-based PI algorithm.Only the input-state trajectory data measured from the system is required for the algorithm.

    4.1 Algorithm development

    For a signalu[0,N-1]= [u0,u1,...,uN-1], its Hankel matrix of depthlis represented as

    Definition 1 An input signalu[0,N-1]is persistent exciting(PE)of orderlif the Hankel matrixHl(u[0,N-1])is full row rank.

    Lemma 12 [31]Let an input signal u[0,N-1]be PE of order l+n.Then,the state trajectory x[0,N-1]sampled from system(1)driven by the input u[0,N-1]satisfies

    Given the input-state datau[0,N-1]andx[0,N]sampled from (1), we will design a learning-based PI algorithm such that the accurate knowledge of system matrices is not required.For any time indices 0 ≤k1,k2≤N- 1 andV∈Sn,along the state trajectory of(1),we have

    It follows from(96)that

    Assumption 2 The exploration signalu[0,N-1]is PE of ordern+1.

    Under Assumption 2 and according to Lemma 12,z[0,N-1]is full row rank.As a result,for any fixedV∈Sn,(98)admits a unique solution

    whereΛis a data-dependent matrix defined as

    Therefore, given anyV∈Sn,Θ(V) can be directly computed from(99)without knowing the system matricesAandB.

    By(97),we can rewrite(7)as

    Plugging(99)into(101)yields(102).The learning-based PI is represented in the following procedure.

    Procedure 5 (Learning-based PI for discrete-time LQR)

    1.Learning-based policy evaluation

    2.Learning-based policy improvement

    It should be noticed that due to(99),Procedure 5 is equivalent to Procedure 1.

    4.2 Robustness analysis

    In the previous subsection,we assume that the accurate data from system can be obtained.In reality,measurement noise and unknown system disturbance are inevitable.Therefore,the input-state data is sampled from the following linear system with unknown system disturbance and measurement noise

    wherewk~N(0,Σw)andvk~N(0,Σv)are independent and identically distributed random noises.Let ˇzk=[yTk,uTk]and suppose there are in totalStrajectories of system(104)which start from the same initial state and are driven by the sameexplorationinputu[0,N-1].Averagingthecollecteddata overStrajectories,we have

    Then,the data-dependent matrix is constructed as

    By the strong law of large numbers,the following limitations hold almost surely

    Recall thatz[0,N-1],x[1,N-1], andΛare the data collected from system (1) with the same initial state and exploration input as (104).SinceSis finite, the difference betweenΛand ˇΛSis unavoidable,and hence,

    Procedure 6 (Learning-based PI using noisy data)

    1.Learning-based policy evaluation using noisy data

    2.Learning-based policy improvement using noisy data

    In Procedure 6,the symbol“check”is used to denote the variables for the learning-based PI using noisy data.In addition,let ?Videnote the result of the accurate evaluation of ˇLi,i.e.?Viis the solution of(109)with ˇΘ(ˇVi)replaced byΘ(?Vi).ˇVi= ?Vi+ΔViis the solution of(109)andΔViis the policy evaluation error induced by the noiseΔΛ.In the following contents,the superscriptSis omitted to simplify the notation.Based on the robustness analysis in the previous section,we will analyze the robustness of the learning-based PI to the noiseΔΛ.

    For any stabilizing control gainL,let ˇVL=VL+ΔVbe the solution of the learning-based policy evaluation with the noisy data-dependent matrix ˇΛ,i.e.

    The following lemma guarantees that(111)has a unique solution(VL+ΔV)=(VL+ΔV)T?0.

    Lemma 13If

    then(Λ+ΔΛ)TIn-LTTis a Schur matrix.

    ProofRecall thatVL=VTL?0 is the solution of(5)associated with the stabilizing control gainL.By (99), (5) is equivalent to the following equation

    SinceQ?0,by[30,Lemma 3.9],ΛTIn-LTTis Schur.

    WhenΛis disturbed byΔΛ,we can rewrite(113)as

    When(112)holds,we haveBy [30, Lemma 3.9], (114) and (115),(Λ+ΔΛ)T×In-LTTis Schur.■

    The following lemma implies that the policy evaluation errorΔVis small as long asΔΛis sufficiently small.

    Lemma 14For any h>0,L∈Lh,and ΔΛ satisfying(112),we have

    ProofAccording to[32,Theorems 2.6 and 4.1],we have

    Since Tr(VL) ≤h+Tr(V?),it follows from(102)and[27,Theorem 5.D6]that

    Taking the trace of both sides of(118)and utilizing[28],we have

    Plugging(119)into(117)yields(116).■

    The following lemma tells us thatΔΘis small ifΔVandΔΛare small enough.

    Lemma 15LetˇΘ(ˇVL)= ˇΛˇVLˇΛTand ΔΘ(VL)= ˇΘ(ˇVL)-Θ(VL),then,

    ProofBy the expressions of ˇΘ(ˇVL)andΘ(VL),we have

    Hence,(120)is obtained by(121)and the triangle inequality.

    By the follow lemma,it is ensured thatΔLconverges to zero asΔΘtends to zero.

    Lemma 16Let ΔL=(R+Θˇuu(VˇL))-1Θˇux(VˇL)-(R+Θuu(VL))-1Θux(VL).Then,

    ProofFrom the expression ofΔL,we have

    Therefore,(122)readily follows from(123).■

    Given the aforementioned lemmas,we are ready to show the main result on the robustness of the learning-based PI algorithm in Procedure 6.

    ProofAt each iteration of Procedure 6,ifΛis not disturbed by noise,i.e.ΔΛ=0,the policy improvement is(103).Due to the influence ofΔΛ,the control gain is updated by(110),which can be rewritten as

    whereΔLi+1is

    5 Numerical simulation

    In this section,we illustrate the proposed theoretical results by a benchmark example known as cart-pole system [33].The parameters of the cart-pole system are:mc=1kg(mass of the cart),m p= 0.1kg (mass of the pendulum),lp=0.5m (distance from the center of mass of the pendulum to the pivot),gc= 9.8 m/s2(gravitational acceleration).By linearizing the system around the equilibrium,the system is

    By discretizing it through Euler method with a step size of 0.01sec,we have

    The weighting matrices of the cost (1) areQ= 10I4andR=1.The initial stabilizing gain to start the policy iteration algorithm is

    5.1 Robustness test of the inexact policy iteration

    We test the robustness of the inexact PI for discrete-time systems in Procedure 3.At each iteration, each element ofΔGiis sampled from a standard Gaussian distribution,and then its spectral norm is scaled to 0.2.During the iteration,the relative errors of the control gain ?Liand cost matrix ?Viare shown in Fig.1.The control gain and cost matrix are close to the optimal solution at the 5th iteration.It is observed that even under the influence of disturbances at each iteration,the inexact PI in Procedure 3 can still approach the optimal solution.ThisisconsistentwiththeISSpropertyofProcedure 3 in Theorem 1.

    In addition, the robustness of Procedure 4 is tested.At each iteration,ΔMiis randomly sampled with the norm of 0.2.Under the influence ofΔMi,the evolution of the control gain ?Kiand cost matrix ?Piis shown in Fig.2.Under the noiseΔMi,the algorithm cannot converge exactly to the optimal solution.However,with the small-disturbance ISS property,the inexact PI can still converge to a neighborhood of the optimal solution,which is consistent with Theorem 2.

    5.2 Robustness test of the learning-based policy iteration

    The robustness of the learning-based PI in Procedure 6 is tested for system (104) with both system disturbance and measurement noise.The variances of the system disturbance and measurement noise areΣw=0.01InandΣv=0.01In.One trajectory is sampled from the solution of(104)and the length of the sampled trajectory isN= 100, i.e.100 data collected from(104)is used to construct the data-dependent matrix ?ΛS.Compared with the matrixΛin(100b)where the data is collected from the system without unknown system disturbance and measurement noise, ?ΛSis directly constructed by the noisy data.Therefore, at each iteration of the learning-based PI,ΔΛSintroduces the disturbances.The evolution of the control gain and cost matrix is in Fig.3.It is observed that with the noisy data,the control gain and the cost matrix obtained by Procedure 6 converge to an approximation of the optimal solution.This coincides with the result in Theorem 3.

    Fig.1 Robustness test of Procedure 3 whenΔG∞=0.2

    Fig.2 Robustness test of Procedure 4 whenΔM∞=0.2

    Fig.3 Robustness test of Procedure 6 when the noisy data is applied for the construction of ?Λ

    6 Conclusion

    In this paper, we have studied the robustness property of policy optimization in the presence of disturbances at each iteration.Using ISS Lyapunov techniques,it is demonstrated that the PI ultimately converges to a small neighborhood of the optimal solution as long as the disturbance is sufficiently small.In this paper,we also provided a quantifiable bound.Based on the ISS property and Willems’fundamental lemma,a learning-based PI algorithm is proposed and the persist excitation of the exploratory signal can be easily guaranteed.A numerical simulation example is provided to illustrate the theoretical results.

    Data availability The data that support the findings of this study are available from the corresponding author, L.Cui, upon reasonable request.

    国产在线精品亚洲第一网站| 国产单亲对白刺激| 国产欧美日韩精品一区二区| 色综合色国产| 3wmmmm亚洲av在线观看| av在线天堂中文字幕| 国产精华一区二区三区| 日韩欧美国产在线观看| 九九在线视频观看精品| 久久精品国产亚洲av香蕉五月| 国产一区二区亚洲精品在线观看| 国产成人aa在线观看| 亚洲在线自拍视频| 成人亚洲欧美一区二区av| 亚洲欧美日韩无卡精品| 日韩成人av中文字幕在线观看| 非洲黑人性xxxx精品又粗又长| 乱码一卡2卡4卡精品| 国产毛片a区久久久久| 日韩欧美 国产精品| 亚洲av男天堂| 久久国内精品自在自线图片| 99热这里只有是精品50| 国产免费一级a男人的天堂| 美女国产视频在线观看| 性插视频无遮挡在线免费观看| 国产精品伦人一区二区| 久久久久国产网址| 国内揄拍国产精品人妻在线| 天天躁日日操中文字幕| 免费黄网站久久成人精品| 免费看光身美女| 村上凉子中文字幕在线| 亚洲av一区综合| 日韩一区二区三区影片| kizo精华| 少妇人妻一区二区三区视频| av视频在线观看入口| 欧美3d第一页| 国产午夜精品论理片| 2022亚洲国产成人精品| 色综合站精品国产| 国产三级在线视频| 看非洲黑人一级黄片| 国产精品,欧美在线| av在线天堂中文字幕| 久久久久九九精品影院| 观看美女的网站| 日本爱情动作片www.在线观看| 日本免费a在线| 国产成人影院久久av| 少妇人妻精品综合一区二区 | 69人妻影院| 日本欧美国产在线视频| 久久久久网色| 成人亚洲精品av一区二区| 十八禁国产超污无遮挡网站| 小蜜桃在线观看免费完整版高清| 国产白丝娇喘喷水9色精品| 一区二区三区高清视频在线| 村上凉子中文字幕在线| 亚洲18禁久久av| 偷拍熟女少妇极品色| 国产亚洲精品久久久com| 免费看av在线观看网站| 午夜精品一区二区三区免费看| 不卡视频在线观看欧美| 白带黄色成豆腐渣| 91在线精品国自产拍蜜月| 九九在线视频观看精品| 悠悠久久av| 在线观看av片永久免费下载| 少妇人妻一区二区三区视频| 黑人高潮一二区| 成人午夜高清在线视频| 麻豆成人午夜福利视频| 干丝袜人妻中文字幕| 成熟少妇高潮喷水视频| 国产精品永久免费网站| 看黄色毛片网站| 三级经典国产精品| 精品人妻偷拍中文字幕| 婷婷六月久久综合丁香| 有码 亚洲区| 好男人视频免费观看在线| 中文字幕精品亚洲无线码一区| 国产乱人偷精品视频| 亚洲无线观看免费| 成年女人看的毛片在线观看| 国产 一区精品| 国产高清视频在线观看网站| 欧美zozozo另类| 三级毛片av免费| 国产又黄又爽又无遮挡在线| 看免费成人av毛片| 欧美精品一区二区大全| 国产女主播在线喷水免费视频网站 | 久久99热这里只有精品18| 日韩欧美三级三区| 久久久精品欧美日韩精品| 日日摸夜夜添夜夜爱| 青青草视频在线视频观看| 美女内射精品一级片tv| 插阴视频在线观看视频| 中文字幕精品亚洲无线码一区| 一区二区三区免费毛片| 丰满人妻一区二区三区视频av| 少妇的逼好多水| 一级毛片电影观看 | 国产av不卡久久| 色哟哟·www| 日本免费a在线| 精品久久久久久久末码| 少妇猛男粗大的猛烈进出视频 | 91午夜精品亚洲一区二区三区| 亚洲成人av在线免费| 网址你懂的国产日韩在线| 麻豆av噜噜一区二区三区| 国产又黄又爽又无遮挡在线| 日韩成人伦理影院| 亚洲第一区二区三区不卡| 人体艺术视频欧美日本| 女的被弄到高潮叫床怎么办| 91精品一卡2卡3卡4卡| 久久精品91蜜桃| 国产精品国产三级国产av玫瑰| 久久这里有精品视频免费| 国产老妇女一区| 波多野结衣高清无吗| 亚州av有码| а√天堂www在线а√下载| 亚洲欧美中文字幕日韩二区| 中文字幕人妻熟人妻熟丝袜美| 伦精品一区二区三区| 日韩欧美 国产精品| av免费在线看不卡| 色播亚洲综合网| 热99re8久久精品国产| 国产成人a区在线观看| h日本视频在线播放| 日韩欧美精品v在线| a级毛片免费高清观看在线播放| 老司机影院成人| 亚洲不卡免费看| 伊人久久精品亚洲午夜| 又粗又爽又猛毛片免费看| 亚洲18禁久久av| 午夜老司机福利剧场| 亚洲人成网站在线播| 麻豆成人av视频| 日韩人妻高清精品专区| 亚州av有码| 国模一区二区三区四区视频| 人妻久久中文字幕网| 中文欧美无线码| 天堂√8在线中文| 男人的好看免费观看在线视频| 成年免费大片在线观看| 天堂av国产一区二区熟女人妻| 日本-黄色视频高清免费观看| 男女做爰动态图高潮gif福利片| 91久久精品国产一区二区三区| 麻豆国产av国片精品| 老师上课跳d突然被开到最大视频| 久久鲁丝午夜福利片| 国产午夜精品久久久久久一区二区三区| 波多野结衣高清作品| 最后的刺客免费高清国语| 精品99又大又爽又粗少妇毛片| 欧美日韩国产亚洲二区| 久久欧美精品欧美久久欧美| 国产视频首页在线观看| 国产美女午夜福利| 久久这里有精品视频免费| 午夜福利高清视频| 国产一区亚洲一区在线观看| АⅤ资源中文在线天堂| 亚洲性久久影院| 伦理电影大哥的女人| 国产精品综合久久久久久久免费| 欧美变态另类bdsm刘玥| 精品久久久久久久人妻蜜臀av| 亚洲一区二区三区色噜噜| 精品午夜福利在线看| 99国产精品一区二区蜜桃av| 美女内射精品一级片tv| 国产淫片久久久久久久久| 国产又黄又爽又无遮挡在线| 欧美一级a爱片免费观看看| 一边摸一边抽搐一进一小说| 一边摸一边抽搐一进一小说| 久久精品国产亚洲网站| 成人综合一区亚洲| 色5月婷婷丁香| 国产精品一区二区在线观看99 | 亚洲人成网站高清观看| 99热6这里只有精品| 欧美成人精品欧美一级黄| 久久久久免费精品人妻一区二区| 国产女主播在线喷水免费视频网站 | 国产精品久久久久久精品电影| 国产精品日韩av在线免费观看| 成人av在线播放网站| 欧美丝袜亚洲另类| 青春草亚洲视频在线观看| 秋霞在线观看毛片| 国产精品电影一区二区三区| 一级黄色大片毛片| 国产成人91sexporn| 午夜精品在线福利| 亚洲丝袜综合中文字幕| 亚洲欧美精品自产自拍| 国产精品一区二区在线观看99 | 久久久午夜欧美精品| 噜噜噜噜噜久久久久久91| 国产老妇伦熟女老妇高清| 欧美潮喷喷水| 国产午夜福利久久久久久| 国产精品av视频在线免费观看| 伦精品一区二区三区| 国产成人a区在线观看| 熟女电影av网| 久久精品国产亚洲av香蕉五月| 国模一区二区三区四区视频| 精品日产1卡2卡| 最近最新中文字幕大全电影3| 亚洲,欧美,日韩| 神马国产精品三级电影在线观看| 婷婷六月久久综合丁香| 中文字幕av成人在线电影| 国产精品国产三级国产av玫瑰| 美女 人体艺术 gogo| 伦精品一区二区三区| av福利片在线观看| 婷婷色综合大香蕉| 99国产精品一区二区蜜桃av| 国产久久久一区二区三区| 99久久九九国产精品国产免费| 欧美日韩一区二区视频在线观看视频在线 | 亚洲欧洲日产国产| 午夜爱爱视频在线播放| 亚洲丝袜综合中文字幕| 亚洲av不卡在线观看| 精品日产1卡2卡| 日韩欧美在线乱码| ponron亚洲| 搡女人真爽免费视频火全软件| 大香蕉久久网| 村上凉子中文字幕在线| 深爱激情五月婷婷| 免费看日本二区| 青春草国产在线视频 | 人人妻人人看人人澡| 岛国毛片在线播放| 夫妻性生交免费视频一级片| 深夜精品福利| 人妻系列 视频| 一个人看视频在线观看www免费| 欧美又色又爽又黄视频| 最近中文字幕高清免费大全6| 午夜精品国产一区二区电影 | 成人二区视频| 欧美+日韩+精品| 男人舔女人下体高潮全视频| 性插视频无遮挡在线免费观看| 国产亚洲91精品色在线| 精品一区二区三区视频在线| 国产在线精品亚洲第一网站| ponron亚洲| 国产成人福利小说| 日本撒尿小便嘘嘘汇集6| 亚洲国产精品合色在线| 久久久久久伊人网av| 国产精品一区www在线观看| 免费av观看视频| 九九久久精品国产亚洲av麻豆| 亚洲国产精品成人久久小说 | av免费在线看不卡| 简卡轻食公司| 亚洲成av人片在线播放无| 村上凉子中文字幕在线| 国产视频首页在线观看| 秋霞在线观看毛片| 国产探花极品一区二区| 91精品一卡2卡3卡4卡| 别揉我奶头 嗯啊视频| 91狼人影院| 国产高潮美女av| 久久这里只有精品中国| 一边摸一边抽搐一进一小说| 国产精品爽爽va在线观看网站| 能在线免费看毛片的网站| 老司机福利观看| 欧美激情国产日韩精品一区| 中文字幕av成人在线电影| 色哟哟·www| 秋霞在线观看毛片| 日韩,欧美,国产一区二区三区 | 日本免费一区二区三区高清不卡| 97在线视频观看| 亚洲一区高清亚洲精品| 欧美最新免费一区二区三区| 亚洲一级一片aⅴ在线观看| 成人无遮挡网站| 综合色av麻豆| 久久久色成人| 成人欧美大片| 男人舔奶头视频| 亚洲av一区综合| 中出人妻视频一区二区| 国产探花在线观看一区二区| 2022亚洲国产成人精品| 国产精品精品国产色婷婷| 成人亚洲精品av一区二区| 九九在线视频观看精品| 亚洲精品国产成人久久av| 如何舔出高潮| 人妻夜夜爽99麻豆av| 久久精品综合一区二区三区| 精品久久久噜噜| 亚洲av男天堂| 亚洲天堂国产精品一区在线| 最新中文字幕久久久久| 麻豆成人午夜福利视频| 欧美日韩国产亚洲二区| 搡女人真爽免费视频火全软件| 少妇猛男粗大的猛烈进出视频 | 亚洲人成网站高清观看| 天天躁夜夜躁狠狠久久av| 国产黄片视频在线免费观看| 国产又黄又爽又无遮挡在线| 国产不卡一卡二| 又爽又黄无遮挡网站| 波多野结衣高清无吗| 国产精品一区www在线观看| 国内精品美女久久久久久| 一级毛片我不卡| 一个人看的www免费观看视频| 亚洲七黄色美女视频| 国产成人一区二区在线| 青青草视频在线视频观看| 国产精品不卡视频一区二区| 成人漫画全彩无遮挡| 久久久久久久久中文| 久久精品综合一区二区三区| 亚洲人成网站在线观看播放| 欧美性猛交黑人性爽| 国产伦精品一区二区三区四那| 久久精品影院6| 国产日韩欧美在线精品| 国产亚洲91精品色在线| 国产av麻豆久久久久久久| 深夜精品福利| 国产在线精品亚洲第一网站| АⅤ资源中文在线天堂| av专区在线播放| 久久精品影院6| 黄色欧美视频在线观看| 色哟哟·www| 美女被艹到高潮喷水动态| 在线免费观看的www视频| 一本精品99久久精品77| 亚洲自偷自拍三级| 青春草视频在线免费观看| av又黄又爽大尺度在线免费看 | 欧美日本视频| 免费观看人在逋| 特级一级黄色大片| 日日摸夜夜添夜夜爱| 黄色一级大片看看| 嫩草影院入口| 在线a可以看的网站| av卡一久久| av福利片在线观看| 99久久精品国产国产毛片| 国产午夜福利久久久久久| 久久九九热精品免费| 欧美日韩一区二区视频在线观看视频在线 | 国产69精品久久久久777片| 国产三级在线视频| 女的被弄到高潮叫床怎么办| 久久精品国产亚洲av香蕉五月| 直男gayav资源| 久久久久久伊人网av| 国产又黄又爽又无遮挡在线| 天堂√8在线中文| 国产欧美日韩精品一区二区| 最近视频中文字幕2019在线8| 99精品在免费线老司机午夜| 国产伦精品一区二区三区视频9| 久久人人爽人人片av| 欧美激情国产日韩精品一区| 免费av毛片视频| 欧美色视频一区免费| 黄色一级大片看看| 国产黄a三级三级三级人| 青春草国产在线视频 | 看非洲黑人一级黄片| 国产麻豆成人av免费视频| 女同久久另类99精品国产91| 我要搜黄色片| 欧美潮喷喷水| 国产成人91sexporn| 99久国产av精品| 男的添女的下面高潮视频| 卡戴珊不雅视频在线播放| 亚洲av成人精品一区久久| 亚洲欧美日韩无卡精品| 一本一本综合久久| 老熟妇乱子伦视频在线观看| 听说在线观看完整版免费高清| 亚洲三级黄色毛片| 亚洲欧洲国产日韩| 一级黄片播放器| 麻豆久久精品国产亚洲av| 午夜老司机福利剧场| 少妇高潮的动态图| 欧美性猛交黑人性爽| 久久精品国产鲁丝片午夜精品| 久久精品人妻少妇| 久久韩国三级中文字幕| 亚洲精华国产精华液的使用体验 | 久久久久久久久久久丰满| 久久国产乱子免费精品| 国产麻豆成人av免费视频| 中文亚洲av片在线观看爽| 99久久精品国产国产毛片| 成人毛片a级毛片在线播放| 99热6这里只有精品| 国产午夜精品久久久久久一区二区三区| 桃色一区二区三区在线观看| 最新中文字幕久久久久| 国产在线精品亚洲第一网站| 久久久午夜欧美精品| 国产精品1区2区在线观看.| 爱豆传媒免费全集在线观看| 欧美xxxx性猛交bbbb| 91精品一卡2卡3卡4卡| a级毛片免费高清观看在线播放| 中文字幕久久专区| 久久中文看片网| 亚洲美女搞黄在线观看| 日本免费一区二区三区高清不卡| 亚洲国产精品成人久久小说 | 美女xxoo啪啪120秒动态图| 久久人妻av系列| 中文欧美无线码| 国产精品美女特级片免费视频播放器| 欧美日韩一区二区视频在线观看视频在线 | 男女做爰动态图高潮gif福利片| 蜜桃亚洲精品一区二区三区| 免费搜索国产男女视频| 嫩草影院入口| 人妻制服诱惑在线中文字幕| 久久久精品欧美日韩精品| 你懂的网址亚洲精品在线观看 | 日日干狠狠操夜夜爽| 国产伦一二天堂av在线观看| 噜噜噜噜噜久久久久久91| 99精品在免费线老司机午夜| 国产高清视频在线观看网站| 国产亚洲av片在线观看秒播厂 | 乱码一卡2卡4卡精品| 在线观看一区二区三区| 亚洲欧美精品综合久久99| 自拍偷自拍亚洲精品老妇| 51国产日韩欧美| 午夜福利高清视频| av女优亚洲男人天堂| 亚洲成人久久性| 日本欧美国产在线视频| 天堂中文最新版在线下载 | 久久99热6这里只有精品| 日产精品乱码卡一卡2卡三| 午夜激情欧美在线| 成年av动漫网址| 日本黄大片高清| 久久久久网色| 你懂的网址亚洲精品在线观看 | or卡值多少钱| 国产大屁股一区二区在线视频| 只有这里有精品99| 亚洲欧美日韩高清专用| 一级黄色大片毛片| 18禁裸乳无遮挡免费网站照片| av福利片在线观看| 天天躁日日操中文字幕| 成人永久免费在线观看视频| 最近视频中文字幕2019在线8| 国产精品不卡视频一区二区| 久久久久久大精品| 舔av片在线| 国产成人精品一,二区 | 蜜桃亚洲精品一区二区三区| 国产亚洲精品av在线| 亚洲成人精品中文字幕电影| 成人二区视频| 一进一出抽搐gif免费好疼| 国产麻豆成人av免费视频| 一区二区三区高清视频在线| 麻豆成人午夜福利视频| 久久精品91蜜桃| 免费观看的影片在线观看| 一级黄色大片毛片| 亚洲精品影视一区二区三区av| 亚洲av中文字字幕乱码综合| 变态另类成人亚洲欧美熟女| 变态另类丝袜制服| 亚洲人成网站在线观看播放| av在线亚洲专区| 女人十人毛片免费观看3o分钟| 亚洲经典国产精华液单| 久久韩国三级中文字幕| 欧美不卡视频在线免费观看| 国产精品久久久久久av不卡| 日本黄色视频三级网站网址| 麻豆国产97在线/欧美| 99热精品在线国产| 九色成人免费人妻av| 麻豆一二三区av精品| 亚洲18禁久久av| 成人一区二区视频在线观看| 国产成人福利小说| 赤兔流量卡办理| 中国国产av一级| 国产激情偷乱视频一区二区| 麻豆国产97在线/欧美| 午夜视频国产福利| a级毛片a级免费在线| 精品少妇黑人巨大在线播放 | 日本免费a在线| 一个人看的www免费观看视频| 免费观看的影片在线观看| 麻豆久久精品国产亚洲av| 99热6这里只有精品| 国产 一区精品| 国产精品久久电影中文字幕| 久久99热这里只有精品18| 国产乱人偷精品视频| 亚洲中文字幕一区二区三区有码在线看| 波多野结衣高清作品| 国产色婷婷99| 久久久久久国产a免费观看| 久久久欧美国产精品| or卡值多少钱| 特级一级黄色大片| 69av精品久久久久久| 一级黄色大片毛片| 欧美日韩精品成人综合77777| 韩国av在线不卡| 最好的美女福利视频网| 一级av片app| 亚洲精品日韩在线中文字幕 | 三级男女做爰猛烈吃奶摸视频| 美女cb高潮喷水在线观看| 日本免费一区二区三区高清不卡| 中文字幕精品亚洲无线码一区| 亚洲国产精品sss在线观看| 亚洲第一电影网av| 爱豆传媒免费全集在线观看| 成人一区二区视频在线观看| 国产精品爽爽va在线观看网站| 欧美+日韩+精品| 激情 狠狠 欧美| 免费看光身美女| 午夜福利在线观看吧| 成人毛片60女人毛片免费| 久久精品久久久久久久性| 丰满的人妻完整版| 亚洲欧美日韩高清在线视频| 最近视频中文字幕2019在线8| 国产精品.久久久| 精品一区二区三区视频在线| 国产精品无大码| 久久精品国产鲁丝片午夜精品| 床上黄色一级片| 亚洲,欧美,日韩| 久久精品综合一区二区三区| 少妇的逼水好多| 搡老妇女老女人老熟妇| 亚洲中文字幕日韩| 日韩一区二区三区影片| 色噜噜av男人的天堂激情| 国产在线精品亚洲第一网站| 一级毛片电影观看 | 免费观看精品视频网站| 欧美一区二区亚洲| 三级男女做爰猛烈吃奶摸视频| 亚洲国产色片| 在线观看av片永久免费下载| 韩国av在线不卡| 成人三级黄色视频| 欧美成人精品欧美一级黄| 久久精品夜夜夜夜夜久久蜜豆| 色尼玛亚洲综合影院| 亚洲一级一片aⅴ在线观看| 国产日韩欧美在线精品| 变态另类丝袜制服| 小蜜桃在线观看免费完整版高清| 最近视频中文字幕2019在线8| 色5月婷婷丁香| 女人十人毛片免费观看3o分钟| 成年版毛片免费区| 男人的好看免费观看在线视频| 人妻夜夜爽99麻豆av| 1024手机看黄色片| 蜜臀久久99精品久久宅男| 精品久久久久久成人av| 美女黄网站色视频| 99精品在免费线老司机午夜| 成年女人永久免费观看视频| 亚洲国产日韩欧美精品在线观看| 午夜亚洲福利在线播放| 伦理电影大哥的女人| videossex国产|