• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      An Iterative Method for Optimal Feedback Control and Generalized HJB Equation

      2018-09-28 10:58:56XuesongChenandXinChen
      IEEE/CAA Journal of Automatica Sinica 2018年5期

      Xuesong Chen and Xin Chen

      Abstract—In this paper,a new iterative method is proposed to solve the generalized Hamilton-Jacobi-Bellman(GHJB)equation through successively approximate it.Firstly,the GHJB equation is converted to an algebraic equation with the vector norm,which is essentially a set of simultaneous nonlinear equations in the case of dynamic systems.Then,the proposed algorithm solves GHJB equation numerically for points near the origin by considering the linearization of the non-linear equations under a good initial control guess.Finally,the procedure is proved to converge to the optimal stabilizing solution with respect to the iteration variable.In addition,it is shown that the result is a closed-loop control based on this iterative approach.Illustrative examples show that the update control laws will converge to optimal control for nonlinear systems.

      I.INTRODUCTION

      THE optimal control of nonlinear systems is one of the mostchallenging and difficultsubjects in controltheory.A large number of theoretical results about the nonlinear optimal control problems have been reported in the past few decades[1]-[9].The dynamic programming algorithm is widely regarded as the most comprehensive method in finding optimal feedback controllers for generic nonlinear systems.However,the main drawback of dynamic programming methods today is the computationalcomplexity required to describe the value function which grows exponentially with the dimension of its domain.It is well known that continuous-time nonlinear optimal control problem depends on the solution of Hamilton-Jacobi-Bellman(HJB)equation,which is a nonlinear partialdifferentialequation(PDE).Even in simple cases,the HJB equation may nothave globalanalytic solutions.Various methods have been proposed in the literature forcomputing numericalsolutions to the HJB equation,see for example Aguilar et al.[10],Cacace et al.[11],Govindarajan et al.[12],Markman et al.[13],Sakamoto et al.[8],Smears et al.[14],and the references therein.

      It is known that the generalized Hamilton-Jacobi-Bellman(GHJB)equation is linear and easier to solve than HJB equation,butno generalsolution for GHJB equation is demonstrated as to the best knowledge of authors.Beard et al.[15]presented a series of polynomialfunctions as basic functions to solve the approximate GHJB equation,however,this method requires the computation of a larger number of integrals.Galerkin’s spectralapproximation approach is proposed in[16]to find approximate but close solutions to the GHJB at each iteration step.The reader is also referred to Markman et al.[13],Smears et al.[14],Saridis et al.[17],Aguilar et al.[10],Gong et al.[18]for more details and different perspectives.Although many articles have discussed the solution to the HJB equation for the continuous-time systems,currently there is very minimal work available on iterative solution approach for GHJB equation.

      In this paper,we propose a new iterative method to find the approximate solution to GHJB equation which is associated with the optimal feedback control for nonlinear systems.The idea of this iterative algorithm for GHJB equation is based on Beard’s work[16].Our approach is designed to obtain the general computational solution for the GHJB equation.We firstly convert the GHJB equation to a simple algebraic equation with vector norm,which is essentially a set of nonlinear equations.Then we propose the procedure of this method to compute the solution to the GHJB by considering the linearization of the nonlinear equations under a good initial control guess.The stability and convergence results of the proposed scheme are proved.

      The paper is organized as follows.The problem description is presented in Section II.The main result of this paper is derived in Section III,i.e.,the iterative algorithm forthe GHJB equation and the detailed mathematical proofs and justifications of the proposed approach.The numerical example is provided in Section IV.Finally,a brief conclusion is given in Section V.

      II.PROBLEM STATEMENT

      Consider the following continuous-time affine nonlinear system:

      where x∈Ω?Rnis the system state,f(0)=0,x0∈Ω?Rnis the initial state and u∈Rmis the control input,and f:Ω → Rn,g:Ω → Rn×m,u:Ω → Rm.Assume that the system is Lipschitz continuous on a setΩthat contains the origin.

      The optimalcontrolproblem under consideration is to find a state feedback controllaw u(x)=u?(x)such thatthe system(1)is closed-loop asymptotically stable,and the following infinite horizon cost function is minimized:

      where l:Ω→ R is called the state penalty function,and it is a positive-defi nite function onΩ.Typically,l is a quadratic weighting of the state,i.e.l=xTQx where Q is a positivedefinite matrix.

      It is well known that the optimal control can be directly found to bewhere V satisfies the HJB equation

      Although the solution to the nonlinear optimal problem has been well-known since the early 1960s in[3],relatively few control designs explicitly use a feedback function of the form given in(3).The primary difficulty lies in solving the HJB equation,for which general closed-form solutions do not exist.It is noted that the HJB equation is a nonlinear PDE that is impossible to be solved analytically.To obtain its approximate solution,in Saridis[17],the HJB equation was successively approximated by a GHJB equation,written as GHJB(V,u)=0,if

      with

      The cost of u(i+1)is given by the solution of equation GHJB(Vi,u(i))=0.Since the GHJB equation is linear,it is theoretically easier to solve than the nonlinear HJB equation.

      III.THE MAIN RESULTS

      In many numerical methods for HJB equations,which are typically solved backward in time,the discretization is based on spatialcausality and the computation is explicitin time.The value of the solution function V at a grid point is computed at an earlier time using the known value of the function at neighboring grid points at a later time.This coupling usually comes from the discretization of the spatial derivatives.In order to mitigate the curse of dimensionality,a new iterative method using a simple algebraic equation with vector norm is proposed.

      A.Iterative Method for the GHJB Equation

      It is obvious that the GHJB equation(4)can be rewritten as

      with

      Let F(i)= (f+gu(i))T,define the norm of x =(x1,x2,...,xn)Tas follows

      then(7)can be rewritten as

      where

      Then,we obtain the feedback control from(8)

      The method proposed in this paper may be implemented by applying the following procedure on the system(1).

      In Algorithm 1,p is a function of the state x,that is to say,p(x)belongs to an infinite-dimensional functional space.The notation of this norm is defined as follows:

      Algorithm1: IterativeAlgorithmBasedonGHJBEquation u(i):=GHJB(f,g,l,u(1))IF(‖p(i)-p(i-1)‖p < ?)Solvenumerically:u?=u(i)ELSE Restriction:u(i+1)=-12gT(p(i))T Computation:p(i+1)=-l+‖u(i+1)‖2‖f+gu(i+1)‖2(f+gu(i+1))Recursion:i←i+1 ENDIF RETURNu(i).

      B.Initial Control Guess

      To apply the method introduced in this paper we mustchoose an initial stabilizing contro u(1)for the system(1) and an estimate of the stability region of u(1).A good estimate for u(1)can be found by considering the linearization of (11).

      If q∈Rnis an equilibrium point for system(1),and F(i)is differentiable at q, then its derivative is given byJF(i)(q).In this case,the linear map described byJF(i)(q)is the best linear approximation of F(i)near the point q, in the sense that

      for x close to q and where o is the little o-notation(forx→q)and ‖x-q‖ is the distance between x and q.

      C.Stability Analysis

      In this subsection, we conduct the stability and convergenceanalysis of the proposed scheme. First, we define the pre-Hamiltonian function for some control u(x) and an associatefunction V (x)

      Lemma1:The optimal control lawu?and the optimal value functionV?(x),if they exist,satisfy(4)and

      This lemma is proved in [17] which is a sufficient conditionfor the optimal control solution to the nonlinear systems.

      There is a bulk of literature devoted to the problem ofdesigning stable regulators for nonlinear system. The mostimportant and popular tool is Lyapunov’s method. To useLyapunov’s method, the designer first proposes a control andthen tries to find a Lyapunov function for the closed-loopsystem. A Lyapunov function is a generalized energy functionof the states, and is usually suggested by the physics of theproblem. It is often possible to find a stabilizing control for aparticular system. Now we show that value function V1is aLyapunov function for the system(f,g,u(2)).

      Theorem 1: Assume Ω is compact set,u(1)∈ Au(Ω)is anarbitrary admissible control for the system (1) onΩ,if there exists a positive definite continuously differentiable function V1(x,t)on[t0,∞)× Rnsatisfying the GHJB equation

      associated with

      then for all t∈ [t0,∞),system(1)hahas bounded trajectories over[t0,∞),and the origin is a stable equilibrium point forsystem (1).

      Proof:SinceV1(x,t)is a continuously differentiable function,we take the derivative ofV1(x,t)along the system(f,g,u(2))and the factthat

      Substituting (17) into (16), we can get

      This establishes the boundaries of the trajectories of system(1)over[t0,∞).Under the assumptions of the theorem,(x,t)<0 is a Lyapunov function, and the origin is a stableequilibrium point for system (1). ■

      D.Convergence Results

      In this subsection, the Algorithm 1 converges to the optimalcontrol will be shown when it exists. It is clear that thefollowing equation is easily obtained from (7) and (8)

      Linearizingtheequation(19)accordingto(13),weobtain

      where T is a linear operator. In its general form the classicalmethod of successive approximation applies to equations of theform=T(u).solution u to such an equation is said to be afixed point of the transformation T since T leaves u invariant.To find a fixed point by successive approximation, we begin with an initial trial vector u(1)and computeu(2)=T(u(1)).Continuing in this manner iteratively, we compute successive vectors u(i+1)=T(u(i)).Under appropriate conditions the sequence{u(i)}converges to a solution of the original equation.

      Definition 1:Let S be a subset of a normal space X and let T be a transformation mapping S into S.Then,T is said to be a contraction mapping if there isα,0≤α<1 such that‖T(u(1))-T(u(2))‖ ≤ α‖u(1)-u(2)‖.

      Note for example that a transformation having ‖T′(u)‖≤α<1 on a convex set S is a contraction mapping since,by the mean value inequality,‖T(u(1))-T(u(2))‖ ≤ sup‖u(1)-u(2)‖≤ α‖u(1)-u(2)‖.

      Lemma 2:If T is a contraction mapping on a closed subset S of a Banach space,there is a unique vector u?∈ S satisfying u?=T(u?).Furthermore,u?can be obtained by the method of successive approximation starting from an arbitrary initial vector in S.

      Proof:Select an arbitrary element u(1)∈S.Define the sequence{u(i)}by the formula u(i+1)= T(u(i)).Then‖u(i+1)-u(i)‖=‖T(u(i))-T(u(i-1))‖ ≤ α‖u(i)-u(i-1)‖.Therefore,

      It follows that

      and hence we conclude that{u(i)}is a Cauchy sequence,since S is a closed subset of a complete space,there is an element u?∈ S such that u(i)→ u?.

      We now show that u?=T(u?).We have

      By appropriate choice of n the right-hand side of the above inequality can be made arbitrarily small.Thus‖u?-T(u?)‖ =0.

      It remains only to show that u?is unique.Assume that u?andare fixed points.Then

      Thus u?=.■

      Defining T(u)=Au+B,the problem is equivalent to that of finding a fixed point of T.Furthermore,the method of successive approximation proposed above for this problem is equivalentto ordinary successive approximation applied to T.Thus it is sufficient to show that T is a contraction mapping with respect to some norm on n-dimensional space.For the mapping T defined above we have

      The basic idea of successive approximation and contraction mappings can be modified in several ways to produce convergence theorems for a number of different situations.We consider one such modification below.

      Theorem 2:Let T be a continuous mapping from a closed subset S of a Banach space into S,and suppose that Tnis a contraction mapping for some positive integer n.Then,T has a unique fixed point in S which can be found by successive approximation.

      Proof:Let u(1)be arbitrarily selected from S.Define the sequence{u(i)}by

      Now since Tiis a contraction,it follows by Lemma 1 that the subsequence{u(ik)}converges to an element u?∈ S which is a fixed point of Ti.We show that u?is a unique fixed point of T.

      By the continuity of T,the element T(u?)can be obtained by applying Tisuccessively to T(u(1)).Therefore,we have u?=limk→∞Tik(u(1))and

      Hence,again using the continuity of T,

      where α < 1.Thus u?=T(u?).If u?,v?are fixed points,then

      and hence u?=v?.Thus the u?found by successive approximation is a unique fixed point of T. ■

      IV.ILLUSTRATIVE EXAMPLES

      In this section,we will show how this solution using the proposed method is worked to obtained a control law which improves the closed-loop performance of the original control.

      A.One Dimensional Nonlinear System

      A first-order nonlinear system is described by the state equation

      with initial condition x(0)=x0.It is desired to find u(t),t∈ (0,∞),that minimizes the performance measure

      Fig.1. A continuous stabilizing control u(1)with different initial state.

      Fig.2. A continuous stabilizing control u(2)with different time.

      Assume a linear control to start with

      it is clear that the system is stable when a>0.Now select a feedback initial control

      then the system(24)under controller u(1)is stable with different initial value as shown in Fig.1.Substitute u(1)into(7),we have

      The above yields

      Then the system(24)under controller u(2)is stable with different time domain as shown in Fig.2.

      To continue the iterative algorithm,we would repeat the preceding steps,using this revised value.If we select x=2,then we can obtain the value p(i)(i=1,2,...),for example,p(1)(2)=-8.000,p(2)(2)=-2.500,p(3)(2)=-0.4120,p(4)(2)=-0.2594,p(5)(2)=-0.2552,p(6)(2)=-0.2550.It is obvious that

      Eventually the iterative procedure should converge to the optimal control history,u?.The preceding example indicates the steps involved in carrying out one iteration of Algorithm 1.Letus use this algorithm to determine the optimaltrajectory and control for a continuous stirred-tank chemical reactor.

      B.Two Dimensional Nonlinear System

      The state equations for a continuous stirred-tank chemical reactorare given below from[1].The flow ofa coolantthrough a coil inserted in the reactor is to control the first-order,irreversible exothermic reaction taking place in the reactor.x1=T(t)is the deviation from the steady-state temperature and x2(t)=C(t)is the deviation from the steady-state concentration.u(t)is the normalized controlvariable,representing the effect of coolant flow on the chemical reaction.The state equations are

      with the boundary conditions x0=[0.05,0]T.The performance measure to be minimized is

      Fig.3.(a)Optimal control and trajectory;(b)Performance measure reduction by Algorithm 1;(c)Performance measure reduction by Beard in[16].

      indicating thatthe desired objective is to maintain the temperature and concentration close to their steady-state value without expending large amounts of control effort.R is a weighting factor that we shall select arbitrarily as 0.1.To ensure that a monotonically decreasing sequence of performance indices was generated,each trial control was required to provide a smaller performance measure than the preceding control.This was accomplished by re-generating any trial control that increased the performance measure.

      With t∈[0,0.78],and an initial value equal to 1.0,the value of the performance measure as a function of the number of iterations is as shown in Fig.3(b).The results compared with Beard’s method are shown in Fig.3(c).In this figure,J is regarded as the performance obtained by Beard’s method in[16].It is seen from Fig.3(b)and Fig.3(c)that the optimal feedback controlobtained by ourmethod is betterthan Beard’s method.However,the approximate optimal controlunder our approach can be decreased by strengthening the stopping criteria even if it costs further computations.

      To illustrate the effects of differentinitialparameters for the control history,different additional solutions were obtained.The results ofthese computer runs are summarized in Table I.

      TABLE I SUMMARY OF ALGORITHM 1 WITH DIFFERENT PARAMETERS

      The value of the performance measure J as a function of the number of iterations with different initial control(u(1)=1.0,0)and different stopping criterion(?=10-6,10-7)is as shown in Table I.To ensure thata monotonically decreasing sequence of performance indices was generated,each initial control was required to provide a smaller stopping criterion than the preceding initial control.This was accomplished by having?and re-generating any trial control that increased the performance measure.It could be seen from Table I when the initial control is equal to zero and the stopping criterion is 10-6,the steps of iteration reduced significantly,however,the performance measure yielded only very slightreduction.This type of progress is typical of the iterative method.

      Remark 1:To conclude our discussion of Algorithm 1,let us summarize the important features of the algorithm.First of all,from the nominal control history,u(1),must be selected to begin the numerical procedure.In selecting the nominal control we use whatever physical insight we have about the problem.Secondly,the current control u(i)and the corresponding state trajectory x(i)are stored.If storage must be conserved,the state value needed to determine p(i)can be obtained by reintegrating the state equations with the costate equations.If this is done x(i)does not need to be stored;however,the computation time will increase.Generating the required state values in this mannermay take the results ofthe backward integration more accurate,because the piecewiseconstant approximation for x(i)need not be used.Finally,the proposed method is generally characterized by ease of starting the initial guess for the control is not usually crucial.On the other hand,the method has a tendency to converge slowly as a minimum is approach.

      C.Robot Arm Problem

      The robotarm problem is taken from[1].In the formulation the arm of the robot is a rigid bar of length L that protrudes a distanceρfrom the origin to the gripping end and sticks out a distance L-ρin the opposite direction.If the pivot point of the arm is the origin of a sphericalcoordinate system,then the problem can be phrased in terms of the lengthρof the arm from the pivot,the horizontal and vertical angles(θ,φ)from the horizontal plane,the controls(uρ,uθ,uφ),and the final time tf.Bounds on the length and angles are

      and for the controls

      The equations of motion for the robot arm are

      where I is the moment of inertia,defined by

      The boundary conditions are

      Fig.4. Variables x1,...,x6 for the robot arm problem.

      Fig.5. Control variables uρ,uθ,uφ for the robot arm problem.

      This model ignores the fact that the spherical coordinate reference frame is a non-inertial frame and should have terms for Coriolis and centrifugal forces.Let x1=ρ,x2=˙ρ,x3=θ,x4=˙θ,x5=φ,x6=˙φ,then the optimal control of this robot arm problem is stated as follows.Find the controls(uρ,uθ,uφ)such that they minimize the cost functional

      subject to the dynamic constraints

      The control inequality constraints and the boundary conditions are shown in the above statement.

      Fig.4 shows the variables x1,...,x6for the robot arm as a function of time.The control variables uρ,uθ,uφfor the robot arm as a function of time are also shown in Fig.5.Note that the functions x1,x3,x5for the robot arm are continuously differentiable,but since the second derivatives are directly proportionalto the controls,the second derivatives are piecewise continuous.

      V.CONCLUSION

      In this paper,we proposed a new iterative numerical technique to solve the GHJB equation effectively.There is a need for numerical methods which approximate solutions to the special types of equations which arise in nonlinear optimal control.We also showed that the resulting controls are in feedback form and stabilize the closed-loop system.The procedure is proved to converge to the optimal solution to GHJB equation with respect to the iteration variable.

      太湖县| 桐城市| 灵山县| 成都市| 海南省| 勃利县| 大竹县| 霍城县| 江津市| 乐都县| 化德县| 邢台市| 临颍县| 台南县| 黄浦区| 青川县| 日土县| 台湾省| 高邑县| 通化县| 庆元县| 开平市| 城口县| 安多县| 萨嘎县| 元朗区| 任丘市| 密云县| 阿荣旗| 武邑县| 星座| 通城县| 新余市| 逊克县| 广西| 拉孜县| 茶陵县| 桦南县| 固镇县| 大余县| 顺义区|