Sei sulla pagina 1di 14

Stochastic Optimal Control

Robert Stengel Optimal Control and Estimation MAE 546 Princeton University, 2012 ! ! ! ! ! Nonlinear systems with random inputs and perfect measurements Nonlinear systems with random inputs and imperfect measurements Certainty equivalence and separation Stochastic neighboring-optimal control Linear-quadratic-Gaussian (LQG) control

Nonlinear Systems with Random Inputs and Perfect Measurements


Inputs and initial conditions are uncertain, but the state can be measured without error

z (t ) = x (t )

! x ( t ) = f ! x ( t ) , u ( t ) , w ( t ) ,t # " $

E ! x ( 0 ) # = x ( 0 ); E ! x ( 0 ) % x ( 0 ) # ! x ( 0 ) % x ( 0 ) # " $ " $" $ E ! w ( t ) # = 0; E ! w ( t ) w T (& ) # = W ( t )' ( t % & ) " $ " $

}=0

Copyright 2012 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE546.html http://www.princeton.edu/~stengel/OptConEst.html

Assume that random disturbance effects are small and additive

! x ( t ) = f ! x ( t ) , u ( t ) ,t # + L ( t ) w ( t ) " $

Cost Must Be an Expected Value


Deterministic cost function cannot be minimized because
disturbance effect on state cannot be predicted state and control have become random variables

Stochastic Euler-Lagrange Equations?


There is no single optimal trajectory Expected values of Euler-Lagrange necessary conditions may not be well dened
( &'[x(t f )] + 1) E " !(t f ) $ = E ) , # % * &x T

min J = ! " x(t f ) $ + & L [ x(t), u(t)] dt # % u(t )


to

tf

However, the expected value of a deterministic cost function can be minimized


tf ' + ) ) min J = E (! " x(t f ) $ + & L [ x(t), u(t)] dt , # % u(t ) to ) ) * -

( ' H[x(t), u(t), !(t),t] + ! 2) E " !(t) $ = &E ) , # % 'x * # ! H[x(t), u(t), "(t),t] & 3) E $ '=0 !u % (

Stochastic Value Function for a Nonlinear System


However, a Hamilton-Jacobi-Bellman (HJB) based on expectations can be solved Base the optimization on the Principle of Optimality Optimal expected value function at t1
t1 ) + + V * ( t1 ) = E *! " x * (t f ) $ & ( L [ x * (' ), u * (' )] d' . # % tf + + , / t1 ) + + = min E *! " x * (t f ) $ & ( L [ x * (' ), u(' )] d' . # % u tf + + , /

Rate of Change of the Value Function


Total time-derivative of V*

dV * = !E { L [ x * (t1 ), u * (t1 )]} dt t =t1


x(t) and u(t) can be known precisely; therefore

dV * = !L [ x * (t1 ), u * (t1 )] dt t =t1

Incremental Change in the Value Function


Apply chain rule to total derivative

Introduction of the Trace


Trace of a matrix product

dV * " !V * !V * % ! = E$ + x dt !x ' # !t &


Incremental change in value function, !V
Expand to second degree
) "V * , "V * dV * 1 # " 2V * & 2 ! ! ! !t = E + !t + !V* = x!t + % xT x !t + ". "x 2 ( "t "x dt 2$ ' * ) "V * , & " 2V * "V * 1# = E+ !t + ( f (.) + Lw (.)) !t + 2 % ( f (.) + Lw (.))T "x 2 ( f (.) + Lw (.))( !t 2 + ". "x $ ' * "t -

Tr ( ABC ) = Tr ( CAB ) = Tr ( BCA ) Tr xT Qx = Tr xxT Q = Tr QxxT


Cancel !t

dim ! Tr ( ) # = 1 % 1 " $

* "V * "V * & dV * " 2V * 1 # ! E, + ( f (.) + Lw (.)) + 2 Tr % ( f (.) + Lw (.))T "x 2 ( f (.) + Lw (.))( )t / dt "x $ ' . + "t * "V * "V * & 1 # " 2V * + = E, ( f (.) + Lw (.)) + 2 Tr % "x 2 ( f (.) + Lw (.))( f (.) + Lw (.))T ( )t / "x $ ' . + "t

Toward the Stochastic HJB Equation


! Because x(t) and u(t) can be measured
) !V * !V * % , dV * 1 " ! 2V * = E+ + ( f (.) + Lw (.)) + 2 Tr # !x 2 ( f (.) + Lw (.))( f (.) + Lw (.))T & (t . $ ' dt !t !x * = ) !V * % , 1 " ! 2V * !V * !V * + f (.) + E + Lw (.) + Tr $ ( f (.) + Lw (.))( f (.) + Lw (.))T ' (t . !x 2 # !x 2 !t & * !x

Toward the Stochastic HJB Equation


! Disturbance is assumed to be zero-mean white noise
E ! w ( t ) w T (% ) # = W ( t )& ( t ' % ) " $ E ! w (t )# = 0 " $

Uncertain disturbance input can only increase the value function rate of change

+ ( ! 2V * dV * !V * !V * 1 T T = + f (.) + lim Tr ) 2 $ E f (.) f (.) "t + LE w (.) w (.) LT & "t , ' "t #0 dt !t !x 2 * !x % = & !V * !V * 1 $! V * (t ) + (t ) f (.) + Tr . 2 (t ) L (t ) W (t ) L (t )T / !t !x 2 % !x '
2

Stochastic Principle of Optimality


(Perfect Measurements)
% dV * !V * !V * 1 " ! 2V * = (t ) + (t ) f (.) + Tr $ 2 (t ) L (t ) W (t ) L (t )T ' dt !t !x 2 # !x &
! Substitute for total derivative, dV*/dt = L(x*,u*) ! Solve for the partial derivative, ! V*/! t ! Stochastic HJB Equation
!V * (t ) = !t ) !V * 1 # ! 2V * T %" min E * L # x * ( t ) , u ( t ) ,t % + $ & !x ( t ) f # x * ( t ) , u ( t ) ,t % + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . $ & u , $ &, / + Boundary (terminal) condition : V * t f = E #0 t f % $ &

Observations of Stochastic Principle of Optimality


(Perfect Measurements)
!V * (t ) = !t ) !V * 1 # ! 2V * T %" min E * L # x * ( t ) , u ( t ) ,t % + $ & !x ( t ) f # x * ( t ) , u ( t ) ,t % + 2 Tr ' !x 2 ( t ) L ( t ) W ( t ) L ( t ) ( . $ & u , , $ &/ +

! Control has no effect on the disturbance input ! Criterion for optimality is the same as for the deterministic case ! Disturbance uncertainty increases the magnitude of the total optimal value function, V*(0)

( )

( )

The Information Set, I


! Sigma algebra(Wikipedia denitions)
! The collection of sets over which a measure is dened ! The collection of events that can be assigned probabilities ! A measurable space

Information Sets and Expected Cost

! Information available at current time, t1


! All measurements from initial time, to ! All control commands from initial time

I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ]}


! Plus available model structure, parameters, and statistics

I [ t o ,t1 ] = {z [ t o ,t1 ], u [ t o ,t1 ], f ( ) ,Q, R,!}

A Derived Information Set, ID


! Measurements may be directly useful, e.g.,
! Displays ! Simple feedback control

Additional Derived Information Sets


! Markov derived information set
! Most current mean and covariance from a state estimator

! ... or they may require processing, e.g.,


! Transformation ! Estimation

I MD ( t1 ) = {x ( t1 ) , P ( t1 ) , u ( t1 )}
! Multiple model derived information set
! Parallel estimates of current mean, covariance, and hypothesis probability mass function

! Example of a derived information set


! History of mean and covariance from a state estimator

I D [ t o ,t1 ] = {x [ t o ,t1 ], P [ t o ,t1 ], u [ t o ,t1 ]}

I MM ( t1 ) = ! x A ( t1 ) , PA ( t1 ) , u ( t1 ) , Pr ( H A ) # , ! x B ( t1 ) , PB ( t1 ) , u ( t1 ) , Pr ( H B ) # ,! " $ " $

Required and Available Information Sets for Optimal Control


! Optimal control requires propagation of information back from the nal time
! Hence, it requires the entire information set, extending from to to tf

Expected Values of State and Control


! Expected values of the state and control are conditioned on the information set

I !t o ,t f # " $
! Separate information set into knowable and predictable parts

E ! x (t ) | I D # = x (t ) " $

$" $ E ! x (t ) % x (t )# ! x (t ) % x (t )# | I D = P (t ) "
T

I !t o ,t f # = I [ t o ,t1 ] + I !t1 ,t f # " $ " $


! ! Knowable information has been received Predictable information is to come

... where the conditional expected values are obtained from a Kalman-Bucy lter

Dependence of the Stochastic Cost Function on the Information Set


J=
tf tf & * 1 ( ( E ' E ! Tr !S(t f )x(t f )xT (t f ) # | I D # + % E Tr !Qx ( t ) xT ( t ) # dt + % E Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ $ " " $ 2 ) 0 0 ( ( ,

Certainty-Equivalent and Stochastic Incremental Costs


J=
tf tf * 1 & ( ( E ' Tr S(t f ) ! P t f + x(t f )xT (t f ) # + % Tr Q ! P ( t ) + x ( t ) xT ( t ) # dt + % Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ " $ 2 ) 0 0 ( ( ,

( )

! Expand the state covariance


%# % P (t ) = E " x (t ) ! x (t )$ " x (t ) ! x (t )$ | I D #
T

! J CE + J S

= E " x ( t ) xT ( t ) ! x ( t ) xT ( t ) ! x ( t ) xT ( t ) + x ( t ) xT ( t ) $ | I D # %
E ! x ( t ) xT ( t ) # | I D = E ! x ( t ) xT ( t ) # | I D = x ( t ) xT ( t ) " $ " $

}
}

! Cost function has two parts

}
J CE = JS =

! Certainty-equivalent cost ! Stochastic increment cost


tf tf * 1 & ( ( E ' Tr !S(t f )x(t f )xT (t f ) # + % Tr Qx ( t ) xT ( t ) dt + % Tr ! Ru ( t ) uT ( t ) # dt + " $ " $ 2 ( 0 0 ( ) ,

} {

P ( t ) = E ! x ( t ) xT ( t ) # | I D % x ( t ) xT ( t ) " $ or E ! x ( t ) xT ( t ) # | I D = P ( t ) + x ( t ) xT ( t ) " $

tf & * 1 ( ( E ' Tr !S(t f )P t f # + % Tr !QP ( t ) # dt + " $ $ 2 ) " 0 ( ( ,

( )

Expected Cost of the Trajectory


! Optimized cost function

V * (to ) ! J * t f

( )

tF ( , * * = E )! " x * (t f ) $ + ' L [ x * (& ), u * (& )] d& # % * * t0 + .

Expected Cost of the Trajectory


! ! For planning or post-trajectory analysis, one can assume that the entire information set is available For real-time control, t1 ! tf, and future information can only be predicted If separation property applies (TBD), future conditioning effect can be predicted If not, future conditioning effect can only be approximated

! Law of total expectation


E (! ) = E (! | I [ t o ,t1 ]) Pr {I [ t o ,t1 ]} + E ! | I "t1 ,t f $ Pr I "t1 ,t f $ # % # % = E " E (! | I ) $ # %

) {

! !

! Because the past is established at t1

E ( J *) = E ( J* | I [ t o ,t1 ])[1] + E J* | I !t1 ,t f # Pr I !t1 ,t f # " $ " $ = E ( J* | I [ t o ,t1 ]) + E J* | I !t1 ,t f # Pr I !t1 ,t f # " $ " $

) {

) {

Separation Property and Certainty Equivalence


Separation Property
Optimal Control Law and Optimal Estimation Law can be derived separately Their derivations are strictly independent

Neighboring-Optimal Control with Uncertain Disturbance, Measurement, and Initial Condition

Certainty Equivalence Property


Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic Optimal Control Law are the same The Optimal Estimation Law can be derived separately Linear-quadratic-Gaussian control is certainty-equivalent

Immune Response Example


! Optimal open-loop drug therapy (control)
! Assumptions
! Initial condition known without error ! No disturbance
Open-Loop Optimal Control for Lethal Initial Condition

Immune Response Example with Optimal Feedback Control


Open- and Closed-Loop Optimal Control for 150% Lethal Initial Condition

Optimal closed-loop therapy


! Assumptions
! Small error in initial condition ! Small disturbance ! Perfect measurement of state

Stochastic optimal closed-loop therapy


! Assumptions
! ! ! ! Small error in initial condition Small disturbance Imperfect measurement Certainty-equivalence applies to perturbation control

Immune Response with Stochastic Optimal Feedback Control


(Random Disturbance and Measurement Error Not Simulated)
Low-Bandwidth Estimator (|W| < |N|) High-Bandwidth Estimator (|W| > |N|)

Immune Response to Random Disturbance with Stochastic Neighboring-Optimal Control


Disturbance due to
Re-infection Sequestered pockets of pathogen

Noisy measurements Closed-loop therapy is robust ... but not robust enough:
Organ death occurs in one case

Probability of satisfactory therapy can be maximized by stochastic redesign of controller

Initial control too sluggish to prevent divergence

Quick initial control prevents divergence

Stochastic Principle of Optimality Applied to the Linear-Quadratic (LQ) Problem


! Quadratic value function

Stochastic Linear-Quadratic Optimal Control

to ) + + V ( t o ) = E *! " x(t f ) $ & ( L [ x(' ), u(' )] d' . # % tf + + , / to ) " Q(t) M(t) $ " x(t) $ 1 + + 10 = E *xT (t f )S(t f )x(t f ) & ( " xT (t) uT (t) $ 0 1 dt # % 0 MT (t) R(t) 1 0 u(t) 1 . 2 + tf + % / %# # ,

! Linear dynamic constraint

! x ( t ) = F(t)x(t) + G(t)u(t) + L(t)w(t)

Components of the LQ Value Function


! Quadratic value function has two parts

Value Function Gradient and Hessian


! Certainty-equivalent value function

V (t ) =

1 T x (t)S(t)x(t) + v ( t ) 2

VCE ( t ) !

1 T x (t)S(t)x(t) 2

! Certainty-equivalent value function

! Gradient with respect to the state

VCE ( t ) !

1 T x (t)S(t)x(t) 2

!V (t ) = xT (t)S(t) !x
! Hessian with respect to the state

! Stochastic value function increment

v (t ) =

1 tf T Tr "S (! ) L (! ) W (! ) L (! ) $ d! # % 2 &t

! 2V (t ) = S(t) !x 2

Linear-Quadratic Stochastic Hamilton-Jacobi-Bellman Equation


(Perfect Measurements)
! Certainty-equivalent plus stochastic terms
!V * 1 = " min E # x *T Qx * +2x *T Mu + uT Ru + x *T S ( Fx * +Gu ) + Tr SLWLT % & u !t 2 $ 1 = " min # x *T Qx * +2x *T Mu + uT Ru + x *T S ( Fx * +Gu ) + Tr SLWLT % & u 2$

Optimal Control Law


! Differentiate right side of HJB equation w.r.t. u and set equal to zero

! ( !V !t ) = 0 = " xT M + uT R + xT SG $ # % !u

! Solve for u, obtaining feedback control law

! Terminal condition

u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # % ! !C ( t ) x ( t )

V tf =

( )

1 T x (t f )S(t f )x(t f ) 2

Matrix Riccati Equation LQ Optimal Control Law


u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # % ! !C ( t ) x ( t )

! Substitute optimal control law in HJB equation


1 1 T! ! x Sx + v = xT " !Q + MR !1MT ! F ! GR !1MT 2 # 2 1 + Tr SLWLT 2

) (

S ! S F ! GR !1MT + SGR !1GT S $ x %


u ( t ) = !R !1 ( t ) "GT ( t ) S ( t ) + MT ( t ) $ x ( t ) # %

Zero-mean, white-noise disturbance has no effect on the structure and gains of the LQ feedback control law

! Matrix Riccati equation provides S(t)


T ! S ( t ) = " !Q(t) + M(t)R !1 (t)MT (t) $ ! " F(t) ! G(t)R !1 (t)MT (t) $ S ( t ) # % # %

! S ( t ) " F(t) ! G(t)R !1 (t)MT (t) $ + S ( t ) G(t)R !1 (t)GT (t)S ( t ) , S t f = &xx t f # %

( )

( )

! Stochastic value function increases cost due to disturbance


! However, its calculation is independent of the Riccati equation

! v=

1 Tr SLWLT 2

Evaluation of the Total Cost


(Imperfect Measurements)
! Stochastic quadratic cost function, neglecting cross terms
J= =
or

Optimal Control Covariance


! Optimal control vector

! Q(t) 0 # ! x(t) # , 1 ( * * Tr ) E ! xT (t f )S(t f )x(t f ) # + E ' ! xT (t) uT (t) # % & dt &% $ " $% 0 R(t) $ " u(t) $ * 2 * " & . &% to " +
tf

1 Tr S(t f )E ! x(t f )xT (t f ) # + ' Q(t)E ! x(t)xT (t) # + R(t)E ! u(t)uT (t) # dt " $ " $ " $ 2 to

tf

u ( t ) = !C ( t ) x ( t )
! Optimal control covariance

J=

tf * 1 & ( ( Tr 'S(t f )P(t f ) + % !Q(t)P(t) + R(t)U ( t ) # dt + " $ 2 ( to ( ) ,

U ( t ) = C ( t ) P ( t ) CT ( t )

= R !1 ( t ) GT ( t ) S ( t ) P ( t ) S ( t ) G ( t ) R !1 ( t )

where

U ( t ) ! E ! u(t)uT (t) # " $

P(t) ! E ! x(t)xT (t) # " $

Revise Cost to Reect State and Adjoint Covariance Dynamics


! Integration by parts

Evolution of State and Adjoint Covariance Matrices


(No Control)
u ( t ) = 0; U ( t ) = 0
! State covariance response to random disturbance

! ! S(t)P(t) tof = % !S(t)P(t) + S(t)P(t) # dt " $


t to

tf

! ! S(t f )P(t f ) = S(t o )P(t o ) + % !S(t)P(t) + S(t)P(t) # dt " $


to

tf

! P ( t ) = F ( t ) P ( t ) + P ( t ) FT ( t ) + L ( t ) W ( t ) LT ( t ) , P ( t o ) given
! Adjoint covariance response to terminal cost

! Rewrite cost function to incorporate initial cost


tf * 1 & ( ( ! ! J = Tr 'S(t o )P ( t o ) + % !Q(t)P ( t ) + R(t)U ( t ) + S(t)P(t) + S(t)P(t) # dt + " $ 2 ) to ( ( ,

! S ( t ) = !FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) , S t f

( )

given

Evolution of State and Adjoint Covariance Matrices


(Optimal Control)
! State covariance response to random disturbance
! P ( t ) = " F ( t ) ! G ( t ) C ( t ) $ P ( t ) + P ( t ) " F ( t ) ! G ( t ) C ( t ) $ + L ( t ) W ( t ) LT ( t ) # % # %
T

Total Cost With and Without Control


! With no control

J no control

tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 $ ' to & #

Dependent on S(t)

! With optimal control, the equation for the cost is the same
tf % 1 " = Tr $S(t o )P ( t o ) + ! S ( t ) L ( t ) W ( t ) LT ( t ) dt ' 2 # ' $ to &

! Adjoint covariance response to terminal cost

! S ( t ) = !FT ( t ) S ( t ) ! S ( t ) F ( t ) ! Q ( t ) ! S ( t ) G ( t ) R !1 ( t ) GT ( t ) S ( t )
Independent of P(t)

J optimal control

! ... but evolutions of S(t) and S(to) are different in each case

Next Time: Linear-Quadratic-Gaussian Regulators

Supplemental Material

Dual Control
(Fel"dbaum, 1965) ! Nonlinear system
! Uncertain system parameters to be estimated ! Parameter estimation can be aided by test inputs

Adaptive Critic Controller


Nonlinear control law, c, takes the general form

! Approach: Minimize value function with three increments


! Nominal control ! Cautious control ! Probing control
min V* = min V *nominal + V *cautious + V * probing
u

u( t ) = c[ x(t),a,y * ( t )]
On-line adaptive critic controller

x(t) : state a : parameters of operating point y * (t) : command input

Nonlinear control law (action network) Criticizes non-optimal performance via critic network
Adapts control gains to improve performance Adapts cost model to improve estimate

Estimation and control calculations are coupled and necessarily recursive

Algebraic Initialization of Neural Networks


(Ferrari and Stengel) Initially, c[x, a, y*] is unknown Design PI-LQ controllers with integral compensation that satisfy requirements at n operating points Scheduling variable, a

Replace Gain Matrices by Neural Networks


Replace control gain matrices by sigmoidal neural networks

u ( t ) = C F ( a ) y * +C B ( a ) !x + C I ( a ) " !y ( t ) dt # c $ x(t), a, y * ( t ) & % '

u ( t ) = NN F ! y * ( t ) , a ( t ) # + NN B ! x ( t ) , a ( t ) # + NN I ! & %y ( t ) dt , a ( t ) # = c ! x(t), a, y * ( t ) # " $ " $ " $ " $

Initial Neural Control Law


Algebraic training of neural networks produces exact t of linear control gains and trim conditions at n operating points
Interpolation and gain scheduling via neural networks One node per operating point in each neural network

On-line Optimization of Adaptive Critic Neural Network Controller

Critic adapts neural network weights to improve performance using approximate dynamic programming

Heuristic Dynamic Programming Adaptive Critic


Dual Heuristic Programming Adaptive Critic for receding-horizon optimization problem Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cost function applied to nonlinear problem Modied resilient backpropagation for neural network training

Action Network On-line Training


Train action network, at time t, holding the critic parameters xed

xa(t) a(t)
NNA

Aircraft Model
Transition Matrices State Prediction

V [x(t k )] = L[x(t k ),u(tk )] + V [x(t k +1)]


!V !L !V !x = + =0 !u !u !x !u
NNA Target Optimality Condition

Utility Function Derivatives


NNC

!V [x a (t )] = NNC[x a (t ),a (t )] !x a (t )

Target Generation

Critic Network On-line Training


Train critic network, at time t, holding the action parameters xed

xa(t) a(t)
NNA

Aircraft Model
Transition Matrices State Prediction

NNC

Utility Function Derivatives


NNC(old)

NNC Target

Target Cost Gradient

Target Generation

Potrebbero piacerti anche