Bellman, Lee - Functional Equations in Dynamic Programming

Aequationes Mathematicae 17 (1978) 1-18 University of Waterloo
Birkh~iuser Verlag, Basel
Expository papers Functional equations in dynamic programming

RICHARD BELLMANand E. STANLEYLEE
1. Introduction
The basic form of the functional equation of dynamic programming f(p) = max [H(p, q, f(T(p, q)))],
q
is:
(1)
where p and q represent the state and decision vectors, respectively, T represents the transformation of the process, and f(p) represents the optimal return function with initial state p. This functional equation can be studied in several ways, either with respect to the type of processes giving rise to (1), or with respect to the precise form of (1), or with respect to the computational aspects of (1). In this survey article, this function will be treated according to the different types of processes. In addition to the optimization problems in dynamic programming as shown in (1), the dynamic programming concept can also be used to solve various types of boundary value problems arising in engineering and physical sciences. The dynamic programming concept without optimization is known as invariant imbedding. The resulting functional equation of invariant imbedding is very similar to (1) except for the absence of the maximization operation. Dynamic programming involves a completely different approach to formulating the problem: Instead of only considering a single problem with a fixed duration, the dynamic programming approach is to colasider a family of problems, with duration of the process ranging from zero to the duration of the original problem. In order to consider these different duration processes, the corresponding initial conditions for these processes must also be calculated and interpolated AMS (1970) subject classification: Primary 39A15. Secondary 49C10, 49C15. This research was supported by the National ScienceFoundation under Grant No. MPS-74-15650 and the Energy Research and Development Administration (ERDA) under Contract No. E(04-3)113, Project 19. Received August 26, 1975 and, in revised form, April 20, 1976.
R, B E L L M A N AI~q?) E. S. L E E
A E Q . MATI:I.
carefully. The advantages and drawbacks of dynamic programming are precisely due to these differences. For more detailed discussion of these computational problems, see references [1}-[3] and [7]-[11]. Some of the most noticeable advantages of dynamic programming are its adaptive nature and its computational flexibility. By the use of the functional equation of dynamic programming, we can ignore whatever happened in the past and only consider the presen t and future functions based on the current state of the problem. Furthermore, because of the computational flexibility, the functional equation technique can be applied in different ways, in combination with the classical techniques and separately, with different objectives in mind. Also, many of the difficulties in optimization by the classical techniques such as relative extreme, inequality constraints, tabulated or discrete functions, nondifferentiable functions and so on can be avoided with appropriate computational techniques. Obviously, it is impossible to cover all the functional equations arising in dynamic programming or to list all the references. In this survey article, only some of the important processes will be discussed with emphasis on recent developments.
2. Allocation processes
To begin our discussion, let us consider processes arising in economics, operations research and engineering. The basic question is that Of using resources of various types in efficient ways. This problem is ideally suited for the dynamic programming approach. To illustrate the approach, consider two types of resources which are present in quantites x and y, and let x~ and Yi, respectively, by the quantities of these resources allocated to the i-th activity. If we define the return function as
g~(x~, y~)= the return from the i-th activity

due to allocations x~ and y~. The resulting equation of dynamic programming can be represented by [1-3]
(1)
fN(x, y)= max [gav(XN, YZV)+fN--I(X--XN,Y-- YN)]

XN, YN
(2)
where fr4(x, y) is the optimal return function for an N stage process with quantities x and y to allocate among the N stages, i = 1, 2 , . . . , N. Comparing with Equation (1-1), the quanties x and y to be allocated are the state variables, the amounts allocated at stage N, xN and YN, are the decision variables.
Vol. 17, 1978
Functional equations in dynamicprogramming
Because of the imbedding nature of this functional equation, the computational effort increases exponentially with the number of state variables. By computational effort, we mean both fast memory requirement and comput~ition time. Even with the most recent computers, computer memory is limited. Thus, we have the dimensionality problem. There are many ways to reduce this memory problem. Some of these techniques are the use of Language multiplier, polynomial approximation, spline approximation and quasilinearization [11]. Equation (2) only represents a simple example of allocation problems. Various complicated allocation problems arising in practice such as the reliability problems, the transportation problems and so on can also be treated by dynamic programming. The essential feature is to view the static allocation processes as dynamic processes for dynamic programming treatment.
3. Optimal routing, inventory and scheduling

Most of the routing, inventory and scheduling problems can be treated as dynamic problems and thus naturally suited for the functional equation of dynamic programming. A simple smoothing process can be described as follows: A supply depot is required at a preassigned set of times to meet a set of known demands for services or supplies. If the demand is not met, a penalty is incurred. On the other hand, if depot is overstaffed, another type of penalty is leveled. Because the cost changes with the level of services and supplies and also because the set of demands fluctuate greatly over time, the problem is nontrivial and the stock level must be adjusted to minimize the total cost. Let rt, r2. . . . . rN be a preassigned sequence of demands, where rk is the demand at the k-th stage. Let
Xk = the capability of the system at the k-th stage,
(1)
k = 1, 2 . . . . . N, where Xo = c is a fixed initial level. In this example, let us assume that it is required that
Xk>--rk,
k= 1,2,...,N.
(2)
In other words, we insist that the demand always be met. Let us then introduce two cost functions
4)k(x k - r k ) = t h e cost incurred at the k-th stage if xk > rk,
(3)
gtk(Xk--Xk-1) = the cost incurred at the k-th stage if xk~ xk-l.
g . B E L L M A N A N D E. S. L E E
AEQ. MATH_
This latter function measures the cost involved in changing supply or service level. The total cost incurred due to a choice of levels xl, x 2 , . . . , Xn is given by
N
C(xl, x 2 , . . . , xN)= }-'. [k(Xk --rk)+qtk(Xk -- X~_a)].

k=l
(4)
Our objective is to choose the xk, k = 1, 2 . . . . . N, subject to the condition Xk >--rk, SO as to minimize this function. In order to treat this minimization problem by means of functional equation techniques, we imbed this problem within the family of problems requiring the minimization of the function
N
CR = Y~ [~(x~-r~)+ ~(x~-x~_0],
k=R
(5)
o v e r the region defined by xk - rk, k = R, R + 1 . . . . , iV, with Xn-1 = c, for R = 1, 2 . . . . . N. Let us define
fr(c) = min C~,

{xk}
R = 1, 2 . . . . . . N,
(6)
where the minimum is taken over the xk-region defined above. Then fN(c) = min [b(xN - rN) + ~ ( x N c)],
(7)
a readily determined function. The usual argument yields the recurrence relation [1-3]
fR(c) = m i n [~g(xR - rR)+ gtR(XR -- C) + fR+~(XR)],
xR ~
(8)
for R = 1, 2 . . . . . N ' I . We thus have a simple algorithm for obtaining the computational solution of the optimization problem. Equation (8) is, again, a particular form of Equation (1-1) with c as the state variable and xR as the decision variable. Optimal replacement, inventory and scheduling problems can be treated in essentially the same way. Consider the following simple inventory problem: We assume that orders for further supplies are made at each of a finite set of times,
VoL 17, 1978
Functional equations in dynamic programming
and immediately fulfilled. After the order has been made and filled, there is a demand made for the item. This demand is satisfied as far as possible, with any excess of demand over supply leading to a penalty cost. We suppose that the following functions are known: (a) 4)(s)ds = the probability that the demand will lie between s and s + ds. (b) k(z)= the cost of ordering z items to increase the stock level. (c) p(z)= the cost of ordering z items to meet an excess, z, of demand over supply, the penalty cost. To simplify the situation, let us assume that these functions are independent of time. Our aim is to determine an ordering policy which minimizes the expected cost of carrying out an N-stage process. Let us then introduce the function
fN(x) = the expected cost of an N stage process, starting with a stock

level x and using an optimal ordering policy. Let us suppose that we order at the first stage a quantity y - x up to y. Then the expected cost is given by the function k(y Hence, fl(x)=min
y~x
to bring the level
- x) +
p(s - y)~b(s) ds.
(9)
k(y-x)+
p(s-y)4~(s) ds .
(10)
The usual argumentation yields the recurrence relation [1-3] fn(x) = min k ( y - x ) +
y~X
p(s-y)rb(s)ds n>- 2,
(11)
+ fn-l(O) [~4)(s)ds+ i" f,-t(y-s)4~(s) ds],
upon an enumeration of the various possibilities corresponding to the different cases of an excess of demand over supply, and supply over demand. As can be seen from the above examples, various forms of inventory, scheduling, smoothing and routing problems can be treated as dynamic problems and dynamic programming is naturally suited for solving these problems. 4. Calculus of variations Dynamic programming is ideally suited for treating problems in calculus of variations. In fact, the functional equation approach of dynamic programming
R_ B E L L M A N A N D E. S. L E E
AEQ. MATH.
yields the fundamental classical results of the calculus of variations and the Hamilton-Jacobi theory. Furthermore, because of the completely different concept in the functional equation approach, many of the shortcomings encountered in the calculus of variations such as the nonlinear two point boundary problem, inequality constraints and the solution of linear problems, can be overcome by the dynamic programming formulation. To illustrate the approach, consider the following calculus of variations problem: Find that function z(t) such that the function x(t) given by the differential equation
dt=f(x, z)
and the initial condition
x(O) = c
dx
(1)
(2)
maximize the integral
Y(z) = f '~h(x, z) dt
(3)
The function x(t) represents the state of the system and is known as the state variable and the function z(t) is the control or decision variable. If this problem is treated by the classical calculus of variations, the result will be a two-point boundary-value problem. This problem will be treated by the dynamic programming approach. To use the invariant imbedding concept, observe that when the maximum of (3) has been obtained, the integral is a function only of the initial condition c and the duration of the process t. Thus, we wish to imbed the original problem with particular values of c and re, within a family of processes in which c and t are parameters. Define
g(c, t0
Thus
~the maximum value of J where the starting state of~ t the process is c and the total duration is tf. !
g(c, tt) = max J(z) = max Iot' h(x, z) dt

[o,q]
Vol. 17, 1978
Functional equations in dynamic p r o g r a m m i n g
The maximization is executed by choosing the p r o p e r value of z over the interval [0, tf]. The function g will be referred to as the o p t i m u m return and J, which in general is not the o p t i m u m or m a x i m u m value, will be called the return or nominal return. T h e control variable z(t) also is k n o w n as a policy. T h e o p t i m u m value of z(t) is the optimal policy. L e t us p e r t u r b the duration of the process and consider the original process with duration f r o m t = 0 to t = t, and a neighboring process with duration f r o m t = A to t = tf. H o w e v e r , instead of relating these two processes as has b e e n d o n e in the previous sections, we shall e m p l o y a different a p p r o a c h of using the p r o p e r t y of g(c, t~) and the additive p r o p e r t y of the integral. T h e original process can be assumed to be c o m p o s e d of two different processes. T h e first process has a d u r a t i o n of t = 0 to t = zl and the s e c o n d o r neighboring process has a duration of t = A t o t = t t. W e m a y write
g(c, tf) = max

z[o,q]
Lt
+
h(x, z) dt = max max

z[o,a] z[a,~tJ
h(x, z) dt
h(x,z) dt = m a x
z [o, a]
{;J
h(x,z) d t + m a x
z [a,q]
h(x,z) d t .
(5)
T h e s e c o n d term represents the m a x i m u m r e t u r n f r o m the second process. Obviously, the starting state for this second process is
A
c+
f(x, z) dt
(6)
which is o b t a i n e d f r o m (1). F r o m the definition of g and Eq. (4), we can see that the m a x i m u m return f r o m the s e c o n d process is
g c+
f(x, z) dt, t f - A = max

z[a,~]
h(x, z) dt.
(7)
Substituting (7) into (5), we have
g(c, tf)=max
z [0,,~]
h(x,z)dt+g
c+
f(x,z)dt, tf-zi .
(8)
R. BELLMAN AND E. S. LEE
AEQ. MATH.
The terms under the integrals may be approximated by
~a h(x, z) dt = h(c, z(O))A [a f(x, z) dt = f(c,

z(0))A
g(c, tf) = m a x
z[0,a]
(9)
with terms involving /12 and higher orders of A omitted. Equation (8) now becomes
[h(c,
z(0))A +
g(c+f(c,
z(0))A, t~- A)].
(10)
Using Taylor's series, we obtain

g(c + f(c, z(0))a, i f - a ) = g(c, tt) + f(c, z(0))a ag(c, tr) _ a Og(c, t~) + o(a). Oc otf
(11)
Equation (10) becomes
g(c, tt) =
max
z[O,a]
[ h(c, z ( 0 ) ) a
tf) g(c, tf) + f(c, z(0))A -Og(c, Oc at?g;;: tf) FO(A)]. (12)
Since g(c, tr) is independent of the choice of z, we put it outside of the maximum operation sign. Thus,
0 = max [h(c, z(0))A +f(c, z(0))A Og(c, tf)_A 0g(c, t~) + O(A)]. J zto,a] [ Oc Ot r In the limit as A ~ 0, equation (13) becomes
(13)
Og(c'tr)=max[h(c, y)+f(c, y) g(c' tO] Ott y Oc
(14)
Vol. 17, 1978
Functionalequationsin dynamicprogramming
where y = z(0). The initial condition is g(c, O) = O. (15)
Equation (14) is the desired relationship. This equation should be compared with the functional equations of invariant imbedding such as (7.13). Note that expression (7.14) also is for a process with zero duration. Although Eq. (14) has been obtained analytically, it also can be obtained by using a basic property of an optimal policy by purely verbal arguments. This basic property is known as the principle of optimality [1]. This principle forms the cornerstone of dynamic programming and is possessed by most multistage decision processes. For simplicity, we have obtained the dynamic programming equations for a system with only one state variable. Similar equations can be obtained for multi-dimensional problems. Notice that the present approach not only avoided the two-point boundaryvalue difficulties, but also many other difficulties which are usually associated with the calculus-of-variations approach. This is due to the fact that the maximum of (14) not only can be obtained by means of calculus, but also by search techniques incorporated into the numerical solution schemes of (14). With a proper search technique, one can avoid the difficulties in handling inequality constraints and in answering the question whether a true maximum has been obtained. In addition, the dynamic programming approach also can handle unusual functions such as nonanalytical functions. Problems in which the control variable appears linearly also can be treated by dynamic programming, if we remember that constraints must be present in order for a linear problem to have an optimum. As a price in overcoming these difficulties, one encounters other forms of problems. Obviously, Eq. (14) cannot be solved easily if the dimension of c is large. Although, Eq. (14) can be solved by using the difference equation before the limit is taken, there still is the dimensionality difficulty for large dimensions of c. This dimensionality difficulty severely limits the number of state variables that can be handled by this approach. The dynamic programming technique has been used to treat various optimization problems. However, detailed discussions will not be given here. The reader can consult any of the references cited earlier for more details.
5. Markov decision processes
The functional equation of dynamic programming forms the backbone in the formulation and solution of Markov decision processes. Various monographs have
10
R. B E L L M A N A N D E. S. L E E
A E Q . MA,qlq.
b e e n written on this subject (see, for example, [5] and [6]). It is obviously impossible in this review to cover the subject in any significant manner. Consider a s e q u e n t i a l decision stochastic process which is ruled by the transition matrix P(q)= (Pii, (q)), where
Pii(q) = the
probability t h a t the system is in state j at time t + 1, given that it was in state i at time t, assuming that a decision policy q is used, i, / = 1 , 2 . . . . . N.
The state of the system can be represented by the transformation equation

N
x,+~(j) = Y, p~ix,(i),
i=1
j = 1, 2 . . . . . N
(1)
with the initial condition
xo( i) = c,
where x,(i) = the probability that the system is in state i at t, i = 1, 2 . . . . . N.
(2)
W e have assumed that the system at any particular time is in one of a finite n u m b e r of states, i = 1, 2 . . . . . N. T h e r e is a change of state involved at each stage. Associated with this change of state is a return which is a function of the initial and terminal state and of the decision. Let
R(q)
= r~,(q)
(3)
represent the return matrix. T h e p r o b l e m is that of choosing the sequence of decisions which will maximize the expected return obtained from an n-stage process, given the initial state of the system. Using the functional equation technique, let us define the optimal return as f ~ ( i ) = e x p e c t e d return obtained from an n-stage process, starting in state i and using an optimal policy.
Vol. 17, 1978
Functional equations in dynamic programming
11
The functional equation of dynamic programming is (4)
f.(i) = max
q i
p~j(q)(r,,(q)+f,~ 1
with i = 1, 2 . . . . . N, n = 1, 2 . . . . . and fo(i) = O. One of the computational difficulties in solving the above functional equation is the necessity of storing the transition matrices Pij(q) and rij(q). If N exceeds a value such as 1,000, the storage requirements become excessive. However, the asymptotic behavior of the functions xt(i) as t ~ o0 can frequently be used to avoid this difficulty. Even if this asymptotic behavior does not exist, approximation can still be used based on this fictitious infinite process behavior. Many processes occurring in our daily life can be treated as a Markov process. For example, the optimal choice of our daily communicating route from home to office, the optimal operation of a taxicab, scheduling and replacement problems, multi-stage games, and so on.
6. Fuzzy systems
The functional equation of dynamic programming is ideally suited to solve fuzzy problems. Fuzziness should not be confused with randomness. For example, much of the decision-making in a real world takes place in a fuzzy environment in which the goals, the constraints and the consequences of possible actions are not known precisely. The theory of fuzzy sets is discussed in detail in the literature (see, for example, [14-18]). Only the dynamic programming application of fuzzy sets will be discussed here. We shall begin our discussion with multistage decision-making in a fuzzy environment. For simplicity we shall assume that the system under control, A, is a time-invariant finite-state deterministic system in which the state, x,, at time t, t = 0, 1, 2 , . . . , ranges over a finite set X = {o-1. . . . . o-m}, and the input, u,, ranges over a finite set U = {al . . . . . a,,}. The temporal evolution of A is described by the state equation
x,+a = f(x,, u~),
t = O, 1, 2 , . . .
(1)
in which f is a given function from X U to X. Thus, f ( x , ut) represents the successor state of x, for input u,. Note that if f is a random function, then A is a stochastic system whose state at time t + 1 is a probability distribution over X,
12
R. BELLMAN
AND
E. S. L E E
AEQ.
MATH.
P ( x , + t l x , , ut), which is conditioned on xt and ut. Analogously, if f is a fuzzy function, then A is a fuzzy system whose state at time t + l is a fuzzy set conditioned on x, and u,, which means that it is characterized by a m e m b e r s h i p function of the form tz(xt+~ [ x,, u,). Since we will not be concerned with such systems in the sequel, it will be understood that f is nonfuzzy unless explicitly stated to the contrary. We assume that at each time t the input is subjected to a fuzzy constraint C t, which is a fuzzy set in U characterized by a membership function tz,(ut). Furthermore, we assume that the goal is a fuzzy set G N in X, which is characterized by a m e m b e r s h i p function tL~N(xN), where N is the time of termination of the process. These assumptions are c o m m o n to most of the problems considered in the sequel. Let us define the p r o b l e m m o r e precisely. The system is assumed to be characterized by (1), with f a given n o n r a n d o m function. T h e termination time N is assumed to be fixed and specified. The initial state, Xo, is assumed to be given. The p r o b l e m is to find a maximizing decision. The decision--viewed as a decomposable fuzzy set in U x U x x U, m a y be expressed at once as R = C N C l f3 . . . N C N - l fq ~ N
(2)
G N
where ~ N is the fuzzy set in U x U x . . - x U which induces explicitly, in terms of m e m b e r s h i p functions, we have
~LLD(~0 ..... /~N 1) = ~L L 0( /~0) A " " ' A
in X. More
t.N_I(UN 1 ) A t.GN(XN)
(3)
where XN is expressible as a f u n c t i o n o f Xo and Uo. . . . . UN-1 through the iteration of (1). O u r problem, then, is to find a sequence of inputs u0 . . . . , UN-1 which maximizes tzo as given by (3). As is usually the case in multi-stage processes, it is expedient to express the solution in the form
u,=,rr,(x,),
t=0,1,2 ..... N-l,
where 7r, is a policy function. Then, we can employ dynamic p r o g r a m m i n g to give us b o t h the 7r, and a maximizing decision u ~ . . . . . u~_l. M o r e specifically, using (2) and (1), we can write tzD(Uo~,..., u ~ - l ) = Max Max (~o(Uo) (4)
Uo, - - - , UN--2 ~N--I
A. /xtc-2(ui,,,-z) A tZN-I(UN-,)/X I ~ N ( f ( X N _ a , UN-1)).
Vol. 17, 1978
13
Now, if 3' is a constant and g is any function of uN_l, we have identity Max (3'/~ g(uN_l)) = 3' A Max g(uN-0.
UN I I~N 1
Consequently, (4) may be rewritten as
t~D(uo ~ ..... uN_0=

Uo,
Max
- ., UN l
(~0(Uo)A.-. A~U_z(UN_z) A t ~ 6 N - - I ( x N
1))
(5)
where
~ 6 N - 1 (Xn_l) = Max (t~N 1(Uu 1) A t ~ N ( f ( x N

UN-1
1, Us 1)),
(6)
may be regarded as the membership function of a fuzzy goal at time t = N - 1 which is induced by the given goal G n at time t = N. On repeating this backward iteration, which is a simple instance of dynamic programming, we obtain the set of recurrence equations
t ~ 6 N - u(xN_~) = Max (tz(uu_,~)A t ~ c N - u + 1 (XN v +0)

UN v
(7)
V = 1. . . . . N,
XN ,,+1 = f ( X N - , , US--,,),
which yield the solution to the problem. Thus, a maximizing decision Uo~,.. , uu-lM is given b y the successive maximizing values of uN_, in (7), with u M defined as a function of Xn-~, u = 1, , N. The above is only one example concerning the application of functional equations in multistage decision-making in a fuzzy environment. The complexity and the considerable order of magnitude involved for further investigation are tremendous. Some of the facets of the functional theory of decision-making in a fuzzy environment that require further investigation are the question of execution of fuzzy decisions; the way in which the goals and the constraints must be combined when they are of unequal importance or are interdependent; the control of fuzzy systems and the implementation of fuzzy algorithms; the notion of fuzzy feedback and its effect on decision-making; control of systems in which the fuzzy environment is partially defined b y exemplification; and decisionmaking in mixed environments, that is, in environments in which the imprecision stems from both randomness and fuzziness.
N - - v
14
R. B E L L M A N A N D E. S. LEE
AEQ. MATH.
7. Invariant imbedding and nonlinear boundary value problems

In many applications in engineering and physical sciences, there occur many two-point or multipoint nonlinear boundary value problems. The functional equation approach can be used effectively to overcome the stability problem in solving these nonlinear boundary value problems. To illustrate the approach, consider the nonlinear two-point boundary value problem
dx -~ = f(~, y, t)
dy=
dt
(1)
g(x, y, t)
with boundary conditions
x(O) = c,
y(tf) = o
(2)
with 0 < t-< tt. In order to avoid the various computational difficulties in solving the above boundary-value problem, we shall convert it into an initial-value problem. In other words, the missing initial condition y(0) will be obtained by using the invariant imbedding concept. To do this, consider the problem with the more general boundary conditions
x ( a ) = c,
y(t~) = 0
(3)
where a < - t < - t s and a is the starting value of the independent variable t. However, it should be kept in mind that a also controls the duration of the process. If a assumes different values from zero to tt, say a = 0, A, 2A . . . . , then there will be a family of problems. Each member of this family has a different starting value of a and is represented by Eqs. (1) and (3). Let us consider obtaining the missing initial conditions y(a) for this family of problems. The idea is that neighboring processes are related to each other. It may be possible to obtain the missing initial condition for the original problem y(0) by examining the relationships between the neighboring processes. Notice that the missing initial condition y(a) for this family of processes is not only a function of the starting point of the process a, but also a function of the
Vol. 17, 1978
Functionalequations in dynamicprogramming
15
starting state or the given initial condition c. Define the missing initial condition for the system represented by (1) and (3) where the process~ J begins at t = a with x(a) = c.
r(c, a) =
Obviously y(a) = r(c, a) (4)
Notice that x(a) and y(a) represent the starting state of the process. We shall consider r as the dependent variable, and c and a as the independent variables. An expression for r in terms of c and a will be obtained. Considering the neighboring process with starting value a +/t, the missing initial condition of this neighboring process can be related to y(a) by the use of Taylor's series y(a +A) = y(a)+ y'(a)zl+ O(A) (5)
where o ( a ) represents higher-order terms or terms involving powers of A higher than the first. At the starting value a, Eq. (1) becomes
x'(a) = f(x(a), y(a), a)=f(c, r(c, a), a)

y'(a) = g(x(a), y(a), a) = g(c, r(c, a), a). Substituting (6b) and (4) into (5), we obtain
(6a)
(6b)
y(a +a) = r(c, a)+ g(c, r(c, a), a ) a + O(a).
(7)
On the other hand, the following expression can be obtained for this missing initial condition y(a +A) from Eq. (4): y(a +A) = r(x(a +A), a +Zl). (8)
Again, the expression x(a +zl) can be related to its neighboring process x ( a ) = c by Taylor's series,
x(a + A) = x(a) + x'(a)d + O(Zl ) = c +f(c, r(c, a), a)a + O(zl).

Thus, Eq. (8) becomes
(9)
y(a +A)= r(c +f(c, r(c, a), a)a + O(a), a + a ) .
(10)
16
R. B E L L M A N A N D E. S. L E E
AEQ. MATH.
Equating Eqs. (7) and (10), we obtain the desired relation
r(c, a)+ g(c, r(c, a), a)A = r(c + f(c, r(c, a), a)A, a +At)
(11)
omitting the terms involving powers of d higher than the first. The difference Eq. (11) can be used directly to obtain the missing initial conditions r(c, a). Alternately, a partial differential equation can be obtained from (11). Expanding the right-hand side of (11) by Taylor's series, we obtain
Or(c, a) r(c + f(c, r(c, a), a)A, a + A)= r(c, a)+ f(c, r(c, a), a ) A Oc
+ a c3r(c' a)+o(a). da
(12)
In the limit as a tends to zero, the following first-order quasilinear partial differential equation is obtained from (11) and (12):
f(c, r(c, a), a) Or(c, a~_~t Or(c,~a, g(c, r(c, a), a).
dc Oa
From (3) and (4), it can be seen that
(13)
r(c, tf) = 0.
(14)
Thus, the missing initial conditions r(c, a) for the family of processes, with the starting values of the independent variable a from zero to t~, can be obtained by solving the systems (13) and (14). The above approach can be used effectively to solve unstable boundary value problems. Furthermore, this approach combined with other techniques such as quasilinearization, spline, and nonlinear summability can be used to solve reasonably large dimensional problems.
8. Dynamic programming and the numerical solution of partial differential equations

The dynamic programming functional equations can also be used for solving complicated boundary value problems in partial differential equations. This approach forms a powerful technique for solving linear elliptic and parabolic
Vol. 17, 1978
17
partial ditterential equations over regular and irregular regions., Nonlinear equations can also be solved by the combined use of quasilinearization, spline, nonlinear summability and dynamic programming. The interested reader can consult the book by Bellman and Angel [10].
9. Discussion
In this short survey article, we have tried to illustrate the various applications of the functional equations of dynamic programming, starting from simple optimization problems to calculus of variational problems to stochastic problems and, finally, to the numerical solution of differential equations. No attempt was made to cover all the applications of the functional equations. Indeed, it would be impossible even to cover only the important applications. Furthermore, new applications are still being discovered. The reader can refer to some of the books listed at the end of this paper. Some of the other applications which are not discussed are identification and nonlinear filtering, communication theory, multistage games and simulation, inverse problems, feedback control, transportation problems, and so on. Our purpose is to illustrate the typical applications of the functional equation and to select a representative process in each typical application. This seems to be the only way to show the versatility of dynamic programming, and at the same time, to obtain a reasonable shor[ survey article.
REFERENCES [1] BELLMAN,R., Dynamic programming. Princeton University Press, Princeton, N.J., 1957. [2] BELLMAN,R., Adaptive control processes. A guided tour. Princeton University Press, Princeton, N.J., 1961. [3] BELLMAN, R. and DREVFtrS, S., Applied dynamic programming. Princeton University Press, Princeton, N.J., 1962. [4] ARROW,K, J., KARLIN,S. and Sc~av, H., Studies in the mathematical theory of inventory and production. Stanford University Press, Stanford, 1958. [5] MINE,H. and OsA~I, S., Markovian decision-processes. American Elsevier, New York, 1970. [6] HOWARD,R. A., Dynamic prograh~ming and Markov processes. Wiley, 1960. [7] BELLMAN,R., Introduction to the mathematical theory of control processes, Volumes I and II, 1967, 197l. AcademicPress, New York. [8] DREYFUS,S., Dynamic programming and the calculus of variations. Academic Press, New York, 1965. [9] Ares, R., Discrete dynamic programming. Blaisdell, New York, 1964. [10] BELLMAN, R. and ANGf~L., E., Dynamic" programming and partial differential equations. Academic Press, New York, 1972. [11] LEE, E. S., Ouasilinearization and invariant imbedding. Academic Press, New York, 1968.
18
R. BELLMAN AND E. S. LEE
AEQ. MATH.
[12] BELLMAN, R. and WING, G. M., An introduction to invariant imbedding. Wiley, New York, 1975. [13] WING, G. M., An introduction to transport theory. Wiley, New York. 1962. [14] BELLMAN,R. and ZADEH, L. A., Decision-making in a fuzzy environment, Management Sciences 17, (1970), B-141-B164. [15] BELLMAN,R., A note on cluster analysis and dynamic programming. Mathematical Biosciences 19 (1973), 311-312. ~16] CHANG, S. S. L., Fuzzy dynamic programming and decision-making process. In Proceedings of the 3rd Princeton Conference on Information Sciences, 1969, pp. 200-203. [17] ZADEH, L. A., Fuzzy sets. Information and Control, 8 (1965), 338-353. [18] ZADEH, L. A., Fuzzy algorithms. Information and Control i2 (1968), 94-102. University of Southern California, Los Angeles, California 90007 U.S.A. and Kansas State University, Manhattan, Kansas 66502 U.S.A.

Bellman, Lee - Functional Equations in Dynamic Programming

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Bellman, Lee - Functional Equations in Dynamic Programming

Caricato da

Copyright:

Formati disponibili

Aequationes Mathematicae 17 (1978) 1-18 University of Waterloo

Birkh~iuser Verlag, Basel

Expository papers Functional equations in dynamic programming

g~(x~, y~)= the return from the i-th activity

fN(x, y)= max [gav(XN, YZV)+fN--I(X--XN,Y-- YN)]

Vol. 17, 1978

Functional equations in dynamicprogramming

3. Optimal routing, inventory and scheduling

gtk(Xk--Xk-1) = the cost incurred at the k-th stage if xk~ xk-l.

C(xl, x 2 , . . . , xN)= }-'. [k(Xk --rk)+qtk(Xk -- X~_a)].

fr(c) = min C~,

VoL 17, 1978

Functional equations in dynamic programming

fN(x) = the expected cost of an N stage process, starting with a stock

to bring the level

p(s - y)~b(s) ds.

+ fn-l(O) [~4)(s)ds+ i" f,-t(y-s)4~(s) ds],

maximize the integral

g(c, tt) = max J(z) = max Iot' h(x, z) dt

Vol. 17, 1978

Functional equations in dynamic p r o g r a m m i n g

g(c, tf) = max

h(x, z) dt = max max

f(x, z) dt, t f - A = max

Substituting (7) into (5), we have

R. BELLMAN AND E. S. LEE

The terms under the integrals may be approximated by

~a h(x, z) dt = h(c, z(O))A [a f(x, z) dt = f(c,

z(0))A, t~- A)].

Using Taylor's series, we obtain

Equation (10) becomes

Og(c'tr)=max[h(c, y)+f(c, y) g(c' tO] Ott y Oc

Vol. 17, 1978

where y = z(0). The initial condition is g(c, O) = O. (15)

5. Markov decision processes

The state of the system can be represented by the transformation equation

with the initial condition

Vol. 17, 1978

Functional equations in dynamic programming

The functional equation of dynamic programming is (4)

x,+a = f(x,, u~),

t=0,1,2 ..... N-l,

Uo, - - - , UN--2 ~N--I

A. /xtc-2(ui,,,-z) A tZN-I(UN-,)/X I ~ N ( f ( X N _ a , UN-1)).

Vol. 17, 1978

Functional equations in dynamicprogramming

Consequently, (4) may be rewritten as

t~D(uo ~ ..... uN_0=

~ 6 N - 1 (Xn_l) = Max (t~N 1(Uu 1) A t ~ N ( f ( x N

t ~ 6 N - u(xN_~) = Max (tz(uu_,~)A t ~ c N - u + 1 (XN v +0)

7. Invariant imbedding and nonlinear boundary value problems

with boundary conditions

Vol. 17, 1978

Obviously y(a) = r(c, a) (4)

x'(a) = f(x(a), y(a), a)=f(c, r(c, a), a)

y(a +a) = r(c, a)+ g(c, r(c, a), a ) a + O(a).

x(a + A) = x(a) + x'(a)d + O(Zl ) = c +f(c, r(c, a), a)a + O(zl).

y(a +A)= r(c +f(c, r(c, a), a)a + O(a), a + a ) .

Equating Eqs. (7) and (10), we obtain the desired relation

8. Dynamic programming and the numerical solution of partial differential equations

Vol. 17, 1978

Functional equations in dynamicprogramming

R. BELLMAN AND E. S. LEE