Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DYNAMIC
PROGRAMMING
Itisausefulmathematicaltechniqueformakinga
sequenceofinterrelateddecisions.
Systematicprocedurefordeterminingtheoptimal
combinationofdecisions.
Thereisnostandardmathematicalformulationof
theDynamicProgrammingproblem.
Knowingwhentoapplydynamicprogramming
dependslargelyonexperiencewithitsgeneral
structure.
JooMigueldaCostaSousa/AlexandraMoutinho
Prototypeexample
203
Costs
Costcij ofgoingfromstatei tostatej is:
Stagecoachproblem
FortuneseekerwantstogofromMissouri(A)to
California(J)inthemid19thcentury.
Journeyhas4stages.
Costisthelifeinsuranceofaspecificroute;lowest
costisequivalenttosafesttrip.
JooMigueldaCostaSousa/AlexandraMoutinho
204
Solvingtheproblem
Problem:whichrouteminimizesthetotalcostofthe
policy?
JooMigueldaCostaSousa/AlexandraMoutinho
205
Formulation
Notethatgreedyapproachdoesnotwork.
Decisionvariablesxn (n =1,2,3,4)arethe
immediatedestinationofstagen.
SolutionA B F I J hastotalcostof13.
However,e.g.A D F ischeaperthanA B F.
Otherpossibility:trialanderror.Toomucheffort
evenforthissimpleproblem.
f thi i l bl
Dynamicprogramming ismuchmoreefficientthan
exhaustiveenumeration,especiallyforlarge
problems.
Startsfromthelaststageoftheproblem,and
enlargesitonestageatatime.
Totalcostofthebestoverallpolicy fortheremaining
stages isffn(s,xn)
JooMigueldaCostaSousa/AlexandraMoutinho
JooMigueldaCostaSousa/AlexandraMoutinho
206
Actualstateiss,readytostartstagen,selectingxn asthe
immediatedestination.
xn* minimizesfn(s,xn)andfn*(s,xn)istheminimum
valueoffn(s,xn):
fn* (s) = min fn (s , x n ) = fn (s , x n* )
xn
207
Formulation
Solutionprocedure
where
fn (s , x n ) = immediatecost(stagen)+minimumfuturecost(stagesn+1onward)
= c sxn + fn*+1 ( x n )
Whenn =4,therouteisdeterminedbyitscurrent
states (H orI)anditsfinaldestinationJ.
Sincef4*(s)=f4*(s,J)=csJ,thesolutionforn =4:
f4*(s)
x4*
Dynamicprogrammingfindssuccessivelyf4*(s),f3*(s),f2*(s)
andfinallyf1*(A).
JooMigueldaCostaSousa/AlexandraMoutinho
208
JooMigueldaCostaSousa/AlexandraMoutinho
Stagen =3
Stagen =3
Needsafewcalculations.IffortuneseekerisinstateF,
hecangotoeitherH orI withcostscF,H =6orcF,I =3.
ChoosingH,theminimumadditionalcostisf4*(H)=3.
Totalcostis6+3=9.
ChoosingI thetotalcostis3+4=7
ChoosingI,thetotalcostis3+4
7.
Thisissmaller,anditis
theoptimalchoicefor
stateF.
JooMigueldaCostaSousa/AlexandraMoutinho
210
Similarcalculationscanbemadeforthetwopossible
statess =E ands =G,resultinginthetableforn =3:
s
x3
f3(s,x3)=csx3 +f4*(x3)
H
f3*(s)
x3*
JooMigueldaCostaSousa/AlexandraMoutinho
Stagen =2
211
Stagen =2
Inthiscase,f2*(s,x2)=csx2 +f3*(x2).
ExamplefornodeC:
Similarcalculationscanbemadeforthetwopossible
statess =B ands =D,resultinginthetableforn =2:
x2
212
f2((s,x
, 2))=csx2 +ff3*((x2))
E
f2*(s)
x2*
11
11
12
11
Eor F
10
11
Eor F
JooMigueldaCostaSousa/AlexandraMoutinho
209
JooMigueldaCostaSousa/AlexandraMoutinho
213
Stagen =1
Optimalsolution
Threeoptimalsolutions,allwithf1*(A)=11:
Justonepossiblestartingstate:A.
x1 =B:
x1 =C:
x1 =D:
f2*(A,B)=cA,B +f2*(B)=2+11=13.
f2*(A,C)=cA,C +f2*(C)=4+7=11 optimal
f2*(A,D)=cA,D +f2*(D)=3+8=11 optimal
Resultsinthetable:
s
x1
A
f1 (s,x1)=csx1 +f2*(x1)
B
f1*(s)
x1*
13
11
11
11
Cor D
JooMigueldaCostaSousa/AlexandraMoutinho
214
CharacteristicsofDP
3. Policydecisiontransformsthecurrentstatetoastate
associatedwiththebeginningofthenextstage.
Example:4stagesandlifeinsurancepolicytochoose.
Dynamicprogrammingproblemsrequiremakinga
sequenceofinterrelateddecisions.
2. Eachstagehasanumberofstates associatedwith
thebeginningofeachstage.
Example:statesarethepossibleterritorieswherethe
fortuneseekercouldbelocated.
Statesarepossibleconditions inwhichthesystemmight
be.
216
CharacteristicsofDP
Example:fortuneseekersdecisionledhimfromhis
currentstatetothenextstateonhisjourney.
DPproblemscanbeinterpretedintermsofnetworks:
eachnode correspondtoastate.
Valueassignedtoeachlinkistheimmediatecontribution
totheobjectivefunctionfrommakingthatpolicy
decision.
Inmostcases,objectivecorrespondstofindingthe
shortest orthelongestpath.
JooMigueldaCostaSousa/AlexandraMoutinho
217
CharacteristicsofDP
4. Thesolutionprocedurefindsanoptimalpolicy for
theoverallproblem.Findsaprescriptionofthe
optimalpolicydecisionateachstageforeach ofthe
possiblestates.
Example:solutionprocedureconstructedatableforeach
stage,n, thatprescribedtheoptimaldecision,xn*,for
eachpossiblestates.
Inadditiontoidentifyingoptimalsolutions,DPprovidesa
policyprescriptionofwhattodoundereverypossible
circumstance(whyadecisioniscalledpolicy decision).
Thisisusefulforsensitivityanalysis.
JooMigueldaCostaSousa/AlexandraMoutinho
215
CharacteristicsofDP
1. Theproblemcanbedividedintostages,witha
policydecision requiredateachstage.
JooMigueldaCostaSousa/AlexandraMoutinho
JooMigueldaCostaSousa/AlexandraMoutinho
218
5. Giventhecurrentstate,anoptimalpolicyforthe
remainingstages isindependent ofthepolicy
decisionsadoptedinpreviousstages.
Optimalimmediatedecisiondependsonlyoncurrent
stateandnotonhowitwasobtained:thisistheprinciple
ofoptimality
f
i li forDP.
f
Example:atanystate,theinsurancepolicyis
independentonhowthefortuneseekergotthere.
Knowledgeofthecurrentstateconveysallinformation
necessaryfordeterminingtheoptimalpolicyhenceforth
(Markovian property).Problemslackingthispropertyare
notDynamicProgrammingProblems.
JooMigueldaCostaSousa/AlexandraMoutinho
219
CharacteristicsofDP
CharacteristicsofDP
6. Solutionprocedurebeginsbyfindingtheoptimal
policyforthelaststage.Solutionisusuallytrivial.
7. Arecursiverelationship thatidentifiesoptimal
policyforstagen,givenoptimalpolicyforstage
g
g
g
n+1,isavailable.
Example:recursiverelationshipwas
xn
Recursiverelationshipdifferssomewhatamong
dynamicprogrammingproblems.
N = numberofstages.
n = labelforcurrentstage(n = 1,2, , N ).
sn = currentstate forstagen.
x n = decisionvariableforstage
g n.
x n* = optimalvalueofx n (givensn ).
aremadethereafter.
fn* (sn ) = fn (sn , x n* )
220
CharacteristicsofDP
JooMigueldaCostaSousa/AlexandraMoutinho
221
CharacteristicsofDP
7. (cont.)Recursiverelationship:
(cont.)Notation:
fn ( sn , x n ) = contributionofstagesn, n + 1, , N toobjective
functionifsystemstartsinstatesn atstagen,
immediatedecisionisx n ,andoptimaldecisions
JooMigueldaCostaSousa/AlexandraMoutinho
7.
xn
wherefn(sn,xn)iswrittenintermsofsn,xn ,
fn*+1 (sn+1 ) ,and
probablysomemeasureoftheimmediatecontributionofxn
totheobjectivefunction.
8. Usingrecursiverelationship,solutionprocedure
startsattheendandmovesbackward stageby
stage.
8. (cont.)ForDPproblems,atablesuchasthe
followingwouldbeobtainedforeachstage(n =N,
N1,,1):
sn
xn
fn(sn, xn)
fn* (sn )
x n*
Stopswhenoptimalpolicystartingatinitial stageisfound.
Theoptimalpolicyfortheentireproblemisfound.
Example:thetablesforthestagesshowthisprocedure.
JooMigueldaCostaSousa/AlexandraMoutinho
222
Deterministicdynamicprogramming
Deterministicproblems:thestate atthenextstage is
completelydetermined bythestate andpolicydecision
atthecurrentstage.
JooMigueldaCostaSousa/AlexandraMoutinho
223
Example:distributingmedicalteams
TheWorldHealthCouncilhasfivemedicalteamstoallocateto
threeunderdevelopedcountries.
Measureofperformance:additionalpersonyearsoflife,i.e.,
increasedlifeexpectancy (inyears)timescountryspopulation.
Thousandsofadditionalpersonyearsoflife
Country
224
Medicalteams
0
50
45
20
70
45
70
90
75
80
105
110
100
120
150
130
JooMigueldaCostaSousa/AlexandraMoutinho
225
Formulationoftheproblem
Statestobeconsidered
Problemrequiresthreeinterrelateddecisions:how
manyteamstoallocatetothethreecountries
(stages).
xn isthenumberofteamstoallocatetostagen.
g
Whatarethestates?Whatchangesfromone
stagetoanother?
sn =numberofmedicalteamsstillavailablefor
remainingcountries(n,,3).
Thus:s1 =5,s2 =5 x1 =s1 x1,s3 =s2 x2.
JooMigueldaCostaSousa/AlexandraMoutinho
226
Overallproblem
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
JooMigueldaCostaSousa
227
Policy
pi(xi):measureofperformance fromallocatingxi
medicalteamstocountryi.
Recursiverelationship relatingfunctions:
x n =0,1,,sn
Maximize
Thousandsofadditional
personyearsoflife
p ( x ),
i =1
i =1
*
n+1
f3* ( s3 ) = max p3 ( x 3 )
xn =0,1,,s3
subjectto
{p ( x ) + f
= 5,
andx i arenonnegativeintegers.
JooMigueldaCostaSousa/AlexandraMoutinho
228
Solutionprocedure,stagen =3
y
Country
Medical
teams
45
20
50
70
45
n =3:
s3
f3*(s3)
x3*
50
70
80
45
20
50
70
70
45
70
90
75
80
100
105
110
100
130
120
150
130
Country
Medical
teams
JooMigueldaCostaSousa/AlexandraMoutinho
229
Stagen =2
Forlaststagen =3,valuesofp3(x3)arethelast
columnoftable.Here,x3* =s3 andf3*(s3) =p3(s3).
Thousandsofadditional
personyearsoflife
JooMigueldaCostaSousa/AlexandraMoutinho
230
45
50
90
75
80
105
110
100
120
150
130
JooMigueldaCostaSousa/AlexandraMoutinho
20
State: 2
0
1
70
231
Stagen =2
Stagen =1
n=2:
s2
x2
Onlystateisthestarting
state s1 =5:
0
50
20
70
70
45
80
90
95
100
100
115
125
110
130
120
125
145
160
75
150
f2*(s2)
x 2*
State: 5
50
70
0or1
95
125
160
JooMigueldaCostaSousa/AlexandraMoutinho
45
0
s1
125
45
20
50
70
45
70
90
75
80
160
f1(s1,x1)=p1(x1)+f2
x1
5
Optimalpolicydecision
n=1:
232
Country
Medical
teams
120
f2(s2,x2)=p2(x2)+f3*(s2 x2)
0
Thousandsofadditional
personyearsoflife
...
Similarcalculationscanbemadefortheothervalues
ofs2:
105
110
100
120
150
130
*(s
x1)
f1*(s1)
x1*
160
170
165
160
155
120
170
JooMigueldaCostaSousa/AlexandraMoutinho
233
Distributionofeffortproblem
Onekindofresource isallocatedtoanumberof
activities.Objective:howtodistributetheeffort
(resource)amongtheactivitiesmosteffectively.
DPinvolvesonlyone(orfew)resources,whileLPcan
dealwiththousandsofresources.
TheassumptionsofLP:proportionality,divisibility and
certainty canbeviolatedbyDP.Onlyadditivity (or
analogousforproduct ofterms)isnecessarybecause
oftheprincipleofoptimality.
WorldHealthCouncilproblemviolatesproportionalityand
divisibility(WHY?)
JooMigueldaCostaSousa
234
Formulationofdistributionofeffort
Stagen = activityn (n = 1,2, , N ).
Statesn = amountofresourcestillavailablefor
g
allocationtoremainingactivities(
n, , N ).
Whensystemstartsatstagen instatesn,choicexn
resultsinthenextstateatstagen +1beingsn+1 =sn xn :
Stage:n
JooMigueldaCostaSousa/AlexandraMoutinho
xn
235
Example
Distributingscientiststoresearchteams
x n = amountofresourceallocatedtoactivityn.
State:sn
JooMigueldaCostaSousa/AlexandraMoutinho
n +1
sn xn
236
3teamsaresolvingengineeringproblemtosafelyfly
peopletoMars.
2extrascientistsreducetheprobabilityoffailure.
Probabilityoffailure
Team
Newscientists
0.40
0.60
0.80
0.20
0.40
0.50
0.15
0.20
0.30
JooMigueldaCostaSousa/AlexandraMoutinho
237
Continuousdynamicprogramming
Example:schedulingjobs
Previousexampleshadadiscrete statevariablesn,at
eachstage.
Theyallhavebeenreversible;thesolutionprocedure
couldhavemovedbackward orforward stageby
stage.
t
Nextexampleiscontinuous.Assn cantakeanyvalues
incertainintervals,thesolutionsfn*(sn)andxn* must
beexpressedasfunctions ofsn.
Stagesinthenextexamplewillcorrespondtotime
periods,sothesolutionmust proceedbackwards.
JooMigueldaCostaSousa/AlexandraMoutinho
238
ThecompanyLocalJobShopneedstoschedule
employmentjobsduetoseasonalfluctuations.
Machineoperatorsaredifficulttohireandcostlytotrain.
Peakseasonpayrollshouldnotbemaintainedafterwards.
Overtimeworkonaregularbasisshouldbeavoided.
Overtimeworkonaregularbasisshouldbeavoided
Minimumrequirementsinnearfuture:
Season
Spring
Summer
Autumn
Winter
Spring
Requirements
255
220
240
200
255
JooMigueldaCostaSousa/AlexandraMoutinho
Example:schedulingjobs
239
Formulation
Employmentabovelevelinthetablecosts$2,000per
personperseason.
Totalcostofchanginglevelofemploymentfromone
seasontotheotheris$200timesthesquareofthe
differenceinemploymentlevels.
Fractionallevelsarepossibleduetoparttime
employees.
Fromdata,maximumemploymentshouldbe255
(spring).Itisnecessarytofindthelevelof
employmentforotherseasons.Seasons arestages.
Onecycleoffourseasons,wherestage1issummer
g
andstage4isspring(knownemployment).
xn =employmentlevelforstagen (n =1,2,3,4);x4=255
rn =minimumemploymentrequirementforstagen:
r1=220,r2=240,r3=200,r4=255.Thus:
rn xn 255
JooMigueldaCostaSousa/AlexandraMoutinho
JooMigueldaCostaSousa/AlexandraMoutinho
240
Formulation
Data
minimize
2000( x x
i =1
i 1
)2 + 200( x i ri ) ,
241
242
minimize
2000( x x
i
i =1
i 1
)2 + 200( x i ri ) ,
rn
Feasiblexn
Possiblesn =xn1
Cost
220
220 x1 255
s1 = 255
240
240 x2 255
220 s2 255
200
200 x3 255
240 s3 255
255
x4 = 255
200 s4 255
200(255 x3)2
JooMigueldaCostaSousa/AlexandraMoutinho
243
Formulation
Solutionprocedure
Recursiverelationship:
f (sn ) = min {200( x n sn ) + 2000( x n rn ) + f
*
n
*
n+1
rn x n 255
( x n )}
Basicstructureoftheproblem:
p
rn
Feasiblexn
Possiblesn =xn1
Cost
220
220 x1 255
s1 = 255
240
240 x2 255
220 s2 255
200
200 x3 255
240 s3 255
255
x4 = 255
200 s4 255
200(255 x3)2
Stage4:thesolutionisknowntobex4* =255.
s4
f4*(s4)
200 s4 255 200(255 s4)2
JooMigueldaCostaSousa/AlexandraMoutinho
244
x4*
255
JooMigueldaCostaSousa/AlexandraMoutinho
245
Graphicalsolutionforf3*(x3)
Solutionprocedure
n
rn
Feasiblexn
Possiblesn =xn1
Cost
220
220 x1 255
s1 = 255
240
240 x2 255
220 s2 255
200
200 x3 255
240 s3 255
255
x4 = 255
200 s4 255
200(255 x3)2
Stage3:240 s3 255:
f3* (s3 ) = min
200 x 3 255
= min
200 x 3 255
{200( x
{200( x
JooMigueldaCostaSousa/AlexandraMoutinho
246
Calculussolutionforf3*(x3)
Solvedinasimilarfashion,with
= 400(2 x 3 s3 250) = 0
x 3* =
240 s3 255
247
Stage2
Usingcalculus:
s3
JooMigueldaCostaSousa/AlexandraMoutinho
s3 + 250
2
f3*(s3)
50(250 s3)2+50(260 s3)2
+1000(s3150)
JooMigueldaCostaSousa/AlexandraMoutinho
Guaranteesminimum?
x3*
(s3+250)/2
248
for220 s2 255(possiblevalues)and240 x2
255(feasiblevalues).
Solving/x2[f2(s2,x2)]=0,yields: x2 =
JooMigueldaCostaSousa/AlexandraMoutinho
2s2 + 240
3
249
Stage2
Stage2andStage1
Thesolutionhastobefeasiblefor220 s2 255(i.e.,
240 x2 255for220 s2 255)!
x 2* =
2s2 + 240
onlyfeasiblefor240 s2 255.
3
240 s2 255
Needtosolveforfeasiblevalueofx
N dt l f f ibl l f 2 thatminimizes
th t i i i
f2(s2, x2)when220 s2 240.
Fors2
240,
sox2*= 240.
Stage1:procedureissimilar.
x2*
240
(2s2+240)/3
s1
f1*(s1)
x1*
255
185000
247.5
How?
250
Deterministiccontinuousproblem
JooMigueldaCostaSousa/AlexandraMoutinho
251
Probabilisticdynamicprogramming
Considerthefollowingnonlinearprogramming
problem:
MaximizeZ x12x2
2,
subjecttox12 x2 2.
(Therearenononnegativity constraints.)
Stateatnextstageisnot completelydeterminedby
stateandpolicydecisionatcurrentstage.
Thereisaprobabilitydistribution fordeterminingthe
nextstate,seefigure.
S = numberofpossiblestatesatstagen +1.
systemgoestoi (i =1,2,,S)withprobabilitypi givenstate
sn anddecisionxn atstagen.
Ci = contributionofstagen toobjectivefunction.
Iffigure isexpandedtoallpossiblestatesand
decisionsatallstages,itisadecisiontree.
Usedynamicprogrammingtosolvethisproblem.
JooMigueldaCostaSousa/AlexandraMoutinho
f2*(s2)
200(240s2) 2+115000
200/9[(240 s2)2+(255 s2)2
(270 s2)2 ]+2000(s2195)
Solution:
x1* = 247.5,x2* = 245,x3* = 247.5,x4* = 255
Totalcostof$185,000
Why?
JooMigueldaCostaSousa/AlexandraMoutinho
s2
220 s2 240
252
Basicstructure
JooMigueldaCostaSousa/AlexandraMoutinho
253
Probabilisticdynamicprogramming
Relationbetweenfn(sn,xn)andf*n+1 (sn+1)depends
uponformofoverallobjectivefunction.
Example:minimize theexpectedsum ofthe
g
contributionsfromindividualstages.
fn(sn,xn)istheminimumexpectedsumfromstagen
onward,givenstatesn andpolicydecisionxn atstagen:
S
fn (sn , x n ) = pi Ci + fn*+1 (i )
i =1
with
JooMigueldaCostaSousa/AlexandraMoutinho
254
JooMigueldaCostaSousa/AlexandraMoutinho
255
Formulation
Example:determiningrejectallowances
TheHitandMissManufacturingCompanyreceived
anordertosupply1itemofaparticulartype.
Customerrequiresspecifiedstringentquality
requirements.
Manufacturerhastoproducemorethanonetoachieveone
acceptable.Numberofextra itemsistherejectallowance.
Probabilityofacceptable ordefective is.
NumberofacceptableitemsinalotofsizeL hasabinomial
distribution:probabilityofnotacceptableitemsis(1/2)L.
Setupcost= $300,costperitem= $100.Maximum
productionruns= 3.Costofnoacceptableitemafter3runs
= $1,600.
JooMigueldaCostaSousa/AlexandraMoutinho
256
Formulation
Objective:determinepolicyregardinglotsize
(1+rejectallowance)forrequiredproductionrun(s)
thatminimizestotalexpectedcost.
Stagen = productionrunn (n =1,2,3),
xn = lotsizeforstagen,
Statesn = numberofacceptableitemsstillneeded(1
or0)atthebeginningofstagen.
Atstage1,states1 = 1.
JooMigueldaCostaSousa/AlexandraMoutinho
257
Basicstructureoftheproblem
fn(sn,xn)= totalexpectedcostforstagesn,,3and
optimaldecisionsare:
fn* (sn ) = min fn (sn , x n )
x n =0,1,
fn*(0)= 0.
Monetaryunitis$100.Contributiontocost
M
t iti
C t ib ti t t from
f
stagen is[K(xn)+xn],with
Recursiverelationship:
ifx n = 0
ifx n > 0
0,
K (xn ) =
3,
x n =0,1,2,
JooMigueldaCostaSousa/AlexandraMoutinho
258
Solutionprocedure
n = 2:
n = 1:
s3 x3
16
s2 x2
0
s1 x1
1
f3*(s3)
0
12
8.5
3 or 4
f2*((s2)
x 2*
1
8
2
7
x 3*
3
7
4
7.5
2 or 3
f1*(s1)
x 1*
7.5
6.75
6.875
7.44
6.75
Optimalsolution?
JooMigueldaCostaSousa/AlexandraMoutinho
JooMigueldaCostaSousa/AlexandraMoutinho
259
Probabilisticproblem
(1/2)x316
forn = 1,2,3
Notethatf4*(1)=16.
n = 3:
{K ( x ) + x
260
Anenterprisingyoungstatisticianbelievesthatshehasdevelopeda
systemforwinningapopularLasVegasgame.Hercolleaguesdo
notbelievethathersystemworks,sotheyhavemadealargebet
withherthatifshestartswiththreechips,shewillnothaveatleast
fivechipsafterthreeplaysofthegame.Eachplayofthegame
involvesbettinganydesirednumberofavailablechipsandthen
eitherwinningorlosingthisnumberofchips Thestatistician
eitherwinningorlosingthisnumberofchips.Thestatistician
believesthathersystemwillgiveheraprobabilityof2/3ofwinning
agivenplayofthegame.
Assumingthestatisticianiscorrect,usedynamicprogrammingto
determineheroptimalpolicyregardinghowmanychipstobet(if
any)ateachofthethreeplaysofthegame.Thedecisionateach
playshouldtakeintoaccounttheresultsofearlierplays.The
objectiveistomaximizetheprobabilityofwinningherbetwithher
colleagues.
JooMigueldaCostaSousa/AlexandraMoutinho
261
10