Sei sulla pagina 1di 10

DynamicProgramming

DYNAMIC
PROGRAMMING

Itisausefulmathematicaltechniqueformakinga
sequenceofinterrelateddecisions.
Systematicprocedurefordeterminingtheoptimal
combinationofdecisions.
Thereisnostandardmathematicalformulationof
theDynamicProgrammingproblem.
Knowingwhentoapplydynamicprogramming
dependslargelyonexperiencewithitsgeneral
structure.
JooMigueldaCostaSousa/AlexandraMoutinho

Prototypeexample

203

Costs
Costcij ofgoingfromstatei tostatej is:

Stagecoachproblem
FortuneseekerwantstogofromMissouri(A)to
California(J)inthemid19thcentury.
Journeyhas4stages.
Costisthelifeinsuranceofaspecificroute;lowest
costisequivalenttosafesttrip.
JooMigueldaCostaSousa/AlexandraMoutinho

204

Solvingtheproblem

Problem:whichrouteminimizesthetotalcostofthe
policy?
JooMigueldaCostaSousa/AlexandraMoutinho

205

Formulation

Notethatgreedyapproachdoesnotwork.

Decisionvariablesxn (n =1,2,3,4)arethe
immediatedestinationofstagen.

SolutionA B F I J hastotalcostof13.
However,e.g.A D F ischeaperthanA B F.

RouteisA x1 x2 x3 x4,wherex4 =J.

Otherpossibility:trialanderror.Toomucheffort
evenforthissimpleproblem.
f thi i l bl
Dynamicprogramming ismuchmoreefficientthan
exhaustiveenumeration,especiallyforlarge
problems.
Startsfromthelaststageoftheproblem,and
enlargesitonestageatatime.

Totalcostofthebestoverallpolicy fortheremaining
stages isffn(s,xn)

JooMigueldaCostaSousa/AlexandraMoutinho

JooMigueldaCostaSousa/AlexandraMoutinho

206

Actualstateiss,readytostartstagen,selectingxn asthe
immediatedestination.

xn* minimizesfn(s,xn)andfn*(s,xn)istheminimum
valueoffn(s,xn):
fn* (s) = min fn (s , x n ) = fn (s , x n* )
xn

207

Formulation

Solutionprocedure

where
fn (s , x n ) = immediatecost(stagen)+minimumfuturecost(stagesn+1onward)
= c sxn + fn*+1 ( x n )

Whenn =4,therouteisdeterminedbyitscurrent
states (H orI)anditsfinaldestinationJ.
Sincef4*(s)=f4*(s,J)=csJ,thesolutionforn =4:

Valueofcsxn givenbycij wherei =s(currentstate)and


j =xn (immediatedestination).
Objective:findf1*(A)andthecorrespondingroute.

f4*(s)

x4*

Dynamicprogrammingfindssuccessivelyf4*(s),f3*(s),f2*(s)
andfinallyf1*(A).
JooMigueldaCostaSousa/AlexandraMoutinho

208

JooMigueldaCostaSousa/AlexandraMoutinho

Stagen =3

Stagen =3

Needsafewcalculations.IffortuneseekerisinstateF,
hecangotoeitherH orI withcostscF,H =6orcF,I =3.
ChoosingH,theminimumadditionalcostisf4*(H)=3.
Totalcostis6+3=9.
ChoosingI thetotalcostis3+4=7
ChoosingI,thetotalcostis3+4
7.
Thisissmaller,anditis
theoptimalchoicefor
stateF.

JooMigueldaCostaSousa/AlexandraMoutinho

210

Similarcalculationscanbemadeforthetwopossible
statess =E ands =G,resultinginthetableforn =3:
s

x3

f3(s,x3)=csx3 +f4*(x3)
H

f3*(s)

x3*

JooMigueldaCostaSousa/AlexandraMoutinho

Stagen =2

211

Stagen =2

Inthiscase,f2*(s,x2)=csx2 +f3*(x2).
ExamplefornodeC:

Similarcalculationscanbemadeforthetwopossible
statess =B ands =D,resultinginthetableforn =2:

x2 =E: f2*(C,E)=cC,E +f3*(E)=3+4=7 optimal


x2 =F: f2*(C,F)=cC,F +f3*(F)=2+7=9.
x2 =G: f2*(C,G)=c
(C G)=cC,G +f3*(G)=4+6=10.
(G)=4+6=10

x2

212

f2((s,x
, 2))=csx2 +ff3*((x2))
E

f2*(s)

x2*

11

11

12

11

Eor F

10

11

Eor F

JooMigueldaCostaSousa/AlexandraMoutinho

209

JooMigueldaCostaSousa/AlexandraMoutinho

213

Stagen =1

Optimalsolution
Threeoptimalsolutions,allwithf1*(A)=11:

Justonepossiblestartingstate:A.
x1 =B:
x1 =C:
x1 =D:

f2*(A,B)=cA,B +f2*(B)=2+11=13.
f2*(A,C)=cA,C +f2*(C)=4+7=11 optimal
f2*(A,D)=cA,D +f2*(D)=3+8=11 optimal

Resultsinthetable:
s

x1
A

f1 (s,x1)=csx1 +f2*(x1)
B

f1*(s)

x1*

13

11

11

11

Cor D

JooMigueldaCostaSousa/AlexandraMoutinho

214

CharacteristicsofDP

3. Policydecisiontransformsthecurrentstatetoastate
associatedwiththebeginningofthenextstage.

Example:4stagesandlifeinsurancepolicytochoose.
Dynamicprogrammingproblemsrequiremakinga
sequenceofinterrelateddecisions.

2. Eachstagehasanumberofstates associatedwith
thebeginningofeachstage.
Example:statesarethepossibleterritorieswherethe
fortuneseekercouldbelocated.
Statesarepossibleconditions inwhichthesystemmight
be.
216

CharacteristicsofDP

Example:fortuneseekersdecisionledhimfromhis
currentstatetothenextstateonhisjourney.
DPproblemscanbeinterpretedintermsofnetworks:
eachnode correspondtoastate.
Valueassignedtoeachlinkistheimmediatecontribution
totheobjectivefunctionfrommakingthatpolicy
decision.
Inmostcases,objectivecorrespondstofindingthe
shortest orthelongestpath.
JooMigueldaCostaSousa/AlexandraMoutinho

217

CharacteristicsofDP

4. Thesolutionprocedurefindsanoptimalpolicy for
theoverallproblem.Findsaprescriptionofthe
optimalpolicydecisionateachstageforeach ofthe
possiblestates.
Example:solutionprocedureconstructedatableforeach
stage,n, thatprescribedtheoptimaldecision,xn*,for
eachpossiblestates.
Inadditiontoidentifyingoptimalsolutions,DPprovidesa
policyprescriptionofwhattodoundereverypossible
circumstance(whyadecisioniscalledpolicy decision).
Thisisusefulforsensitivityanalysis.
JooMigueldaCostaSousa/AlexandraMoutinho

215

CharacteristicsofDP

1. Theproblemcanbedividedintostages,witha
policydecision requiredateachstage.

JooMigueldaCostaSousa/AlexandraMoutinho

JooMigueldaCostaSousa/AlexandraMoutinho

218

5. Giventhecurrentstate,anoptimalpolicyforthe
remainingstages isindependent ofthepolicy
decisionsadoptedinpreviousstages.
Optimalimmediatedecisiondependsonlyoncurrent
stateandnotonhowitwasobtained:thisistheprinciple
ofoptimality
f
i li forDP.
f
Example:atanystate,theinsurancepolicyis
independentonhowthefortuneseekergotthere.
Knowledgeofthecurrentstateconveysallinformation
necessaryfordeterminingtheoptimalpolicyhenceforth
(Markovian property).Problemslackingthispropertyare
notDynamicProgrammingProblems.
JooMigueldaCostaSousa/AlexandraMoutinho

219

CharacteristicsofDP

CharacteristicsofDP

6. Solutionprocedurebeginsbyfindingtheoptimal
policyforthelaststage.Solutionisusuallytrivial.
7. Arecursiverelationship thatidentifiesoptimal
policyforstagen,givenoptimalpolicyforstage
g
g
g
n+1,isavailable.
Example:recursiverelationshipwas
xn

Recursiverelationshipdifferssomewhatamong
dynamicprogrammingproblems.

N = numberofstages.
n = labelforcurrentstage(n = 1,2, , N ).
sn = currentstate forstagen.
x n = decisionvariableforstage
g n.
x n* = optimalvalueofx n (givensn ).

aremadethereafter.
fn* (sn ) = fn (sn , x n* )
220

CharacteristicsofDP

JooMigueldaCostaSousa/AlexandraMoutinho

221

CharacteristicsofDP

7. (cont.)Recursiverelationship:

fn* (sn ) = max { fn (sn , x n )} or fn* (sn ) = min{ fn (sn , x n )}


xn

(cont.)Notation:

fn ( sn , x n ) = contributionofstagesn, n + 1, , N toobjective
functionifsystemstartsinstatesn atstagen,
immediatedecisionisx n ,andoptimaldecisions

fn* ( s) = min{c sxn + fn*+1 ( x n )}

JooMigueldaCostaSousa/AlexandraMoutinho

7.

xn

wherefn(sn,xn)iswrittenintermsofsn,xn ,
fn*+1 (sn+1 ) ,and
probablysomemeasureoftheimmediatecontributionofxn
totheobjectivefunction.

8. Usingrecursiverelationship,solutionprocedure
startsattheendandmovesbackward stageby
stage.

8. (cont.)ForDPproblems,atablesuchasthe
followingwouldbeobtainedforeachstage(n =N,
N1,,1):

sn

xn

fn(sn, xn)
fn* (sn )

x n*

Stopswhenoptimalpolicystartingatinitial stageisfound.
Theoptimalpolicyfortheentireproblemisfound.
Example:thetablesforthestagesshowthisprocedure.
JooMigueldaCostaSousa/AlexandraMoutinho

222

Deterministicdynamicprogramming
Deterministicproblems:thestate atthenextstage is
completelydetermined bythestate andpolicydecision
atthecurrentstage.

JooMigueldaCostaSousa/AlexandraMoutinho

223

Example:distributingmedicalteams
TheWorldHealthCouncilhasfivemedicalteamstoallocateto
threeunderdevelopedcountries.
Measureofperformance:additionalpersonyearsoflife,i.e.,
increasedlifeexpectancy (inyears)timescountryspopulation.
Thousandsofadditionalpersonyearsoflife
Country

Formoftheobjectivefunction: minimize ormaximize the


sum,product,etc.ofthecontributionsfromtheindividual
stages.
Setofstates:maybediscrete orcontinuous,orastate
vector.Decisionvariablescanalsobediscreteorcontinuous.
JooMigueldaCostaSousa/AlexandraMoutinho

224

Medicalteams

0
50

45

20

70

45

70

90

75

80

105

110

100

120

150

130

JooMigueldaCostaSousa/AlexandraMoutinho

225

Formulationoftheproblem

Statestobeconsidered

Problemrequiresthreeinterrelateddecisions:how
manyteamstoallocatetothethreecountries
(stages).
xn isthenumberofteamstoallocatetostagen.
g
Whatarethestates?Whatchangesfromone
stagetoanother?
sn =numberofmedicalteamsstillavailablefor
remainingcountries(n,,3).
Thus:s1 =5,s2 =5 x1 =s1 x1,s3 =s2 x2.
JooMigueldaCostaSousa/AlexandraMoutinho

226

Overallproblem

Country
Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130

JooMigueldaCostaSousa

227

Policy

pi(xi):measureofperformance fromallocatingxi
medicalteamstocountryi.

Recursiverelationship relatingfunctions:

fn* (sn ) = max

x n =0,1,,sn

Maximize

Thousandsofadditional
personyearsoflife

p ( x ),
i =1

i =1

*
n+1

(sn x n )} , forn = 1,2

f3* ( s3 ) = max p3 ( x 3 )

xn =0,1,,s3

subjectto

{p ( x ) + f

= 5,

andx i arenonnegativeintegers.
JooMigueldaCostaSousa/AlexandraMoutinho

228

Solutionprocedure,stagen =3

y
Country
Medical
teams

45

20

50

70

45

n =3:

s3

f3*(s3)

x3*

Here,findingx2* requirescalculating f2(s2,x2)forthe


valuesofx2 =0,1,,s2.Examplefors2 =2:
Thousandsofadditional
personyearsoflife

50

70

80

45

20

50

70

70

45

70

90

75

80

100

105

110

100

130

120

150

130

Country
Medical
teams

JooMigueldaCostaSousa/AlexandraMoutinho

229

Stagen =2

Forlaststagen =3,valuesofp3(x3)arethelast
columnoftable.Here,x3* =s3 andf3*(s3) =p3(s3).
Thousandsofadditional
personyearsoflife

JooMigueldaCostaSousa/AlexandraMoutinho

230

45
50

90

75

80

105

110

100

120

150

130

JooMigueldaCostaSousa/AlexandraMoutinho

20

State: 2
0

1
70

231

Stagen =2

Stagen =1

n=2:

s2

x2

Onlystateisthestarting
state s1 =5:
0

50

20

70

70

45

80

90

95

100

100

115

125

110

130

120

125

145

160

75
150

f2*(s2)

x 2*

State: 5

50

70

0or1

95

125

160

JooMigueldaCostaSousa/AlexandraMoutinho

45
0

s1

125

45

20

50

70

45

70

90

75

80

160

f1(s1,x1)=p1(x1)+f2

x1
5

Optimalpolicydecision

n=1:

232

Country
Medical
teams

120

f2(s2,x2)=p2(x2)+f3*(s2 x2)
0

Thousandsofadditional
personyearsoflife

...

Similarcalculationscanbemadefortheothervalues
ofs2:

105

110

100

120

150

130

*(s

x1)

f1*(s1)

x1*

160

170

165

160

155

120

170

JooMigueldaCostaSousa/AlexandraMoutinho

233

Distributionofeffortproblem
Onekindofresource isallocatedtoanumberof
activities.Objective:howtodistributetheeffort
(resource)amongtheactivitiesmosteffectively.
DPinvolvesonlyone(orfew)resources,whileLPcan
dealwiththousandsofresources.
TheassumptionsofLP:proportionality,divisibility and
certainty canbeviolatedbyDP.Onlyadditivity (or
analogousforproduct ofterms)isnecessarybecause
oftheprincipleofoptimality.
WorldHealthCouncilproblemviolatesproportionalityand
divisibility(WHY?)

JooMigueldaCostaSousa

234

Formulationofdistributionofeffort
Stagen = activityn (n = 1,2, , N ).
Statesn = amountofresourcestillavailablefor
g
allocationtoremainingactivities(
n, , N ).

Whensystemstartsatstagen instatesn,choicexn
resultsinthenextstateatstagen +1beingsn+1 =sn xn :
Stage:n

JooMigueldaCostaSousa/AlexandraMoutinho

xn

235

Example
Distributingscientiststoresearchteams

x n = amountofresourceallocatedtoactivityn.

State:sn

JooMigueldaCostaSousa/AlexandraMoutinho

n +1
sn xn
236

3teamsaresolvingengineeringproblemtosafelyfly
peopletoMars.
2extrascientistsreducetheprobabilityoffailure.
Probabilityoffailure
Team
Newscientists

0.40

0.60

0.80

0.20

0.40

0.50

0.15

0.20

0.30

JooMigueldaCostaSousa/AlexandraMoutinho

237

Continuousdynamicprogramming

Example:schedulingjobs

Previousexampleshadadiscrete statevariablesn,at
eachstage.
Theyallhavebeenreversible;thesolutionprocedure
couldhavemovedbackward orforward stageby
stage.
t
Nextexampleiscontinuous.Assn cantakeanyvalues
incertainintervals,thesolutionsfn*(sn)andxn* must
beexpressedasfunctions ofsn.
Stagesinthenextexamplewillcorrespondtotime
periods,sothesolutionmust proceedbackwards.
JooMigueldaCostaSousa/AlexandraMoutinho

238

ThecompanyLocalJobShopneedstoschedule
employmentjobsduetoseasonalfluctuations.
Machineoperatorsaredifficulttohireandcostlytotrain.
Peakseasonpayrollshouldnotbemaintainedafterwards.
Overtimeworkonaregularbasisshouldbeavoided.
Overtimeworkonaregularbasisshouldbeavoided

Minimumrequirementsinnearfuture:
Season

Spring

Summer

Autumn

Winter

Spring

Requirements

255

220

240

200

255

JooMigueldaCostaSousa/AlexandraMoutinho

Example:schedulingjobs

239

Formulation

Employmentabovelevelinthetablecosts$2,000per
personperseason.
Totalcostofchanginglevelofemploymentfromone
seasontotheotheris$200timesthesquareofthe
differenceinemploymentlevels.
Fractionallevelsarepossibleduetoparttime
employees.

Fromdata,maximumemploymentshouldbe255
(spring).Itisnecessarytofindthelevelof
employmentforotherseasons.Seasons arestages.
Onecycleoffourseasons,wherestage1issummer
g
andstage4isspring(knownemployment).
xn =employmentlevelforstagen (n =1,2,3,4);x4=255
rn =minimumemploymentrequirementforstagen:
r1=220,r2=240,r3=200,r4=255.Thus:
rn xn 255

JooMigueldaCostaSousa/AlexandraMoutinho

JooMigueldaCostaSousa/AlexandraMoutinho

240

Formulation

Data

Costforstagen=200(xn xn1)2 +2000(xn rn)


Statesn:employmentintheprecedingseasonxn1
sn =xn1
(n=1:s1 =x0 =x4 =255)
Problem:

Choosex1,x2 andx3 asto


4

minimize

2000( x x
i =1

i 1

)2 + 200( x i ri ) ,

subjecttori x i 255, fori = 1,2,3,4


JooMigueldaCostaSousa/AlexandraMoutinho

241

242

Choosex1,x2 andx3 asto


4

minimize

2000( x x
i

i =1

i 1

)2 + 200( x i ri ) ,

subjecttori x i 255, fori = 1,2,3,4

rn

Feasiblexn

Possiblesn =xn1

Cost

220

220 x1 255

s1 = 255

200(x1 255)2 + 2000(x1 220)

240

240 x2 255

220 s2 255

200(x2 x1)2 + 2000(x2 240)

200

200 x3 255

240 s3 255

200(x3 x2)2 + 2000(x3 200)

255

x4 = 255

200 s4 255

200(255 x3)2

JooMigueldaCostaSousa/AlexandraMoutinho

243

Formulation

Solutionprocedure

Recursiverelationship:
f (sn ) = min {200( x n sn ) + 2000( x n rn ) + f
*
n

*
n+1

rn x n 255

( x n )}

Basicstructureoftheproblem:
p

rn

Feasiblexn

Possiblesn =xn1

Cost

220

220 x1 255

s1 = 255

200(x1 255)2 + 2000(x1 220)

240

240 x2 255

220 s2 255

200(x2 x1)2 + 2000(x2 240)

200

200 x3 255

240 s3 255

200(x3 x2)2 + 2000(x3 200)

255

x4 = 255

200 s4 255

200(255 x3)2

Stage4:thesolutionisknowntobex4* =255.
s4
f4*(s4)
200 s4 255 200(255 s4)2
JooMigueldaCostaSousa/AlexandraMoutinho

244

x4*
255

JooMigueldaCostaSousa/AlexandraMoutinho

245

Graphicalsolutionforf3*(x3)

Solutionprocedure
n

rn

Feasiblexn

Possiblesn =xn1

Cost

220

220 x1 255

s1 = 255

200(x1 255)2 + 2000(x1 220)

240

240 x2 255

220 s2 255

200(x2 x1)2 + 2000(x2 240)

200

200 x3 255

240 s3 255

200(x3 x2)2 + 2000(x3 200)

255

x4 = 255

200 s4 255

200(255 x3)2

Stage3:240 s3 255:
f3* (s3 ) = min

200 x 3 255

= min

200 x 3 255

{200( x
{200( x

s3 )2 + 2000( x 3 200) + f4* ( x 3 )}

s3 )2 + 2000( x 3 200) + 200(255 x 3 )2 }

JooMigueldaCostaSousa/AlexandraMoutinho

246

Calculussolutionforf3*(x3)

Solvedinasimilarfashion,with

f3 ( s3 , x 3 ) = 400( x 3 s3 ) + 2000 400(255 x 3 )


x 3

f2 ( s2 , x2 ) = 200( x 2 s2 )2 + 2000( x2 r2 ) + f3* ( x 3 )


= 200( x 2 s2 )2 + 2000( x2 240)

= 400(2 x 3 s3 250) = 0
x 3* =

240 s3 255

247

Stage2

Usingcalculus:

s3

JooMigueldaCostaSousa/AlexandraMoutinho

s3 + 250
2

f3*(s3)
50(250 s3)2+50(260 s3)2
+1000(s3150)

JooMigueldaCostaSousa/AlexandraMoutinho

+50(250 x2 )2 + 50(260 x2 )2 + 1000( x 2 150)

Guaranteesminimum?
x3*

(s3+250)/2

248

for220 s2 255(possiblevalues)and240 x2
255(feasiblevalues).
Solving/x2[f2(s2,x2)]=0,yields: x2 =
JooMigueldaCostaSousa/AlexandraMoutinho

2s2 + 240
3
249

Stage2

Stage2andStage1

Thesolutionhastobefeasiblefor220 s2 255(i.e.,
240 x2 255for220 s2 255)!
x 2* =

2s2 + 240
onlyfeasiblefor240 s2 255.
3

240 s2 255

Needtosolveforfeasiblevalueofx
N dt l f f ibl l f 2 thatminimizes
th t i i i
f2(s2, x2)when220 s2 240.
Fors2

240,

f2 (s2 , x2 ) > 0for240 x2 255


x2

sox2*= 240.

Stage1:procedureissimilar.

x2*

240
(2s2+240)/3

s1

f1*(s1)

x1*

255

185000

247.5

How?

250

Deterministiccontinuousproblem

JooMigueldaCostaSousa/AlexandraMoutinho

251

Probabilisticdynamicprogramming

Considerthefollowingnonlinearprogramming
problem:

MaximizeZ x12x2
2,
subjecttox12 x2 2.
(Therearenononnegativity constraints.)

Stateatnextstageisnot completelydeterminedby
stateandpolicydecisionatcurrentstage.
Thereisaprobabilitydistribution fordeterminingthe
nextstate,seefigure.
S = numberofpossiblestatesatstagen +1.
systemgoestoi (i =1,2,,S)withprobabilitypi givenstate
sn anddecisionxn atstagen.
Ci = contributionofstagen toobjectivefunction.

Iffigure isexpandedtoallpossiblestatesand
decisionsatallstages,itisadecisiontree.

Usedynamicprogrammingtosolvethisproblem.
JooMigueldaCostaSousa/AlexandraMoutinho

f2*(s2)
200(240s2) 2+115000
200/9[(240 s2)2+(255 s2)2
(270 s2)2 ]+2000(s2195)

Solution:
x1* = 247.5,x2* = 245,x3* = 247.5,x4* = 255
Totalcostof$185,000

Why?

JooMigueldaCostaSousa/AlexandraMoutinho

s2
220 s2 240

252

Basicstructure

JooMigueldaCostaSousa/AlexandraMoutinho

253

Probabilisticdynamicprogramming
Relationbetweenfn(sn,xn)andf*n+1 (sn+1)depends
uponformofoverallobjectivefunction.
Example:minimize theexpectedsum ofthe
g
contributionsfromindividualstages.
fn(sn,xn)istheminimumexpectedsumfromstagen
onward,givenstatesn andpolicydecisionxn atstagen:
S

fn (sn , x n ) = pi Ci + fn*+1 (i )
i =1

with

fn*+1 (i ) = min fn+1 (i , x n+1 )


x n+1

JooMigueldaCostaSousa/AlexandraMoutinho

254

JooMigueldaCostaSousa/AlexandraMoutinho

255

Formulation

Example:determiningrejectallowances
TheHitandMissManufacturingCompanyreceived
anordertosupply1itemofaparticulartype.
Customerrequiresspecifiedstringentquality
requirements.
Manufacturerhastoproducemorethanonetoachieveone
acceptable.Numberofextra itemsistherejectallowance.
Probabilityofacceptable ordefective is.
NumberofacceptableitemsinalotofsizeL hasabinomial
distribution:probabilityofnotacceptableitemsis(1/2)L.
Setupcost= $300,costperitem= $100.Maximum
productionruns= 3.Costofnoacceptableitemafter3runs
= $1,600.
JooMigueldaCostaSousa/AlexandraMoutinho

256

Formulation

Objective:determinepolicyregardinglotsize
(1+rejectallowance)forrequiredproductionrun(s)
thatminimizestotalexpectedcost.
Stagen = productionrunn (n =1,2,3),
xn = lotsizeforstagen,
Statesn = numberofacceptableitemsstillneeded(1
or0)atthebeginningofstagen.
Atstage1,states1 = 1.
JooMigueldaCostaSousa/AlexandraMoutinho

257

Basicstructureoftheproblem

fn(sn,xn)= totalexpectedcostforstagesn,,3and
optimaldecisionsare:
fn* (sn ) = min fn (sn , x n )
x n =0,1,
fn*(0)= 0.

Monetaryunitis$100.Contributiontocost
M
t iti
C t ib ti t t from
f

stagen is[K(xn)+xn],with
Recursiverelationship:

ifx n = 0
ifx n > 0

0,
K (xn ) =
3,

fn* (1) = min

x n =0,1,2,

JooMigueldaCostaSousa/AlexandraMoutinho

258

Solutionprocedure

n = 2:

n = 1:

s3 x3

f3(1, x3) = K(x3) + x3 +


0

16

s2 x2
0

s1 x1
1

f3*(s3)
0

12

8.5

3 or 4

f2*((s2)

x 2*

1
8

+ 0.5 xn fn*+1 (1)}

2
7

x 3*

3
7

4
7.5

2 or 3

f1(1, x1) = K(x1) + x1 +(1/2)x1 f2*(1)


0

f1*(s1)

x 1*

7.5

6.75

6.875

7.44

6.75

Optimalsolution?
JooMigueldaCostaSousa/AlexandraMoutinho

JooMigueldaCostaSousa/AlexandraMoutinho

259

Probabilisticproblem
(1/2)x316

f2(1, x2) = K(x2) + x2 +(1/2)x2 f3*(1)


0

forn = 1,2,3

Notethatf4*(1)=16.

n = 3:

{K ( x ) + x

260

Anenterprisingyoungstatisticianbelievesthatshehasdevelopeda
systemforwinningapopularLasVegasgame.Hercolleaguesdo
notbelievethathersystemworks,sotheyhavemadealargebet
withherthatifshestartswiththreechips,shewillnothaveatleast
fivechipsafterthreeplaysofthegame.Eachplayofthegame
involvesbettinganydesirednumberofavailablechipsandthen
eitherwinningorlosingthisnumberofchips Thestatistician
eitherwinningorlosingthisnumberofchips.Thestatistician
believesthathersystemwillgiveheraprobabilityof2/3ofwinning
agivenplayofthegame.
Assumingthestatisticianiscorrect,usedynamicprogrammingto
determineheroptimalpolicyregardinghowmanychipstobet(if
any)ateachofthethreeplaysofthegame.Thedecisionateach
playshouldtakeintoaccounttheresultsofearlierplays.The
objectiveistomaximizetheprobabilityofwinningherbetwithher
colleagues.
JooMigueldaCostaSousa/AlexandraMoutinho

261

10

Potrebbero piacerti anche