Sei sulla pagina 1di 23

10-1

CHAPTER 10
Comparing Alternative System
Configurations


10.1 Introduction ..............................................................................................2
10.2 Confidence Intervals for the Difference Between the Expected Responses of
Two Systems.....................................................................................................6
10.2.1 A Paired-t Confidence Interval..........................................................................................7
10.2.2 A Modified Two-Sample-t Confidence Interval.................................................................9
10.2.3 Contrasting the Two Methods .........................................................................................10
10.2.4 Comparisons Based on Steady-State Measures of Performance.......................................11
10.3 Confidence Intervals for Comparing More than Two Systems ...................12
10.3.1 Comparisons with a Standard..........................................................................................13
10.3.2 All Pairwise Comparisons................................................................................................14
10.3.3 Multiple Comparisons with the Best.................................................................................15
10.4 Ranking and Selection..............................................................................16
10.4.1 Selecting the Best of k Systems .......................................................................................17
10.4.2 Selecting a Subset of Size m Containing the Best of k Systems .........................................20
10.4.3 Selecting the m Best of k Systems ...................................................................................21
10.4.4 Additional Problems and Methods...................................................................................23

10-2
10.1 Introduction

Many (probably most) simulation projects involve more than one system or
configuration:
Change the number of machines in some workcenters
Alternative job-dispatch policies (FIFO, SPT, etc.)
Alternative reorder-point, order quantities in inventory model

Possible Statistical Goals
With k = 2 alternatives:
Test H
0
:
1
=
2
, or maybe H
0
:
1
>
2

Confidence interval for
1

2

With k 2 alternatives:
Test H
0
:
1
=
2
=
3
=
...
=
k
(ANOVA)
Simultaneous confidence intervals for various combinations of
j
1

j
2

Pick the best of the k alternatives
Pick a subset of size m < k that contains the best alternative
Pick the m best (unranked) of the alternatives
Other kinds of selection and ranking goals

Given: Which alternative systems/configurations are of interest
(Deciding this is part of formal experimental design)

Most statistical procedures assume the ability to collect IID observations from each
alternative that are unbiased for the desired performance measures
Terminating: No problem just make replications
Steady-state:
Replication/deletion
Batch means
Algorithms for regenerative, standardized-time-series methods
10-3
Motivation

Dont ever just make one run of each alternative and eyeball

Compare:

M/M/1 queue with arrival rate 1 and one fast server with mean service time
0.9 minutes (alternative 1)

M/M/2 queue with arrival rate 1 and two slow servers with mean service
time 1.8 minutes each (alternative 2)

Single queue in each case:



Performance measure:
i
= expected average delay in queue for alternative i of
first 100 customers, with empty-and-idle initial conditions

The truth:
1
= 4.13 > 3.70 =
2
, so system 2 is better
10-4
Study:

One run of each model

Get average delay X
i
from alternative i

Pick the system with the smallest X
i




P(pick system 1 (wrong answer)) 0.52

Difficulty: Randomness in output
10-5
Idea:

Replicate each alternative some number (n) times

X
ij
= average delay from jth replication of alternative i

Base decision on n X n X
n
j
ij i

1
) ( , average of replications for alternative i

Results:
n P(wrong answer)
1
5
10
20
0.52
0.43
0.38
0.34

Dot plots of 100 experiments for each value of n:


Clearly, need methods to assess the uncertainty and give statistical bounds or
guarantees for conclusions and decisions
10-6
10.2 Confidence Intervals for the Difference
Between the Expected Responses of Two
Systems


Two alternative simulated systems (i = 1, 2),
i
= expected performance measure
from system i

Take sample of n
i
observations (replications, batches, ...) from system i

X
ij
= observation j from system i

Want: confidence interval on =
1

2


If interval misses 0, conclude there is a statistical difference between the systems

Which way? Cant say (would need one-sided interval)

Is the difference practically significant? Must use judgment in context.

Confidence interval better than hypothesis test

If a difference exists, the interval measures its magnitude, while a test does not

Two slightly different methods for making the intervals, designing the runs:

Paired t
Two-sample t

10-7
10.2.1 A Paired-t Confidence Interval


Assume n
1
= n
2
(= n, say)

For a fixed j, X
1j
and X
2j
need not be independent
Very important variance-reduction via common random numbers

Let Z
j
= X
1j
X
2j
, so E(Z
j
) =

Reduce to one-sample problem using the Z
j
s

Get sample mean ) (n Z , sample variance ) (
2
n S
Z
of the Z
j
s

Confidence interval:
n
n S
t n Z
Z
n
) (
) (
2 / 1 , 1
t

10-8
Example:

Two different inventory systems, output performance measure is average total
cost per month, n
i
= 5 replications of each system, used separate
(independent) sampling across the systems



) 5 ( Z = 4.98, ) 5 (
2
Z
S = 2.44, 90% confidence interval is [1.65, 8.31]
Confidence interval misses 0, so wed reject H
0
:
1
=
2
at level = 0.10

Robustness (coverage):

Above assumes normality of the Z
j
s
Since the Z
j
s are themselves differences, not unreasonable assumption
Alternative nonparametric methods (dont assume any distribution)

Possibility:

Sequential sampling to reduce width of this interval until it misses 0
Get statistical significance, can then judge practical significance
Since observations are being paired, this forces the same number of
observations from each of the two systems, which might not be the most
efficient allocation of computing resources

10-9
10.2.2 A Modified Two-Sample-t Confidence Interval

Can allow n
1
n
2
(e.g., sample more from the noisier alternative)

For a fixed j, X
1j
and X
2j
must be independent
Eliminates common random numbers (big drawback)

Get sample means, sample variances from each system separately (no pairing)

Confidence interval:
2 2
2
2 1 1
2
1
2 / 1 ,
2 2 1 1
) ( ) ( ) ( ) ( n n S n n S t n X n X
f
+ t



Uses estimated degrees of freedom
[ ]
[ ] ( ) [ ] ( ) 1 ) ( 1 ) (
) ( ) (

2
2
2 2
2
2 1
2
1 1
2
1
2
2 2
2
2 1 1
2
1
+
+

n n n S n n n S
n n S n n S
f (due to Welch, 1938)

This is an old problem: comparing two systems with unequal and unknown
variances, called the Behrens-Fisher problem

Inventory example: Since sampling was independent across the two systems, can
use two-sample-t as well
Same point estimator, ) 5 (
2
1
S = 4.00, ) 5 (
2
2
S = 3.76, f

= 7.99, interpolating in t-
table get t
7.99,0.95
= 1.860 for 90% confidence interval [2.66, 7.30]
Misses zero, so we conclude again that there is a statistically significant (at the
0.10 level) difference between the means

Procedure can also be used for model validation: system 1 is the real-world
system with observed data, system 2 is our simulation model

Sequential procedures exists to control confidence-interval half-width, which
increase n
1
and n
2
differently depending on the different variances of the two
systems, and minimize the final total sample size n
1
+ n
2


10-10
10.2.3 Contrasting the Two Methods


Neither dominates in terms of coverage or smallness

In inventory example, with the same (independent-across-models) data,
Paired t: [1.65, 8.31]
Two-sample t: [2.66, 7.30]

Choice usually made depending on the situation

Good Bad
Paired t Can use common random
numbers
Simple
Lose half the degrees of
freedom
Requires n
1
= n
2

Two-sample t Keep all the degrees of
freedom
Can have n
1
n
2

Cannot use common random
numbers
A bit complicated

With most simulation software, it usually takes explicit action to ensure that
different random numbers are used to simulate the different alternatives

The default is that the same random-number stream(s) are re-used, starting from
the same place

10-11
10.2.4 Comparisons Based on Steady-State Measures of
Performance


Ingredient required for the above procedures, and for most statistical procedures:
Observations that are IID and unbiased for the parameter(s) of interest
Not terribly non-normal (have distributions that are at least reasonably
symmetric)

With terminating simulations, this is easy
Observations are summary output performance measures across IID
replications

With steady-state simulations, though, must attempt to manufacture such IID
unbiased observations; two alternatives:

Replication-deletion approach
Identify warmup periods for the alternatives (could be of different warmup
lengths for different systems)
Then replicate
Example in text

Batch means
If warmup period is long
Make one long run of each system, identify sufficiently long batch lengths
(could be different batch sizes for different systems)
Use batch means as the basic observations

10-12
10.3 Confidence Intervals for Comparing
More than Two Systems

Have k 2 systems,
i
= expected performance measure from system i

Take sample of n
i
observations (replications, batches, ...) from system i

X
ij
= observation j from system i

Want: Confidence intervals on selected differences
j
1

j
2


Making c > 1 intervals, so for overall confidence level 1 , must make each
interval at level 1 /c (Bonferroni inequality)

If c is large:
Individual confidence intervals can become quite wide for fixed sample sizes,
or
Need large sample sizes to meet confidence-interval-smallness criteria in
sequential-sampling procedures

In general, could use paired-t or two-sample-t methods to make the individual
intervals, depending on the situation

Could do sequential sampling to sharpen comparisons

Possibility of apparent contradictions
Looks like
1
=
2
and
2
=
3

But
1
and
3
look significantly different from each other

10-13
10.3.1 Comparisons with a Standard

One of the systems (say system 1) is the standard (maybe the existing
configuration)
Compare all other systems to the standard
Want intervals on
2

1
,
3

1
, ...,
k

1

Number of confidence intervals is c = k 1
Depending on the situation, could use either the paired-t or two-sample-t
approaches to make the confidence intervals on each pair of means

Example:
Five different inventory systems, output performance measure is average total
cost per month, n
i
= 5 replications of each system, used separate
(independent) sampling across the five systems

Overall confidence level = 0.90 ( = 0.10), number of systems is c = k 1 = 4,
so individual 97.5% confidence intervals on
i

1
for i = 2, 3, 4, 5 (*
denotes significant difference):

The two methods may lead to different conclusions (system 2), and neither
dominates in terms of confidence-interval smallness

As before, consider validity of intervals, possibility of common random numbers,
and using replication-deletion or batch means for steady-state simulations
10-14
10.3.2 All Pairwise Comparisons

Want confidence intervals on
2 1
i i
for all i
1
and i
2
between 1 and k, with i
1
< i
2
:

2

1

3

1

...

k

1

3

2

...

k

2


.
.
.

k

k1

Number of confidence intervals is c = k (k 1)/2

Above five-alternative inventory example:

Two approaches may disagree on significance of difference
Neither approach dominates in terms of precision
Have apparent contradiction with Welch two-sample-t intervals:

1
=
3
and
2
=
3
, but
1

2

Such apparent contradictions are less likely with smaller intervals

As before, consider validity of intervals, possibility of common random numbers,
and using replication-deletion or batch means for steady-state simulations

10-15
10.3.3 Multiple Comparisons with the Best


Compare the mean of each alternative with the mean of the best (smallest or largest,
depending on context) of all the other alternatives, even though we do not know
which of the others is really best

Called multiple comparisons with the best (MCB)

Assuming bigger is better, want k simultaneous confidence intervals on
l
i l
i

max for i = 1, 2, ..., k (if smaller is better, replace max with min)

Text contains references for procedures, including sequential methods, steady-state
analysis, and incorporating common random numbers

Usually yields confidence intervals that are smaller than those using Bonferroni
inequality

Closely related to selection goals discussed below

10-16
10.4 Ranking and Selection


More ambitious goal than comparison via confidence intervals

In general, want to select one (or several) systems as best or rank some or all of
the systems

Still assume that the possible system configurations are given were not (yet)
freely looking for good configurations

Must be very specific and clear about what the goals are

Looking at means or some other kind of performance measure?

How sure?

How much do we care about making small mistakes?

Statistical guarantees?

Terminating vs. steady-state?

Independent sampling vs. common random numbers?

Same setup, notation as before

10-17
10.4.1 Selecting the Best of k Systems

Want to select one of the k alternatives as the best (assume smaller is better)
Realize we cant be sure that selected system is the one with smallest
i

Specify correct-selection probability P* (like 0.90 or 0.95)
Also specify indifference zone d*:
If best mean and next-best mean differ by more than d*, we really do want to
select the best one
We dont care if the selected system is not the one with smallest mean, but has
mean that is no worse than d* worse than the best

Procedure (two-stage sampling):
Get n
0
observations from each system
Compute sample means ) (
0
) 1 (
n X
i
, sample variances ) (
0
2
n S
i
from each system i
separately
Compute the final sample size N
i
from each system i as
N
i
= max{n
0
+ 1,
2
0
2 2
1
* ) ( d n S h
i
1} (Values of h
1
are tabled in text)
Make N
i
n
0
more observations on each system, compute the second-stage
means ) (
0
) 2 (
n N X
i i

Compute weights
1
1
]
1

,
_


+
) (
* ) (
1 1 1
0
2 2
1
2
0
0
0
1
n S h
d n N
n
N
N
n
W
i
i i
i
i
, W
i2
= 1 W
i1

Compute weighted sample means ) ( ) ( ) (
~
0
) 2 (
2 0
) 1 (
1
n N X W n X W N X
i i i i i i i
+
and pick the system with the smallest ) (
~
i i
N X
(If bigger is better, pick the system with the biggest ) (
~
i i
N X )

Issues:
Assumes normal observations (appears robust to violating this)
Requires independent sampling (no common random numbers)
Picking P* big or d* small can lead to a lot of simulating
10-18
Example:

Earlier k = 5 inventory models, simulated independently across alternatives

Output = expected average total cost per month, so smaller is better

Set P* = 0.90, d* = 1

Took n
0
= 20 initial replications of each system

From Table 10.11, h
1
= 2.747



Conclude that configuration 2 is the best (smallest mean)

Note that the procedure calls for larger samples for those systems with larger
first-stage variance estimates

10-19
Validity of This Procedure (Sketch)

Weights W
i1
and W
i2
were derived so that
1
*
) (
~
h d
N X
T
i i i
i

, i = 1, 2, ..., k will be
IID t with n
0
1 d.f.
Suppose that the alternatives are numbered such that alternative i
l
has the l
th
smallest
mean, so
k
i i i
L
2 1
and we want to select a system with mean
1
i

assuming that *
1 2
d
i i
(which we do assume)
Then correct selection (CS) will occur if and only if ) (
~
1 1
i i
N X is the smallest of the
) (
~
i i
N X s
Let f and F respectively be the density and CDF of the t distribution with n
0
1 d.f.
Then
( )
) ( ) (
*
..., , 3 , 2 ,
*
..., , 3 , 2 ,
* *
) (
~
*
) (
~
..., , 3 , 2 ), (
~
) (
~
) (
1
1
1
1
1 1 1 1
1 1
2 1
1
1 1 1
t T on condition dt t f t
h d
F
k l
h d
T T P
k l
h d h d
N X
h d
N X
P
k l N X N X P CS P
i
k
l
i i
i i
i i
i i i i i i i i
i i i i
l
l
l
l l l l
l l

,
_

,
_

,
_

<

<





Since *
1 2
d
i i
, we know that *
1
d
i i
l
for l = 2, 3, ..., k. Since F is
monotone increasing, we can replace the numerator of the fraction in its
argument by d
*
and get [ ]


+ dt t f h t F CS P
k
) ( ) ( ) (
1
(*)
Note that equality holds in (*) if
k
i i i i
d + L
3 2 1
* , called the least
favorable configuration of the means
The procedure is made valid by setting the right-hand side of (*) to P
*
and solving
numerically for h
1
, the values for which are tabled as a function of P
*
and k

10-20
10.4.2 Selecting a Subset of Size mContaining the Best of k
Systems

Select a subset (unordered) of specified size m (< k) that contains the best system
Initial screening of obviously inferior alternatives

Use similar indifference-zone idea
If the difference between the best and second-best system is at least d*, we want
a procedure such that with probability at least P*, the selected subset will
contain a system with the best response
If the difference between the best and second-best system is less than d*, then
with probability at least P*, the subset selected will contain a system that is
no worse than d* worse than the best system

Uses similar two-stage sampling
Take a first stage of n
0
independent replications, compute sample means and
sample variances
Compute final required sample sizes from each system by a formula similar to
the one used earlier, except use h
2
(Table 10.12) rather than h
1
(Table 10.11)
Make required additional replications, compute second-stage sample means,
weights for the two stages, and final weighted sample means
Select the systems with the m best (smallest or largest) values of the final
weighted sample means

Example with the five inventory systems (k = 5, m = 3, P* = 0.90, d* = 1, n
0
= 20)
From Table 10.12, h
2
= 1.243 (smaller than earlier h
1
= 2.747); results:

Selected subset consists of configurations 1, 2, and 3 (unordered)
Note that sample sizes required are smaller for this more modest goal

10-21
10.4.3 Selecting the mBest of k Systems

Select a subset (unordered) of specified size m (< k) with the m best configurations
Provide several good options in case the best is unacceptable for other
reasons
Not saying that the configurations in the selected subset are ranked or ordered in
any way
Correct Selection means the unordered set of m selected system has means
that are the same as those of the unordered set of the m best systems

Use similar indifference-zone idea
If the difference between the mth best and (m + 1)st best system is at least d*,
we want a procedure such that with probability at least P*, the means of the
selected subset are equal to the means of the m best configurations
If the difference between the mth best and (m + 1)st best system is less than d*,
then with probability at least P*, the means of the subset selected will be no
worse than d* worse than the mth best system

Uses similar two-stage sampling
Take a first stage of n
0
independent replications, compute sample means and
sample variances
Compute final required sample sizes from each system by a formula similar to
the one used earlier, except use h
3
(Table 10.13) rather than h
1
(Table 10.11)
Make required additional replications, compute second-stage sample means,
weights for the two stages, and final weighted sample means
Select the systems with the m best (smallest or largest) values of the final
weighted sample means
10-22
Example with the five inventory systems (k = 5, m = 3, P* = 0.90, d* = 1, n
0
= 20)

From Table 10.13, h
3
= 3.016 (larger than earlier h
1
= 2.747); results:



Selected subset consists of configurations 1, 2, and 3 (unordered)

10-23
10.4.4 Additional Problems and Methods

(See text for references on the topics mentioned below)

Select a subset of unspecified size
Perhaps set maximum size of subset (restricted subset selection)
Can select fewer if there are obviously inferior onesefficient screening
Since size of selected subset is random, indifference zone can be set to 0

Sequential (instead of two-stage) sampling for improved efficiency
Protects against poor first-stage variance estimate (especially too-big)
Reduces required sample size by as much as 75%
Keep recalculating means, variances after each new sample more
computation work, but offset by decreased number of simulation replications

Criteria other than expectations
Sometimes expectations dont tell the real story:
Inventory system, two alternative policies, measure is profit
Policy A: profit =

'

999 . 0 0
001 . 0 1000
y probabilit with
y probabilit with
, so E(profit) = 1
Policy B: profit = 0.999 with probability 1, so E(profit) = 0.999
Thus, policy A is better from the standpoint of expectation
But policy B is better than policy A 999 times out of 1000 (B gives profit of
0.999 vs. As profit of 0)
Select the system most likely to result in good performance
Nonparametric multinomial procedures

Correlation between alternatives: valid use of common random numbers

Correlation within a model: one-run approaches for steady-state ranking/selection

Potrebbero piacerti anche