Sei sulla pagina 1di 10

Personal Notes on Simulation Methodology

Azim Houshyar, Ph.D.


ANALYSIS OF SIMULATION DATA
Comparison and Evaluation of Alternative System Designs
Introduction
In most manufacturing processes management is faced with making decisions on
competing system designs or alternative operating policies. In those situations, the
management is asked to choose between different alternatives.
Realizing that making decisions without a detailed analysis of the suitability of each
alternative is not acceptable, you may want to use simulation to compare alternatives
before implementation. Remember that simulation is a powerful tool for answering
what-if questions.
In this section, we discuss statistical analyses of the output from several different
simulation models that might represent competing system designs or alternative
operating policies. This is a very important subject, since the real utility of simulation lies
in comparing such alternatives before implementation. Therefore, appropriate statistical
methods are essential, if we are to avoid making serious errors.
As an example assume that we want to compare an M/M/1 queue, in which customers
arrive with rate of 1 per minute and are served with mean of 0.9 minute, with a
comparable M/M/2, in which customers arrive with rate of 1 per minute, and are served
by one of the two servers with mean of 1.8 minutes. Even though queuing theory tells
us that the customers wait less in the second system, the one run of simulation will
mislead us most of the times.
There are some very useful statistical tools that can be used for this purpose. An
example of such tools is a confidence interval on the difference in two means.
As an example, assume that your company is considering the purchase of new
equipment that is believed to improve throughput. You are asked to run two simulations
to compare the performance of the existing system with that of the new system to see if
there is a significant difference. How would you respond to the management's inquiry?

Azim Houshyar, April 2015

Page 1

Comparison of Two System Designs


Forming a confidence interval for the difference in the two expectations is a better
comparison than a hypothesis test to see whether the observed difference is
significantly different from zero. It gives more information than just accepting or
rejecting the hypothesis. In most manufacturing processes the mean and standard
deviation of a process are unknown, and we have to estimate them using sample
observations.
Assume that you want to compare the difference in the performance of two processes.
Recognizing that the simulation output data are stochastic, comparing the two systems
on the basis of only one run is a complete negligence of the simulation methodology.
For i=1,2, let {Xi1, Xi2,, Xin} be a sample of ni IID observations from system i, and let
i=E(Xij) be the expected response of interest. We want to construct a confidence
interval for =1-2.
A Paired-t Confidence Interval: For simulating two different configurations with the
same number of replications, say n, we can pair the observations from X1 and X2 for the
j-th replication, and calculate Dj which represents the difference between the two
outputs for the j-th replication.
The Dj's are IID random variables, and the E(Dj)= . Therefore, the mean and variance
of the random variable D are:
D(n) = { Dj} / n
Var(D) = S2[D(n)] = { [Dj - D(n)] 2} / (n-1)
Var(D) = S2[D(n)] = { [Dj - D(n)] 2} / [n.(n-1)]
If the random variables Dj are Normally distributed, then the 100(1-)% confidence
interval for the difference in means 1-2 is:
D(n) t n-1,1-/2. S[D(n)] < 1-2 < D(n) + t n-1,1-/2. S[D(n)]
where t n-1,1-/2 is the upper (1-/2) critical point for a t-distribution with (n-1) degrees of
freedom. Even if the random variables Dj are not normally distributed, by increasing the
number of replications, we can approximate the Normal equation.
The importance of this approach is that there is no need for assuming that X1 and X2 be
independent, and certainly no need for the assumption of equality of their variances. In
fact, being able to positively correlate the two random variables will help reduce the
variance and improve the confidence interval. To positively correlate the two random
variables, we can choose the same streams of random variables for simulating both
configurations.

Azim Houshyar, April 2015

Page 2

Allowing positive correlation between X1j and X2j can be of great importance, since this
leads to a reduction of Var(Dj), and thus a smaller confidence interval. We will see that
the method of common random numbers can induce this positive correlation between
observations on the two systems - note that the Xijs are random variables defined over
an entire replication. For example, Xij might be the average of the 100 delays on the j-th
replication, and is not the delay of an individual customer.
A Modified Two-Sample-t Confidence Interval: This method does not pair up the
observations from the two systems, but does require that the X 1js be independent of
the X2js. However, n1 and n2 can now be different. Assume we make n 1 replications for
the first configuration and n2 replications for the second configuration. Then we have
two independent random variables:
a)

X1 with unknown mean 1 and unknown variance 1

b)

X2 with unknown mean 2 and unknown variance 22

If it is reasonable to assume that both variances are approximately equal, we can use
classical statistics to find a 100(1-)% confidence interval on the difference in means,
1-2.
Denote the mean and variance of the replications of the first system of size n 1 by X1
and S12; and the mean and variance of the replications of the second system of size n 2
by X2 and S22. Then, if it is reasonable to assume that 12 is approximately equal to 22,
then SD2 is a good estimate of the common variance 2.
SD2 = {(n1-1). S12+ (n2-1). S22}/(n1+ n2-2)
Moreover, the statistic T is t-distributed with (n1+ n2-2) degrees of freedom (dof).
T = {(X1 X2) - ( 1-2)}/{ SD.[1/ n1+1/ n2]}0.5
A 100(1-)% two-sided C.I. on the difference in means, 1-2, is:
(X1X2) tn1+n2-2,1-/2.SD.[1/n1+1/n2]}0.5 < 1-2 < {(X1X2) + tn1+n2-2,1-/2.SD.[1/ n1+1/ n2]}0.5
The choice of either the paired-t or the modified approach will usually be made
according to the situation. Note that the basic ingredient for most comparison
techniques is a sample of IID observations with expectation equal to the performance
measure on which the comparison is to be made. This is easily done for terminating
simulations, because such observations come naturally by simply replicating the
simulation some number of times.
But what if we wanted to compare two (or more) systems on the basis of a steady-state
measure of performance. Here we can no longer simply replicate the models, since
initialization effects may bias the output.

Azim Houshyar, April 2015

Page 3

There are means to solve this problem. For instance, if the warm-up period is long, we
might want to use batch means on each alternative system to obtain IID unbiased
observations, but to eliminate correlation between batches, we must take care to define
the batches appropriately.

Comparison of Several System Designs


The Bonferroni inequality implies that if we want to make some number, say c,
confidence interval statements, then we should make each separate interval at level 1/c, so that the overall confidence level associated with all intervals, covering their
targets will be at least 1-.
Although there are many goals for comparing k systems, we will focus on the following
two procedures:
1. Comparisons with a standard
2. All pairwise comparisons.
Comparisons with a standard: Suppose that one of the model variants is a standard,
perhaps representing the existing system or policy. If we call the standard system 1
and the other variants systems 2, 3, , k, the goal is to construct k-1 confidence
intervals for the k-1 differences, 2-1, 3-1, , k-1, with overall confidence level 1-.
We are making c=k-1 individual intervals, so they should each be constructed at a level
1-/(k-1). Note that the Bonferroni inequality is quite general, it doesnt matter how the
individual confidence intervals are formed, they need not result from the same number
of replications, nor must they be independent.
All pairwise comparisons: if we want to compare each system with any other system
to detect and quantify any significant pairwise differences, then one approach would be
to form confidence intervals for the differences i2-i1 for all i1 and i2 between 1 and k,
with i1<i2. We will have c = k.(k-1)/2 individual intervals, so each must be made at level
1-/c in order to have a confidence level of at least 1- for all the intervals together.

Azim Houshyar, April 2015

Page 4

Ranking and selection of one of the k systems as being the best one: Let Xij be the
random variable of interest from the j-th replication of the i-th system, and let i=E(Xij).
Assume that Xijs are all independent of each other, i.e., the replications for a given
alternative are independent, and the runs for different alternatives are also made
independently. For example, Xij could be the average total cost per month for the j-th
replication of policy i.
Let il be the i-th smallest of the is, so that i1 <i2 <<ik. Our goal is to select a
system with the smallest expected response, i1. Let CS denote this event of correct
selection. Note that if i1 and i2 are actually very close together, we might not care if we
erroneously choose system i2, so we want a method that avoids making a large number
of replications to resolve this unimportant difference.
The exact problem formulation is that we want P(CS)>P* provided that i2 -i1>d*, where
P* and d* are specified by the analyst. Consider a two stage sampling from each of the
k systems. In the first stage, we make a fixed number of replications of each system,
then use resulting variance estimates to determine how many more replications from
each system are necessary in the second stage of sampling -- in order to reach a
decision.
It must be assumed that the Xijs are normally distributed, but we dont have to assume
that the values of i2=Var(Xij) are known, nor do we have to assume that i2 are the
same for different is.
In the first-stage sampling, we make n0>2 replications of each of the k systems, and
define the first-stage sample means and variances as follows:
Xi(1)(n0) = { Xij} / n0
Si2(n0) = { [Xij - Xi(1)(n0)] 2} / (n0-1) where both summations are from j=1 to n0.
We, then make Ni-n0 more replications of system i (i=1, 2,,k) and obtain the secondstage sample means:
Xi(2)(Ni-n0) = { Xij}/(Ni-n0) where the summation is from j= n0+1 to Ni.
Finally, we define the weighted sample mean as:
Xi(Ni) = wi1. Xi(1)(n0)+ wi2. Xi(2)( Ni-n0)
The last step is to select the system with the smallest Xi(Ni). The values of total sample
size Ni needed for system i, and the weights wi1 and wi2 can be found in Law and Kelton,
page 597.

Azim Houshyar, April 2015

Page 5

Statistical Methods for Estimating the Effect of Design Alternatives


In a previous section we assumed that various configurations are externally designated
as the only feasible choices, and we ventured to determine the best configuration. In
the design of experiments, however, there is very limited instruction as to the model
specifications that may lead to optimal system performance. For instance, we may be
interested in determining which of possibly many parameters and assumptions have a
greater effect on the systems effectiveness.
Experimental design helps us decide (before the runs are made) which configurations
to simulate so that the desired knowledge are extracted with minimal work on data
collection and simulation runs. To learn more about systematic design of experiments
we need to be familiar with some additional statistical procedures.
Many standard experimental designs have little or no importance for simulation
experiments because they were developed for physical experiments in which the
experimenter lacks complete control over the experiment, or is unable to collect data for
certain conditions of factors.
In design of experiments, we want to design the strategic data collection procedure so
as to perform only the informative experiments. For this purpose, we need to define
some of the experimental-design terminology such as: factors, responses, and levels.
Factor

Input parameters and structural assumptions of a model that will be changed in the
course of simulation. Factors can be either quantitative, or qualitative. A
quantitative factor is one whose levels can be measured on a numerical scale,
while a qualitative factor represents structural assumptions that are not naturally
quantified. WIP, number of machines, mean inter-arrival time, reorder point, and
processing time are examples of quantitative factors, whereas queuing policy,
ordering policy, and maintenance policy are examples of qualitative factors.

Response Output performance measures or quantities to be measured. Throughput,


makespan, machine utilization, and delay in queue are some examples of
responses.
Levels

Different settings or values used for the factors. A combination of factors all at a
specified level is called a treatment.

Because simulation is conducted in a completely controlled environment, and in


particular, the simulation analyst controls the sources of random variation, it is possible
to replicate a simulation model under identical conditions.
There are several experimental designs used for estimating the effects of the factors,
including:

single-factor completely randomized experimental design

factorial design with two factors.

Azim Houshyar, April 2015

Page 6

In a model with only one factor, the design of experiments is simple: run n replications
at different levels of the factor, and use the concepts of the previous section to
determine if there is a significant difference between different levels of the factor under
consideration. Even if the number of factors is two, we have the technique for the
comparison of the means that can be used accordingly. But what if the number of
factors is three or more, then, we need to know more about Factorial design. In a
factorial design, we want to study the actual impact of these factors on the systems
responses.
When there are numerous factors of interest in an experiment, a factorial design should
be used. These are designs in which factors are varied together; that is, in each
replication of the experiment, all combinations of the levels of the factors are examined.
A factorial design is a strategic plan for gaining information about the impact of the
factors on the response. The design specifies how many runs of the simulation are to
be performed, and what level or value of each of the factors is to be used for each run.
So, the factorial design provides more than a way to compare pre-specified alternatives;
it also provides a strategy for determining which alternatives should be compared.
Single-Factor Completely Randomized Experimental Design (SFCR): When there is
only one factor having some number of levels, say k, the experiment is called a singlefactor experiment. The effect of level j of the factor is called j.
If the experiments of the model at each level and for different levels of the factor are
based on independent streams of random numbers, the design is called a completely
randomized design. Note that this condition implies that correlated sampling (use of
common random numbers) is not used across factors. Note also that the numbers of
replications at each level of factor do not have to be the same.
The statistical model for the analysis of the SFCR experimental design with k treatment
level is:
Yrj = +j + rj
r = 1, 2, , Rj and j = 1, 2, , k
where:
Yrj is observation r of the response variable for levels j of the factor.

is the overall mean effect.


j is the effect due to level j of the factor.
rj is a random error in observation r at level j, assumed to be N(0,2).
Rj is the number of observations made at level j.
K is the number of levels of the factor under study.

We will look at the model for which and j are assumed to be fixed and to satisfy j =
0 (the summation is for j going from 1 to k). This model is called the fixed effects model.
If the level of the factors j can not be fixed but instead are chosen at random from
some population that is assumed to be normally distributed, then the resulting model is
a random effects model.

Azim Houshyar, April 2015

Page 7

The initial analysis of a single-factor fixed-effects completely randomized experiment


consists of a statistical test of the hypothesis:
H0: j = 0
(j = 1, 2, , k)
That is, the factor has no statistically significant effect on the response variable. The
applicable statistical test is a one-way analysis of variance (ANOVA). The test consists
of computing an F-statistic and comparing its value to an appropriate critical value.
The layout used for ANOVA analysis is as follows:
Replication
r
1
2

Rj
Totals
Means

Level j of the single factor


2

j
k
Y12
Y1j
Y1k
Y22

Y2j
Y2k

1
Y11
Y21

YR1,1
T01
Y01

YR2,2
T02
Y02

YRj,j
T0j
Y0j

YRk,k
T0k
Y0k

T00
Y00

The variation of the response variable, Yij, about the overall sample mean, Y00, can be
written as:
Yrj - Y00 = (Y0j - Y00) + (Yrj - Y0j)
(Y0j-Y00) is due to variation of a treatment mean from the grand mean and (Y rj-Y0j)
represents the deviation of the response from the treatment mean at its level.
(Yrj - Y00 )2 = (Y0j - Y00)2 + (Yrj - Y0j)2
SS Total = SS Treat + SS Error
In the first term, j goes from 1 to k, and r goes from 1 to R j, in the second term, j
goes from 1 to k, and finally in the last term, j goes from 1 to k, and r goes from 1 to
Rj. Note:
1. If the assumption of a common variance is correct, then:
MSE =SSE/(R-k) is an unbiased estimate of variance 2 of the response variable
Y, that is: E(MSE) = 2.
2. If H0 is true, then:
MSTreart =SSTreat/(k-1) is an unbiased estimate of variance 2.
3.

In any case MSE and MSTreart are statistically independent when the data are
Normally distributed.

Azim Houshyar, April 2015

Page 8

When H0 is true, SSTreat/ and SSError/ have distribution with (k-1) and (R-k)
degrees of freedom. Therefore, the test statistic for testing the H 0 hypothesis is:
2

F = MSTreart / MSE = [SSTreat/(k-1)] / [SSError/(R-k)].


When H0 is true, this test statistic has an F-distribution with (k-1) and (R-k) degrees of
freedom. The ANOVA test of the hypothesis H0 is:
To reject H0 if:
F > F1-,(k-1),(R-k)
Fail to reject H0 if:
F < F1-,(k-1),(R-k)
Note that if the test indicates a statistically significant effect due to the factor, the
analyst may be interested in estimating 100(1-)% confidence interval for (+j) using:
Y0j + t /2. (R-k). [MSE/Rj]
Even though there are numerous software packages available for the ANOVA analysis,
the following table can be used for manual calculation and is common to almost all
software applications.
Source of
Variation
Treatment
Error
Total

Sum of
squares
SSTreat
SSE
SSTotal

d.o.f
k-1
R-k
R-1

Mean
squares
MSTreat
MSE

F
MSTreat /MSE

Factorial Designs with Two Factors: The statistical model for the analysis of the
factorial designs with two factors is:
Yijr = +Ai + Bj + ijr
Where:
Yijr is the observation of the response variable Y, for replication r of level i of the
first factor (A) and level j of the second factor (B).
To conduct an ANOVA test, assuming that there are a levels of factor A, b levels of
factor B, and k replications at each treatment level (for a total of R = abk replications),
then:
SSTotal = (Yijr - Y000 )2
SSA= b.k.(Yi00 - Y000 )

SSB= a.k.(Y0j0 - Y000 )2


SSAB= k.(Yij0 Yi00 Y0j0 +Y000 )2
SSE = SSTotal - SSA - SSB - SSAB

Azim Houshyar, April 2015

Page 9

In the first summation, i goes from 1 to a, j goes from 1 to b, and r goes from 1 to
k. In the second summation, i goes from 1 to a. In the third summation, j goes from
1 to b, and in the last summation, i goes from 1 to a, and j goes from 1 to b. The
layout used for manual calculation of the ANOVA is as follows:
Source of Sum of
Variation
squares
Factor A
SSA
Factor B
SSB
Factor AB
SSAB
Error
SSE
Total
SSTotal

d.o.f

Mean
F
squares
a-1
MSA = SSA/(a-1)
MSA/MSE
b-1
MSB = SSB/(a-1)
MSB /MSE
(a-1)(k-1) MSAB = SSAB/[(a-1)(k-1)] MSAB /MSE
ab(k-1)
MSE=SSE/[ab(k-1)]
abk-1

This layout allows three hypotheses to be tested:


H01: Ai = 0

for all i

H02: Bj = 0

for all j

H03: ABij = 0

for all I and j

Metamodeling: Suppose that there is a simulation output response variable, Y, that is


related to k independent variables, say X1, X2, , Xk. In most cases the functional
relationship is unknown, and the analyst must select an appropriate function containing
unknown parameters, and then estimate those parameters from a set of data (Y,X).
Regression Analysis is one such method for estimating the parameters.
As an example, suppose that it is desired to estimate the relationship between a single
independent variable X and a dependent variable Y, and suppose that the true
relationship between Y and X is linear.
E(Y | x) = 0 + 1 x
It is further assumed that each observation of Y can be described by the model:
Y = 0 + 1 x +
where is a random error with mean zero and constant variance 2. Suppose that there
are n pairs of observations (y1, x1), , (yn, xn). These observations may be used to
estimate 0 and 1. In the method of least squares, 0 and 1 are estimated such that
the sum of the squares of the deviations between the observations and the regression
line is minimized.
0 = Y = yi/n
1 = [yi.(xI X)]/ [(xI X)]
Testing for significance of regression is one of many hypothesis tests that can be
developed. Suppose the null hypothesis is H0 : 1 = 0, then the appropriate test statistic
for significance of regression is given by:
T0 = 1 /[MSE/SXX],

Azim Houshyar, April 2015

where MSE = (i)2/(n-2) and SXX = (Xi)2 [(Xi)2/n].

Page 10

Potrebbero piacerti anche