Sei sulla pagina 1di 10

Personal Notes on Simulation Methodology Azim Houshyar, Ph.D.


Comparison and Evaluation of Alternative System Designs


In most manufacturing processes management is faced with making decisions on competing system designs or alternative operating policies. In those situations, the management is asked to choose between different alternatives.

Realizing that making decisions without a detailed analysis of the suitability of each alternative is not acceptable, you may want to use simulation to compare alternatives before implementation. Remember that simulation is a powerful tool for answering what-if questions.

In this section, we discuss statistical analyses of the output from several different simulation models that might represent competing system designs or alternative operating policies. This is a very important subject, since the real utility of simulation lies in comparing such alternatives before implementation. Therefore, appropriate statistical methods are essential, if we are to avoid making serious errors.

As an example assume that we want to compare an M/M/1 queue, in which customers arrive with rate of 1 per minute and are served with mean of 0.9 minute, with a comparable M/M/2, in which customers arrive with rate of 1 per minute, and are served by one of the two servers with mean of 1.8 minutes. Even though queuing theory tells us that the customers wait less in the second system, the one run of simulation will mislead us most of the times.

There are some very useful statistical tools that can be used for this purpose. An example of such tools is a confidence interval on the difference in two means.

As an example, assume that your company is considering the purchase of new equipment that is believed to improve throughput. You are asked to run two simulations to compare the performance of the existing system with that of the new system to see if there is a significant difference. How would you respond to the management's inquiry?

Comparison of Two System Designs

Forming a confidence interval for the difference in the two expectations is a better comparison than a hypothesis test to see whether the observed difference is significantly different from zero. It gives more information than just accepting or rejecting the hypothesis. In most manufacturing processes the mean and standard deviation of a process are unknown, and we have to estimate them using sample observations.

Assume that you want to compare the difference in the performance of two processes. Recognizing that the simulation output data are stochastic, comparing the two systems on the basis of only one run is a complete negligence of the simulation methodology. For i=1,2, let {X i1 , X i2 ,…, X in } be a sample of n i IID observations from system i, and let i =E(X ij ) be the expected response of interest. We want to construct a confidence interval for =1 -2 .

A Paired-t Confidence Interval: For simulating two different configurations with the same number of replications, say n, we can pair the observations from X 1 and X 2 for the j-th replication, and calculate D j which represents the difference between the two outputs for the j-th replication.

The D j 's are IID random variables, and the E(D j )= . Therefore, the mean and variance of the random variable D are:

D(n) = {D j } / n

Var(D) = S 2 [D(n)] = {[D j - D(n)] 2 } / (n-1)

Var(D) = S 2 [D(n)] = {[D j - D(n)] 2 } / [n.(n-1)]

If the random variables D j are Normally distributed, then the 100(1-)% confidence interval for the difference in means 1 -2 is:

D(n) t n-1,1-/2 . S[D(n)] < 1 -2 < D(n) + t n-1,1-/2 . S[D(n)]

where t n-1,1-/2 is the upper (1-/2) critical point for a t-distribution with (n-1) degrees of freedom. Even if the random variables D j are not normally distributed, by increasing the number of replications, we can approximate the Normal equation.

The importance of this approach is that there is no need for assuming that X 1 and X 2 be independent, and certainly no need for the assumption of equality of their variances. In fact, being able to positively correlate the two random variables will help reduce the variance and improve the confidence interval. To positively correlate the two random variables, we can choose the same streams of random variables for simulating both configurations.

Allowing positive correlation between X 1j and X 2j can be of great importance, since this leads to a reduction of Var(D j ), and thus a smaller confidence interval. We will see that the method of common random numbers can induce this positive correlation between observations on the two systems - note that the X ij ’s are random variables defined over an entire replication. For example, X ij might be the average of the 100 delays on the j-th replication, and is not the delay of an individual customer.

A Modified Two-Sample-t Confidence Interval: This method does not pair up the observations from the two systems, but does require that the X 1j ’s be independent of the X 2j ’s. However, n 1 and n 2 can now be different. Assume we make n 1 replications for the first configuration and n 2 replications for the second configuration. Then we have two independent random variables:

a) X 1 with unknown mean 1 and unknown variance 1

b) X 2 with unknown mean 2 and unknown variance 2



If it is reasonable to assume that both variances are approximately equal, we can use

classical statistics to find a 100(1-)% confidence interval on the difference in means,

1 -2 .

Denote the mean and variance of the replications of the first system of size n 1

and S 1 2 ; and the mean and variance of the replications of the second system of size n 2 by X 2 and S 2 2 . Then, if it is reasonable to assume that 1 2 is approximately equal to 2 2 , then S D 2 is a good estimate of the common variance 2 .

by X 1

S D 2 = {(n 1 -1). S 1 2 + (n 2 -1). S 2 2 }/(n 1 + n 2 -2)

Moreover, the statistic T is t-distributed with (n 1 + n 2 -2) degrees of freedom (dof).

T = {(X 1 X 2 ) - ( 1 -2 )}/{ S D .[1/ n 1 +1/ n 2 ]} 0.5

A 100(1-)% two-sided C.I. on the difference in means, 1 -2 , is:

(X 1 X 2 )t n1+n2-2,1-/2 .S D .[1/n 1 +1/n 2 ]} 0.5 < 1 -2 < {(X 1 X 2 ) + t n1+n2-2,1-/2 .S D .[1/ n 1 +1/ n 2 ]} 0.5

The choice of either the paired-t or the modified approach will usually be made according to the situation. Note that the basic ingredient for most comparison techniques is a sample of IID observations with expectation equal to the performance measure on which the comparison is to be made. This is easily done for terminating simulations, because such observations come naturally by simply replicating the simulation some number of times.

But what if we wanted to compare two (or more) systems on the basis of a steady-state measure of performance. Here we can no longer simply replicate the models, since initialization effects may bias the output.

There are means to solve this problem. For instance, if the warm-up period is long, we might want to use batch means on each alternative system to obtain IID unbiased observations, but to eliminate correlation between batches, we must take care to define the batches appropriately.

Comparison of Several System Designs

The Bonferroni inequality implies that if we want to make some number, say c, confidence interval statements, then we should make each separate interval at level 1- /c, so that the overall confidence level associated with all intervals, covering their targets will be at least 1-.

Although there are many goals for comparing k systems, we will focus on the following two procedures:

1. Comparisons with a standard

2. All pairwise comparisons.

Comparisons with a standard: Suppose that one of the model variants is a standard, perhaps representing the existing system or policy. If we call the standard system 1 and the other variants systems 2, 3, …, k, the goal is to construct k-1 confidence intervals for the k-1 differences, 2 -1 , 3 -1 , …, k -1 , with overall confidence level 1-.

We are making c=k-1 individual intervals, so they should each be constructed at a level 1-/(k-1). Note that the Bonferroni inequality is quite general, it doesn’t matter how the individual confidence intervals are formed, they need not result from the same number of replications, nor must they be independent.

All pairwise comparisons: if we want to compare each system with any other system to detect and quantify any significant pairwise differences, then one approach would be to form confidence intervals for the differences i2 -i1 for all i 1 and i 2 between 1 and k, with i 1 <i 2 . We will have c = k.(k-1)/2 individual intervals, so each must be made at level 1-/c in order to have a confidence level of at least 1-for all the intervals together.

Ranking and selection of one of the k systems as being the best one: Let X ij be the random variable of interest from the j-th replication of the i-th system, and let i =E(X ij ). Assume that X ij ‘s are all independent of each other, i.e., the replications for a given alternative are independent, and the runs for different alternatives are also made independently. For example, X ij could be the average total cost per month for the j-th replication of policy i.

Let il be the i-th smallest of the i ‘s, so that i1 <i2 <<ik . Our goal is to select a system with the smallest expected response, i1 . Let “CS” denote this event of correct selection. Note that if i1 and i2 are actually very close together, we might not care if we erroneously choose system i 2 , so we want a method that avoids making a large number of replications to resolve this unimportant difference.

The exact problem formulation is that we want P(CS)>P* provided that i2 -i1 >d*, where P* and d* are specified by the analyst. Consider a two stage sampling from each of the

k systems. In the first stage, we make a fixed number of replications of each system, then use resulting variance estimates to determine how many more replications from each system are necessary in the second stage of sampling -- in order to reach a decision.

It must be assumed that the

that the values of i 2 =Var(X ij ) are known, nor do we have to assume that i 2 are the same for different i’s.

’s are normally distributed, but we don’t have to assume

X ij

In the first-stage sampling, we make n 0 >2 replications of each of the k systems, and define the first-stage sample means and variances as follows:

X i (1) (n 0 ) = {X ij } / n 0

S i 2 (n 0 ) = {[X ij - X i (1) (n 0 )] 2 } / (n 0 -1) where both summations are from j=1 to n 0 .

We, then make N i -n 0 more replications of system i (i=1, 2,…,k) and obtain the second- stage sample means:

X i (2) (N i -n 0 ) = {X ij }/(N i -n 0 )

where the summation is from j= n 0 +1 to N i .

Finally, we define the weighted sample mean as:

X i (N i ) = w i1 . X i (1) (n 0 )+ w i2 . X i (2) ( N i -n 0 )

The last step is to select the system with the smallest X i (N i ). The values of total sample size N i needed for system i, and the weights w i1 and w i2 can be found in Law and Kelton, page 597.

Statistical Methods for Estimating the Effect of Design Alternatives

In a previous section we assumed that various configurations are externally designated as the only feasible choices, and we ventured to determine the best configuration. In the design of experiments, however, there is very limited instruction as to the model specifications that may lead to optimal system performance. For instance, we may be interested in determining which of possibly many parameters and assumptions have a greater effect on the systems effectiveness.

Experimental design helps us decide (before the runs are made) which configurations to simulate so that the desired knowledge are extracted with minimal work on data collection and simulation runs. To learn more about systematic design of experiments we need to be familiar with some additional statistical procedures.

Many standard experimental designs have little or no importance for simulation experiments because they were developed for physical experiments in which the experimenter lacks complete control over the experiment, or is unable to collect data for certain conditions of factors.

In design of experiments, we want to design the strategic data collection procedure so as to perform only the informative experiments. For this purpose, we need to define some of the experimental-design terminology such as: factors, responses, and levels.

Factor Input parameters and structural assumptions of a model that will be changed in the course of simulation. Factors can be either quantitative, or qualitative. A quantitative factor is one whose levels can be measured on a numerical scale, while a qualitative factor represents structural assumptions that are not naturally quantified. WIP, number of machines, mean inter-arrival time, reorder point, and processing time are examples of quantitative factors, whereas queuing policy, ordering policy, and maintenance policy are examples of qualitative factors.












machine utilization, and delay in queue are some examples of



Different settings or values used for the factors. A combination of factors all at a specified level is called a treatment.

Because simulation is conducted in a completely controlled environment, and in particular, the simulation analyst controls the sources of random variation, it is possible to replicate a simulation model under identical conditions.

There are several experimental designs used for estimating the effects of the factors, including:

single-factor completely randomized experimental design

factorial design with two factors.

In a model with only one factor, the design of experiments is simple: run n replications

at different levels of the factor, and use the concepts of the previous section to determine if there is a significant difference between different levels of the factor under consideration. Even if the number of factors is two, we have the technique for the comparison of the means that can be used accordingly. But what if the number of

factors is three or more, then, we need to know more about Factorial design. In a factorial design, we want to study the actual impact of these factors on the systems responses.

When there are numerous factors of interest in an experiment, a factorial design should be used. These are designs in which factors are varied together; that is, in each

replication of the experiment, all combinations of the levels of the factors are examined.

A factorial design is a strategic plan for gaining information about the impact of the

factors on the response. The design specifies how many runs of the simulation are to be performed, and what level or value of each of the factors is to be used for each run.

So, the factorial design provides more than a way to compare pre-specified alternatives;

it also provides a strategy for determining which alternatives should be compared.

Single-Factor Completely Randomized Experimental Design (SFCR): When there is only one factor having some number of levels, say k, the experiment is called a single- factor experiment. The effect of level j of the factor is called j .

If the experiments of the model at each level and for different levels of the factor are based on independent streams of random numbers, the design is called a completely randomized design. Note that this condition implies that correlated sampling (use of common random numbers) is not used across factors. Note also that the numbers of replications at each level of factor do not have to be the same.

The statistical model for the analysis of the SFCR experimental design with k treatment

level is:

Y rj = +j + rj

r = 1, 2, …, R j


j = 1, 2, …, k


Y rj is observation r of the response variable for levels j of the factor. is the overall mean effect. j is the effect due to level j of the factor. rj is a random error in observation r at level j, assumed to be N(0,2 ). R j is the number of observations made at level j. K is the number of levels of the factor under study.

We will look at the model for which and j are assumed to be fixed and to satisfy  j = 0 (the summation is for j going from 1 to k). This model is called the fixed effects model.

If the level of the factors j can not be fixed but instead are chosen at random from

some population that is assumed to be normally distributed, then the resulting model is

a random effects model.

The initial analysis of a single-factor fixed-effects completely randomized experiment consists of a statistical test of the hypothesis:

H 0 : j = 0

(j = 1, 2, …, k)

That is, the factor has no statistically significant effect on the response variable. The applicable statistical test is a one-way analysis of variance (ANOVA). The test consists of computing an F-statistic and comparing its value to an appropriate critical value.

The layout used for ANOVA analysis is as follows:



Level j of the single factor










Y 12

Y 1j … Y 2j

Y 1k





Y 22


Y 2k




Y R1,1

Y R2,2

Y Rj,j

Y Rk,k


T 01

T 02

T 0j

T 0k




Y 01

Y 02

Y 0j

Y 0k



The variation of the response variable, Y ij , about the overall sample mean, Y 00 , can be

written as:

Y rj - Y 00 = (Y 0j - Y 00 ) + (Y rj - Y 0j )

(Y 0j -Y 00 ) is due to variation of a treatment mean from the grand mean and (Y rj -Y 0j ) represents the deviation of the response from the treatment mean at its level.

(Y rj - Y 00 ) 2 = (Y 0j - Y 00 ) 2 +  (Y rj - Y 0j ) 2

SS Total = SS Treat + SS Error

In the first  term, j goes from 1 to k, and r goes from 1 to R j , in the second term, j goes from 1 to k, and finally in the last  term, j goes from 1 to k, and r goes from 1 to R j . Note:

1. If the assumption of a common variance is correct, then:

MS E =SS E /(R-k) is an unbiased estimate of variance 2 of the response variable Y, that is: E(MS E ) = 2 .

2. If H 0 is true, then:

MS Treart =SS Treat /(k-1) is an unbiased estimate of variance 2 .

3. In any case MS E and MS Treart are statistically independent when the data are Normally distributed.

When H 0 is true, SS Treat /2 and SS Error /2 have 2 distribution with (k-1) and (R-k) degrees of freedom. Therefore, the test statistic for testing the H 0 hypothesis is:

F = MS Treart / MS E = [SS Treat /(k-1)] / [SS Error /(R-k)].

When H 0 is true, this test statistic has an F-distribution with (k-1) and (R-k) degrees of freedom. The ANOVA test of the hypothesis H 0 is:

To reject H 0 if:

Fail to reject H 0 if:

F > F 1-,(k-1),(R-k)

F < F 1-,(k-1),(R-k)

Note that if the test indicates a statistically significant effect due to the factor, the analyst may be interested in estimating 100(1-)% confidence interval for (+j ) using:

Y 0j + t /2. (R-k) . [MS E /R j ]

Even though there are numerous software packages available for the ANOVA analysis, the following table can be used for manual calculation and is common to almost all software applications.

Source of

Sum of








SS Treat


MS Treat

MS Treat /MS E










Factorial Designs with Two Factors: The statistical model for the analysis of the factorial designs with two factors is:


Y ijr = +A i + B j + ijr

Y ijr is the observation of the response variable Y, for replication r of level i of the first factor (A) and level j of the second factor (B).

To conduct an ANOVA test, assuming that there are a levels of factor A, b levels of factor B, and k replications at each treatment level (for a total of R = abk replications), then:

SS Total = (Y ijr - Y 000 ) 2

SS A =

b.k.(Y i00 - Y 000 ) 2

SS B = a.k.(Y 0j0 - Y 000 ) 2

SS AB = k.(Y ij0 Y i00 Y 0j0 +Y 000 ) 2

SS E = SS Total - SS A - SS B - SS AB

In the first  summation, i goes from 1 to a, j goes from 1 to b, and r goes from 1 to k. In the second summation, i goes from 1 to a. In the third summation, j goes from 1 to b, and in the last  summation, i goes from 1 to a, and j goes from 1 to b. The layout used for manual calculation of the ANOVA is as follows:

Source of

Sum of












MS A = SS A /(a-1) MS B = SS B /(a-1) MS AB = SS AB /[(a-1)(k-1)] MS E =SS E /[ab(k-1)]





















This layout allows three hypotheses to be tested:

H 01 : A i = 0 H 02 : B j = 0 H 03 : AB ij = 0

for all i for all j for all I and j

Metamodeling: Suppose that there is a simulation output response variable, Y, that is related to k independent variables, say X 1 , X 2 , …, X k . In most cases the functional relationship is unknown, and the analyst must select an appropriate function containing unknown parameters, and then estimate those parameters from a set of data (Y,X). Regression Analysis is one such method for estimating the parameters.

As an example, suppose that it is desired to estimate the relationship between a single independent variable X and a dependent variable Y, and suppose that the true relationship between Y and X is linear. E(Y | x) = 0 + 1 x

It is further assumed that each observation of Y can be described by the model:

Y = 0 + 1 x +

where is a random error with mean zero and constant variance 2 . Suppose that there are n pairs of observations (y 1 , x 1 ), …, (y n , x n ). These observations may be used to estimate 0 and 1 . In the method of least squares, 0 and 1 are estimated such that the sum of the squares of the deviations between the observations and the regression line is minimized.

0 = Y = y i /n

1 = [y i .(x I X)]/ [(x I X)]

Testing for significance of regression is one of many hypothesis tests that can be developed. Suppose the null hypothesis is H 0 : 1 = 0, then the appropriate test statistic for significance of regression is given by:

T 0 = 1 /[MS E /S XX ], where MS E = (i ) 2 /(n-2) and S XX = (X i ) 2 [(X i ) 2 /n].