Course 9 OutputAnalysis For Simulation

END 306
Simulation
Output Analysis
Output Analysis for a Single Model

Outputs from the same simulation model vary as a result of the
randomness in the input variables to the system.
Performance for the real system is donated and measured by .
Output from the simulation is used as an estimator for , this
estimator is denoted by
Sensitivity for the estimator is measured by its standard
error or size of the confidence interval.
The reason for the statistical output analysis ; to determine the
sensitivity for the estimator or determine the number of
replications to obtain the desired confidence interval or both of
them.

Inputs are generated randomly:
Outputs derived from inputs -> stochastic outputs
Outputs may depend on initial conditions of inputs and system
System behavior often nonlinear function of input
Correlated outputs:
Consider time intervals:
If average queue length was long in Ti, starting conditions in Ti+1 will bias
that interval to longer than average queue length
Types of Simulations
with Respect to Output Analysis
Sonlanan veya Sonlanmayan (kararl hal) simulasyonlar
Terminating or transient simulation: (e.g.: Simulation of a Bank from 9am-5pm)

Runs until end time,(TE)
Initial conditions at t=0 specified
Stop time, TE , or stop event, E, must be specified
Steady-state simulation: (e.g. : automated production line)

Runs continuously or over a long period
Properties are not influenced by initial conditions
Stochastic Nature of Output Data
Model is an input output transformation

Random inputs random outputs
Run simulation n times r r 1,2,3 n and observe the results:
M/G/1 Example:
Batched average queue length for 3 independent replications :
Inherent variability in stochastic simulation both within a single replication and

across different replications.
The average across 3 replications, Y1. , Y2. , Y3. , can be regarded as independent
observations, but averages within a replication, Y11, , Y15, are not.
Measures of performance
Consider the estimation of a performance parameter, for
discrete (or for continuous), of a simulated system.
Discrete time data: [Y1, Y2, , Yn], with ordinary mean : (orn. bekleme
sureleri)
Continuous-time data:{Y(t), 0 t TE} with time-weighted mean : (orn.
bekleyen kisi sayisi)
Point estimation for discrete time

data.
n
1
The point estimator:
Yi
n
i 1
stenen
Is unbiased if its expected value is , that is if: E ()
Is biased if: E ( )
1 TE
Point estimation for continuous-time data. T 0 Y (t )dt

E
Is biased in general where: E ()
Point Estimator
Usually, system performance measures can be put into the
common framework of or
Performance measure that does not fit: quantile or percentile:
Quantile is the performance level that can be achieved with
probability
e.g.: Let Y be waiting time. If 0.85 quantile for Y is . Then
%85 of the customers will wait less than .
Pr{Y } p
Histograms can be used to estimate Quantiles for observed Y
Confidence-Interval Estimation
Confidence interval C.I.: A Measure of Error(gercek ortalamanin
hesaplanan aralikta olduguna %CI guveniyoruz)
Yi. average production time on the ith run
R
Average of R replication
Y.. Yi. / R
i 1
Y.. is an estimator for and is calculated
from our data
We cannot know for certain how far Y..is from but CI attempts to bound
that error. The more replications we make, the less error there is in
Y..
1 R
S
2
2
S
(
Y
Y
)
Y
t
i. ..
..
/ 2 , R 1
R 1 i 1
R
Prediction Interval P.I.: A measure of risk.

A good guess for the average cycle time on a particular day is our estimator but it is
unlikely to be exactly right.
PI is designed to be wide enough to contain the actual average cycle time on any
particular day with high probability. The length of PI will not go to 0 as R increases
Y.. t / 2, R 1S 1
1
R
R sonsuza giderken bu aralik z / 2 olur.
Output Analysis for Terminating

Simulations
A terminating simulation: runs over a simulated time interval[0, TE].

A common goal is to estimate: or
1 n
E Yi ,
n
i 1
for discrete output
1 TE
, for continuous output Y (t ),0 t TE

Y
(
t
)
dt
0
T
E
Independent replications are used. Each run using a different random number
stream and independently chosen initial conditions.
Important to distinguish within-replication data from across-replication data.
Bir simulasyon runnda retim zamanlar Yi1, Yi2 (i inci tekrarn 1 ve 2 nolu
paralarnn retim zamanlar) tekrar-ii veridir.
Bir tekrardaki retim zamanlarnn ortalamas Yi. tekrarlar-aras veridir
Tekrarlar-aras veriler birbirinden bamszdrlar. Tekrar ici verilerin ortalamasi
alinarak bulundularsa, kendi aralarinda normal dagilima sahiptirler.
Tekrar-ii veriler birbirinden bamsz degildir (cogu zaman).
Output Analysis for Terminating

Simulations
Tekrarlar aras (Across Replication):
For example: the daily cycle time averages (discrete time data)
1 R
The average:
Y.. Yi.
R i 1
1 R
The sample variance:
2
2
S
(
Y
Y
)
i. ..
R 1 i 1
S
The confidence-interval half-width:
H t / 2, R 1
R
Tekrar ii (Within replication):
For example: the WIP (a continuous time data)

TEi
1
The average:
Yi.
Yi (t )dt
0
T Ei
The sample variance:
1
S
T Ei
2
i
Y (t ) Y
TEi
i.
dt
Output Analysis Example
For the call center 4 replications are conducted. Estimated average waiting time in queue
for the log run
Replication Average waiting time in Average number in queue

queue (min)
1
0.88
0.68
5.04
4.18
4.13
3.26
0.52
0.34
Output Analysis Example - contd

Average waiting time in queue:
0.88 5.04 4.13 0.52
Y..
2.64
4
2
2
(
0
.
88
2
.
64
)
(
0
.
52
2
.
64
)
S2
2.282
4 1
S
H t0.025,4 1
3.62
4
%95CI 2.64 3.63 min
Negatif Ortalama kuyrukta bekleme suresi mmkn olmadna

gre
cok az tekrar yapmsz!!!
C.I. with Specified Precision

The half-length H of a 100(1 )% confidence interval for a mean , based on
the t distribution, is given by:
S
R tekrar says
H t / 2, R 1
S2 rneklerin
varyans
Suppose that an error criterion is specified with probability 1 - , a sufficiently

large sample size should satisfy:
P Y.. 1
Assume that an initial sample of size R0 (independent) replications has been

observed (R0>=2). Obtain an initial estimate S02 of the population variance 2.
Then, choose sample size R such that R R0:
Since t/2, R-1 z/2, an initial estimate of R:
2
z S
R /2 0
t / 2, R 1S 0

R is the smallest integer satisfying R R0 and
Collect R - R0 additional observations.

The 100(1-)% C.I. for :
Y.. t / 2, R 1
S
R
C.I. with Specified Precision

Call Center Example : estimate the agents utilization . The error is
= 0.04 and confidence coefficient is 1- = 0.95
Initial sample of size R0 = 4 is taken and an initial estimate of the population
variance is S02 = (0.072)2 = 0.00518.
The error criterion is = 0.04 and confidence coefficient is 1- = 0.95, hence,
the final sample size must be at least:
z0.025 S 0
1.96 2 * 0.00518
12.14
0.04 2
For the final sample size:
S
t
R / 2, R 1 0
, so R - R0 = 11 additional replications are needed.
For R = 15
After obtaining additional outputs, half-width should be checked.
Quantiles
A proportion or probability is treated as a special case of a mean.
When the number of independent replications Y1, , YR is large
enough that t/2,n-1 = z/2, the confidence interval for a probability p
is often written as:
p (1 p )
p z / 2
R 1
The sample proportion
A quantile is the inverse of the probability to the probability

estimation problem:
p is given
Find such that Pr(Y) = p
Quantiles
The best way is to sort the outputs and use the (R*p)th smallest
value, i.e., find such that 100p% of the data in a histogram of Y
is to the left of .
Example: If we have R=10 replications and we want the p = 0.8 quantile,
first sort, then estimate by the (10)(0.8) = 8th smallest value (round if
necessary).
5.6 sorted data
7.1
8.8
8.9
9.5
9.7
10.1
12.2 point estimate
12.5
12.9
Quantiles
Confidence Interval of Quantiles: An approximate (1-)100%

confidence interval for can be obtained by finding two values
l and u.
l cuts off 100pl% of the histogram (the Rpl smallest value of the
sorted data.
u cuts off 100pu% of the histogram (the Rpu smallest value of the
sorted data
where p p z / 2
p (1 p )
R 1
pu p z / 2
p(1 p )
R 1
Quantiles
Example: Suppose R = 1000 reps, to estimate the p = 0.8 quantile

with a 95% confidence interval.
First, sort the data from smallest to largest.
Then estimate of by the (1000)(0.8) = 800th smallest value, and the
point estimate is 212.03.
A portion of the 1000
sorted values:
And find the confidence interval:
p 0.8 1.96
.8(1 .8)
0.78
1000 1
pu 0.8 1.96
.8(1 .8)
0.82
1000 1
The c.i. is the 780 th and 820 th smallest values
95% CI [188.96, 256.79]
Output Analysis for Steady-State

Simulation
Consider a single run of a simulation model to estimate a steadystate or long-run characteristics of the system.
The single run produces observations Y1, Y2, ... (generally the
samples of an autocorrelated time series).
Performance measure : (denklemde n in sonsuza gittiini
grnz)
n
1
Yi ,
n
n
i 1
1 TE
lim Y (t )dt ,
0
TE TE
lim
for discrete measure

for continuous measure
Independent of the initial conditions.
(with probability 1)
(with probability 1)
Output Analysis for Steady-State

Simulation
The sample size is a design choice, with several considerations
in mind:
Any bias in the point estimator that is due to artificial or arbitrary initial
conditions (bias can be severe if run length is too short).
Desired precision of the point estimator.
Budget constraints on computer resources.
Notation: the estimation of from a discrete-time output

process.
One replication (or run), the output data : Y1, Y2, Y3,
With several replications, the output data for replication r: Yr1, Yr2, Yr3,
Steady-State Simulation
Initialization Bias
Methods to reduce the point-estimator bias caused by using artificial and

unrealistic initial conditions:
Intelligent initialization.
Divide simulation into an initialization phase and data-collection phase.
Intelligent initialization
Initialize the simulation in a state that is more representative of long-run conditions.
If the system exists, collect data on it and use these data to specify more nearly
typical initial conditions.
If the system can be simplified enough to make it mathematically solvable, e.g.
queueing models, solve the simplified model to find long-run expected or most
likely conditions, use that to initialize the simulation.
Divide each simulation into two phases:

An initialization phase, from time 0 to time T0.
A data-collection phase, from T0 to the stopping time T0+TE.
The choice of T0 is important:
After T0, system should be more nearly representative of steady-state behavior.
System has reached steady state: the probability distribution of the system state is
close to the steady-state probability distribution
Initialization Bias
M/G/1 queueing example: A total of 10 independent replications.
Each replication beginning in the empty and idle state

Simulation run length on each replication was T0+TE = 15,000 min.
Response variable: queue length, LQ(t,r) (at time t of the rth replication).
Batching intervals of 1,000 minutes, batch means
Ensemble averages::
To identify trend in the data due to initialization bias
The average correspondingRbatch means across replications :
Y. j
1
Yrj
R r 1
A plot of the ensemble averages,

Y ..(n, d )
,versus 1000j, for j = 1,2, ,15.
R replications
Initialization Bias
Cumulative average sample mean (after deleting d observations):

1
Y.. (n, d )
nd
j d 1
.j
No widely accepted technique to guide how much data to delete.

Plots can be helpful.
It is apparent that downward bias is
present and this bias can be reduced
by deletion of one or more
observations.
Initialization Bias
No widely accepted, objective and proven technique to

guide how much data to delete to reduce initialization
bias to a negligible level.
Plots can, at times, be misleading but they are still
recommended.
Ensemble averages reveal a smoother and more precise trend as
the # of replications, R, increases.
Cumulative average becomes less variable as more data are
averaged.
The more correlation present, the longer it takes for Yto
.j
approach steady state.
Different performance measures could approach steady state at
different rates.
Error Estimation
If {Y1, , Yn} are not statistically independent, then S2/n is a
biased estimator of the true variance.
Almost always the case when {Y1, , Yn} is a sequence of output
observations from within a single replication (autocorrelated sequence,
time-series).
Suppose the point estimator is the sample mean
Y i 1 Yi / n
n
Variance of Y is almost impossible to estimate.

For system with steady state, produce an output process that is
approximately covariance stationary (after passing the transient phase).
The covariance between two random variables in the time series
depends only on the lag (the # of observations between them).
Error Estimation
For a covariance stationary time series, {Y1, , Yn}:
Lag-k autocovariance is:
k cov(Y1 , Y1 k ) cov(Yi , Yi k )
k
2
If a time series is covariance stationary, then the variance of Y is:
Lag-k autocorrelation is:
2
V (Y )
n
2
1
n
k 1
n 1
The expected value of the variance estimator is:

S2
BV (Y ),
E
n
where B
n / c 1
n 1
Error Estimation
Stationary time series Yi

exhibiting positive
autocorrelation.
Stationary time series Yi

exhibiting negative
autocorrelation.
Nonstationary time series with an

upward trend
Error Estimation
The expected value of the variance estimator is:
S2
BV (Y ),
E
n
where B
n / c 1
and V (Y ) is the variance of Y
n 1
If Yi are independent, then S2/n is an unbiased estimator of V (Y )

If the autocorrelation k are primarily positive, then S2/n is biased low as
an estimator of V (Y ) .
If the autocorrelation k are primarily negative, then S2/n is biased high
as an estimator of V (Y )
Replication Method
Use to estimate point-estimator variability and to construct a
confidence interval.
Approach: make R replications, initializing and deleting from each
one the same way.
Important to do a thorough job of investigating the initialcondition bias:
Bias is not affected by the number of replications, instead, it is affected only
by deleting more data (i.e., increasing T0) or extending the length of each
run (i.e. increasing TE).
Basic raw output data {Yrj, r = 1, ..., R; j = 1, , n} is derived by

Individual observation from within replication r (tekrar r da mteri j nin
gecikme sresi).
Batch mean from within replication r of some number of discrete-time
observations.
Batch mean of a continuous-time process over time interval j.
Replication Method
Each replication is regarded as a single sample for estimating
n
1
Yr . (n, d )
Yrj
For replication r:
n d j d 1
The overall point estimator:
1 R
Y.. (n, d ) Yr . (n, d )
R r 1
and
E[Y.. (n, d )] n ,d
If d and n are chosen sufficiently large n,d ~

Y.. ( n, d ) nn yaklak olarak yansz bir tahminidir.
To estimate standard error of Y.. , the sample variance and
standard error:
1 R
1
2
S
(
Y
Y
)
r. .. R 1
R 1 r 1
2
Yr. RY..
r 1
and
s.e.(Y.. )
S
R
Replication Method
Length of each replication (n) beyond deletion point (d):
(d nin en az 10 kat olmal):
(n - d) > 10d
Number of replications (R) should be as many as time permits,
up to about 25 replications.
For a fixed total sample size (n), as fewer data are deleted ( d):
C.I. shifts: greater bias
Standard error of Y.. (n, d ) decreases: decrease variance.
Reducing
bias
Trade off
Increasing
variance
Replication Method
Table 11.8 pp.416
Bu tablodan da goruldugu gibi sabit data sayisi (n) varken silinen data miktarininin
(d) azaltilmasi;
ortalamayi asagiya ceker (yanli olarak etkiler)
Hesaplanan ortalamanin standard hatasini yani S/sqrt(R) azaltir. Bunun sebebi
artan data sayisi ile birlikte varyansin azalmasidir.
Replication Method
M/G/1 queueing example:
Suppose R = 10, each of length TE = 15,000 minutes, starting at time 0 in
the empty and idle state, initialized for T0 = 2,000 minutes before data
collection begins.
Each batch means is the average number of customers in queue for a
1,000-minute interval.
The 1st two batch means are deleted (d = 2).
The point estimator and standard error are:
Y.. (15,2) 8.43
and
s.e.Y.. (15,2) 1.59
The 95% C.I. for long-run mean queue length is:
Y.. t / 2, R 1S / R Y.. t / 2, R 1S / R
8.43 2.26(1.59) LQ 8.42 2.26(1.59)
A high degree of confidence that the long-run mean queue length is

between 4.84 and 12.02 (if d and n are large enough).( d ve n yeteri
kadar bykse).
Sample Size
An alternative to increasing R is to increase total run length
T0+TE within each replication.
Approach:
Increase the run length from (T0+TE) to (R/R0)(T0+TE),
Delete the data from time 0 to time (R/R0)T0.
Advantage: any residual bias in the point estimator should be further

reduced.
However, it is necessary to have saved the state of the model at time
T0+TE and to be able to restart the model.
Sample Size
Example 11.17: Tablo 11.8 deki M/G/1 kuyruk orneginde R0=10 tekrar yapilmisti ve d=2
observasyonun silinmesiyle varyans tahmincisi S2=25.30 olmustu. Bizden ortalama kuyruk
uzunlugunu %90 guvenle +-2 musteri ile tahmin etmemiz isteniyor. Bu durumda kac tekrar
yapariz? Istenen adet tekrari bastan yapmak yerine ne onerirsiniz?
S
z
R /2 0
1.6452 (25.30)
22
t / 2, R 1S0
R 18 ?
t / 2, R 1S0
R 19 ?
19.15
no!!
18.93 yes, then 19 replications is enough
9 adet daha sifir anindan baslayan tekrar yapmak yerine elimizdeki tekrarlarin son
dakikalarindan itibaren simulasyonlari uzatiriz. Elimizdeki tekrarlari (R/R0)(T0+TE)=(19/10)
(15000)=28500 dakikaya uzatiriz. Tum tekrarlarda ilk (R/R0)T0 = 1.9 * 2000 =3800 dakikayi
sileriz. Cikan verilerden 3800-28500 arasini kullaniriz.
Batch Means
A lot of simulation time is wasted by deleting d many data points from each
replication. Instead consider having one long simulation and deleting the
data for one period.
Using a single, long replication :
Problem: data are dependent so the usual estimator is biased.
Solution: batch means.
Batch means: divide the output data from 1 replication (after appropriate
deletion) into a few large batches and then treat the means of these batches
as if they were independent.
A continuous-time process, {Y(t), T0 <= t <= T0+TE}:
k batches of size m = TE/k, batch means :
1 jm
Yj
Y (t T0 )dt
(
j
1
)
m
m
A discrete-time process,{Yi, i = d+1,d+2, , n}:

k batches of size m = (n d)/k, batch means:
jm
1
Yj
Yi d
m i ( j 1) m 1
Batch Means
Y1 , ..., Yd , Yd 1 , ..., Yd m , Yd m 1 , ..., Yd 2 m , ... , Yd ( k 1) m 1 , ..., Yd km
deleted
Y1
Yk
Y2
Starting either with continuous-time or discrete-time data, the variance of the

sample mean is estimated by:
2
S2 1
k
k
j 1
Y j Y
k 1
j 1
Y j2 kY 2
k (k 1)
If the batch size is sufficiently large, successive batch means will be

approximately independent, and the variance estimator will be
approximately unbiased.
No widely accepted and relatively simple method for choosing an acceptable
batch size m. Some simulation software does it automatically.
It is important to have batch sizes such that lag-1 correlation between the
batches is under 0.2.

Course 9 OutputAnalysis For Simulation

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Course 9 OutputAnalysis For Simulation

Caricato da

Copyright:

Formati disponibili

END 306

Output Analysis for a Single Model

Output Analysis for a Single Model

Output Analysis for a Single Model

Output Analysis for a Single Model

Terminating or transient simulation: (e.g.: Simulation of a Bank from 9am-5pm)

Steady-state simulation: (e.g. : automated production line)

Stochastic Nature of Output Data

Model is an input output transformation

Inherent variability in stochastic simulation both within a single replication and

Point estimation for discrete time

Point estimation for continuous-time data. T 0 Y (t )dt

Prediction Interval P.I.: A measure of risk.

R sonsuza giderken bu aralik z / 2 olur.

Output Analysis for Terminating

A terminating simulation: runs over a simulated time interval[0, TE].

for discrete output

, for continuous output Y (t ),0 t TE

Output Analysis for Terminating

Tekrar ii (Within replication):

For example: the WIP (a continuous time data)

Output Analysis Example

Replication Average waiting time in Average number in queue

Output Analysis Example - contd

Negatif Ortalama kuyrukta bekleme suresi mmkn olmadna

C.I. with Specified Precision

Suppose that an error criterion is specified with probability 1 - , a sufficiently

Assume that an initial sample of size R0 (independent) replications has been

R is the smallest integer satisfying R R0 and

Collect R - R0 additional observations.

C.I. with Specified Precision

For the final sample size:

A quantile is the inverse of the probability to the probability

Confidence Interval of Quantiles: An approximate (1-)100%

Example: Suppose R = 1000 reps, to estimate the p = 0.8 quantile

The c.i. is the 780 th and 820 th smallest values

95% CI [188.96, 256.79]

Output Analysis for Steady-State

for discrete measure

Independent of the initial conditions.

Output Analysis for Steady-State

Notation: the estimation of from a discrete-time output

Methods to reduce the point-estimator bias caused by using artificial and

Divide each simulation into two phases:

M/G/1 queueing example: A total of 10 independent replications.

Each replication beginning in the empty and idle state

A plot of the ensemble averages,

Cumulative average sample mean (after deleting d observations):

No widely accepted technique to guide how much data to delete.

No widely accepted, objective and proven technique to

Suppose the point estimator is the sample mean

Variance of Y is almost impossible to estimate.

The expected value of the variance estimator is:

Stationary time series Yi

Stationary time series Yi

Nonstationary time series with an

If Yi are independent, then S2/n is an unbiased estimator of V (Y )

Basic raw output data {Yrj, r = 1, ..., R; j = 1, , n} is derived by

If d and n are chosen sufficiently large n,d ~

A high degree of confidence that the long-run mean queue length is

Advantage: any residual bias in the point estimator should be further

18.93 yes, then 19 replications is enough

A discrete-time process,{Yi, i = d+1,d+2, , n}: