Sei sulla pagina 1di 39

END 306

Simulation
Output Analysis

Output Analysis for a Single Model


Outputs from the same simulation model vary as a result of the
randomness in the input variables to the system.
Performance for the real system is donated and measured by .
Output from the simulation is used as an estimator for , this
estimator is denoted by
Sensitivity for the estimator is measured by its standard
error or size of the confidence interval.
The reason for the statistical output analysis ; to determine the
sensitivity for the estimator or determine the number of
replications to obtain the desired confidence interval or both of
them.

Output Analysis for a Single Model

Output Analysis for a Single Model


Inputs are generated randomly:
Outputs derived from inputs -> stochastic outputs
Outputs may depend on initial conditions of inputs and system
System behavior often nonlinear function of input

Correlated outputs:
Consider time intervals:
If average queue length was long in Ti, starting conditions in Ti+1 will bias
that interval to longer than average queue length

Output Analysis for a Single Model

Types of Simulations
with Respect to Output Analysis
Sonlanan veya Sonlanmayan (kararl hal) simulasyonlar

Terminating or transient simulation: (e.g.: Simulation of a Bank from 9am-5pm)


Runs until end time,(TE)
Initial conditions at t=0 specified
Stop time, TE , or stop event, E, must be specified

Steady-state simulation: (e.g. : automated production line)


Runs continuously or over a long period
Properties are not influenced by initial conditions

Stochastic Nature of Output Data

Model is an input output transformation


Random inputs random outputs
Run simulation n times r r 1,2,3 n and observe the results:
M/G/1 Example:
Batched average queue length for 3 independent replications :

Inherent variability in stochastic simulation both within a single replication and


across different replications.
The average across 3 replications, Y1. , Y2. , Y3. , can be regarded as independent
observations, but averages within a replication, Y11, , Y15, are not.

Measures of performance
Consider the estimation of a performance parameter, for
discrete (or for continuous), of a simulated system.

Discrete time data: [Y1, Y2, , Yn], with ordinary mean : (orn. bekleme
sureleri)
Continuous-time data:{Y(t), 0 t TE} with time-weighted mean : (orn.
bekleyen kisi sayisi)

Point estimation for discrete time


data.
n
1
The point estimator:
Yi
n

i 1

stenen
Is unbiased if its expected value is , that is if: E ()

Is biased if: E ( )
1 TE

Point estimation for continuous-time data. T 0 Y (t )dt


E
Is biased in general where: E ()

Point Estimator
Usually, system performance measures can be put into the
common framework of or
Performance measure that does not fit: quantile or percentile:
Quantile is the performance level that can be achieved with
probability
e.g.: Let Y be waiting time. If 0.85 quantile for Y is . Then
%85 of the customers will wait less than .
Pr{Y } p
Histograms can be used to estimate Quantiles for observed Y

Confidence-Interval Estimation
Confidence interval C.I.: A Measure of Error(gercek ortalamanin
hesaplanan aralikta olduguna %CI guveniyoruz)
Yi. average production time on the ith run
R
Average of R replication
Y.. Yi. / R
i 1
Y.. is an estimator for and is calculated
from our data
We cannot know for certain how far Y..is from but CI attempts to bound
that error. The more replications we make, the less error there is in
Y..
1 R
S
2
2
S
(
Y

Y
)
Y

t
i. ..
..
/ 2 , R 1
R 1 i 1
R

Prediction Interval P.I.: A measure of risk.


A good guess for the average cycle time on a particular day is our estimator but it is
unlikely to be exactly right.
PI is designed to be wide enough to contain the actual average cycle time on any
particular day with high probability. The length of PI will not go to 0 as R increases

Y.. t / 2, R 1S 1

1
R

R sonsuza giderken bu aralik z / 2 olur.

Output Analysis for Terminating


Simulations

A terminating simulation: runs over a simulated time interval[0, TE].


A common goal is to estimate: or
1 n
E Yi ,
n

i 1

for discrete output

1 TE

, for continuous output Y (t ),0 t TE


Y
(
t
)
dt

0
T
E

Independent replications are used. Each run using a different random number
stream and independently chosen initial conditions.
Important to distinguish within-replication data from across-replication data.
Bir simulasyon runnda retim zamanlar Yi1, Yi2 (i inci tekrarn 1 ve 2 nolu
paralarnn retim zamanlar) tekrar-ii veridir.
Bir tekrardaki retim zamanlarnn ortalamas Yi. tekrarlar-aras veridir
Tekrarlar-aras veriler birbirinden bamszdrlar. Tekrar ici verilerin ortalamasi
alinarak bulundularsa, kendi aralarinda normal dagilima sahiptirler.
Tekrar-ii veriler birbirinden bamsz degildir (cogu zaman).

Output Analysis for Terminating


Simulations
Tekrarlar aras (Across Replication):
For example: the daily cycle time averages (discrete time data)
1 R
The average:
Y.. Yi.
R i 1
1 R
The sample variance:
2
2
S
(
Y

Y
)
i. ..
R 1 i 1
S
The confidence-interval half-width:
H t / 2, R 1
R

Tekrar ii (Within replication):

For example: the WIP (a continuous time data)


TEi
1
The average:
Yi.
Yi (t )dt

0
T Ei
The sample variance:

1
S
T Ei
2
i

Y (t ) Y
TEi

i.

dt

Output Analysis Example

For the call center 4 replications are conducted. Estimated average waiting time in queue
for the log run

Replication Average waiting time in Average number in queue


queue (min)
1

0.88

0.68

5.04

4.18

4.13

3.26

0.52

0.34

Output Analysis Example - contd


Average waiting time in queue:
0.88 5.04 4.13 0.52
Y..
2.64
4
2
2
(
0
.
88

2
.
64
)

(
0
.
52

2
.
64
)
S2
2.282
4 1
S
H t0.025,4 1
3.62
4
%95CI 2.64 3.63 min

Negatif Ortalama kuyrukta bekleme suresi mmkn olmadna


gre
cok az tekrar yapmsz!!!

C.I. with Specified Precision


The half-length H of a 100(1 )% confidence interval for a mean , based on
the t distribution, is given by:
S
R tekrar says

H t / 2, R 1

S2 rneklerin
varyans

Suppose that an error criterion is specified with probability 1 - , a sufficiently


large sample size should satisfy:

P Y.. 1

Assume that an initial sample of size R0 (independent) replications has been


observed (R0>=2). Obtain an initial estimate S02 of the population variance 2.
Then, choose sample size R such that R R0:
Since t/2, R-1 z/2, an initial estimate of R:
2

z S
R /2 0
t / 2, R 1S 0

R is the smallest integer satisfying R R0 and

Collect R - R0 additional observations.


The 100(1-)% C.I. for :

Y.. t / 2, R 1

S
R

C.I. with Specified Precision


Call Center Example : estimate the agents utilization . The error is
= 0.04 and confidence coefficient is 1- = 0.95
Initial sample of size R0 = 4 is taken and an initial estimate of the population
variance is S02 = (0.072)2 = 0.00518.
The error criterion is = 0.04 and confidence coefficient is 1- = 0.95, hence,
the final sample size must be at least:
z0.025 S 0

1.96 2 * 0.00518

12.14
0.04 2

For the final sample size:

S
t
R / 2, R 1 0
, so R - R0 = 11 additional replications are needed.

For R = 15
After obtaining additional outputs, half-width should be checked.

Quantiles
A proportion or probability is treated as a special case of a mean.
When the number of independent replications Y1, , YR is large
enough that t/2,n-1 = z/2, the confidence interval for a probability p
is often written as:
p (1 p )
p z / 2
R 1
The sample proportion

A quantile is the inverse of the probability to the probability


estimation problem:
p is given
Find such that Pr(Y) = p

Quantiles
The best way is to sort the outputs and use the (R*p)th smallest
value, i.e., find such that 100p% of the data in a histogram of Y
is to the left of .
Example: If we have R=10 replications and we want the p = 0.8 quantile,
first sort, then estimate by the (10)(0.8) = 8th smallest value (round if
necessary).
5.6 sorted data
7.1
8.8
8.9
9.5
9.7
10.1
12.2 point estimate
12.5
12.9

Quantiles

Confidence Interval of Quantiles: An approximate (1-)100%


confidence interval for can be obtained by finding two values

l and u.

l cuts off 100pl% of the histogram (the Rpl smallest value of the
sorted data.

u cuts off 100pu% of the histogram (the Rpu smallest value of the
sorted data

where p p z / 2

p (1 p )
R 1

pu p z / 2

p(1 p )
R 1

Quantiles

Example: Suppose R = 1000 reps, to estimate the p = 0.8 quantile


with a 95% confidence interval.
First, sort the data from smallest to largest.
Then estimate of by the (1000)(0.8) = 800th smallest value, and the
point estimate is 212.03.
A portion of the 1000
sorted values:
And find the confidence interval:
p 0.8 1.96

.8(1 .8)
0.78
1000 1

pu 0.8 1.96

.8(1 .8)
0.82
1000 1

The c.i. is the 780 th and 820 th smallest values

95% CI [188.96, 256.79]

Output Analysis for Steady-State


Simulation

Consider a single run of a simulation model to estimate a steadystate or long-run characteristics of the system.
The single run produces observations Y1, Y2, ... (generally the
samples of an autocorrelated time series).
Performance measure : (denklemde n in sonsuza gittiini
grnz)
n
1
Yi ,

n
n
i 1
1 TE
lim Y (t )dt ,
0
TE TE

lim

for discrete measure


for continuous measure

Independent of the initial conditions.

(with probability 1)
(with probability 1)

Output Analysis for Steady-State


Simulation
The sample size is a design choice, with several considerations
in mind:
Any bias in the point estimator that is due to artificial or arbitrary initial
conditions (bias can be severe if run length is too short).
Desired precision of the point estimator.
Budget constraints on computer resources.

Notation: the estimation of from a discrete-time output


process.
One replication (or run), the output data : Y1, Y2, Y3,
With several replications, the output data for replication r: Yr1, Yr2, Yr3,

Steady-State Simulation
Initialization Bias

Methods to reduce the point-estimator bias caused by using artificial and


unrealistic initial conditions:
Intelligent initialization.
Divide simulation into an initialization phase and data-collection phase.

Intelligent initialization
Initialize the simulation in a state that is more representative of long-run conditions.

If the system exists, collect data on it and use these data to specify more nearly
typical initial conditions.
If the system can be simplified enough to make it mathematically solvable, e.g.
queueing models, solve the simplified model to find long-run expected or most
likely conditions, use that to initialize the simulation.

Divide each simulation into two phases:


An initialization phase, from time 0 to time T0.
A data-collection phase, from T0 to the stopping time T0+TE.
The choice of T0 is important:
After T0, system should be more nearly representative of steady-state behavior.

System has reached steady state: the probability distribution of the system state is
close to the steady-state probability distribution

Steady-State Simulation
Initialization Bias

M/G/1 queueing example: A total of 10 independent replications.

Each replication beginning in the empty and idle state


Simulation run length on each replication was T0+TE = 15,000 min.
Response variable: queue length, LQ(t,r) (at time t of the rth replication).
Batching intervals of 1,000 minutes, batch means

Ensemble averages::
To identify trend in the data due to initialization bias
The average correspondingRbatch means across replications :

Y. j

1
Yrj

R r 1

A plot of the ensemble averages,


Y ..(n, d )
,versus 1000j, for j = 1,2, ,15.

R replications

Steady-State Simulation
Initialization Bias

Cumulative average sample mean (after deleting d observations):


1
Y.. (n, d )
nd

j d 1

.j

No widely accepted technique to guide how much data to delete.


Plots can be helpful.
It is apparent that downward bias is
present and this bias can be reduced
by deletion of one or more
observations.

Steady-State Simulation
Initialization Bias

No widely accepted, objective and proven technique to


guide how much data to delete to reduce initialization
bias to a negligible level.
Plots can, at times, be misleading but they are still
recommended.
Ensemble averages reveal a smoother and more precise trend as
the # of replications, R, increases.
Cumulative average becomes less variable as more data are
averaged.
The more correlation present, the longer it takes for Yto
.j
approach steady state.
Different performance measures could approach steady state at
different rates.

Steady-State Simulation
Error Estimation
If {Y1, , Yn} are not statistically independent, then S2/n is a
biased estimator of the true variance.
Almost always the case when {Y1, , Yn} is a sequence of output
observations from within a single replication (autocorrelated sequence,
time-series).

Suppose the point estimator is the sample mean

Y i 1 Yi / n
n

Variance of Y is almost impossible to estimate.


For system with steady state, produce an output process that is
approximately covariance stationary (after passing the transient phase).
The covariance between two random variables in the time series
depends only on the lag (the # of observations between them).

Steady-State Simulation
Error Estimation
For a covariance stationary time series, {Y1, , Yn}:
Lag-k autocovariance is:

k cov(Y1 , Y1 k ) cov(Yi , Yi k )

k
2
If a time series is covariance stationary, then the variance of Y is:
Lag-k autocorrelation is:

2
V (Y )
n

2
1

n
k 1

n 1

The expected value of the variance estimator is:


S2
BV (Y ),
E
n

where B

n / c 1
n 1

Steady-State Simulation
Error Estimation

Stationary time series Yi


exhibiting positive
autocorrelation.

Stationary time series Yi


exhibiting negative
autocorrelation.

Nonstationary time series with an


upward trend

Steady-State Simulation
Error Estimation
The expected value of the variance estimator is:
S2
BV (Y ),
E
n

where B

n / c 1
and V (Y ) is the variance of Y
n 1

If Yi are independent, then S2/n is an unbiased estimator of V (Y )


If the autocorrelation k are primarily positive, then S2/n is biased low as
an estimator of V (Y ) .
If the autocorrelation k are primarily negative, then S2/n is biased high
as an estimator of V (Y )

Steady-State Simulation
Replication Method
Use to estimate point-estimator variability and to construct a
confidence interval.
Approach: make R replications, initializing and deleting from each
one the same way.
Important to do a thorough job of investigating the initialcondition bias:
Bias is not affected by the number of replications, instead, it is affected only
by deleting more data (i.e., increasing T0) or extending the length of each
run (i.e. increasing TE).

Basic raw output data {Yrj, r = 1, ..., R; j = 1, , n} is derived by


Individual observation from within replication r (tekrar r da mteri j nin
gecikme sresi).
Batch mean from within replication r of some number of discrete-time
observations.
Batch mean of a continuous-time process over time interval j.

Steady-State Simulation
Replication Method
Each replication is regarded as a single sample for estimating
n
1
Yr . (n, d )
Yrj
For replication r:

n d j d 1
The overall point estimator:
1 R
Y.. (n, d ) Yr . (n, d )
R r 1

and

E[Y.. (n, d )] n ,d

If d and n are chosen sufficiently large n,d ~


Y.. ( n, d ) nn yaklak olarak yansz bir tahminidir.
To estimate standard error of Y.. , the sample variance and
standard error:
1 R
1
2
S
(
Y

Y
)

r. .. R 1
R 1 r 1
2

Yr. RY..
r 1

and

s.e.(Y.. )

S
R

Steady-State Simulation
Replication Method
Length of each replication (n) beyond deletion point (d):
(d nin en az 10 kat olmal):
(n - d) > 10d
Number of replications (R) should be as many as time permits,
up to about 25 replications.
For a fixed total sample size (n), as fewer data are deleted ( d):
C.I. shifts: greater bias
Standard error of Y.. (n, d ) decreases: decrease variance.

Reducing
bias

Trade off

Increasing
variance

Steady-State Simulation
Replication Method
Table 11.8 pp.416
Bu tablodan da goruldugu gibi sabit data sayisi (n) varken silinen data miktarininin
(d) azaltilmasi;
ortalamayi asagiya ceker (yanli olarak etkiler)
Hesaplanan ortalamanin standard hatasini yani S/sqrt(R) azaltir. Bunun sebebi
artan data sayisi ile birlikte varyansin azalmasidir.

Steady-State Simulation
Replication Method
M/G/1 queueing example:
Suppose R = 10, each of length TE = 15,000 minutes, starting at time 0 in
the empty and idle state, initialized for T0 = 2,000 minutes before data
collection begins.
Each batch means is the average number of customers in queue for a
1,000-minute interval.
The 1st two batch means are deleted (d = 2).
The point estimator and standard error are:
Y.. (15,2) 8.43
and
s.e.Y.. (15,2) 1.59
The 95% C.I. for long-run mean queue length is:
Y.. t / 2, R 1S / R Y.. t / 2, R 1S / R
8.43 2.26(1.59) LQ 8.42 2.26(1.59)

A high degree of confidence that the long-run mean queue length is


between 4.84 and 12.02 (if d and n are large enough).( d ve n yeteri
kadar bykse).

Steady-State Simulation
Sample Size
An alternative to increasing R is to increase total run length
T0+TE within each replication.
Approach:
Increase the run length from (T0+TE) to (R/R0)(T0+TE),
Delete the data from time 0 to time (R/R0)T0.

Advantage: any residual bias in the point estimator should be further


reduced.
However, it is necessary to have saved the state of the model at time
T0+TE and to be able to restart the model.

Steady-State Simulation
Sample Size
Example 11.17: Tablo 11.8 deki M/G/1 kuyruk orneginde R0=10 tekrar yapilmisti ve d=2
observasyonun silinmesiyle varyans tahmincisi S2=25.30 olmustu. Bizden ortalama kuyruk
uzunlugunu %90 guvenle +-2 musteri ile tahmin etmemiz isteniyor. Bu durumda kac tekrar
yapariz? Istenen adet tekrari bastan yapmak yerine ne onerirsiniz?

S
z
R /2 0

1.6452 (25.30)
22

t / 2, R 1S0

R 18 ?

t / 2, R 1S0

R 19 ?

19.15

no!!

18.93 yes, then 19 replications is enough

9 adet daha sifir anindan baslayan tekrar yapmak yerine elimizdeki tekrarlarin son
dakikalarindan itibaren simulasyonlari uzatiriz. Elimizdeki tekrarlari (R/R0)(T0+TE)=(19/10)
(15000)=28500 dakikaya uzatiriz. Tum tekrarlarda ilk (R/R0)T0 = 1.9 * 2000 =3800 dakikayi
sileriz. Cikan verilerden 3800-28500 arasini kullaniriz.

Steady-State Simulation
Batch Means

A lot of simulation time is wasted by deleting d many data points from each
replication. Instead consider having one long simulation and deleting the
data for one period.
Using a single, long replication :
Problem: data are dependent so the usual estimator is biased.
Solution: batch means.

Batch means: divide the output data from 1 replication (after appropriate

deletion) into a few large batches and then treat the means of these batches
as if they were independent.
A continuous-time process, {Y(t), T0 <= t <= T0+TE}:
k batches of size m = TE/k, batch means :

1 jm
Yj
Y (t T0 )dt
(
j

1
)
m
m

A discrete-time process,{Yi, i = d+1,d+2, , n}:


k batches of size m = (n d)/k, batch means:

jm
1
Yj
Yi d

m i ( j 1) m 1

Steady-State Simulation
Batch Means
Y1 , ..., Yd , Yd 1 , ..., Yd m , Yd m 1 , ..., Yd 2 m , ... , Yd ( k 1) m 1 , ..., Yd km
deleted

Y1

Yk

Y2

Starting either with continuous-time or discrete-time data, the variance of the


sample mean is estimated by:
2
S2 1

k
k

j 1

Y j Y
k 1

j 1

Y j2 kY 2
k (k 1)

If the batch size is sufficiently large, successive batch means will be


approximately independent, and the variance estimator will be
approximately unbiased.
No widely accepted and relatively simple method for choosing an acceptable
batch size m. Some simulation software does it automatically.
It is important to have batch sizes such that lag-1 correlation between the
batches is under 0.2.

Potrebbero piacerti anche