Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Techniques
in
Management-2
Dr. Gayatri V Singh, PhD,
AMITY University,
INDIA
Today’s Highlights
• Time-Series Analysis
• Probability
• Probability Distribution
• Sampling
• Sampling Distribution
• Hypothesis Testing
Time Series Analysis
Understand time-series forecasts
techniques
Understand four possible components
Understand how to use regression
models for trend analysis
Understand nature of autocorrelation
• Time series data are composed of four elements
Trend Cyclicality Seasonality irregularity
• Trend : Long term general direction of data
• Cycles: highs and lows through which data move over time
periods usually of more than a year
• Seasonal: shorter cycles, which usually occur in time periods
of less than one year
• Irregularity: rapid changes in the data, which occur in even
shorter time frames than seasonal effects
Stationary: data that contain no trend, cyclical or seasonal
effects
Linear Regression Trend
Analysis
• The response variable ‘Y’, is being forecast
• The independent variable ‘X’, the time
periods
• Linear model
Yi = β0 + β1Xti + εi
Yi = Data value for period i
Xti = ith time period
a. Checking for dependence
We will NOT assume that Yt-1 is independent of Yt
C1 C2 C3 C4
t Y(t) Y(t-1) Y(t-2)
1 5 * *
2 8 5 *
3 1 8 5
Now each row has Y at
4 3 1 8 time t, Y one period
ago, and Y two periods
5 9 3 1 ago
6 4 9 3
Corr = .794
Now, let’s plot Levelt vs.
Levelt-2
Corr = .531
Moving Averages
• A moving average is an average that is
updated or recomputed for every new time
period being considered.
• The most recent information is utilized in
each new moving average.
• Shown here are shipments (in millions of
dollars) for electric lighting and wiring
equipment over a 12-month period. Use
these data to compute a 4-month moving
average for all available months.
c. Autocorrelation
Time series is about dependence. We use correlation as
a measure of dependence.
∑(Y − Y)(Y
t =s
t t −s − Y)
rs = T
∑(Y − Y)
2
t
t =1
c. Autocorrelation
There is a strong
dependence
between
observations
spaced close
together in time
(e.g only one or
two years apart).
As time passes,
the dependence
diminishes in
strength.
Autocorrelation Function for ran
(with 5% significance limits for the autocorrelations)
1.0
0.8
In contrast to the
0.6
0.4 ACF for the ‘level’
Autocorrelation
0.2
series, the sample
0.0
-0.2 autocorrelations are
-0.4 much smaller.
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24
Lag
How do we know if the sample autocorrelations are
good estimates of the underlying theoretical
autocorrelations?
and
How do we know if we have enough sample
information to reach definitive conclusions?
If all the true autocorrelations are 0, then the standard deviation of
the sample autocorrelations is about 1/sqrt(T).
1
Std Err ( rs ) =
T
p( , , ) = 0.096 2
p( , , ) = 0.144 1
p( , , ) = 0.096 2
p( , , ) = 0.096 2
p( , , ) = 0.064 3
Probability Distribution
Days of Rain Probability
0 0.216
1 0.432
2 0.288
3 0.064
DISCRETE PROBABILITY DISTRIBUTION
• Variance (σ 2) = Σ (x - µ )2 P(x)
= [Σ x2 P(x)] - µ 2
Scores Probability x-µ (x - µ )2 P(x)(x - µ )2
x P(x)
1 0.16 -1.94 3.764 0.602
2 0.22 -0.94 0.884 0.194
3 0.28 0.06 0.004 0.001
4 0.20 1.06 1.124 0.225
5 0.14 2.06 4.244 0.594
Σ P(x) = 1 Σ P(x)(x - µ )2
= 1.616
Mean (µ ) = 2.94
0 0.237
1 0.396
2 0.188
3 0.088
4 0.015
5 0.001
MEAN = np
VARIANCE = npq
Poisson Distribution
• The probability of exactly x occurrences in an
interval is
P(x) = µ x e-µ / x!
Specific number of success within a given unit of
time.
Number of accidents per months with an average
of 3. what is the probability that in any given
month 4 accidents will occur.
P(4) = 34 e-3 / 4! = 0.168
• Number of cases of a rare blood
disease per 100,000 people
• Number of hazardous waste sites per
county
• Number of printing error in a book
with pages 1000
• Number of times a tire blows on a
commercial airplane per week
NORMAL DISTRIBUTION
• It is a symmetrical distribution
• Normal curve is bell shaped
• Mean, median and mode are equal
• Total area under the normal curve =
1
• Ratio of
Probability density function (pdf) =
68%
95%
99.7%
68%
of the observations fall within 1
standard deviation of the mean, that
is, between and
95%
of the observations fall within 2
standard deviations of the mean, that
is, between and
99.7%
of the observations fall within 3
standard deviations of the mean, that
is, between and .
Example
The distribution of heights of women aged 18 to
24 is approximately normally distributed with
mean 65.5 inches and standard deviation 2.5
inches.
47
Stratified Sampling
• Divide population into groups
that differ in important ways
• Basis for grouping must be known
before sampling
• Select random sample from
within each group
• For a given sample size, reduces
error compared to simple random
sampling IF the groups are
different from each other
• Tradeoff between the cost of
doing the stratification and
smaller sample size needed for
same error
• Probabilities of selection may be
different for different groups, as
long as they are known
Systematic Random
• Sampling
Each element has an equal
probability of selection, but
combinations of elements have
different probabilities.
• Population size N, desired sample
size n, sampling interval k=N/n.
• Randomly select a number j between
1 and k, sample element j and then
every kth element thereafter, j+k,
j+2k, etc.
• Example: N=60, n=15, k=60/15=4.
• Random number j=3.
Cluster Sampling
• This is a form of random sampling
• Population is divided into groups,
usually geographic or
organizational (heterogeneous)
• Some of the groups are randomly
chosen
• Population is divided into groups
• Some of the groups are randomly
selected
• For given sample size, a cluster
sample has more error than a
simple random sample
• Cost savings of clustering may
permit larger sample
• Error is smaller if the clusters are
similar to each other
Difference Between Cluster and
Stratified Sampling
Take simple random sample in every stratum Take srs of clusters, sample
every unit in51
chosen clusters
Non-Probability Sampling
• Methods
Judgment sample are chosen by the judgment of the
of the researcher
• Quota samples are based on selecting objects until
you have a certain number (the quota) of each type
– Appeals to idea of a “representative” sample
– Can produce substantial bias
– Still widely used (especially for telephone surveys with high
non-response levels)
• Convenience samples are obtained by choosing the
easiest objects available
– E.g. the first ten people to walk out of a store
52
Sampling
Distributions
Typically, we are interested in learning about some
numerical feature of the population, such as
• the proportion possessing a stated characteristic;
• the mean and the standard deviation.
A numerical feature of a population is called a
parameter.
A statistic is a numerical valued function of the
sample observations.
Sample mean is an example of a statistic.
The sampling distribution of a statistic
Three important points about a statistic:
• the numerical value of a statistic cannot be expected to
give us the exact value of the parameter;
• the observed value of a statistic depends on the
particular sample that happens to be selected;
• there will be some variability in the values of a statistic
over different occasions of sampling.
Because any statistic varies from sample to sample, it is
a random variable and has its own probability
distribution.
The probability distribution of a statistic is called its
sampling distribution.
Sampling
• Distributions
Suppose there is a
population
• Random variable, X, is Age
(years) of the individuals of
individuals
• Values of X: 18, 20, 22, 24
• If you want to select two person
from this population, what are
the possible samples?
Summary Population
Measure
N Distribution
∑Xi P(X)
µ = i =1 .3
N
.2
18 + 20 + 22 + 24
= = 21 .1
4
0
X
A B C D
N (18) (20) (22) (24)
∑( X i − µ )
2
σ= i =1
= 2.236
N Uniform
Distribution
All Possible Samples of Size n
=2
1st 2nd Observation
Obs 18 20 22 24
18 18,18 18,20 18,22 18,24
16 Sample
20 20,18 20,20 20,22 20,24
Means
22 22,18 22,20 22,22 22,24 1st 2nd Observation
24 24,18 24,20 24,22 24,24 Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
16 Samples
Samples Taken with
22 20 21 22 23
Replacement
24 21 22 23 24
Sampling
Distribution of All
Sample
16 Sample Means
Means
Sample Means
1st 2nd Observation Distribution
Obs 18 20 22 24 P(X)
18 18 19 20 21 .3
20 19 20 21 22 .2
22 20 21 22 23 .1
_
24 21 22 23 24 0 X
18 19 20 21 22 23 24
.2 .2
.1 .1
0
X
0 _
A B C D 18 19 20 21 22 23 24 X
(18) (20) (22) (24)
Hypothesis Testing
• Make statement(s) regarding unknown population
parameter values based on sample data
• Elements of a hypothesis test:
– Null hypothesis - Statement regarding the value(s) of
unknown parameter(s). Typically will imply no association
between explanatory and response variables in our
applications (will always contain an equality)
– Alternative hypothesis - Statement contradictory to the
null hypothesis (will always contain an inequality)
– Test statistic - Quantity based on sample data and null
hypothesis used to test between null and alternative
hypotheses
– Rejection region - Values of the test statistic for which we
reject the null in favor of the alternative hypothesis
The Null Hypothesis, H0
• States the Assumption (numerical) to be tested
e.g. The grade point average of juniors is 3.0 (H0:
µ = 3.0)
• Begin with the assumption that the null
hypothesis is TRUE.
•Always contains the ‘ = ‘ sign
The Alternative Hypothesis,
H•1 Is the opposite of the null hypothesis
e.g. The grade point average of juniors is not equal
to or less than 3.0 (H1: µ ≠ 3.0) or (H1: µ < 3.0)
• Never contains the ‘=‘ sign
• The Alternative Hypothesis may or may
not be accepted
• Is generally the hypothesis that is believed to
be true by the researcher
Hypothesis Testing
Test Result – H0 True H0 False
True State
H0 True Correct Type I Error
Decision
H0 False Type II Error Correct
Decision
X −μ -
t = ( X − X 2 ) − ( µ1 − µ 2 ) d
S t= 1 t =
n s X1 − X 2 S
d
n
2 2
s s
s X1 − X 2 = 1
+ 2
n n
Sample Hypothesized Sample Estimated t-statistic
Data Population Variance Standard
Parameter Error
One sample
t-statistic SS s2 X−µ
µ s = 2
X df t=
n sx
Paired
samples t- SS
s = D
2 s2 D − µD
statistic D µD df t=
n sD
Independent
S +S
samples 1−
X X µ
1−µ2
S
s= 1
2 S2 s2p
+
s2p
2 f1+d
dp
f2
t-statistic n1 n2
(
X −X)− (µ −µ)
t= 1 2 1 2
sx−
1 x
2
Steps for Calculating a Test Statistic
One-Sample T
Independent Samples T
1. Calculate X1-X2
S +S
S S2
sp= 1
2
2. Calculate pooled variance f1+d
d f2
s2p s2p 3. Calculate standard error
+ (
X −X)− (µ −µ)
n1 n2 4. Calculate T and d.f. t= 1 2 1 2
sx− x
d.f. = (n1 - 1) + (n2 - 1)
1 2
5. Use Table E.6
Illustration
A developmental psychologist would like to examine the
difference in verbal skills for 8-year-old boys versus 8-
year-old girls. A sample of 10 boys and 10 girls is
obtained, and each child is given a standardized verbal
abilities test. The data for this experiment are as follows:
Girls Boys
n1 = 10 n2 = 10
X = 37
1 X2
= 31
Girls Boys
n1 = 10 n2 = 10
X = 37
1 X = 31
2
SS1 = 150
SS2 = 210
X1 − X 2 = 6
Illustration
Girls Boys
n1 = 10 n2 = 10
X = 37
1 X = 31
2
SS1 = 150
SS2 = 210
S
S+S
S 1
5
0+2
1
0 36
0
s=
21 2
= ==2
0
+ −+ −
p
d
f
1 d
f2 (
1
0 1
)(
101
)1
8
Illustration
Girls Boys
n1 = 10 n2 = 10
X = 37
1 X = 31
2
SS1 = 150
SS2 = 210
2 2
s s 2
020
= + = + =4
=
p p
S
E 2
n
1 n
2 1
010
Illustration
Girls Boys
n1 = 10 n2 = 10
X = 37
1 X = 31
2
SS1 = 150
SS2 = 210
=
t
(
X−
1 X
)
2− µ
(1µ
−)(
2
=
3
7−
3
1−
)0
=
3
s
x−
1x2
2
d.f. = (n1 - 1) + (n2 - 1) = (10-1) + (10-1) = 18
Illustration