Sei sulla pagina 1di 77

Quantitative

Techniques
in
Management-2
Dr. Gayatri V Singh, PhD,
AMITY University,
INDIA
Today’s Highlights
• Time-Series Analysis
• Probability
• Probability Distribution
• Sampling
• Sampling Distribution
• Hypothesis Testing
Time Series Analysis
 Understand time-series forecasts
techniques
 Understand four possible components
 Understand how to use regression
models for trend analysis
 Understand nature of autocorrelation
• Time series data are composed of four elements
Trend Cyclicality Seasonality irregularity
• Trend : Long term general direction of data
• Cycles: highs and lows through which data move over time
periods usually of more than a year
• Seasonal: shorter cycles, which usually occur in time periods
of less than one year
• Irregularity: rapid changes in the data, which occur in even
shorter time frames than seasonal effects
Stationary: data that contain no trend, cyclical or seasonal
effects
Linear Regression Trend
Analysis
• The response variable ‘Y’, is being forecast
• The independent variable ‘X’, the time
periods
• Linear model
Yi = β0 + β1Xti + εi
Yi = Data value for period i
Xti = ith time period
a. Checking for dependence
We will NOT assume that Yt-1 is independent of Yt

Example: Is tomorrow’s temperature independent


of today’s?

Suppose y1 ...yT are the temperatures measured


daily for several years. Which of the following
two predictors would work better:
i. the average of the temperatures from
the previous year
ii. the temperature on the previous day?

If the readings are iid N(µ ,σ 2), what would be your


prediction for YT+1 ?
b. Checking for Independence
Independence:

Knowing Yt does not help you in predicting Yt+1

It is not always easy just to look at the data and


decide whether a time series is independent.

So how can we tell?


Plot Yt vs. Yt-1 to check for a relationship
or
Plot Yt vs. Yt-s for s = 1, 2, …
How do we do this in Minitab? – Use the “lag” command

MTB > lag c2 c3


MTB > lag c3 c4

C1 C2 C3 C4
t Y(t) Y(t-1) Y(t-2)
1 5 * *
2 8 5 *
3 1 8 5
Now each row has Y at
4 3 1 8 time t, Y one period
ago, and Y two periods
5 9 3 1 ago
6 4 9 3

Y Y lagged once Y lagged twice


Each point
e.g. is a pair of
(Level adjacent) years.
, Level
1929 1930

First, let’s plot Levelt


vs. Levelt-1

Corr = .794
Now, let’s plot Levelt vs.
Levelt-2

Corr = .531
Moving Averages
• A moving average is an average that is
updated or recomputed for every new time
period being considered.
• The most recent information is utilized in
each new moving average.
• Shown here are shipments (in millions of
dollars) for electric lighting and wiring
equipment over a 12-month period. Use
these data to compute a 4-month moving
average for all available months.
c. Autocorrelation
Time series is about dependence. We use correlation as
a measure of dependence.

Although we have only one variable, we can compute


the correlation between Yt and Yt-1 or between Yt and Yt-2 .

The correlations between Y’s at different times are


called autocorrelations (serial correlation).

However, we must assume that all the Y’s have:


– same mean (no upward or downward trends)
– same variances
We will assume what is known as stationarity.
Roughly speaking this means:
– The time series varies about a fixed mean and
has constant variance
– The dependence between successive
observations does not change over time

Let’s define the autocorrelations for a


stationary time series.
cov ( Yt ,Yt −s ) cov ( Yt ,Yt −s )
ρs = =
Var ( Yt ) × Var ( Yt −s ) Var ( Yt )

Note that the autocorrelation does not depend on t because we


have assumed stationarity
We estimate the theoretical quantities by using
sample averages (as always).

The estimated or sample autocorrelations are:

∑(Y − Y)(Y
t =s
t t −s − Y)
rs = T

∑(Y − Y)
2
t
t =1
c. Autocorrelation
There is a strong
dependence
between
observations
spaced close
together in time
(e.g only one or
two years apart).
As time passes,
the dependence
diminishes in
strength.
Autocorrelation Function for ran
(with 5% significance limits for the autocorrelations)

1.0
0.8
In contrast to the
0.6
0.4 ACF for the ‘level’
Autocorrelation

0.2
series, the sample
0.0
-0.2 autocorrelations are
-0.4 much smaller.
-0.6
-0.8
-1.0

2 4 6 8 10 12 14 16 18 20 22 24
Lag
How do we know if the sample autocorrelations are
good estimates of the underlying theoretical
autocorrelations?
and
How do we know if we have enough sample
information to reach definitive conclusions?
If all the true autocorrelations are 0, then the standard deviation of
the sample autocorrelations is about 1/sqrt(T).

1
Std Err ( rs ) =
T

T = Total Number of observations or time periods


Probability
• Structure of Probability
 Experiment : Process that produces outcome
 Event : an outcome of an experiment
 Elementary Events : Events that can not be
decomposed or broken down into other events
 Sample Space : a complete roaster or listing of all
elementary events of an experiment
 Mutually Exclusive Events : if the occurrence of
one event prevents the occurrence of other event.
 Independent Events : If occurrence and
nonoccurrence of one event does not affect the
occurrence of the other.
Unions (∪ ) and Intersections (∩ )
x = {1,4,7,9} and y = {2,3,4,5,6}
x ∪ y = {1,2,3,4,5,6,7,9}
x ∩ y = {4}
Selecting a thing without
replacement method: the possibilities
are
N
Cn = N!/n!*(N-n)!
Selecting a thing with replacement
method : the possibilities are
Nn
No. of favourable
outcome
Probability =
Total possible No. of
outcome
Probabilities
• Addition Theorem
P(X ∪ Y) = P(X) + P(Y) – P(X ∩ Y)
What is the probability of getting an
ace or red card from a pack of cards?
P(X) = 4/52
P(Y) = 26/52
P(X ∩ Y) = 1/52
P(X ∪ Y) = 4/52 +26/52 – 1/52
= 29/52
• Multiplicative Theorem
P(X ∩ Y) = P(X).P(Y/X) = P(Y).P(X/Y)
X ∩ Y : both the events must happen
Since 46% of the labour force is women,
P(W) = 0.46. P(T/W) is a probability that a
worker is a part-time worker given that
the worker is a woman i.e. P(T/W) = 0.25
Probability that labour force are women
and work part time = P(W ∩ T) =
P(W).P(T/W) = (0.46).(0.25) = 0.115
Probability Distributions

Three days forecast


• You have determined that there is
40% probability of rain on each of
three days.
• What is the probability that it will rain
on 0,1, 2 or 3 of the days?
Probability Days of Rain
Day 3
Day 2 p( , , ) = 0.216 0
Day 1
p( , , ) = 0.144 1
p( , , ) = 0.144 1

p( , , ) = 0.096 2

p( , , ) = 0.144 1

p( , , ) = 0.096 2
p( , , ) = 0.096 2

p( , , ) = 0.064 3
Probability Distribution
Days of Rain Probability
0 0.216
1 0.432
2 0.288
3 0.064
DISCRETE PROBABILITY DISTRIBUTION

Scores, x Frequency, f Scores, Probability, P(x)


x = f/n
1 24 1 24/150 =0.16
2 33 2 33/150 =0.22
3 42 3 42/150 =0.28
4 30 4 30/150 =0.20
5 21 5 21/150 =0.14

Total (n) = 150


• Mean (µ ) = Σ x P(x)

• Variance (σ 2) = Σ (x - µ )2 P(x)
= [Σ x2 P(x)] - µ 2
Scores Probability x-µ (x - µ )2 P(x)(x - µ )2
x P(x)
1 0.16 -1.94 3.764 0.602
2 0.22 -0.94 0.884 0.194
3 0.28 0.06 0.004 0.001
4 0.20 1.06 1.124 0.225
5 0.14 2.06 4.244 0.594
Σ P(x) = 1 Σ P(x)(x - µ )2
= 1.616

Mean (µ ) = 2.94

Variance (σ 2) = √ 1.616 =1.27


BINOMIAL DISTRIBUTION
• In a binomial experiment, the probability of
exactly x successes in n trials is
P(x)= nCr px qn-x
n = number of time a trial is repeated
p = Probability of success in a single trial
q = Probability of failure in a single trial
x = number of success in n trials
EXAMPLE

• From a standard deck of cards, pick a card


• Note whether it is club or not and replace it
• Repeat experiment 5 times, n = 5
• S = getting a club, F = getting another suit
• p = ¼, q = ¾
• Random variable x = 0, 1, 2, 3, 4, 5
P(0) = 5C0 p0 q5 = 0.237
P(1) = 5C1 p1 q4 = 0.396
P(2) = 5C2 p2 q3 = 0.188
P(3) = 5C3 p3 q2 = 0.088
P(4) = 5C4 p4 q1 = 0.015
P(5) = 5C5 p5 q0 = 0.001
x P(x)

0 0.237
1 0.396
2 0.188
3 0.088
4 0.015
5 0.001

MEAN = np

VARIANCE = npq
Poisson Distribution
• The probability of exactly x occurrences in an
interval is
P(x) = µ x e-µ / x!
Specific number of success within a given unit of
time.
Number of accidents per months with an average
of 3. what is the probability that in any given
month 4 accidents will occur.
P(4) = 34 e-3 / 4! = 0.168
• Number of cases of a rare blood
disease per 100,000 people
• Number of hazardous waste sites per
county
• Number of printing error in a book
with pages 1000
• Number of times a tire blows on a
commercial airplane per week
NORMAL DISTRIBUTION
• It is a symmetrical distribution
• Normal curve is bell shaped
• Mean, median and mode are equal
• Total area under the normal curve =
1
• Ratio of
Probability density function (pdf) =

68%

95%

99.7%
68%
of the observations fall within 1
standard deviation of the mean, that
is, between and
95%
of the observations fall within 2
standard deviations of the mean, that
is, between and
99.7%
of the observations fall within 3
standard deviations of the mean, that
is, between and .
Example
The distribution of heights of women aged 18 to
24 is approximately normally distributed with
mean 65.5 inches and standard deviation 2.5
inches.

From the above rule, it follows that 68% of these


women have heights between 65.5 - 2.5 and 65.5
+ 2.5 inches, or between 63 and 68 inches,

95% of these women have heights between 65.5


- 2(2.5) and 65.5 + 2(2.5) inches, or between 61
and 71 inches.
Again, you can try this out with the example
below.
Sampling
• Reason for sampling
 The sample can save money
 The sample can save time
 For given resources, the sample can broaden
the scope of the study
 Because the research process is sometimes
destructive, the sample an save product.
 If accessing the population is impossible, the
sample is the only option
Steps in Sampling process
• Define the population
• Frame the population
• Chose a sample design
• Draw the sample
• Execute the research
Designs
• A probability sample is one in which each
element of the population has a known
non-zero probability of selection.
• Not a probability sample of some elements
of population cannot be selected (have
zero probability)
• Not a probability sample if probabilities of
selection are not known.
Simple Random
Sampling
• Each element in the population
has an equal probability of
selection AND each combination
of elements has an equal
probability of selection
• Names drawn out of a hat
• Random numbers to select
elements from an ordered list

47
Stratified Sampling
• Divide population into groups
that differ in important ways
• Basis for grouping must be known
before sampling
• Select random sample from
within each group
• For a given sample size, reduces
error compared to simple random
sampling IF the groups are
different from each other
• Tradeoff between the cost of
doing the stratification and
smaller sample size needed for
same error
• Probabilities of selection may be
different for different groups, as
long as they are known
Systematic Random
• Sampling
Each element has an equal
probability of selection, but
combinations of elements have
different probabilities.
• Population size N, desired sample
size n, sampling interval k=N/n.
• Randomly select a number j between
1 and k, sample element j and then
every kth element thereafter, j+k,
j+2k, etc.
• Example: N=60, n=15, k=60/15=4.
• Random number j=3.
Cluster Sampling
• This is a form of random sampling
• Population is divided into groups,
usually geographic or
organizational (heterogeneous)
• Some of the groups are randomly
chosen
• Population is divided into groups
• Some of the groups are randomly
selected
• For given sample size, a cluster
sample has more error than a
simple random sample
• Cost savings of clustering may
permit larger sample
• Error is smaller if the clusters are
similar to each other
Difference Between Cluster and
Stratified Sampling

Population of L strata, stratum l contains nl units Population of C clusters

Take simple random sample in every stratum Take srs of clusters, sample
every unit in51
chosen clusters
Non-Probability Sampling
• Methods
Judgment sample are chosen by the judgment of the
of the researcher
• Quota samples are based on selecting objects until
you have a certain number (the quota) of each type
– Appeals to idea of a “representative” sample
– Can produce substantial bias
– Still widely used (especially for telephone surveys with high
non-response levels)
• Convenience samples are obtained by choosing the
easiest objects available
– E.g. the first ten people to walk out of a store

52
Sampling
Distributions
Typically, we are interested in learning about some
numerical feature of the population, such as
• the proportion possessing a stated characteristic;
• the mean and the standard deviation.
A numerical feature of a population is called a
parameter.
A statistic is a numerical valued function of the
sample observations.
Sample mean is an example of a statistic.
The sampling distribution of a statistic
Three important points about a statistic:
• the numerical value of a statistic cannot be expected to
give us the exact value of the parameter;
• the observed value of a statistic depends on the
particular sample that happens to be selected;
• there will be some variability in the values of a statistic
over different occasions of sampling.
Because any statistic varies from sample to sample, it is
a random variable and has its own probability
distribution.
The probability distribution of a statistic is called its
sampling distribution.
Sampling
• Distributions
Suppose there is a
population
• Random variable, X, is Age
(years) of the individuals of
individuals
• Values of X: 18, 20, 22, 24
• If you want to select two person
from this population, what are
the possible samples?
Summary Population
Measure
N Distribution
∑Xi P(X)
µ = i =1 .3
N
.2
18 + 20 + 22 + 24
= = 21 .1
4
0
X
A B C D
N (18) (20) (22) (24)
∑( X i − µ )
2

σ= i =1
= 2.236
N Uniform
Distribution
All Possible Samples of Size n
=2
1st 2nd Observation
Obs 18 20 22 24
18 18,18 18,20 18,22 18,24
16 Sample
20 20,18 20,20 20,22 20,24
Means
22 22,18 22,20 22,22 22,24 1st 2nd Observation
24 24,18 24,20 24,22 24,24 Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
16 Samples
Samples Taken with
22 20 21 22 23
Replacement
24 21 22 23 24
Sampling
Distribution of All
Sample
16 Sample Means
Means
Sample Means
1st 2nd Observation Distribution
Obs 18 20 22 24 P(X)

18 18 19 20 21 .3

20 19 20 21 22 .2

22 20 21 22 23 .1
_
24 21 22 23 24 0 X
18 19 20 21 22 23 24

# in sample = 2, # in Sampling Distribution


= 16
Distribution of the sample mean
Statistical inference about the population mean is of
prime practical importance. Inferences about this
parameter are based on the sample mean and its
sampling distribution.
Comparing the Population with
its Sampling Distribution
Population Sample Means
Distribution
µ = 21, σ = µx = 21n =σ2 = 1.58
x
P(X)
2.236 P(X)
.3
.3

.2 .2

.1 .1

0
X
0 _
A B C D 18 19 20 21 22 23 24 X
(18) (20) (22) (24)
Hypothesis Testing
• Make statement(s) regarding unknown population
parameter values based on sample data
• Elements of a hypothesis test:
– Null hypothesis - Statement regarding the value(s) of
unknown parameter(s). Typically will imply no association
between explanatory and response variables in our
applications (will always contain an equality)
– Alternative hypothesis - Statement contradictory to the
null hypothesis (will always contain an inequality)
– Test statistic - Quantity based on sample data and null
hypothesis used to test between null and alternative
hypotheses
– Rejection region - Values of the test statistic for which we
reject the null in favor of the alternative hypothesis
The Null Hypothesis, H0
• States the Assumption (numerical) to be tested
e.g. The grade point average of juniors is 3.0 (H0:
µ = 3.0)
• Begin with the assumption that the null
hypothesis is TRUE.
•Always contains the ‘ = ‘ sign
The Alternative Hypothesis,
H•1 Is the opposite of the null hypothesis
e.g. The grade point average of juniors is not equal
to or less than 3.0 (H1: µ ≠ 3.0) or (H1: µ < 3.0)
• Never contains the ‘=‘ sign
• The Alternative Hypothesis may or may
not be accepted
• Is generally the hypothesis that is believed to
be true by the researcher
Hypothesis Testing
Test Result – H0 True H0 False

True State
H0 True Correct Type I Error
Decision
H0 False Type II Error Correct
Decision

α = P (Type I Error ) β = P(Type II Error )


Level of Significance, α
• Defines Unlikely Values of Sample Statistic
if Null Hypothesis Is True
– Called Rejection Region of Sampling
Distribution
• Designated α (alpha)
– Typical values are 0.01, 0.05, 0.10
• Selected by the Researcher at the Start
• Provides the Critical Value(s) of the Test
Steps of Hypothesis
• Identify data testing
• Formulation of Hypothesis
• Determine Degrees of freedom
• Specify level of significance
• Calculate test statistics
• Find Table Value of test statistic
• Result
• Interpretation
If n ≥ 30 then it is a large sample and you have to apply Z test
If n < 30 then it is a small sample and you have to apply t test

X −μ -
t = ( X − X 2 ) − ( µ1 − µ 2 ) d
S t= 1 t =
n s X1 − X 2 S
d
n
2 2
s s
s X1 − X 2 = 1
+ 2
n n
Sample Hypothesized Sample Estimated t-statistic
Data Population Variance Standard
Parameter Error

One sample
t-statistic SS s2 X−µ
µ s = 2
X df t=
n sx

Paired
samples t- SS
s = D
2 s2 D − µD
statistic D µD df t=
n sD

Independent
S +S
samples 1−
X X µ
1−µ2
S
s= 1
2 S2 s2p
+
s2p
2 f1+d
dp
f2
t-statistic n1 n2

(
X −X)− (µ −µ)
t= 1 2 1 2

sx−
1 x
2
Steps for Calculating a Test Statistic
One-Sample T

1. Calculate sample mean


2. Calculate standard error
3. Calculate T and d.f.
4. Use Table D
Steps for Calculating a Test Statistic

Independent Samples T

1. Calculate X1-X2
S +S
S S2
sp= 1
2
2. Calculate pooled variance f1+d
d f2
s2p s2p 3. Calculate standard error
+ (
X −X)− (µ −µ)
n1 n2 4. Calculate T and d.f. t= 1 2 1 2

sx− x
d.f. = (n1 - 1) + (n2 - 1)
1 2
5. Use Table E.6
Illustration
A developmental psychologist would like to examine the
difference in verbal skills for 8-year-old boys versus 8-
year-old girls. A sample of 10 boys and 10 girls is
obtained, and each child is given a standardized verbal
abilities test. The data for this experiment are as follows:

Girls Boys

n1 = 10 n2 = 10
X = 37
1 X2
= 31

SS1 = 150 SS2 = 210


Illustration

Girls Boys

n1 = 10 n2 = 10
X = 37
1 X = 31
2

SS1 = 150
SS2 = 210

STEP 1: get mean difference

X1 − X 2 = 6
Illustration
Girls Boys

n1 = 10 n2 = 10
X = 37
1 X = 31
2

SS1 = 150
SS2 = 210

STEP 2: Compute Pooled Variance

S
S+S
S 1
5
0+2
1
0 36
0
s=
21 2
= ==2
0
+ −+ −
p
d
f
1 d
f2 (
1
0 1
)(
101
)1
8
Illustration

Girls Boys

n1 = 10 n2 = 10
X = 37
1 X = 31
2

SS1 = 150
SS2 = 210

STEP 3: Compute Standard Error

2 2
s s 2
020
= + = + =4
=
p p
S
E 2
n
1 n
2 1
010
Illustration

Girls Boys

n1 = 10 n2 = 10
X = 37
1 X = 31
2

SS1 = 150
SS2 = 210

STEP 4: Compute T statistic and df

=
t
(
X−
1 X
)
2− µ
(1µ
−)(
2
=
3
7−
3
1−
)0
=
3
s
x−
1x2
2
d.f. = (n1 - 1) + (n2 - 1) = (10-1) + (10-1) = 18
Illustration

STEP 5: Use table E.6

T = 3 with 18 degrees of freedom

For alpha = .01, critical value of t is 2.552


T calculated > t tabulated
Our T is more extreme, so we reject the null
There is a significant difference between boys and girls

Potrebbero piacerti anche