Sei sulla pagina 1di 54

PROBABILITY & STATISTICAL

INFERENCE LECTURE 6

MSc in Computing (Data Analytics)

Lecture Outline
Quick Recap
Testing the difference between two sample means
Practical Hypothesis Testing
Analysis Of Variance
General Steps in Hypotheses testing
1. From the problem context, identify the parameter of interest.
2. State the null hypothesis, H
0
.
3. Specify an appropriate alternative hypothesis, H
1
.
4. Choose a significance level, o.
5. Determine an appropriate test statistic.
6. State the rejection region for the statistic.
7. Compute any necessary sample quantities, substitute these into the
equation for the test statistic, and compute that value.
8. Decide whether or not H
0
should be rejected and report that in the
problem context.

Type of questions that can be answered with Two sample
hypothesis tests
A manufacturing plant want to compare the
defective rate of items coming off two different
process lines.
Whether the test results of patients who received a
drug are better than test results of those who
received a placebo.
The question being answered is whether there is a
significant (or only random) difference in the
average cycle time to deliver a pizza from Pizza
Company A vs. Pizza Company B.
Difference in Means of Two Normal Distributions, Variances
Known
Test Assumptions
Example
Example
Example
The P-Value is the exact significance level of a statistical test; that
is the probability of obtaining a value of the test statistic that
is at least as extreme as that when the null hypothesis is true
Confidence Interval on a Difference in Means, Variances
Known
Example
Example
Difference in Means of Two Normal Distributions,
Variances unknown
We wish to test:
The pooled estimator of o
2
:
Difference in Means of Two Normal Distributions,
Variances unknown
Example
Example
Example
Confidence Interval on the Difference in Means, Variance
Unknown
Example
Example
Example
Practical Hypothesis Testing
1. From the problem context, identify the parameter of
interest.
2. State the null hypothesis, H
0
.
3. Specify an appropriate alternative hypothesis, H
1
.
4. Choose a significance level, o.
5. Calculate the P-value using a software package of choice.
6. Decide whether or not H
0
should be rejected and report
that in the problem context. Reject H
0
when P-Value is less
than o.
(Golden rule: Reject H
0
for small o)

Some Reserach
Look up the correct formula for calculating the
hypotheses test between two proportions
What are the assumptions for the test
Find an example of the research
Answer to research
Large-sample test on the difference in population
proportions
Example
Example of large-sample test on the difference in
population proportions

Analysis of Variance
Introduction
In the previous section we were concerned with the
analysis of data where we compared the sample
means.
Frequently data contains more that two samples,
they may compare several treatments.
In this lecture we introduce statistical analysis that
allows us compare the mean of more that two
samples. The method is called Analysis of Variance
or AVOVA for short.
Total Sum of Squares
Data set:
14, 12, 10, 6 ,4, 2
Group A:
6 ,4, 2
Group B:
14, 12, 10
Overall Mean : 8
Total Sum of Squares:
SS
T
= (14-8)
2
+ (12-8)
2
+
(10-8)
2
+ (6-8)
2
+ (4-8)
2
+
(2-8)
2
=112
Between Group Variation
Sum of Squares of the
Model:
SS
m
= n
a
( -
a
)
2
+ n
b
( -

b
)
2

=3*(8-4)
2
+

3*(8-12)
2

=96



Within Group Variation
Sum of Squares of the
Error:

SS
e
=


= (14-12)
2
+ (12-12)
2
+
(10-12)
2
+ (6-4)
2
+ (4-
4)
2
+ (4-2)
2
+


= 16

2
1 1
__
) (

= =

k
i
n
j
j ij
x x
Structure of the Data
Group Observation Total Mean
1 x
11
x
12
.......... x
1n
x
1

2 x
21
x
22

..........

x
2n
x
2

.
.
.
..........
a x
a1
x
a2
.......... x
an
x
a

Total
1
x
2
x
a
x
x
ANOVA Table
Source Degrees of
Freedom
Sum Of Squares Mean
Square
F- Stat
Model a - 1 SS
M
/(a-1) MS
M
/ MS
E

Error n-a
SS
E
/(n-a)

Total n-1
SS
T
/(n-1)

2
1
) ( x x
n
i
i
=

=

a
j
j
j
x x n
1
2
) (
2
1 1
__
) (

= =

a
i
n
j
j ij
x x
Where : n is the sample size and a is the number of
groups
ANOVA Table Original Example
Source Degrees of
Freedom
Sum Of Squares Mean
Square
F- Stat
Model 2 - 1 = 1 96 96 24
Error 6 2 = 4 16
4

Total 6 1 = 5 112
Where : n is the sample size and a is the number of groups
Model Assumptions
Independence of observations within and between
samples
normality of sampling distribution
equal variance - This is also called the
homoscedasticity assumption
The ANOVA Equation
We can describe the observations in the above
table using the following equation:

=
=
+ + =
n j
a i
Y
ij i ij
,......, 2 , 1
,......, 2 , 1
c t
Where : n is the sample size and k is the number of groups
ANOVA Hypotheses
We wish to test the hypotheses:
The analysis of variance partitions the total variability
into two parts.
Example
Graphical Display of Data
Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of
the model in Equation 13-1 for the completely randomized single-factor
experiment
Example
We can use ANOVA to test the hypotheses that
different hardwood concentrations do not affect the
mean tensile strength of the paper. The hypotheses
are:



The ANOVA table is below:
Example
The p-value is less than 0.05 therefore the H
0
can
be rejected and we can conclude that at least one
of the hardwood concentrations affects the mean
tensile strength of the paper.
Test Model Assumptions
Use the Bartletts Test to test for homoscedasticity
assumption
Bartlett's test (Snedecor and Cochran, 1983) is used
to test if k samples have equal variances.
Bartlett's test is sensitive to departures from
normality. That is, if your samples come from non-
normal distributions, then Bartlett's test may simply
be testing for non-normality. The Levene test is an
alternative to the Bartlett test that is less sensitive to
departures from normality.

Barlett Test for Equal Variance
The hypotheses for the Barlett test are as follows:




The barlett test statistic follows a chi-squared
distribution
Interpert the p-value like any other hypothese test



j i, pair on least at for : H
... : H
2 2
i 1
2 2
2
2
1 0
j
n
o o
o o o
=
= = =
If the Assumption of Equal Variance is
not met
If the assumption for equal variance is not met use
the Welches ANOVA
Assignment for next week:
Investigate the difference between the standard
ANOVA and Welches ANOVA?
Demo

Confidence Interval about the mean
For 20% hardwood, the resulting confidence interval on the mean is
Confidence Interval about on the difference of two treatments
For the hardwood concentration example,
An Unbalanced Experiment
Multiple Comparisons Following the
ANOVA
The least significant difference (LSD) is
If the sample sizes are different in each treatment:
Example: Multi-comparison Test
Example: Multi-comparison Test
Demo

Exercises

Potrebbero piacerti anche