Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tests
Karl F. Gauss
Abraham de Moivre (1667-
(1777-1855)
1754)
NORMAL DISTRIBUTION
Normal (Gaussian) distribution is the most famous
probability distribution of continuous variables.
2
1 xi
1
2
f ( x) e xi
2
The normal distribution is completely defined
by the mean and standard deviation of a set of
quantitative data:
The mean determines the location of the
curve on the x axis of a graph
The standard deviation determines the
height of the curve on the y axis
20
Frequency
Frequency
15
10
0 //
55 60 65 70 75 80 85 90
Mean Heart Rate (BPM)
M od e, M ed ian, M ean
Frequency
of
occurrence
Quantitative variables may also have a
skewed distribution:
When distributions are skewed, they have
more extreme values in one direction than the
other, resulting in a long tail on one side of the
distribution.
The direction of the tail determines whether a
distribution is positively or negatively skewed.
A positively skewed distribution has a long tail
on the right, or positive side of the curve.
A negatively skewed distribution has the tail
on the left, or negative side of the curve.
For a normally distributed variable:
~68.3% of the observations lie between the mean and 1
standard deviation
~95.4% lie between the mean and 2 standard deviations
~99.7% lie between the mean and 3 standard deviations
Mode,
Median,
Mean
68.3 %
95.4 %
99.7 %
68.26%
95.44%
99.74%
3 2 2 3
P ( x ) 0.6826
P( 2 x 2 ) 0.9544
P( 3 x 3 ) 0.9974
Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR For the heart rate data for 84
F
M
55
57
M
F
66
67
F
F
70
70
M
M
73
73
F
F
77
77
M
M
79
79
F
M
82
82
adults:
M
F
59
61
F
F
67
68
M
M
70
70
M
M
73
73
F
M
77
77
M
F
79
80
F
M
83
83
Mean HR = 74.0 bpm
M
M
61
62
F
F
68
68
F
F
71
71
F
F
74
74
M
F
77
78
F
M
80
80
M
F
83
84
SD = 7.5 bpm
M 62 M 68 M 71 F 74 F 78 F 81 F 84
F 63 F 69 M 71 M 74 F 78 F 81 M 85
F 64 M 69 F 72 F 75 F 78 F 81 F 86
M
M
64
64
M
M
69
69
M
F
72
73
F
M
75
75
M
M
78
78
M
F
81
82
F
M
86
89
Mean 1SD = 74.0
M 66 F 70 M 73 M 76 M 79 F 82 M 89 7.5
= 66.5-81.5 bpm
25
20
15
10
15.0
5 = 59.0-89.0 bpm
0 //
55 60 65 70 75 80 85 90
22.5
= 51.5-96.5 bpm
HR Data:
57/84 (67.9%) subjects are between mean 1SD
82/84 (97.6%) are between mean 2SD
84/84 (100%) are between mean 3SD
100
95 +3 SD
90 +2 SD
85
+ 1SD
Heart rate (bpm)
80
75
Mean
70
-1 SD
65
60 -2 SD
55
-3 SD
50
45
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84
Subject number
The normal range in medical measurements is the central 95%
of the values for a reference population, and is usually determined
from large samples representative of the population.
The central 95% is approximately the mean 2 sd*
BASICS OF
HYPOTHESIS
TESTING
An Example to Start with
An imaginary company called Chose
Your Baby provided a product called
Gender Choice
The claim is: increase your chances of
having
a boy up to 85%,
a girl up to 80%"
Suppose an experiment with 100
couples who want to have baby girls,
and they all follow the Gender Choice
directions in the pink package
An Example to Start with
For the purpose of testing the claim of an
increased likelihood for girls, we will
assume that Gender Choice has no
effect
Using common sense and no formal
statistical methods, what should we
conclude about the assumption of no effect
from Gender Choice if 100 couples using
Gender Choice have 100 babies consisting
of
52 girls?
97 girls?
An Example to Start with
normally around 50 girls in 100 births
52 girls is close to 50, so we should not
conclude that the Gender Choice product
is effective
If 100 couples used no special method,
the result of 52 girls could easily occur by
chance
The assumption of no effect from Gender
Choice appears to be correct
There isnt sufficient evidence to say that
Gender Choice is effective
An Example to Start with
97 girls in 100 births is extremely unlikely to
occur by chance
We could explain the occurrence of 97 girls
in one of two ways:
Either an extremely rare event has occurred by
chance, or
Gender Choice is effective.
The extremely low probability of getting 97
girls is strong evidence against the
assumption that Gender Choice has no
effect
It does appear to be effective
What is a Hypothesis
The word hypothesis is just slightly
technical or mathematical term for
sentence or claim or
statement
A statement that something is true
concerning the population
In statistics, a hypothesis is always
a statement about the value of one
or more population parameter(s)
Parameter vs Statistic
Parameter is
a summary value which in some way characterizes
the nature of the population in the variable
under study
If the measures are computed for data from a
population, they are called population parameters
Statistic is
a summary value calculated from a sample of
observation
If the measures are computed for data from a
sample, they are called sample statistics
Parameter vs Statistic
POPULATION SAMPLE
Parameter Statistic
Number of cases N n
Standard Deviation S, sd
Arithmetic Mean X
Hypothesis Test
is a process that uses sample
statistics to test a claim about
the value of a population
parameter
purpose of hypothesis testing is
to determine whether there is
enough statistical evidence in
favor of a certain belief about a
population parameter
is a standard procedure for
More on Hypothesis
H subzero or H naught
A null hypothesis H0 is a statistical hypothesis
that contains a statement of equality such as ,
=, or
H sub-a
Condition of
equality
H0: 99.95 % (Claim)
Complement of
the null
hypothesis
Example:
Write the claim as a mathematical sentence.
State the null and alternative hypotheses and
identify which represents the claim.
Condition of
equality
H0: p = 0.94 (Claim)
Ha: p 0.94
Complement of
the null
hypothesis
One Sided vs Two Sided Ha
The null and alternative
hypotheses are complementary.
two alternatives together cover
all possibilities of the values that
the hypothesized parameter can
assume
Two-sided One-sided
H0: = 0 H0: = 0 H0: = 0
Ha: 0 Ha: > 0 Ha: < 0
Typical statistical hypotheses are:
>5 cm P0.65
..
2>2.00 1-2>0
Null Hypothes is
Decisio True False
n
Accept H0 Correct Decision Type II Error
(1- )
Reject H0 Type I Error Correct Decision
(1- )
= P(commit a Type I error) = P(reject H0 given that H0 is true)
= P(commit a Type II error) = P(accept H0 given that H0 is false)
We want to keep both and as small as
possible. The value of is controlled by the
experimenter and is called the significance
level
Generally, with everything else held
constant, decreasing one type of error causes
the other to increase
Balance Between and
The only way to decrease both types of
error simultaneously is to increase the
sample size.
No matter what decision is reached,
there is always the risk of one of these
errors.
Balance: identify the largest
significance level as the maximum
tolerable risk you want to have of
making a type I error. Employ a test
procedure that makes type II error as
small as possible while maintaining
type I error smaller than the given
Significance Level
denoted by
the probability that the test statistic
will fall in the critical region when the
null hypothesis is actually true.
common choices are 0.05, 0.01, and
0.10
p-value
A p-value, or probability value, is
the value that represents the
probability of selecting a
sample at least as extreme as
the observed sample
is a measure of inconsistency
between the hypothesized
value under the null hypothesis
and the observed sample
p-value
Itmeasures whether the test
statistic is likely or unlikely,
assuming H0 is true.
Small p-values suggest that the
null hypothesis is unlikely to be
true
The smaller it is, the more convincing is
the rejection of the null hypothesis.
It indicates the strength of evidence for
rejecting the null hypothesis H0
In Other Words
A small p value indicates that the
observed result is unlikely by
chance (therefore statistically
significant) and provides evidence to
reject the null hypothesis
A large p value indicates that the
sample result is not unusual,
therefore not statistically significant - or
that it could easily occur by chance,
which tells us to NOT reject the null
hypothesis
In Other Words
We use our data to calculate the
probability that our finding is
just due to chance, under the
null hypothesis.
This is called the p value.
If the p value is small enough, we
will reject the null hypothesis and
conclude there is a difference.
How small is small enough?
How Small???
A decision as to whether H0
should be rejected
results from comparing the p
value to the chosen significance
level :
H0 should be rejected if p-value
<
H0 should not be rejected if p-
value >
The total area under the normal distribution curve is
1:
90% of the area is between 1.645 std dev
95% of the area is between 1.960 std dev
99% of the area is between 2.575 std dev
Area = 90%
Area = 95%
Area = 99%
b.) Because
a.) 0.01? 0.0256 is < 0.05, you should reject the
null hypothesis.
CONTINUOUS CATEGORICAL
CONTINUOUS
Kruskal Wallis
Mann Whitney Wilcoxon Signed Friedman
Analysis of
Sign Test U Test Rank Test Test
Variance
Parametric Nonparametric
CATEGORICAL
Parametric Nonparametric