Sei sulla pagina 1di 22

The t-test

Inferences about Population Means


Questions
How are the distributions of z and t related?
Given that

construct a rejection region. Draw a picture
to illustrate.
What is the standard error of the difference
between means? What are the factors that
influence its size?

01 . 2 ; 49 ; 14 ; 75 : ; 75 :
) 48 , 05 (. 1 0
= = = = = t N s H H
y

Questions (2)
What are the main uses of the t-test?
Give a concrete example of the use of the
{one sample, independent samples, dependent
samples} t-test. State why the particular test
is the right one to choose.
What is the importance of variance accounted
for?

Confidence intervals in z
For large samples (N>100) can use z.


Suppose

Then

If
M
M
est
y
z
o

.
) (
=
N
N
y y
N
s
est
y
M
1
) (
.
2

= =

o
200 ; 5 ; 10 : ; 10 :
1 0
= = = = N s H H
y

35 .
14 . 14
5
200
5
. = = = =
N
s
est
y
M
o
05 . 96 . 1 83 . 2 ; 83 . 2
35 .
) 10 11 (
11 < > =

= = p z y
The t Distribution
We use t when the population variance is unknown (the
usual case) and sample size is small (N<100, the usual
case). If you use a stat package for testing hypotheses
about means, you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends on
its df. As N becomes infinitely large, t becomes normal.
Degrees of Freedom
For the t distribution, degrees of freedom are always a
simple function of the sample size, e.g., (N-1).

One way of explaining df is that if we know the total or
mean, and all but one score, the last (N-1) score is not free to
vary. It is fixed by the other scores. 4+3+2+X = 10. X=1.
Confidence Intervals in t
With a small sample size, we compute the same numbers
as we did for z, but we compare them to the t distribution
instead of the z distribution.
25 ; 5 ; 10 : ; 10 :
1 0
= = = = N s H H
y

1
25
5
. = = =
N
s
est
y
M
o
1
1
) 10 11 (
11 =

= = t y
064 . 2 ) 24 , 05 (. = t
1<2.064, n.s.
Interval =
] 064 . 13 , 936 . 8 [ ) 1 ( 064 . 2 11

M
t y o
Interval is about 9 to 13 and contains 10, so n.s.
(c.f. z=1.96)
Review
How are the distributions of z and t related?
Given that

construct a rejection region. Draw a picture
to illustrate.

01 . 2 ; 49 ; 14 ; 75 : ; 75 :
) 48 , 05 (. 1 0
= = = = = t N s H H
y

Difference Between Means (1)
Most studies have at least 2 groups
(e.g., M vs. F, Exp vs. Control)[1 v 2
sample]
If we want to know diff in population
means, best guess is diff in sample
means.
Unbiased:
Variance of the Difference:
Standard Error:
2
2
2
1 2 1
) var(
M M
y y o o + =
2 1 2 1 2 1
) ( ) ( ) ( = = y E y E y y E
2
2
2
1 M M diff
o o o + =
Difference Between Means (2)
We can estimate the standard error of
the difference between means.

For large samples, can use z
2
2
2
1
. . .
M M diff
est est est o o o + =
diff
est
y y
diff
z
o

2 1 2 1
) (
=
3 ; 100 ; 12
2 ; 100 ; 10
0 : ; 0 :
2 2 2
1 1 1
2 1 1 2 1 0
= = =
= = =
= =
SD N y
SD N y
H H
36 .
100
13
100
9
100
4
. = = + =
diff
est o
05 . ; 56 . 5
36 .
2
36 .
0 ) 12 10 (
< =

=

= p z
diff
Independent Samples t (1)
Looks just like z:
df=N
1
-1+N
2
-1=N
1
+N
2
-2
If SDs are equal, estimate is:

diff
est
y y
diff
t
o

2 1 2 1
) (
=
(

+ = + =
2 1
2
2
2
1
2
1 1
N N N N
diff
o
o o
o
Pooled variance estimate is weighted average:
)] 2 /[( ] ) 1 ( ) 1 [(
2 1
2
2 2
2
1 1
2
+ + = N N s N s N esto
Pooled Standard Error of the Difference (computed):
(

+
+
+
=
2 1
2 1
2 1
2
2 2
2
1 1
2
) 1 ( ) 1 (
.
N N
N N
N N
s N s N
est
diff
o
Independent Samples t (2)
(

+
+
+
=
2 1
2 1
2 1
2
2 2
2
1 1
2
) 1 ( ) 1 (
.
N N
N N
N N
s N s N
est
diff
o
diff
est
y y
diff
t
o

2 1 2 1
) (
=
7 ; 83 . 5 ; 20
5 ; 7 ; 18
0 : ; 0 :
2
2
2 2
1
2
1 1
2 1 1 2 1 0
= = =
= = =
= =
N s y
N s y
H H
47 . 1
35
12
2 7 5
) 83 . 5 ( 6 ) 7 ( 4
. =
(

+
+
=
diff
est o
. . ; 36 . 1
47 . 1
2
47 . 1
0 ) 20 18 (
s n t
diff
=

=

=
t
crit
= t(.05,10)=2.23
Assumptions
The t-test is based on assumptions of
normality and homogeneity of variance.
You can test for both these (make sure
you learn the SAS methods).
As long as the samples in each group
are large and nearly equal, the t-test is
robust, that is, still good, even tho
assumptions are not met.
Review
What is the standard error of the
difference between means? What are
the factors that influence its size?
What are the assumptions of the t-test?
Strength of Association (1)
Scientific purpose is to predict or
explain variation.
Our variable Y has some variance that
we would like to account for. There are
statistical indexes of how well our IV
accounts for variance in the DV. These
are measures of how strongly or closely
associated our IVs and DVs are.
Variance accounted for:
2
2
2 1
2
2
|
2
2
4
) (
Y Y
X Y Y
o

o
o o
e

=

=
Strength of Association (2)
How much of variance in Y is
associated with the IV?
2
2
2 1
2
2
|
2
2
4
) (
Y Y
X Y Y
o

o
o o
e

=

=
6 4 2 0 -2 -4

0.4
0.3
0.2
0.1
0.0




Compare the 1
st
(left-most) curve with the curve in the
middle and the one on the right.
In each case, how
much of the variance
in Y is associated
with the IV, group
membership? More
in the second
comparison. As
mean diff gets big, so
does variance acct.
Association & Significance
Power increases
with association
(effect size) and
sample size.
Effect size:
Significance =
effect size X sample
size.


p
y y o / ) (
2 1

(

=
2 1
2
2 1
1 1
) (
N N
y y
t
p
o
Increasing sample size does not increase effect size
(strength of association). It decreases the standard
error so power is greater. Widely misunderstood.
(

=
N
y
t
2
) (
o

pooled
SD
X X
d
2 1

=
Estimating Power (1)
If the null is false, the statistic is no
longer distributed as t, but rather as
noncentral t. This makes power
computation difficult.
Hays (p. 334) presents an alternative
method based on strength of
association, that is, on

2
2
2 1
2
2
|
2
2
4
) (
Y Y
X Y Y
o

o
o o
e

=

=
Estimating Power (2)
Based on Hayss method, we find:
35 . 22
) 25 (. 2
) 75 (. ] 58 . 2 28 . 1 [
2
=
+
>
g
n
Suppose alpha is .01, power
desired is .90, and variance
accounted for is .25. What is
n per group? Its 24 (23?) per
group or 48 all together.
(Hays says add one more
person for luck. its wise
28 . 1
) 90 (. ) 1 (
= =

z z
|
58 . 2
) 005 (. ) 2 / (
= = z z
o
Same problem, but variance a/c is .10, need 68/group.
Same again, but .15, need 43 per group. What if alpha =
.05?
2
2 2
) 2 / ( ) 1 (
2
) 1 ( ] [
e
e
o |

>

z z
n
g
Dependent t (1)
Observations come in pairs. Brother, sister, repeated measure.
) , cov( 2
2 1
2
2
2
1
2
y y
M M diff
+ = o o o
Problem solved by finding diffs between pairs D
i
=y
i1
-y
i2
.
1
) (
2
2

N
D D
s
i
D
N
s
est
D
MD
= o .
N
D
D
i
=
) (
MD
est
D E D
t
o .
) (
=
df=N(pairs)-1
Dependent t (2)
Brother Sister
5 7
7 8
3 3
5 = y
6 = y
Diff
2 1
1 0
0 1
1 = D
58 . 3 / 1 . = =
MD
est o
72 . 1
58 .
1
.
) (
= =

=
MD
est
D E D
t
o
1
1
) (
2
=

=

N
D D
s
D
2
) ( D D
df =2; n.s.
Review
What are the main uses of the t-test?
Give a concrete example of the use of the
{one sample, independent samples, dependent
samples} t-test. State why the particular test
is the right one to choose.
What is the importance of variance accounted
for?

Potrebbero piacerti anche