Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Danny Kaplan
net the time from the start line to the finish line: seconds
gun the time from the start gun to the finish line: seconds
sex
Cross-sectional Analysis
How does the net time for runners depend on their age?
Estimate
5339.1554
16.8936
-726.6195
Std. Error
35.0487
0.9444
20.0181
t value
152.34
17.89
-36.30
Pr(>|t|)
0.0000
0.0000
0.0000
Cross-sectional Critique
I
Name
===============
Nelson Kiplagat
Samuel Ndereba
Ag
==
25
25
Hometown Net
G
========= ======= =
KEN
48:12
KEN
48:12
)
res = subset(res, age > 10 & ! is.na(age)) # eliminate th
return(res)
}
"name"
"sex"
"age"
"year"
A lot of runners! But how many times does each person run?
a.j. montes
1
aaron aldridge
1
aaron ansell
1
Clearly there are people who ran multiple times, but this display is
not so useful because it is too long. BUT ... the output of table
can be the input to another computation. In this case, table
table does something very sensible.
> table(table(foo$name))
1
2
3
4
38466 8580 3123 1518
5
803
6
441
7
282
8
162
9
87
10
41
11
8
12
2
13
2
14
1
22
1
Creating a Unique ID I
To help separate different people with the same name, we need
additional information. Hometown is a possibility, but people may
move.
Try Year of birth
foo$yob = foo$year - foo$age
foo$id = paste( foo$name, foo$yob )
> head(foo$id)
[1] "lineth chepkurui 1988" "angelina mutuku 1983"
[3] "lidia simon 1974"
"catherine ndereba 1973"
[5] "sharon cherop 1984"
"aziza aliyu 1986"
> table(table(foo$id))
1
41117
2
8457
3
2954
4
1384
5
750
6
401
7
247
8
143
9
73
10
25
Creating a Unique ID II
This is plausible, but not proven as a unique ID.
One possible check: There shouldnt be anyone with the same
Name-YOB who ran twice in any one year:
> foo$idWithYear = paste( foo$id, foo$year )
> table(table(foo$idWithYear))
1
82245
2
60
2
29
9
73
10
25
2
8
Im not going to worry about these few, but I could exclude them
by using the same approach used to count how many times each
runner participated.
(Intercept)
age
Estimate
75.6471
0.2436
Std. Error
0.7001
0.0155
t value
108.05
15.76
Pr(>|t|)
0.0000
0.0000
(Intercept)
age
sexM
Estimate
78.9320
0.3472
-11.8040
Std. Error
0.6516
0.0145
0.3243
t value
121.13
23.89
-36.40
Pr(>|t|)
0.0000
0.0000
0.0000
(Intercept)
age
idabigail grier 1983
idabiy zewde 1967
idadam anthony 1966
idadam knapp 1977
Estimate Std.
59.50
0.83
18.33
13.59
-8.19
22.91
... and so on.
Error
2.69
0.04
3.81
3.40
3.81
4.15
t value
22.12
19.99
4.82
4.00
-2.15
5.52
Pr(>|t|)
0.00
0.00
0.00
0.00
0.03
0.00
A Mixed-Effects Model? I
I dont know much about mixed effects models, but the fact that
were not interested in the coefficients for individual runners
suggests that we should be treating them as random effects.
> library(lme4)
> m6 = lmer( net ~ 1 + age + (1|id), five)
A Mixed-Effects Model? II
> summary(m6)
Linear mixed model fit by REML
AIC
BIC logLik deviance REMLdev
52918 52945 -26455
52904
52910
Random effects:
Groups
Name
Variance Std.Dev.
id
(Intercept) 167.489 12.9417
Residual
35.007
5.9167
Number of obs: 7480, groups: id, 1639
Fixed effects:
Estimate Std. Error t value
(Intercept) 67.21559
1.14307
58.8
age
0.44137
0.02508
17.6
This coefficient is quite different.
> t.test(ages$result)
One Sample t-test
data: ages$result
t = 11.1364, df = 1638, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.6078309 0.8677138
sample estimates:
mean of x
0.7377724
1920
1940
1960
Year of Birth
1980
1940
1960
1920
1940
1960
1980
Year of Birth
1980