Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
tp
://b
Hadley Wickham
@hadleywickham
Chief Scientist, RStudio
July 2013
Tidy
Transform
Model
Thursday, July 18, 13
Frequent data analysis learn to program
http://www.flickr.com/photos/compleo/5414489782
Thursday, July 18, 13
http://www.flickr.com/photos/mutsmuts/4695658106
Cognition time Computation time
Tidy
reshape2 Transform
plyr
stringr
lubridate
Model
Thursday, July 18, 13
Computation time Cognition time
Tidy
Transform
dplyr
Model
Thursday, July 18, 13
Studio
Data
name n Al 2
2
Al 2 name n
Bo 4 Bo 4
name total
total
Al 2
Bo 0 Bo 0
9 Bo 9
Bo 5 Bo 5
Ed 15
Ed 5 name n total
Ed 10 Ed 5
15
Ed 10
function
maply mdply mlply m_ply
arguments
function
maply mdply mlply m_ply
arguments
alply
aaply
l_ply
daply
use
adply Never
fun
Occassionally
laply Often
All the time
d_ply
llply
dlply
ldply
ddply
0 50 100 150
count
Thursday, July 18, 13
Data analysis verbs
select: subset variables
filter: subset rows
mutate: add new columns
summarise: reduce to a single row
arrange: re-order the rows
library(plyr)
ddply(h, c("Year", "Month", "DayofMonth"),
summarise, n = length(Year))
# user system elapsed
# 2.320 0.330 2.649
# 20x faster!
library(ggplot2)
library(bigvis)
Goals
Insight
Process:
Condense (bin & summarise)
Smooth
Visualise
Thursday, July 18, 13
Bin
x origin
width
Std. dev.
1500000
1000000
.count
500000
500000
NA
1500000
1000000
.count
500000
750000
500000
.count
250000
750000
500000
.count
250000
1500000
1000000
.count
500000
0 20 40 60
time
autoplot(time_s %% 60)
Thursday, July 18, 13
600
.count
1e+06
speed
400
1e+04
1e+02
1e+00
200
600
.count
1e+06
speed
400
1e+04
1e+02
1e+00
200
400
1e+04
1e+02
1e+00
200
600
.count
6e+05
5e+05
speed
400 4e+05
3e+05
2e+05
1e+05
0e+00
200
600
.count
6e+05
5e+05
speed
400 4e+05
3e+05
2e+05
1e+05
0e+00
200
0
sd2 <-0 condense(bin(dist,
1000 2000
20),
3000
bin(speed,
4000 5000
20))
autoplot(sd2) dist
Thursday, July 18, 13
800
user system elapsed
7.366 1.190 8.552
600
.count
6e+05
5e+05
speed
400 4e+05
3e+05
2e+05
1e+05
0e+00
200
0
sd2 <-0 condense(bin(dist,
1000 2000
20),
3000
bin(speed,
4000 5000
20))
autoplot(sd2) dist
Thursday, July 18, 13
Studio
Demo
shiny::runApp("mt/", 8002)
Tidy
Transform
dplyr
Model
Thursday, July 18, 13