Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
dplyr functions work with pipes and expect tidy data. In tidy data:
A B C
&
Each variable is
in its own column
A B C
pipes
x %>% f(y)
becomes f(x, y)
Each observation, or
case, is in its own row
Summarise Cases
These apply summary functions to
columns to create a new table.
Summary functions take vectors as
input and return one value (see back).
w
w
w
w
w
w
summary
function
Manipulate Cases
Manipulate Variables
Extract Cases
Extract Variables
filter(.data, )
w
w
ww
w
w
w
w
ww
w
w
w
w
ww
w
w
w
w
ww
w
w
top_n(x, n, wt)
Select and order top n entries (by group if
grouped data). top_n(iris, 5, Sepal.Width)
<=
>=
is.na()
!is.na()
ungroup(x, ...)
|
&
xor()
Arrange Cases
w
w
ww
w
w
w
w
w
w
ww
w
ww
w
w
w
ww
w
w
Add Cases
vectorized
function
mutate(.data, )
w
w
ww
w
w
w
w
ww
w
w
:, e.g. mpg:cyl
-, e.g, -Species
arrange(.data, ...)
%in%
!
num_range(prefix, range)
one_of()
starts_with(match)
w
w
w
w
ww
w
w
w
contains(match)
ends_with(match)
matches(match)
transmute(.data, )
slice(.data, )
Variations
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
Group Cases
w
w
ww
summarise(.data, )
select(.data, )
rename(.data, )
w
w
ww
w
Rename columns.
rename(iris, Length = Sepal.Length)
Learn more with browseVignettes(package = c("dplyr", "tibble")) dplyr 0.5.0 tibble 1.2.0 Updated: 11/16
Vectorized Functions
Summary Functions
vectorized
function
summary
function
Counts
Osets
dplyr::lag() - Oset elements by 1
dplyr::lead() - Oset elements by -1
Cumulative Aggregates
Location
Rankings
dplyr::cume_dist() - Proportion of all values <=
dplyr::dense_rank() - rank with ties = min, no
gaps
dplyr::min_rank() - rank with ties = min
dplyr::ntile() - bins into n bins
dplyr::percent_rank() - min_rank scaled to [0,1]
dplyr::row_number() - rank with ties = "first"
Logicals
mean() - Proportion of TRUEs
sum() - # of TRUEs
Position/Order
dplyr::first() - first value
dplyr::last() - last value
dplyr::nth() - value in nth location of vector
Rank
quantile() - nth quantile
min() - minimum value
max() - maximum value
Spread
IQR() - Inter-Quartile Range
mad() - mean absolute deviation
sd() - standard deviation
var() - variance
Math
+, - , *, ?, ^, %/%, %% - arithmetic ops
log(), log2(), log10() - logs
<, <=, >, >=, !=, == - logical comparisons
Misc
dplyr::between() - x > right & x < left
dplyr::case_when() - multi-case if_else()
dplyr::coalesce() - first non-NA values by
element across a set of vectors
if_else() - element-wise if() + else()
dplyr::na_if() - replace specific values with NA
pmax() - element-wise max()
pmin() - element-wise min()
dplyr::recode() - Vectorized switch()
dplyr::recode_factor() - Vectorized switch() for
factors
Row names
Tidy data does not use rownames, which store
a variable outside of the columns. To work with
the rownames, first move them into a column.
C
1
2
3
A
a
b
c
B
t
u
v
C
1
2
3
A
a
b
c
B
t
u
v
rownames_to_column()
Move row names into col.
a <- rownames_to_column(iris,
var = "C")
column_to_rownames()
Move col in row names.
column_to_rownames(a,
var = "C")
Also has_rownames(), remove_rownames()
A
a
b
c
B
t
u
v
C
1
2
3
C
1
2
3
A
a
b
c
B
t
u
v
Combine Tables
Combine Variables
x
A
A
aa
bb
cc
Combine Cases
B
B
tt
uu
vv
C
C
C
11
22
33
A
A
A
aaa
bbb
ddd
B
B
B
tt
uuu
w
w
w
D
D
33
22
11
=
+
a t 1 a t 3
b u 2 b u 2
c v 3 d w 1
t
u
v
A B C D
a
b
d
t 1 3
u 2 2
w NA 1
t
u
1
2
t 1
u 2
v 3
w NA
A
a
b
c
c
d
B
t
u
v
v
w
C
1
2
3
3
4
copy=FALSE, suix=c(.x,.y),)
Join matching values from y to x.
A B C
c v 3
intersect(x, y, )
A
aa
bb
B
B
tt
uu
C
C
11
22
setdi(x, y, )
A
A
aa
bb
c
d
B
B
tt
u
v
w
C
C
1
1
2
3
4
union(x, y, )
3
2
full_join(x, y, by = NULL,
A B C D
a
b
c
d
DF
x
x
x
z
z
A B C D
a
b
B C
v 3
w 4
left_join(x, y, by = NULL,
1 3
2 2
3 NA
A
c
d
3
2
copy=FALSE, suix=c(.x,.y),)
Join data. Retain all values, all
rows.
NA
1
Extract Rows
x
A B.x C B.y D
a
b
c
t
u
v
1
2
3
t
3
u
2
NA NA
1
2
3
d
b
a
w
u
t
A
a
b
cc
left_join(x, y, by = "A")
a
b
c
a
b
c
t
u
v
1
2
3
A2 B2
d
b
a
w
u
t
B
t
u
vv
y
C
C
1
2
3
A
a
b
d
B
t
uu
w
D
C
33
22
11
B
t
u
C
1
2
semi_join(x, y, by = NULL, )
A
c
B
v
C
3
anti_join(x, y, by = NULL, )
C
1
2
3
bind_cols()
A B C A B D
A B C D
A B
a t
b u
c v
Learn more with browseVignettes(package = c("dplyr", "tibble")) dplyr 0.5.0 tibble 1.2.0 Updated: 12/16