Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
L. Torgo
ltorgo@fc.up.pt
Faculdade de Ciências / LIAAD-INESC TEC, LA
Universidade do Porto
Oct, 2014
Introduction
Introduction
R Graphics
Graphics Devices
A few examples
A scatterplot
plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1)))
jpeg('exp.jpg')
plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1)))
dev.off()
pdf('exp.pdf',width=6,height=6)
plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1)))
dev.off()
Introduction to ggplot
Package ggplot2
H. Wickham (2009). A layered grammar of graphics. Journal of Computational and Graphical Statistics.
Introduction to ggplot
Aesthetic Mappings
Geometric Objects
Introduction to ggplot
Statistical Transformations
Scales
Maps the data values into values in the coordinate system of the
graphics device
Introduction to ggplot
Coordinate System
Faceting
Split the data into sub-groups and draw sub-graphs for each group
Introduction to ggplot
A Simple Example
8
●
● ●●
●
●
●
● ●●
●
7 ●
●● ● ●
● ● ●
● ● ●● ●●●
●●
Sepal.Length
● ●● ● ●
data(iris) ●● ● ●●
Species
● ● ●●● ● ●
library(ggplot2) ●● ● ●
● setosa
ggplot(iris, ● ●● ● ●
6 ● ● ● ●● ● versicolor
aes(x=Petal.Length, ● ● ●
y=Sepal.Length, ● ●●● ● ● virginica
●● ● ●● ● ●
colour=Species ● ● ●● ● ●
●● ●● ● ●
) ●●● ●
) + geom_point() ●
●● ●
●●●● ● ●
5 ●●●●● ●●
●● ● ●
●● ●
● ●
● ●●
●
●●
●
2 4 6
Petal.Length
barplot(table(algae$season),
main='Distribution of the Water Samples across Seasons')
Barplots in ggplot2
ggplot(algae,aes(x=season)) + geom_bar() +
ggtitle('Distribution of the Water Samples across Seasons')
60
40
count
20
Pie Charts
Pie charts serve the same purpose as bar plots but present the
information in the form of a pie.
pie(table(algae$season),
main='Distribution of the Water Samples across Seasons')
spring
autumn
summer
winter
1
season
autumn
factor(1)
150 50 spring
summer
winter
100
count
0.05
100
0.04
80
Nr. of Occurrencies
Percentage
0.03
60
0.02
40
0.01
20
0.00
0
0 20 40 60 80 0 20 40 60 80
a1 a1
ggplot(algae,aes(x=a1)) + geom_histogram(binwidth=10) +
ggtitle("Distribution of Algae a1") + ylab("Nr. of Occurrencies")
ggplot(algae,aes(x=a1)) + geom_histogram(binwidth=10,aes(y=..density..)) +
ggtitle("Distribution of Algae a1") + ylab("Percentage")
90
0.04
Nr. of Occurrencies
Percentage
60
0.02
30
0 0.00
0 25 50 75 100 0 25 50 75 100
a1 a1
0.7
0.6
0.5
0.4
Density
0.3
0.2
0.1
0.0
6 7 8 9 10
mxPH Values
0.6
density
0.4
0.2
0.0
5 6 7 8 9 10
mxPH
plot(ecdf(algae$NO3),
main='Cumulative Distribution Function of NO3',xlab='NO3 values')
●● ●
●
●●●
●
●
●●
●●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.6
●●
●
●
●
●
●
●
●
●
●
●
●
●
Fn(x)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
0.4
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
0.2
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0.0
●
●
0 10 20 30 40 50
NO3 values
0.75
0.50
y
0.25
0.00
0 10 20 30 40
NO3 values
QQ Plots
library(car)
qqPlot(algae$mxPH,main='QQ Plot of Maximum PH Values')
●
●
●●●●
●●●
9
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●●
●
●●
algae$mxPH
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
8
●
●●
●
●●
●
●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●●
●
●●
●●
●
●●
●●
●●
●
●
7
●
●●●●
●
●
6
●
●
−3 −2 −1 0 1 2 3
norm quantiles
ggplot(algae,aes(sample=mxPH)) + stat_qq() +
ggtitle('QQ Plot of Maximum PH Values')
●●●● ●
9 ●●
●●
●●●
●●
●●
●●●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
8 ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
sample
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●●●
●●
●●●
●
●
7 ●
●●●●
●
●
6
●
●
−3 −2 −1 0 1 2 3
theoretical
●●●● ●
9 ●●
●●
●●●
●●
●●
●●●
●●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
8 ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
sample
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●●●
●●
●●●
●
●
7 ●
●●●●
●
●
6
●
●
−3 −2 −1 0 1 2 3
theoretical
12
Minimum values of O2
10
8
6
4
2
●
●
ggplot(algae,aes(x=factor(0),y=mnO2)) + geom_boxplot() +
ylab("Minimum values of O2") + xlab("") + scale_x_discrete(breaks=NULL)
10
Minimum values of O2
●
●
Using the Algae data set from package DMwR answer to the following
questions:
1 Create a graph that you find adequate to show the distribution of
the values of algae a6
2 Show the distribution of the values of size
3 Check visually if it is plausible to consider that oPO4 follows a
normal distribution
Conditioned Graphs
Conditioned Graphs
Conditioned Histograms
60
40
count
20
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
a3
15
10
low
15
medium
count
10
15
10
high
0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
a3
boxplot(a6 ~ season,algae,
main='Distribution of a6 as a function of Season',
xlab='Season',ylab='Distribution of a6')
80
●
●
60
●
●
Distribution of a6
● ● ●
40
●
● ●
●
20
● ●
●
●
●
● ●
●
●
●
●
●
●
●
0
Season
ggplot(algae,aes(x=season,y=a6))+geom_boxplot() +
ggtitle('Distribution of a6 as a function of Season and River Size')
60
● ●
●
40
a6
●
●
● ●
●
20
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Using the Algae data set from package DMwR answer to the following
questions:
1 Produce a graph that allows you to understand how the values of
NO3 are distributed across the sizes of river
2 Try to understand (using a graph) if the distribution of algae a1
varies with the speed of the river
●
60
●
●
●●
●
40
a6
●
● ●
●
●
● ●
● ●
20
●
●● ●● ● ●
● ●●
●
●
● ● ●
●
● ● ● ●
● ● ● ●
●●●● ● ● ●
●●●●
● ● ●●
●●●●●
● ● ●
●● ● ●● ●● ● ●
●
●● ●
●● ● ●● ● ●● ●
●●
●●●
●●
●
●●●
●●
●●●
●●●●●
●●●●
●●●
●
●●●● ●
●
●● ●●●●
● ●●●●
●●●● ●●●● ●
●
● ●●●● ●●● ● ●● ●● ● ●
0
0 20 40 60 80
© L.Torgo (FCUP - LIAAD / UP) Data Visualization Oct, 2014 46 / 51
a1
Other Plots Scatterplots
Scatterplots in ggplot2
ggplot(algae,aes(x=a1,y=a6)) + geom_point() +
ggtitle('Relationship between the frequency of a1 and a6')
60
●
●
●●
●
40
a6
● ●
● ●
●
● ●
●
20 ●
● ●
●●● ● ●
●●●
●
● ●
● ●
●
● ●
● ● ● ● ●
●●● ● ● ● ●
●
●●●
●
● ●
●●●
●●●● ●●
● ●
● ●● ● ●
● ●● ● ● ● ●●
●●● ●
●● ● ●● ● ● ●
0 ●●
●●●
●
●●
●●
●
●●
●
●● ●
●●●● ●
●●●●●●
●
●
●●●● ●●●● ● ●●●
●
●● ● ● ●●●● ●
●
●● ● ● ●●● ● ● ●● ● ●● ●● ● ●
0 25 50 75
a1
autumn spring
●
●
●
90
●
●
60
● ●
● ●
30 ● ●
●
●
● ● ●
●
● ●
●
● ●●
● ●
● ● ●
● ●●
●
● ● ●
●
●● ● ● ● ● ●
●
●●
● ● ● ● ●
● ● ●
●
● ●
●
●●●● ●
●
0 ●
Chla
summer winter
●
90 ●
●
●
60
● ●
●
●
●
●
30 ●● ●
●
●
● ● ●●
● ● ● ●
● ● ●
●● ● ●
●● ●
● ●●
● ● ●
●● ●● ●
● ●
●● ● ●
●
● ●
● ● ● ● ●● ●
● ●●
● ● ●● ●
● ●●
0 ● ● ●●
●
0 10 20 30 40 0 10 20 30 40
a4
75
50
Corr: Corr: Corr: Corr:
a1
●
●
●
●●
● ● ●
20 ●●●●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●●●●●
0.0321 −0.172 −0.16
●
●
●
●
● ●
●●
●
●●●
● ●● ●
●●
0 ●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●●●
●
●
●●
●
●●●●
●
●
●●
●●●●
●
●●●●
●
● ●
40 ● ●
30 ●
●●
●
●
●●● ●
●
●
●
●
●
●
● ●
Corr: Corr:
a3
20 ●●
●
●
●
● ●
●
●●● ●
● ●
10
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●●●
●●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●● ●● ●●
0.0123 −0.108
●
●
●
●
●●
●
●●
●●●●●
● ●●●●● ●
●
●●
●
●
●
●●
● ●●
●●●●●●●
0 ●
●
●
●
●
●●
●
●●
●
●
●
● ●
●●
●●●
●
●●
●●
●●
●●
●
●●
●●
●
●●
●●●
●●
●●●●
● ●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●●●● ● ●
● ● ●
40
30 ● ● ● Corr:
a4
20
10 ●●
●
●
●
●
●
●
●
●
●● ●
●● ●
●
●●●
●
●
●
●
●●
● ●●
●
●●● ●
●●
●
●
●
● ●● ●
−0.109
●
●
●
●
●
●●
●●
●
●
●● ●
●
●
●●
● ●
● ● ●
●
●●
●
●●
●●
● ● ● ●
●
●
●●
●
●●
● ● ●
●● ●●●●
●● ●
0 ●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●
●
● ●●
● ●
●●
●●
●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●●●
●●●
●● ● ● ●
●●
●
●●
●
●
● ●
●●
●
●
●●
●
●●
●
●●
●
●●
●●●
●● ●●●
●● ●
● ● ● ●
40 ●
●
● ●
● ●
● ●●
●
30 ● ● ●● ●
● ●
●
20 ●
●
●
●
● ●
●
●● ●●
● ●
●
●●●
● ●●
●●
a5
●
● ● ●●●●● ●
● ●
●●
●
●
●●●●● ●
●
● ●
●●
● ●●
●
●
● ●● ● ●●● ● ● ● ●●●
10 ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ● ●●
●●●
●●● ●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●● ●●
● ● ● ●
●●
●●
●
●●●
●
●● ● ● ● ●
●
●
●
●●
● ●● ● ●●
●●● ●●●
0 ●
●
●
●●
●●
●
●●
●●
●
●
●●●
●●
●
●●
●●
●●
●
●●
●
●●
●●
●●
●●
●●●●●
● ●
●
●
●●
●
●●●
●●
● ●
●
●
●●
●●●
●●●
● ●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●●●●
●● ● ● ●●
●●
●
●
●
●●
●
●●● ● ● ●
ggpairs(algae,columns=c("season","speed","a1","a2"),axisLabels="show")
autumn
autumn
60 ●● ● ● ●
spring
spring
season
40 ●● ● ● ● ● ●
●
● ●
summer
summer
● ●●
20
winter high lowmedium
winter
●
● ●● ●●
●●
0
20 ●
●
●●
●●●● ●
●
10
0
speed
20 ●● ●
10
0
20 ●
●●● ●
● ●●● ●●
●● ● ●
10
0
75
50
Corr:
a1
25 −0.294
0
●
60 ●
40 ●
●
●●
●
●
●
a2
●
●●●
● ●
20 ●
●
●
●
●
●
●
●●●●
●
●
●●●●●
●
●
●
●●
●●● ●● ●●●
●
●
●
●
● ●●●●●
● ● ●●
●● ● ●● ●
●●
0 ●
●
●
●
●●
●
●●●
●
●
●●●
●●
●●●
●●●
●●●
●●
● ●
●●●●
●●
●●● ●
●●●
●
●●●
●●●●●
010
20
30
010
20
30
010
20
30
010
20
30020400204002040 0 25 50 75 0 20 40 60
season speed a1 a2
Parallel Plots
ggparcoord(algae,columns=12:16,groupColumn="speed")
10.0
7.5
speed
5.0
high
value
low
medium
2.5
0.0
a1 a2 a3 a4 a5
variable