Sei sulla pagina 1di 54

FEBRUARY23,2015

RMANUAL
ABRIEFTUTORIALTOLEARNR
DR.R.SRIVATSAN
CONSULTANTFACULTY|IVYPROFESSIONALSCHOOL

1 R SESSIONS

R sessions

The R programming can be carried out as an interative R-session. To start an R session, type R
from the command line in windows or linux OS. For example, from shell prompt $ in linux,
$R
This generates the following output before entering > prompt of R:
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i486-pc-linux-gnu (32-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type license() or licence() for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type contributors() for more information and
citation() on how to cite R or R packages in publications.
Type demo() for some demos, help() for on-line help, or
help.start() for an HTML browser interface to help.
Type q() to quit R.
[Previously saved workspace restored]
>
Once we are inside an R session, we can directly execute R language commands by typing them
line by line. Pressing the return key terminates typing of command and brings the > prompt again).
In the example session below, we declare 2 variables a and b to have values 5 and 6 respectively,
and assign their sum to another variable called res:
>
>
>
>
>

a = 5
b = 6
c = a + b
c

[1] 11
To get help on any function of R, type help(function-name) in R prompt. For example,
if we need help on if logic, type,
>help("if")
then, help lines are printed.
To exit the R session, type quit() in the R prompt:
> quit()
Save workspace image? [y/n/c]: n
Copyright (c) from 2012 R. Srivatsan

2 NUMBERS AND MATHEMATICAL OPERATIONS

Numbers and Mathematical operations

In R, we can assign integer and floating point values to variables directly. The mathematical
operations can be performed with symbols following a format exactly similar to languages like C,
C++, java, python and perl. We have already performed a simple operation c = a+b in the previous
section.
(a+b)
For example, we can perform the operation c = (ab) by assigning values directly at declaration
> a
> b
> c
> c
[1]

= 7.5
= 6
= (a+b)/(a*b)
0.3

q
In another example, we apply the formula y = 1.0 + ( zr )0.38 as,
> z
> r
> y
> y
[1]

= 22.9
= 6.7
= sqrt(1.0 + (z/r)^0.38)
1.610978

Note the explicit usage of brackets for grouping the terms to remove ambiguity.
A list of useful inbuilt functions in R is given below:
Function
Description
------------------------------------------abs(x)
absolute value
sqrt(x)
square root
ceiling(x)
ceiling(3.475) is 4
floor(x)
floor(3.475) is 3
trunc(x)
trunc(5.99) is 5
round(x, digits=n)
round(3.475, digits=2) is 3.48
signif(x, digits=n)
signif(3.475, digits=2) is 3.5
cos(x),sin(x),tan(x)
Triginomteric sine, cosine and tan functions
acos(x),cosh(x),acosh(x) arcsine, arccosine and arctangent functions
log(x)
natural logarithm
log10(x)
common logarithm
log2(x)
logarithm to the bse of 2
exp(x)
e^{x}

Copyright (c) from 2012 R. Srivatsan

3 STRING OPERATIONS

String operations

In R, a string can be declared within double quotes:


> str = "abcacabac"
>str1 = "qqqqqq"
>str2 = " ++++++"
For assignment, the operator < (a left angle bracket followed by a dash) is also extensively
used in R. Thus, above mentioned commands can also be written as,
> str <- "abcacabac"
>str1 <- "qqqqqq"
>str2 <- " ++++++"
The assignment operator can assign either from left to right or from right to left.
Thus the following two assignments for a string s are equally valid:
> str <- "abcacabac"
> "abcacabac" -> str
The length of the string is returned by the function nchar()
> slength <- nchar(str)
> slength
[1] 9
We can concatinate strings with paste() function.
To concatinate 2 strings with single space between them,
> scat <- paste(str, str1)
> scat
[1] "abcacabca qqqqqq"
While concatinating two strings, the separater between them can be specified.
For example, to concatinate two string with a separator - - - between them,
> scat <- paste(str, str1, sep="---")
> scat
[1] "abcacabca---qqqqqq"
To concatinate without any gap between them, use a null separator:
> scat <- paste(str, str1, sep="")
> scat
[1] "abcacabcaqqqqqq"
Copyright (c) from 2012 R. Srivatsan

3 STRING OPERATIONS
A substring can be formed by calling substr() function specifying the start and stop
character locations of the substring in the main string. To form a substring from location 4 to 8 of
string scat,
> ssub <- substr(scat,4,8)
> ssub
[1] "acabc"
We can also replace a portion of string with other substring:
> substr(scat,4,8) <- "UUUUU"
> scat
[1] "abcUUUUUaqqqqqq"
In case we want a substring from a given start positition to the end of original string, give an
arbitrarily large integer for the end location:
> str3 = "WWW.objsite.com"
> sublg <- substr(str3,4,100000000L)
> sublg
[1] "objsite.com"
A string can be trunkated to a certain number of characters from its beginning with
strtrim() function:
> str4 <- "AECH9939-ALM"
> strunk <- strtrim(str4, 4)
> strunk
[1] "AECH"
The function strsplit() is used for spltting a string by a given character. For example,
> strsplit("fname.doc", "\\.")
[[1]]
[1] "fname" "doc"
The two portions of the split string can be converted to a list, as shown below. More on
lists later:
> aa <- unlist(strsplit("fname.doc", "\\."))
> aa[1]
[[1]]
[1] "fname"
> aa[2]
[1] "doc"
For converting the upper cases to lower cases and vice versa, we use functions toupper()
and tolower()
> strr <- "This is a sentence"
> strrup <- toupper(strr)
> strrup
[[1] "THIS IS A SENTENCE"
> tolower(strrup)
[1] "this is a sentence"
Copyright (c) from 2012 R. Srivatsan

4 DATA STRUCTURES IN R

Data structures in R

R has a wide variety of data types including scalars, vectors (numerical, character, logical), matrices,
dataframes, and lists. We can perform many algebraic operations with these data structures and
many useful built-in functions are defined for these data types. Here we will learn to use them.

4.1

Vectors

A Vector in R is an ordered collection of numbers or strings or chars. A vector with elements


can be defined by placing the comma separated list of elements inside a pair of brackets next to the
letter c and assigning it to a variable name as shown below:
> avec <- c(10.2, 5.5, 6.9, 7.2, 8.1)
The data type of the vector is decided by the the data types of the elements it
contains. Thus, if all elements of the vector are numbers, the vector takes the type numeric. We
can do many numerical operations with such a vector. However, if one or more elements of the
vector happen to be string, the entire vector will be treated to be string vector, and we
cannot perform numerical operations with it. For example, the vector avec defined above is
a numeric vector, while the follwing two vectors are treted as string vectors:
> avec1 <- c("AEC", "AED", "AAB", "AFC")
> avec2 <- c(10.2, 5.5, "6.9", 7.2, 8.1)
A vector can be assigned in many ways. We can use assign() function instead of above syntax.
Thus, all of the the following assignments define a vector named vec with elements (10.2, 5.5, 6.9,
7.2, 8.2)
> vec <- c(10.2, 5.5, 6.9, 7.2, 8.1)
> assign("vec", c(10.2, 5.5, 6.9, 7.2, 8.1) )
> vec = c(10.2, 5.5, 6.9, 7.2, 8.1)
> c(10.2, 5.5, 6.9, 7.2, 8.1) -> vec
The individual elements of a vector can be accessed by subscripting the element
number inside the square bracket:
> x <- c(10,20,30,40,50,60,70,80,90,100,110,120,130)
> x[3]
[1] 30
We can also access a block of vector elements using subscripts:
> x[c(4:9)]
[1] 40 50 60 70 80 90
Once a vector is defined, basic mathematical operations like addition, subtraction,
multiplication and division performed on it is applied to all its elements individually.
See the examples below:
> vstr <- c(1,2,3,4,5,6,7,8,9)
> vstr + 100
[1] 101 102 103 104 105 106 107 108 109
> vstr - 100
[1] -99 -98 -97 -96 -95 -94 -93 -92 -91

Copyright (c) from 2012 R. Srivatsan

4.1 Vectors

4 DATA STRUCTURES IN R

> vstr*100
[1] 10 20 30 40 50 60 70 80 90
> vstr/100
[1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
In the same way, the algebraic operations between one or more vectors are applied
to their individual elements. Thus if we add two vectors of same number of elements, their
individual elements are correspondingly added to give a new vector. This is illustrated in the following
operations between vectors vec1 and vec2 below:
> vec1 <- c(1.5,2.5,3.5,4.5,5.5,6.5)
> vec2 <- c(10,20,30,40,50,60)
> vec1+vec2
[1] 11.5 22.5 33.5 44.5 55.5 66.5
> vec1-vec2
[1]

-8.5 -17.5 -26.5 -35.5 -44.5 -53.5

> vec1*vec2
[1]

15

50 105 180 275 390

> vec1/vec2
[1] 0.1500000 0.1250000 0.1166667 0.1125000 0.1100000 0.1083333
> log(vec2)
[1] 2.302585 2.995732 3.401197 3.688879 3.912023 4.094345
Vectors can be combined with other vectors or individual elements and grow in
size. For example, with the vectors vec1 and vec2 defined above,
> cvec <- c(vec1, vec2)
> cvec
[1]

1.5

2.5

3.5

4.5

5.5

6.5 10.0 20.0 30.0 40.0 50.0 60.0

> cvec <- c(vec1, 0.0002, 0.00003, vec2)


> cvec
[1] 1.5e+00 2.5e+00 3.5e+00 4.5e+00 5.5e+00 6.5e+00 2.0e-04 3.0e-05 1.0e+01
[10] 2.0e+01 3.0e+01 4.0e+01 5.0e+01 6.0e+01

Copyright (c) from 2012 R. Srivatsan

4.1 Vectors

4 DATA STRUCTURES IN R

We can also perform mathemaical operations with whole vectors, provided they
contain same number of elements. The mathematical operations with vectors is applied
to their respective elements individually:
> rvec <- 3*vec1 + vec2
> rvec
[1] 14.5 27.5 40.5 53.5 66.5 79.5
The vector can be sorted in ascending order by sort function
> vec <- c(8.9, 1.5, 3.4, 6.7, 12.8, 7.4)
> sort(vec)
[1]

1.5

3.4

6.7

7.4

8.9 12.8

We can get the maximum and minimum values among the elements of a vector:
> vec <- c(8.9, 1.5, 3.4, 6.7, 12.8, 7.4)
> max(vec)
[1] 12.8
> min(vec)
[1] 1.5
It is very easy to generate a sequence of numbers in R. Use seq function to generate
a sequence from a given number to an end number:
> sq <- seq(1,50)
> sq
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Use the following syntax to generate a sequence from 1 to 50 in steps of 5
> sq <- seq(1,50,5)
> sq
[1]

6 11 16 21 26 31 36 41 46

The sequence can be generated in reverse by flipping the sign of the step:
> sq <- seq(50,1,-5)
> sq
[1] 50 45 40 35 30 25 20 15 10

Given a vector, logical operations can be performed on each of its elements to evaluate a
TRUE or FALSE value. For example, in the following R statement, for all elements of vector
lseq that satisfy the given logical condition, a TRUE is assigned, and for others that dont satisfy
this condition, FALSE is assigned. The resulting vector lresult has TRUE or FALSE values
in the corresponding positions.
> lseq <- c(23,15,34,25,46,58, 59,34,29,36,44,89)
> lresult <- ( (lseq > 24) & (lseq < 60) )
> lresult
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE

TRUE

TRUE

TRUE FALSE

The vectors in R can handle the missing values. The missing value is recognized by NA.
For example, in the statements below, zm is a vector of elements: numbers 1 to 10, and two NA
(missing) values. The vector data has FALSE for genuine values, and TRUE for missing values
since is.na() is looking for missing values:
Copyright (c) from 2012 R. Srivatsan

4.2 Arrays and Matrices

> zm <- c(1:10, NA, NA)


> zm
[1] 1 2 3 4 5 6 7

4 DATA STRUCTURES IN R

9 10 NA NA

> dat <- is.na(zm)


> dat
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

TRUE

TRUE

Only NA is taken to be a missing value. Other states like -Inf and Inf are evaluated as
FALSE while looking for a missing value.
The missing values in a vector can be replaced by zeroes as shown below:
> x <- c(1.22, 3.44, 2.95, 4.23, NA, NA, 5.99)
> x[is.na(x)] <- 0
> x
[1] 1.22 3.44 2.95 4.23 0.00 0.00 5.99
Finally, if we want to create a character vector with indexed strings like X1, X2, X3, ...
etc.,
> labs <- paste( c("X"), 1:20, sep="")
> labs
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9"
[13] "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20"

4.2

"X10" "X11" "X12"

Arrays and Matrices

An array in R can have one, two or more dimensions. Array is just a vector stored with additional
attributes like dimensions and names for the dimensions, if needed.
All the elements of an array should be of same data type (string, number, character
etc).
Also, all the rows of an array should have same length, and rows and columns can
have different dimensions.
We can convert a vector into any dimensional array by specifying its dimensions in the array()
function. Below, we convert a vector x into an array arr of dimension (4,3), where (4,3) refers to 4
rows and 3 columns:
> x <- c(10,20,30,40,50,60,70,80,90,100,110,120)
> arr <- array(x, dim=c(4,3))
> arr
[,1] [,2] [,3]
[1,]
10
50
90
[2,]
20
60 100
[3,]
30
70 110
[4,]
40
80 120

Copyright (c) from 2012 R. Srivatsan

4.2 Arrays and Matrices

4 DATA STRUCTURES IN R

Similarly, we can create an array of dimension (2,2,3) as follows:


>
>
>
,

x <- c(10,20,30,40,50,60,70,80,90,100,110,120)
brr <- array(x, dim=c(2,2,3))
brr
, 1

[,1] [,2]
[1,]
10
30
[2,]
20
40
, , 2

[1,]
[2,]

[,1] [,2]
50
70
60
80

, , 3
[,1] [,2]
[1,]
90 110
[2,] 100 120
Individual elements of an array can be accessed by giving the name of the array followed by
the subscripts in square brackets, separated by commas. For example, in the above arrays, arr[2,1],
arr[1,1], brr[1,1,1], brr[2,1,3] etc.
Dropping a subscript of a row or a column will give all elements in the corresponding row or
column. Thus, arr[2,] is entire second row, arr[,1] is entire first column.
A Matrix is a 2 dimensional array. Like an array, all elements of a matrix are of same data
type and all the rows should be of same length. The general format for creating a matrix of r rows
and c columns from a vector vec is,
amat <- matrix(vec, nrow=r, ncol=c, byrow=FALSE)
Here, byrow=TRUE indicates that the matrix should be filled by rows. byrow=FALSE indicates
that the matrix should be filled by columns (the default).
For example,
> x <- c(10,20,30,40,50,60,70,80,90,100,110,120)
> amat <- matrix(x, nrow=3, ncol=4)
> amat
[,1] [,2] [,3] [,4]
[1,]
10
40
70 100
[2,]
20
50
80 110
[3,]
30
60
90 120
> amat <- matrix(x, nrow=3, ncol=4, byrow=TRUE)
> amat
[,1] [,2] [,3] [,4]
[1,]
10
20
30
40
[2,]
50
60
70
80
[3,]
90 100 110 120
Copyright (c) from 2012 R. Srivatsan

10

4.3 Dataframes

4 DATA STRUCTURES IN R

The rows and columns of a matrix can be named using functions rownames() and colnames()
> rownames(amat) <- c("r1","r2","r3")
> colnames(amat) <- c("c1", "c2", "c3", "c4")
> amat
c1 c2 c3 c4
r1 10 20 30 40
r2 50 60 70 80
r3 90 100 110 120

The number of rows and columns in matrix amat will be returned by function calls
nrow(amat) and ncol(amat)
> nrow(amat)
[1] 3
> ncol(amat)
[1] 4
There are a number of functions to manipulate matrices exist in R library. Sophisticated
operations like matrix multiplication, inversion, transpose, computing eigen values and eigen vectors,
computing determinants can be done by calling appropriate functions from R library.

4.3

Dataframes

Dataframe is a data structure similar to matrix, with a special feature that different
columns can have different data types. Dataframe is very useful for combining vectors of same
length with different data types into a single data matrix.
Similar to matrices, all the columns of a data frame should have same number of
rows.
In the example below, we create a data frame called frm1 with three vectors namely data1,
data2 and data3. We call the function data.frame() for this. The created data frame will have
columns data1, data2 and data3.
>
>
>
>
>
1
2
3
4
5

data1 <- c("Iron","Sulphur","Calcium", "Magnecium", "Copper")


data2 <- c(12.5, 32.6, 16.7, 20.6, 7.5)
data3 <- c(1122, 1123, 1124, 1125, 1126)
frm1 <- data.frame(data1, data2, data3)
frm1
data1 data2 data3
Iron 12.5 1122
Sulphur 32.6 1123
Calcium 16.7 1124
Magnecium 20.6 1125
Copper
7.5 1126

Note that the column names of the data frame frm1 we have created are just the names of the
objects themselves.
To get the column names of a data frame, use names(frame name):
> names(frm1)
[1] "data1" "data2" "data3"
The columns of a data frame can be named explicitly using a vector of strings. For the above
frame frm1, we can set the column names with our own vector of strings:
Copyright (c) from 2012 R. Srivatsan

11

4.3 Dataframes

4 DATA STRUCTURES IN R

> names(frm1) <- c("Element", "Proportion", "Product_ID")


> frm1
Element Proportion Product_ID
1
Iron
12.5
1122
2
Sulphur
32.6
1123
3
Calcium
16.7
1124
4 Magnecium
20.6
1125
5
Copper
7.5
1126
Instead of providing the column names through names command, we can provide column
names directly during the call for the creation of the data frame. The above data frame can be
created with column names direc data1 data2 data3 1 Iron 12.5 1122 tly as,
frm1 <- data.frame(Element=data1, Proportion=data2, Product_ID=data3)
Similar to the column names, the row names of a data frame can be obtained and set using
the row.names() function. We will first get the existing row names for the above frame frm1,
and then set new row names:
> row.names(frm1)
[1] "1" "2" "3" "4" "5"
> row.names(frm1) <- c("elm-1","elm-2","elm3","elm-4","elm-5")
> frm1
Element Proportion Product_ID
elm-1
Iron
12.5
1122
elm-2
Sulphur
32.6
1123
elm3
Calcium
16.7
1124
elm-4 Magnecium
20.6
1125
elm-5
Copper
7.5
1126
The elements of a Data frame are accessed using same subscript convention as matrices. Thus,
frm1[1,3] is the element in first row third column, frm1[1,] is entire first row, frm1[,2] is entire first
column etc. Also, frm1[1:3,] gives the rows 1,2 and 3. This is illustrated here:
> frm1[1,3]
[1] 1122
> frm1[1,]
Element Proportion Product_ID
1
Iron
12.5
1122
> frm1[,2]
[1] 12.5 32.6 16.7 20.6

7.5

> frm1[1:3,]
Element Proportion Product_ID
Iron
12.5
1122
1
2 Sulphur
32.6
1123
3 Calcium
16.7
1124
Copyright (c) from 2012 R. Srivatsan

12

4.4 Lists

4 DATA STRUCTURES IN R

We can also access a column of a dataframe by its name, by typing the frame name and the
column names separated by a $ sign:
> frm1$Element
[1] Iron
Sulphur
Calcium
Magnecium Copper
Levels: Calcium Copper Iron Magnecium Sulphur
> frm1$Proportion
[1] 12.5 32.6 16.7 20.6

7.5

> frm1$Product_ID
[1] 1122 1123 1124 1125 1126
In case we feel that typing the $ sign every time we want to access a column in a data
frame is becoming tiresome, R provides a command to attach the variable in the data frame to our
workspace:
> attach(frm1)
Now, the column named Proportion can be accessed directly by its name, rather than by a $
sign:
> Proportion
[1] 12.5 32.6 16.7 20.6

4.4

7.5

Lists

A list is an ordered collection of objects known as its components. The objects collected inside a
list can be very different like arrays, matrices, vectors and dataframes. Each object inside a list and
the element of the object can be accessed by proper symbol and indexing.
To create a list with many objects, use list() function. In the example below, we will create
a list of the following 5 vectors of entirely different data types:
>
>
>
>
>

expt_name <- c("Experiment-A","Experiment-B","Experiment-C", "Experiment-D", "Experiment


sample_length <- c(12.5, 32.6, 16.7, 20.6, 7.5)
sample_weight <- c(1122, 1123, 1124, 1125, 1126)
sample_category <- c(A,S,P,K,G)
lab_name <- c("IBAB_LAB")
We now put together these 5 vectors into a list using list command as shown below.

> alis <- list(expt_name, sample_length, sample_weight, sample_category,lab_name)


Now alis is a list which has the above five objects in the same order they are typed while creating
them using list command. Just type the name of the list to get a description of its data structure.
> alis

Copyright (c) from 2012 R. Srivatsan

13

4.4 Lists

4 DATA STRUCTURES IN R

[[1]]
[1] "Experiment-A" "Experiment-B" "Experiment-C" "Experiment-D" "Experiment-E"
[[2]]
[1] 12.5 32.6 16.7 20.6

7.5

[[3]]
[1] 1122 1123 1124 1125 1126
[[4]]
[1] "A" "S" "P" "K" "G"
[[5]]
[1] "IBAB_LAB"
Each object in the list can be accessed in its entirity by typing the object order in the list
within double square brackets after list name:
> alis[[1]]
[1] "Experiment-A" "Experiment-B" "Experiment-C" "Experiment-D" "Experiment-E"
> alis[[2]]
[1] 12.5 32.6 16.7 20.6

7.5

> alis[[5]]
[1] "IBAB_LAB"
To access the individual members of a specific vector in the list, use a second subscript as
shown:
> alis[[1]][1]
[1] "Experiment-A"
> alis[[4]][2]
[1] "S"
> alis[[3]][3] * 100
[1] 112400
Now we will create a list consisting of components of different data types:
> Lst <- list(name="AA-list", lengths=c(12.5,32.6,16.7,20.6,7.5),
+ XX=array(1:20, dim=c(4,5)))

Copyright (c) from 2012 R. Srivatsan

14

5 R SCRIPTS
In the above list called Lst consists of a string component called name, a vector called lengths
and a 2 dimensional array called XX with dimension (4,5). Note that we have given names to the
components and created them inside the list() function itself.
Now let us print the components of the list:
> Lis
$name
[1] "AA-list"
$lengths
[1] 12.5 32.6 16.7 20.6

7.5

$XX
[1,]
[2,]
[3,]
[4,]

[,1] [,2] [,3] [,4] [,5]


1
5
9
13
17
2
6
10
14
18
3
7
11
15
19
4
8
12
16
20

The elements of each of the components can be directly accessd by the format
list_name$component_name. For example, element (3,2) of array XX can be accessed
by "Lis$XX[3,4]", and the elements of third row of array XX are accessed by "Lis$XX[3,]".

R Scripts

So far we have been typing the R-commands in the R prompt >. Though this method is convenient
for learning few lines of commands, this cannot be used for real life applications where codes spanning
many tens of lines are required to be written. For this purpose, R allows us to write a script, which is
a collection of many lines of R statements written in a file. The statements are written one below the
other separated by line break, without the > character at the beginning of each statement. This
script file can be executed inside R prompt with a single line of command, which in turn executes
the statements in the script one by one sequentially. This way, very complicated long logical code
can be written and executed.
The R script is recognized by the file extension r or R. Thus, test.r is an R script named
test and compute.R is an R script called compute.
As an example, create a text file with the name test.r and write the following lines of code in
the file:
a=5
b=6
c = a*(a+b)
print(c)
To execute this code, go to R prompt and source the script file with the command:
> source("test.r")
[1] 55
If we source an R script inside another R script, then the variables of the sourced script will be
accessable to the second script. They need not be declared separately inside the second script.
Copyright (c) from 2012 R. Srivatsan

15

6 LOGICAL STATEMENTS AND CONTROL LOOPS

Logical statements and control loops

In R, an expression or a statement consists of one or more data types and operations applied to
them. For example,
c=a+b
Here, + is an addition operator that operates on integers a and b (called operands).
We have already learnt about the arithmatic operators +, -, *, /, and %.
R has many operators for creating relational and logical expressions. They are mostly used in
the control flow statements for executing simple or compound statements based on whether a given
expression is evaluated to be true or false.
The syntax of logical expressions and control flow statements in R are very much similar to the
corresponding constructs in C language.
The important equality, relational and the logical operators are listed below:
Operator
------------

Function
------------

usage
--------------

<

less than

expression1 < expression2

<=

less than or equal to

expression1 <= expression2

>

greater than

expression1 > expression2

>=

greater than or equal to

expression1 >= expression2

==

equality

expression1 == expression2

!=

inequality

expression1 != expression2

&

logical AND

expression1 and expression2

||

logical OR

expression1 or expression2

logical NOT

not expression

isTRUE(x)

test if x is true

isTRUE(logical expression or logica

is.na(x)

returns TRUE if x is a missing value

In the above table, expression, expression1 and expression2 means expressions like,
3.1456 (simple constant)
radius (simple variable)
xvalue * sin(x) (a compund expression)
These operators operate on one or more expressions.
To understand the way a logical expression works in R, have a look at the following tiny R-script
and the output it generates:
x=5
y=6
print(x < y)
Copyright (c) from 2012 R. Srivatsan

16

6 LOGICAL STATEMENTS AND CONTROL LOOPS


When these lines are executed as a script (say) test1.r, the following output is generated:
> source("test1.r")
[1] TRUE
What has happened here?. In the statement print(x < y), the result of x < y is evaluated.
Since x is indeed less than y in the above script, the result evaluates to be TRUE. That is what is
being printed by the print(x < y) statement. Thus we see that when a logical statement inside a
pair of paranthesis is evaluated, the result is either TRUE or FALSE.
The logical statements can be very simple (as above) or can be a compund statement. Some
example statements are given below:
The statement (x > y) & (x < z) means x greater than y and x less than z. This evaluates
to TRUE when x=3,y=1 and z=4.
Similarly, the statement
((x == 5) & (y == 6)) || (z > 10)
means either x equals 5 and y equals 6 should be true, or z should be greater than 10. This
statement demands either a simultaneous conditions on valus of x and y to be true, or, alternately,
z should have a value greater than 10. Note the clubbing of statements using pairs of brackets.
For a given set of values for the variables x, y and z, the evaluation of above logical statement
proceeds in the following 4 steps:
(i) First (x == 5) is evaluated (TRUE or FALSE)
(ii) Second, (y == 6) is evaluted (TRUE or FALSE)
(iii) Next, (z > 10) is evaluated (TRUE or FALSE)
(iv) Evaluate whether statements (i) and (ii) are true simultaneously.
(v) Evaluate whether at least one of the result of evaluations (iv) or (iii) are TRUE.
Thus for a set of values x=5, y=7 and z=12, the whole statement above evaluates to TRUE. For
another set of values x=5, y=2 and z=9, above statement evalutes to be FALSE.

Copyright (c) from 2012 R. Srivatsan

17

6.1 Data filtering with logical statements6 LOGICAL STATEMENTS AND CONTROL LOOPS

6.1

Data filtering with logical statements

Using simple logical expressions, data stored in various data structures of R can be easily filtered to
create subsets of data. In this chapter, we demonstrate this through examples in the form of small
script lines which can be typed into a file and sourced inside R prompt as shown in the previous
chapter.
We start with a simple example in which we filter out the elements of a vector whose values
are greater than certain number. In a second filter operation, we filter values in a range.
To achive this, we place the required logical statement inside the square bracket where array
elements are accesed. See the script below:

marray <- c(2.1,5.4,7.3,9.7,3.2,6.8,7.6,9.9,11.4,14.6,17.4,16.5,5.5,4.4,3.1)


highfilt <- marray[marray > 9.5]
bandfilt <- marray[(marray > 7) & (marray < 15.0)]
print("High filter : values above 7")
print(highfilt)
print("Band filter : values between 7 and 15")
print(bandfilt)

When the above code lines are executed in an R script, the following output is created.
[1] "High filter : values above 7"
[1] 9.7 9.9 11.4 14.6 17.4 16.5
[1] "Band filter : values between 7 and 15"
[1] 7.3 9.7 7.6 9.9 11.4 14.6
In the above statements, the statement
highfilt <- marray[marray > 9.5]
basically picks the elements of vector marray whose valus are greater than 9.5 and creates the
list highfilt with these numbers. The print[highf ilt] statement prints the elements of filtarr.
Similarly, the statement
bandfilt <- marray[(marray > 7) & (marray < 15.0)]
picks up the elements of vector marray whose valus are greater than 7 and less than 15 to create
the list bandfilt with these numbers.
In the second example, we create a vector of numbers with some missing values (ie. NA). We
will apply a filter to select elements which are not NAs and at the same time have values below 100
and write them into another vector. In a second operation, we will remove all the NA values from
the original vector itself.
The script below achieves this:

Copyright (c) from 2012 R. Srivatsan

18

6.1 Data filtering with logical statements6 LOGICAL STATEMENTS AND CONTROL LOOPS

tarray <- c(2, 7, 29, 32, 41, 11, 15, NA, NA, 55, 32, NA, 42, 109)
karray <- tarray[ !is.na(tarray) & (tarray < 100) ]
tarray[is.na(tarray)] <- 0
print("Filter with NAs and numbers greater than 100 removed:")
print(karray)
print("Filter with NAs replaced by 0")
print(tarray)

When the above code lines are executed in an R script, the following output is created.
[1] "Filter with NAs and numbers greater than 100 removed:"
[1] 2 7 29 32 41 11 15 55 32 42
[1] "Filter with NAs replaced by 0"
[1]
2
7 29 32 41 11 15
0
0 55 32
0 42 109
In the above script, the statement
tarray[ !is.na(tarray) & (tarray < 100) ]
selects elements of vector tarray that are not NAs and at the same time less than 100. The
statement
tarray[is.na(tarray)] <- 0
assigns the value 0 to the elemts of vector tarray that are missing values (NAs).
After this, all NAs in vector tarray are replaced by 0.
From a data set, a subset can be created by applying conditions on one or more column
members.
For example, suppose a data frame is called datframe with many columns and one of them
have name npcol. Then the statement
subdata <- subset(datframe, datframe$npcol > 30.0)
will select all the rows of datframe in which npcol is greater than 30 to create a new data frame
called subdata.
The subset function can be applied to data types like vectors and data frames.
As a third example, we will create a data frame with an (imaginary) experimental data. In
this data set, there are 7 genes for which some experimental measurements are available from 7
experiments.
We first create a data frame with these data vectors, and then use subset() function to create
a subset of data after filtering on individual column values.
The code below demonstrates this. The comments are self explanatory.

Copyright (c) from 2012 R. Srivatsan

19

6.1 Data filtering with logical statements6 LOGICAL STATEMENTS AND CONTROL LOOPS

# creating a vector of gene names


genes <- c("gene-1","gene-2","gene-3","gene-4","gene-5","gene-5","gene-6")
# creating a vector of gender
gender <- c("M", "M", "F", "M", "F", "F", "M")
# creating 7 data vectors with experimental results
result1 <- c(12.3, 11.5, 13.6, 15.4, 9.4, 8.1, 10.0)
result2 <- c(22.1, 25.7, 32.5, 42.5, 12.6, 15.5, 17.6)
result3 <- c(15.5, 13.4, 11.5, 21.7, 14.5, 16.5, 12.1)
result4 <- c(14.4, 16.6, 45.0, 11.0, 9.7, 10.0, 12.5)
result51 <- c(12.2, 15.5, 17.4, 19.4, 10.2, 9.8, 9.0)
result52 <- c(13.3, 14.5, 21.6, 17.9, 15.6, 14.4, 12.0)
result6 <- c(11.0, 10.0, 12.2, 14.3, 23.3, 19.8, 13.4)
# creating a data frame with this data.
# genes along rows, results along columns
datframe <- data.frame(genes,gender,result1,result2,result3,result4,
result51,result52,result6)
# adding column names to data frame
names(datframe) <- c("GeneName", "Gender", "expt1", "expt2", "expt3", "expt4",
"expt51", "expt52", "expt6")
# creating subset of data with expt2 values above 20
subframe1 <- subset(datframe, datframe$expt2 > 20)
# creating a subset of data with only Female gender
subframe2 <- subset(datframe, datframe$Gender == "F")
# creating a subset with male gender for which expt2 is less than 30
subframe3 <- subset(datframe, (datframe$Gender == "M")&(datframe$expt2 < 30.0) )
# printing the data frames
print("subframe1 : Rows with expt2 > 20")
print(subframe1)
print("subframe2 : Rows with gender Female")
print(subframe2)
print("subframe3 : Rows with Male gender and expt2 < 30.0")
print(subframe3)

When the above code lines are executed in an R script, the following output is created.
[1] "subframe1 : Rows with expt2 > 20"
GeneName Gender expt1 expt2 expt3 expt4 expt51 expt52 expt6
1
gene-1
M 12.3 22.1 15.5 14.4
12.2
13.3 11.0
2
gene-2
M 11.5 25.7 13.4 16.6
15.5
14.5 10.0
3
gene-3
F 13.6 32.5 11.5 45.0
17.4
21.6 12.2
Copyright (c) from 2012 R. Srivatsan

20

6.1 Data filtering with logical statements6 LOGICAL STATEMENTS AND CONTROL LOOPS

4
gene-4
M 15.4 42.5 21.7 11.0
19.4
17.9 14.3
[1] "subframe2 : Rows with gender Female"
GeneName Gender expt1 expt2 expt3 expt4 expt51 expt52 expt6
3
gene-3
F 13.6 32.5 11.5 45.0
17.4
21.6 12.2
5
gene-5
F
9.4 12.6 14.5
9.7
10.2
15.6 23.3
6
gene-5
F
8.1 15.5 16.5 10.0
9.8
14.4 19.8
[1] "subframe3 : Rows with Male gender and expt2 < 30.0"
GeneName Gender expt1 expt2 expt3 expt4 expt51 expt52 expt6
1
gene-1
M 12.3 22.1 15.5 14.4
12.2
13.3 11.0
2
gene-2
M 11.5 25.7 13.4 16.6
15.5
14.5 10.0
7
gene-6
M 10.0 17.6 12.1 12.5
9.0
12.0 13.4

Copyright (c) from 2012 R. Srivatsan

21

6.2 The if...else statement

6.2

6 LOGICAL STATEMENTS AND CONTROL LOOPS

The if...else statement

The if conditional statement helps us to execute certain commands subject to the condition
that a given statement is TRUE.
The general syntax of if statement is given by

if(condition) statement
Here if is a reserved key word. The condition typed inside braces refers to a logical statement.
The statement refers to a single or a set of statements which will be executed if the condition is true.
First the condition is logically evaluated and if it evaluates to TRUE, the statement is executed. If
the condition evaluates to FALSE, the statement is not executed.
Following script illustrates this:
a = 5.0
b = 10.0
if(a < b)
print("a is less than b")

Executing the above code prints this output in R prompt:


[1] "a ia less than b"
In the above code, the condition (a < b) is evaluated. Since a = 5.0 and b = 10.0, the condition
evaluates to TRUE, and hence the print statement is executed. If the condition evaluates to false,
print statement will not be executed.
When the condition in the if statement fails, we can provide an alternate path with else
statement followed by if. The general format is as follows:

if(condition) statement1 else statement2


When condition evaluates to TRUE, statement1 is executed. When it fails, statement2 is executed. See the illustration below:
a
b
c
d

=
=
=
=

5.0
10.0
15.0
20.0

if(a > b)
{
print("a is greater than b")
} else {print("a is less than b")}
Since a=5.0 and b=10.0, the condition (a > b) evaluates to false in the above code, and the
control is transferred to else condition and the following line is printed:
[1] "a is less than b"
A set of nested if...else if conditions can be set up as shown in the example below. The code
is self explanatory.

Copyright (c) from 2012 R. Srivatsan

22

6.2 The if...else statement

a
b
c
d

=
=
=
=

6 LOGICAL STATEMENTS AND CONTROL LOOPS

5.0
10.0
15.0
20.0

if(a > b)
{
k = k + 1
print("a is greater than b")
} else if(b > c)
{
k = k -1
print("b is less than c")
} else { print("both are not true")}
Since the conditions (a > b) and (b > c) both evalate to FALSE, the print statement inside the
final else is executed to print the following line:
[1] "both are not true"

Copyright (c) from 2012 R. Srivatsan

23

6.3 The for loop

6.3

6 LOGICAL STATEMENTS AND CONTROL LOOPS

The for loop

The for loop is useful for iteratively executing a group of instructions. The general format is
given by

for(variation in a sequence) expression


A sequence inside the pair of braces next to for is considered. For every element in the sequence,
the expression will be evaluated. The expression can be either a simple statement or a set of complex
statements which may or may not involve the iterated elements. See the example below:
for(i in 1:10)
{
print(i)
}
In the above code, the word 1:10 creates a seqence from 1 to 10. The keyword in refers to
individual elements in this sequence. The letter i is a variable name which refers to the element in
the sequence. Any name can be used instead of this. The statement for(i in 1:10) iterates through
every element from 1 to 10, and the print(i) is executed 10 times for every value of i in the sequence,
and values of i from 1 to 10 are printed. After the 10 executions, the for loop terminates, resulting
in the following printout:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
The for loop in R can be used in two ways. In the first way, we can access the elements of a
vector directly though an iteration inside for statement. See the example below:
avec <- c(2.1, 3.2, 4.3, 5.4, 6.5, 7.6)
for( num in avec)
{
num = num*10
print(num)
}
In the above script, a vector avec is created with 6 numbers. The statement for( num in avec)
assigns the elements of avec iteratively to the variable num. Inside the loop defined by a pair of curly
braces, every value of num is multiplied by 10 and printed, resulting in the following output:
[1] 21
[1] 32
[1] 43
[1] 54
[1] 65
[1] 76
Copyright (c) from 2012 R. Srivatsan

24

6.3 The for loop

6 LOGICAL STATEMENTS AND CONTROL LOOPS

In the second method, elements of a vector can be iteratively accessed inside the for loop by
the index generated inside. Carefully go through this script:
avec <- c(2.1, 3.2, 4.3, 5.4, 6.5, 7.6)
for( i in 1:length(avec) )
{
num = avec[i]*10
print(num)
}
in the above example, length(avec) returns a number 6 which is the length of the vector as defined
in the code. Thus, 1:length(avec) generates a sequence from 1 to 6. As we have seen before, the
for loop iterates through this sequence assigning values 1 to 6 for the variable i. Inside the loop,
avec[i]*10 accesses the values of vector avec using this index and multiplied by 10. The resulting
output is presented here:
[1] 21
[1] 32
[1] 43
[1] 54
[1] 65
[1] 76

Copyright (c) from 2012 R. Srivatsan

25

6.4 The while loop

6.4

6 LOGICAL STATEMENTS AND CONTROL LOOPS

The while loop

The while loop is used for executing a statement until a condition is valid. The loop terminates
when the condition fails. The general format is

while( condition ) expression


First the condition is tested. If it is TRUE, the expression is executed, which generally modifies
the condition. Then, the condition is again tested. If it is TRUE, the expression is executed. This
goes on until the condition fails. When this happens, the while loop is terminated. This is illustrated
here:
num = 100.0
while (num > 0.0)
{
num = num - 10.0
print(num)
}
In the above script, the condition tested inside the while loop is whether num is greater than
zero. Initially num = 100.0 and inside the loop, num is subtracted with 10 during every iteration.
Thus the while loop keeps on executing until num reaches a value below zero where it terminates.
This login results in the following numers:
[1] 90
[1] 80
[1] 70
[1] 60
[1] 50
[1] 40
[1] 30
[1] 20
[1] 10
[1] 0

Copyright (c) from 2012 R. Srivatsan

26

6.5 The break statement

6.5

6 LOGICAL STATEMENTS AND CONTROL LOOPS

The break statement

The break statement breaks out of a for or while control loops. When a break is encountered, control
is transferred to the first statement outside the inner-most loop. When combined with if condition,
the break can be effectively used for the conditional termination of for or while loops. Example below
illustrates this concept.
nevent = 100
for(i in 1:nevent)
{
if(i*12.0 > 200)
break;
print(i)
}
print("Now control is outside the for loop")
The value of the iterator i varies from 1 to nevent = 100 inside the for loop. There is an if
condition inside the for loop that tests whether i*12 is greater than 200 for every iterative value of i.
When this test is true, the break statement transfers the control outside the first enclosing for loop.
Since 17 12 > 200, the for loop should run for first 16 iterations when i varies from 1 to 16. This
code prints out the following lines as expected:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] "Now control is outside the for loop"
Similarly, we can break out of while loop under specific condition.

Copyright (c) from 2012 R. Srivatsan

27

USER DEFINED FUNCTIONS IN R

User defined functions in R

Like other languages, R has the ability to support used defined functions. An R function takes
objects and data variables as function argument and returns an object.
The function in R has the following structure:
myfunction <- function(argument1, argument2, ...) {
statements
return(object)
}
Here, function is a key word used for defining the function. argument1, argument2 etc. are
function arguments. They can be either simple variable types or objects like arrays, lists etc. Inside
the function, the arguments passed in are used by the function. The statements refers to such lines of
script. Finally, the function passes the computed object through a return statement. All the lines of
the code inside the function are enclosed in a pair of curly brackets following the keyword function.
myfunction refers to the name given to the function. When a function is caled, it will be called with
this name.
Once a function is defined, it can be called with the general syntax,
objectName <- myfunction(arg1, arg2, ...)
The object returned by myfunction is copied to the new object called objectName. As with any
other program logic, the data type, number and order of the arguments during the definition and
call of the function should exactly match. If not, error is flagged by R.
In the example script below, a function called normalize takes a vector avec and a number anum
as arguments. It divided each element of this vector by the number and take a square root. The
resulting vector with normalized number is then returned as a vector.
In the script, a vector called vec and a number called anumber are created and the function call
is given. The resulting vector is printed.
The script is given below, which is self explanatory:
# defining a function called normalize
normalize <- function(avec, anum) {
norvec <- (avec/anum)^0.5
return(norvec)
}

# Defining a vector and a number for data.


vec <- c(45.0, 67.0, 81.0, 57.0, 103.0, 122.0, 68.0, 98.0)
anumber = 21.5
# function call
normalvec <- normalize(vec, anumber)
# print the resulting vector returned by the function
print(normalvec)
Executing the above script generates the following output:
[1] 1.446728 1.765299 1.940990 1.628239 2.188766 2.382104 1.778424 2.134980
Copyright (c) from 2012 R. Srivatsan

28

8 PLOTS IN R

Plots in R

Various types of sophisticated plots can be created in R. For each plot type, a plot function has to
be called with parameters to set plot properties like range, axis, point type, line type, titles, legend
etc.
We will start with plots with points and lines. We sill discuss all aspects of plots in this section,
most of which are common to all plot types. In the later sections, we discuss specific feature of each
plot types.

8.1

Point and Line Plots

Point and line plots can be produced using plot() function, which takes x and y points either as
vectors or single number along with many other parameters. The parameters x and y are necessary.
For others, default value will be used in the absence of the parameter.
In the script below, we create 2 vectors called x and y with data points and call the plot
function. Obviously, vectors x and y should have equal number of elements:
# We create vectors of (X,Y) points and plot.
x <- c(1,2,3,4,5,6,7,8,9,10)
y = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
# Just plot points
plot(x,y)
The above code creates a plot with points. This is a simple plot in black color with both axes
marked with vector names. Ather parameters of the plot have been given default values.

Copyright (c) from 2012 R. Srivatsan

29

8.1 Point and Line Plots


8.1.1

8 PLOTS IN R

Joining points with lines

We will now add more features to this plot in steps. First, we join the points with a line
while retaining the points. This is achived by the parameter called type, which takes a character
value in double quotes. See the code:
# We create vectors of (X,Y) points and plot.
x <- c(1,2,3,4,5,6,7,8,9,10)
y = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
# plot points overlaied by lines
plot(x,y,type="o")

The other important options for type are:


type="p"
type="l"
type="b"
type="o"
type="h"
type="h"
type="s"
type="n"

plots points
plots lines
plots points and lines
plots points overlaid by
plot with histogram like
plot with histogram like
plot with stair steps
no plotting - blank plot

Copyright (c) from 2012 R. Srivatsan

lines
vertical lines
vertical lines
with axis marked (x,y)

30

8.1 Point and Line Plots


8.1.2

8 PLOTS IN R

Symbols for data points

Now we will choose a symbol and size for the data points. This is achieved by the
parameters pch (meaning point character) for point symbol, and cex for the symbol size. The code
is below:
# We create vectors of (X,Y) points and plot.
x <- c(1,2,3,4,5,6,7,8,9,10)
y = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
#
#
#
#
#
#

plot points overlaied by lines


plot with a symbol for data points and a size for them
pch means point character
cex represents size of point
give a line type and line width
lty is line type, lwd is line width

plot(x, y, type="o", pch=20, cex=1.1, lty=3, lwd=1)

Important values of these parameters are given here. For details, see R manual.
pch --->
cex --->

takes values between 0 to 24 to give 25 symbols.


In addiditon, 10 keyboard characters like "*", "+", "o" etc can be used.
A number indicating the amount by which plotting text and symbol
should be scaled relative to the default value.
Thus, cex = 1
is default size
cex = 1.5
is 150% of default size
cex = 0.5
is 50% of default size

Copyright (c) from 2012 R. Srivatsan

31

8.1 Point and Line Plots

[Note :

8 PLOTS IN R

cex.axis --> scales the axis


cex.lab
---> scales the label
cex.main --> scales main title
cex.sub ---> scales the subtitle

]
lty ---> a number like 1,2,3... indicating type of line like plain line,
dashed line, dot dashed etc.
See manual for details of each type.
lwd ---> number indicating the line width
lwd = 1
is default
lwd = 2
is twice the default width
lwd = 3
is thrice the width etc.

8.1.3

color the data points and lines

We should now add color to the data points and line we have plotted. This is done using
col parameter.
Modify the plot statement in our code as follows (we omit printing other lines of code)
plot(x,y,type="o", pch=20, cex=1.1, lty=3, lwd=1,col="dark red")
The col parameter can be defined in 3 ways:
col = 5

---> number from 1 to 657, each representing one color.


See manuals for this table

col="blue" ---> names of the color given as a string. See manual for list
col=#FFFFFF

----> octal color code format, as in HTML language etc.

Copyright (c) from 2012 R. Srivatsan

32

8.1 Point and Line Plots


8.1.4

8 PLOTS IN R

adding main title to the plot

Now we will add main title to the graph with its own color, font and sizes.
For this we use main parameter. This takes a string value which will be displayed as the main
title of the plot at the top.
The col parameter can be defined in 3 ways:
col.main

--->

sets the color of main title


Takes same values as col

font.main --->

sets the font of main title.


font.main = 1 for plain
font.main = 2 for bold
font.main = 3 for italic
font.main = 4 for bold italic

cex.main --->

scales the main title, as explained before.

With these, the plot call for a plot with main title is as below:
plot(x,y,type="o", pch=20, cex=1.1, lty=3, lwd=1,col="dark red",
main="Plot of Data-1", col.main="blue", font.main=4, cex.main=1.2 )

8.1.5

Adding subtitle to the plot

A subtitle at the bottom of the plot can be added with sub parameter, which takes a string value
and displays it at the bottom of the plot. The other properties of this text are set with parameters
col.sub, font.sub, cex.sub which take usual values. See the plot statement below:
plot(x,y,type="o", col="dark red", main="Plot of Data-1",
col.main="blue", font.main=4, cex.main=1.2,
sub = "This is sub title", col.sub="blue", font.sub=7,
cex.sub=1.0)

8.1.6

Axes titles and their properties

We will also add axis titles with chosen color, size and font. The titles to the X and Y axis can
be given with xlab, ylab parameters (meaning X-label and Y-label). These two parameters take
string values which are displayed as labels for X and Y axis. The font type, color and size are set
through font.lab, col.lab, cex.lab whose values are similar to the ones we saw before.
Here is the plot statement with X and Y labels set:
x <- c(1,2,3,4,5ha,6,7,8,9,10)
y = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
plot(x,y,type="o", pch=20, cex=1.1, lty=3, lwd=1,col="dark red",
main="Plot of Data-1", col.main="blue", font.main=4, cex.main=1.2,
sub = "This is sub title", col.sub="blue", font.sub=7, cex.sub=1.0,
xlab="This is X-axis Label", ylab="This is Y-axis Label", col.lab="red",
font.lab=6, cex.lab=1.1)

The plot with axis titles and subtitle drawn for the above statement is shown here:
Copyright (c) from 2012 R. Srivatsan

33

8.1 Point and Line Plots

8 PLOTS IN R

Figure 1: Different types of data

Copyright (c) from 2012 R. Srivatsan

34

8.1 Point and Line Plots


8.1.7

8 PLOTS IN R

Fixing the Ranges of X,Y axis

When plot function is called with data, it computes the ranges of X and Y axis based on the data.
Sometimes, we may require to fix the range of the data by hand, rather than by the range of data.
The ranges of X and Y axis can be varied using xlim, ylim parameters. (xlim means xlimit
and ylim means ylimit).
The parameters xlim and ylim take a 2 element vector as input. The first number represents the
beginning of range and second represents end of range. Thus, xlim=c(1,10) means an X axis range
from 1 to 10. Similarly for ylim.
See the plot call below. This plots with x axis in the range 1 to 20 and y axis in the range 1 to
150.
Xvalue <- c(1,2,3,4,5,6,7,8,9,10)
Yvalue = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
# Ranges for X,Y axis (other parameters to default, for clarity.)
plot(Xvalue, Yvalue, xlim=c(1,20), ylim=c(1,150))

Copyright (c) from 2012 R. Srivatsan

35

8.1 Point and Line Plots


8.1.8

8 PLOTS IN R

Writing text inside a plot

.
We can write text inside the plot for explanation and labelling curves using text parameter. This
parameter takes 2 numbers for the (x,y) coordinates of starting point of plot, and a text string which
is displayed inside the plot starting from given (x,y). Note that the units of these coordinates
are same as units of x and y axis used in the plot.
If text labels to be written near many points in the plot, we can give x and y as vectors, and
another vector of strings as label. In this case, all these 3 vectors should be of same length. The
general format of text command is,
text(x, y, textString, col=color, cex=value, font=fontType)
The text() call can be given inside plot() function as well as after the call to the plot().
See plot call below for all these:
Xvalue <- c(1,2,3,4,5,6,7,8,9,10)
Yvalue = c(12, 23, 36, 48, 53, 64, 78, 89, 91, 110)
# To add text to a plot. We add text at a particular
#
location (2, 105) in the plot.
# First look at the plot, and decide the units for (x,y)!!
plot(Xvalue,Yvalue,text(2,100,"This is text at (2,100)"))
# Here, we place a text near every point, at 0.3 unit
# from x disrance of each point.
cch <- c("a","b","c","d","e","f","g","h","i","j")
plot(Xvalue,Yvalue, text(Xvalue+0.3, Yvalue, cch, col="blue"))
# We will draw both - text at a particular location as well as text at
#
every point as labels.
plot(Xvalue,Yvalue, text(Xvalue+0.3, Yvalue, cch, col="blue"))
text(2, 100, "This is text at (2,100)", col="red")

Copyright (c) from 2012 R. Srivatsan

36

8.2 Multiple graphs on the same plot with legends

8.2

8 PLOTS IN R

Multiple graphs on the same plot with legends

To plot more than one curve on a single plot in R, we proceed as follows. First, create the first
plot. For the subsequent plots, do not use plot function. Instead, each one of the subsequent curves
are plotted using points and lines functions, whose calls are similar to the plot function. See the
code below:
# multiple graphs on the same plot with legends
x <- c(1,2,3,4,5,6,7)
y1 <- c(1,4,9,16,25,36,49)
y2 <- c(1, 5, 12, 21, 34, 51, 72)
y3 <- c(1, 6, 14, 28, 47, 73, 106 )
# First curve is plotted
plot(x, y1, type="o", col="blue", pch="o", lty=1)
# second curve on same plot -- use points() and lines() function
points(x, y2, col="red", pch="*")
lines(x, y2, col="red",lty=2)
# third curve on the same plot. Use points() and lines() function.
points(x, y3, col="dark red",pch="+")
lines(x, y3, col="dark red", lty=3)

Copyright (c) from 2012 R. Srivatsan

37

8.2 Multiple graphs on the same plot with legends

8 PLOTS IN R

Legends can be added to the plot within a box at a desired location using legend function.
This function takes the following parameters:
X and Y axis locations in the graph coordinates.
A vector of string consisting of legends, typically one per graph
A vector of colors for col parameter. These colors are same as the ones used in the graph
A vector of character symbols for pch parameter, same as the ones used as pch parameters
in the plots.
A vector of line types to be given to lty parameter, same as the one used for plotting curved
In the code below, we have added legend to the above plot. Full code is given.
# multiple graphs on the same plot with legends
x <- c(1,2,3,4,5,6,7)
y1 <- c(1,4,9,16,25,36,49)
y2 <- c(1, 5, 12, 21, 34, 51, 72)
y3 <- c(1, 6, 14, 28, 47, 73, 106 )
# First curve is plotted
plot(x, y1, type="o", col="blue", pch="o", lty=1)
# second curve on same plot -- use points() and lines() function
points(x, y2, col="red", pch="*")
lines(x, y2, col="red",lty=2)
# third curve on the same plot. Use points() and lines() function.
points(x, y3, col="black",pch="+")
lines(x, y3, col="black", lty=3)
# Adding a legend inside box at the location (2,40) in graph coordinates.
legend(2,40,c("y1","y2","y3"), col=c("blue","red","black"),
pch=c("o","*","+"),lty=c(1,2,3))

Copyright (c) from 2012 R. Srivatsan

38

8.3 2D Scatter Plot

8.3

8 PLOTS IN R

2D Scatter Plot

The 2D scatter plot is same as the plots with points. We just have to pass X and Y vectors for the
two coordinates to the plot function as arguments. All other settings are similar. The example code
is given here:
# R Scatter plot demo
# Generate 10000 random numbers from gaussian distribution
Xrandom <- 10*rnorm(10000)
# Generate 10000 numbers from Gaussian
Yrandom <- 10*rnorm(10000)
# plot the scatter plot. We choose color in Hexadecimal system
plot(Xrandom, Yrandom, cex=0.2, col="#FF9999", main="2D Scatter plot")

Copyright (c) from 2012 R. Srivatsan

39

8.4 Histogram

8.4

8 PLOTS IN R

Histogram

We can generate histograms in R using hist function. The arguments of this function are almost
same as that of plot
In a histogram, we have to decide the number of bins beforehand. The function hist has a
parameter called breaks. This is the number of bins in the histogram.
In the simplest code below, we generate 10000 points from a Gaussian distribution and histogram
it.
# Generate Gaussian deviates with mean=5, and SD=3
data <- rnorm(10000, mean=5, sd=3)
#plot histogram with 40 bins
hist(data, breaks=40, col="red", xlim=c(-10,20), ylim=c(0,800),
main="Simulated Data", col.main="blue")
}

The above script creates a histogram of these 10000 data points on the screen.

Copyright (c) from 2012 R. Srivatsan

40

8.4 Histogram
8.4.1

8 PLOTS IN R

Accessing the results of histogram

We can also access the data of the histogram through the object returned by the histogram. For
the above data, try this:
# Generate Gaussian deviates with mean=5, and SD=3
data <- rnorm(10000, mean=5, sd=3)
#plot histogram with 40 bins and get the returned histogram object.
hdat <- hist(data, breaks=40, col="red", xlim=c(-10,20), ylim=c(0,800),
main="Simulated Data", col.main="blue")
# print the contents of hist, which has histogram data
print(hdat)
# we can access (for example) first 10 elements of bin data
print( hdat$breaks[1:10])
# we can access (for example) first 10 elements of counts on bins
print(hdat$counts[1:10])
# First 10 elements of Intensities
print(hdat$intensities[1:10])
## First 10 elements of Kernal Densitieshdat
print(hdat$density[1:10])
# First 10 elements of mid values
print(hdat$mids[1:10])

Copyright (c) from 2012 R. Srivatsan

41

8.5 Box-Whisker Plot

8.5

8 PLOTS IN R

Box-Whisker Plot

The Box-Whisker plot creates a pictorial representation of statistical spread in the data. In R,
the function boxplot creates this plot. This function can take many data types as inputs. We can
pass a vector, a list of vectors or a data frame made up of column vectors as input. For each one of
the columns of data, a Box-Whisker diagram is created.
We first create a single vector of data and call the boxplot function:
#
x
y
z

Generate three vectors


<- c(1,5,7,8,9,7,5,1,8,5,6,7,8,9,8,6,7,8,10,19,6,7,8,6,4,6)
= x*1.5
= x*2.3

# We call boxplot with single vector


boxplot(x, range=0.0, horizontal=FALSE, varwidth=TRUE, notch=FALSE,
outline=TRUE, names=c("A"), boxwex=0.3, border=c("blue"), col=c("red"))
}

The above script creates a single boxplot in the screen.

Various parameters of the function are explained below:


Copyright (c) from 2012 R. Srivatsan

42

8.5 Box-Whisker Plot

range

8 PLOTS IN R

--->

this determines how far the plot whiskers extend out from the
box. If range is positive, the whiskers extend to the most
extreme data point which is no more than range times the
interquartile range from the box. A value of zero causes the
whiskers to extend to the data extremes.
horizontal ----> A TRUE value for this will make the plot horizontal.
Default is vertical
varwidth
----> if varwidth is TRUE, the boxes are drawn with widths
proportional to the square-roots of the number of
observations in the groups.
notch ----> if notch is TRUE, a notch is drawn in each side of the
boxes. If the notches of two plots do not overlap this is
strong evidence that the two medians differ
outline ----> if outline is not true, the outliers are not drawn
names ---> group labels which will be printed under each boxplot. Can
be a character vector or an expression
boxwex ---> a scale factor to be applied to all boxes. When there are
only a few groups, the appearance of the plot can be improved
by making the boxes narrower
border ---> an optional vector of colors for the outlines of the
boxplots.
col ---> Contain colors to be used to colour the bodies of the box plots

We can also call the boxplot function with lists of vectors and data frames in which column vectors
are in the form of matrix. In the script below, we first create a list of vectors and call boxplot. Next,
we create a data frame of the same numeric vectors are call boxplot. Both the calls create a plot
with three boxplots, each for one column of data.
#
x
y
z

Generate three vectors


<- c(1,5,7,8,9,7,5,1,8,5,6,7,8,9,8,6,7,8,10,19,6,7,8,6,4,6)
= x*1.5
= x*2.3

# we create a list of vectors and call box plot with it.


# Three Box-Whiskers are plotted, for x, y and x vectors
alis <- list(x,y,z)
boxplot(alis, range=0.0, horizontal=FALSE, varwidth=TRUE, notch=FALSE,
outline=TRUE, names=c("A","B","C"), boxwex=0.3,
border=c("blue","blue","blue"), col=c("red","red","red"))
# we create a data frame of these vectors.
# Box plot can take many data structures.
# See the effect of notch=FALSE.
aframe <- data.frame(x,y,z)
boxplot(alis, range=0.0, horizontal=FALSE, varwidth=TRUE, notch=TRUE,
outline=TRUE, names=c("A","B","C"), boxwex=0.3,
border=c("blue","blue","blue"), col=c("red","red","red"))

Copyright (c) from 2012 R. Srivatsan

43

8.6 Pie Charts

8.6

8 PLOTS IN R

Pie Charts

The Pie charts in R can be drawn using pie function of the plot library. This function is called with
a vector x and a vector of colors for these segments. We can also choose the data segments to be
drawn clockwise or anticlockwise, which is the default. In the script below, we draw 2 pie charts,
onw without legend and simple labels and the other with legend and percentages marked:
result <- c(10, 30, 60, 40, 90)
# Create a Pie chart with a heading and rainbow colors
pie(result, main="Experiment-1", col=rainbow(length(result)),
label=c("Mol-1","Mol-2","Mol-3", "Mol-4", "Mol-5"))
# Calculate the percentage of sections and put it in the label
alabels <- round((result/sum(result)) * 100, 1)
alabels <- paste(alabels, "%", sep="")
colors <- c("blue", "green","red", "white", "black")
pie(result, main="Experiment-1", col=colors, labels=alabels, cex=0.8)
# draw the legend
legend(-1.2, 1.0, c("molecule-1", "molecule-2", "molecule-3",
"molecule-4", "molecule-5"), fill=colors)

Copyright (c) from 2012 R. Srivatsan

44

8.7 Bar-plots

8.7

8 PLOTS IN R

Bar-plots

In bar plots, individual categoroes are represented as vertical bars standing next to each other for
quantitative comparison. In R, the barplot function is called to create bar plots. This function can
take a vector or a matrix as data input. Code below shows this:
# We plot various bar charts here
# Define a data vector
data <- c(1,3,6,4,9)
#bar plot the vector -- simple plot with no legends and colors
barplot(data, main="Cancer-data", xlab="Days", ylab="Response Index",
names.arg=c("grp-1","grp-2","grp-3","grp-4","grp-5"),
border="blue", density=c(10,20,30,40,50))
# Create a data frame
col1 <- c(1,3,6,4,9)
col2 <- c(2,5,4,5,12)
col3 <- c(4,4,6,6,16)
data <- data.frame(col1,col2,col3)
names(data) <- c("patient-1","patient-2","patient-3")
# barplot with colors
barplot(as.matrix(data), main="Experiment-1", ylab="dosage", beside=TRUE,
col=rainbow(5))
#Add legends
legend("topleft", c("day1","day2","day3","day4","day5"), cex=1.0, bty="n",
fill=rainbow(5))

These two plots are shown in the next page:

Copyright (c) from 2012 R. Srivatsan

45

8.7 Bar-plots

Copyright (c) from 2012 R. Srivatsan

8 PLOTS IN R

46

8.8 Multiple plots in a single figure

8.8

8 PLOTS IN R

Multiple plots in a single figure

We can place multiple plots in a single figure. For this, we use par() function in R. The function
par(mfrow) sets up plots one by one along rows, and par(mfcol) sets up plots one by one along the
columns.
For example, par(mfrow, c(2,3)) sets up a plots with first three plots along first row and next
three plots along the second row. When this command is given, blank screen is created by the device.
The plots pltted are one by one alloted the positions as they are plotted
The code below splits the screen into 2 rows and 3 columns to contain 6 plots. The comments
make the code easy to understand.
# This script demonstrates multiple plots in a single figure.
## Set up plotting in two rows and three columns.
## Set the outer margin so that bottom, left, and right are 0 and
## top is 2 lines of text.
## Plotting goes along rows first.
## To plot along columns, usde "mfcol" instead of mfrow.
par( mfrow = c( 2, 3 ), oma = c( 0, 0, 2, 0 ) )
## Call the first plot. This is automatically located in row 1, column 1:
plot( rnorm( n = 10 ), col = "red", main = "plot 1", cex.lab = 1.1 )
## Call the second plot. This is automatically located in row 1, column 2:
plot( runif( n = 10 ), col = "blue", main = "plot 2", cex.lab = 1.1 )
##Call the third plot. This is located in row 1, column 3:
plot( rt( n = 10, df = 8 ), col = "springgreen4", main = "plot 3",
cex.lab = 1.1 )
## Call the fourth plot. It is located in row 2, column 1:
plot( rpois( n = 10, lambda = 2 ), col = "black", main = "plot 4",
cex.lab = 1.1 )
## plot.new() skips a position.
plot.new()
## The fifth plot is located in row 2, column 3:
plot( rf( n = 10, df1 = 4, df2 = 8 ), col = "gray30", main = "plot 5",
cex.lab = 1.1 )
# Title is given to the whole of the plot.
title("Many distributions", outer=TRUE)
The plot is shown in the next page.

Copyright (c) from 2012 R. Srivatsan

47

8.8 Multiple plots in a single figure

8 PLOTS IN R

Multiple plots in a single figure.

Copyright (c) from 2012 R. Srivatsan

48

8.8 Multiple plots in a single figure

8 PLOTS IN R

Multiple plots by splitting the screen


Another way of creating multiple plots on the same screen is to split the screen into regions and
plotting. We use split.screen() function for this.
The general usage is like,
split.screen(figs = c(2,1))
which splits the screen into 2 rows and 1 column. We get screens 1(top) and 2(bottom). We can
further split the top and bottom screen again. For example, command below splits screen 1 (top
screen) into one row and three columns. We get screens 3,4,5:
split.screen(figs = c(1,3), screen = 1)
We now split bottom screen (scrren 2) into 2 columns to get screen 6 and 7.
split.screen(figs = c(1,3), screen = 1)
now when we start plotting one by one, plots start from screen 3 and go upto 7. Thus we get 3
plots on top row, and 2 on bottom row, achieving an odd distribution.
See the code below, which is self explanatory because of comments.
## Split the screen into two rows and one column, defining screens 1 and 2.
split.screen( figs = c( 2, 1 ) )
## Split screen 1 into one row and three columns, defining screens 3, 4, and 5.
split.screen( figs = c( 1, 3 ), screen = 1 )
## Split screen 2 into one row and two columns, defining screens 6 and 7.
split.screen( figs = c( 1, 2 ), screen = 2 )
## The first plot is located in screen 3:
screen( 3 )
plot( rnorm( n = 100 ), col = "red", main = "plot 1" )
## The second plot is located in screen 4:
screen( 4 )
plot( runif( n = 100 ), col = "blue", main = "plot 2" )
## The third plot is located in screen 5:
screen( 5 )
plot( rt( n = 10, df = 8 ), col = "springgreen4", main = "plot 3" )
## The fourth plot is located in screen 6:
screen( 6 )
plot( rpois( n = 10, lambda = 2 ), col = "black", main = "plot 4" )
## The fifth plot is located in screen 7:
screen( 7 )
plot( rf( n = 10, df1 = 4, df2 = 8 ), col = "gray30", main = "plot 5" )
##

Close all screens.

close.screen( all = TRUE )

The plot created by the above code is shown in the next page.

Copyright (c) from 2012 R. Srivatsan

49

8.8 Multiple plots in a single figure

8 PLOTS IN R

Multiple plots of variable sizes in a single figure drawn by splitting the screen:

Copyright (c) from 2012 R. Srivatsan

50

9 INPUT/OUTPUT OPERATIONS IN R

Input/Output operations in R

When we start R, it starts an interactive session by default. The user gives input from keyboard
and output is printed on the screen. We can also take input from files and scripts into R session and
write into external files from R session. This section explains various Input/Output operations in R.
Including a script in current session source() function
Using source() function, we can include a script in R session or into another script. For exmaple,
the command in R session
source(datfile.r)
include the whole contents of datfile.r inside current session. Subsequently, we can use every
object declared inside datfile.r in the current session This also can be used to source one script
inside another script, and make the second script to use the variables and objects in the sourced file.
Example below illustrates this:
script datfile.r
PI = 3.14
Epsilon = 0.034
K = 1.788
KMM = 2*PI*Epsilon

Now, the following script includes the first script and uses the variables in it:
script calcul.r
source("datfile.r")
cc = KMM * 29.5
print(cc)

Copyright (c) from 2012 R. Srivatsan

51

9 INPUT/OUTPUT OPERATIONS IN R
Writing the output of current session into a file sink() function
Using sink() function, we can direct the output of session to the terminal. This function can
also takes arguments like : output file name as a string, append parameter that decides whether to
append to existing file or overwrite it and a split parameter that allows printing to the screen when
TRUE. Once the function is called with file name, all the subsequent print statements write to the
file. Another call sink() with no parameter terminates the writing to the external file. See code
below:
# call the sink function.
# append=FALSE means Dont append to the existing file
# split=FALSE means dont write on screen
sink("test.txt", append=FALSE, split=FALSE)
# Following print statements written to the file test.txt
for(i in 1:10)
{
print("Start Printing")
i = i*10 + 5
print(i)
}
# now return output to terminal
sink()
# Now the following statement will not be printed
print("Hi, this is over")
}

Copyright (c) from 2012 R. Srivatsan

52

9 INPUT/OUTPUT OPERATIONS IN R
Writing R objects into external files
To redirect the graphic output plotted by R into a file, we cannot use sink() function. For this,
there are many functions given by various libraries. Here we demonstrate the use of two functions,
dput and save for writing R data structures into files, and retrieving the data frem them.
# Creating some vectors
avec1 <- c(1,2,3,4,5,6)
avec2 <- c(10,20,30,40,50,60)
avec3 <- c(100,200,300,400,500,600)
svec <- c("aa","bb","cc","cc","dd")
cvec <- c("A","B","C")
astr <- "ATGCCTGAACGCCGGATT"
# create a data frame with vectors
aframe <- data.frame(avec1,avec2,avec3)
# create a list of this data frame and 3 more vectors
alis <- list(aframe, svec, cvec, astr)
# another vector
kvec <- c("AAA","BBB","CCC")

# Write into a text file as simple ASCII using dput()


dput(alis, "test.out")
# Read it back into R by dget()
dd <- dget("test.out")
# Access the member data structures with dd
print(dd)
print(dd[[1]]$avec1)
print(dd[[1]]$avec1[3])
print(dd[[2]])

# Write two R objects into file using save() function. See help for options.
# We can save many such objects
save(list=c("alis", "kvec"), file="test1.out")
# load them into R using load() function
load("test1.out")
# Once loaded, just use them by name!!
print(alis[[1]]$avec1)
print(alis[[2]])

Copyright (c) from 2012 R. Srivatsan

53

9 INPUT/OUTPUT OPERATIONS IN R
Writing R plots into image files
Many libraries exist for writing the R plots produced on screen into image files line .png, .jpeg,
PDF etc.
The image is plotted and svaed in the following steps.
(1) Open a device for plotting. The default device is screen itself.
(2) call an image function in R with image filename.
(3) plot the image with plot() function for example. This is also written to the file name in
image function.
(4) close the device by image.off() call. Now image is saved in the directory given for file
name.
See the code below:
# For writing plot into jpeg file
jpeg("figure1.jpeg")
plot(c(1,2,3,4), c(1,2,3,4))
dev.off()
# For writing plot into png file
png("figure1.png")
plot(c(1,2,3,4),c(1,2,3,4))
dev.off()
# For writing into bmp file
bmp("figure1.bmp")
plot(c(1,2,3,4), c(1,2,3,4))
dev.off()
# For writing into PDF file
pdf("figure1.pdf")
plot(c(1,2,3,4),c(1,2,3,4))
dev.off()

Copyright (c) from 2012 R. Srivatsan

54

Potrebbero piacerti anche