Sei sulla pagina 1di 46

Getting started with R

Alok Srivastava
CRRAO-AIMSCS, Hyderabad, INDIA
Jan 08, 2015
Getting Started with R

Alok Srivastava

Lecture - 02

Topics
Topics
1

How to use R

Data types in R

Data creation

Data curation

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Topics
Topics
1

How to use R

Data types in R

Data creation

Data curation

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Topic 1 : How to use R

Getting
BasicsStarted
of R programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Check your Working Path


R installation directory
R.home()

# R installation directory {which R}

Check your working path


getwd()

Linux

# To get the location of current working


directory
H1

/home/alok/WorkShop/2014/Workshop_UoH_14_Jan/Lecture2

Windows
C:/Users/Alok/Documents

Itll help to load the path


Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Change your Working Path


Change your working path
setwd()

# To change the location of working


directory
H2

Recheck your working directory


getwd()

Strings

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Working with Text editor

- Use hash # to comment

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Use R as Calculator
Airthmetic Operators
Addition
Subtraction

Multiplication
Division

Exponent
Modulus (x mod y)

^ OR **

Integer Division

x%/%y

H3

/
x%%y

H4
Variable

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

H5
Mulitple Variable

Lecture
Lecture
101212
-02
02

Workspace in R
Save Workspace
Save workspace

save.image()

unlink(.RData)

save.image(mywork.Rdata)

load(mywork.Rdata)

savehistory(file=abc)

loadhistory(file=abc)

Quit Session
q()

Getting
BasicsStarted
of R programming
Programming
with R

# Default file .Rdata


# To remove
# In specific file
# Load previous work
# Save in txt file, default .Rhistory
# Load history from file

# It will ask to save the workspace


image? [y/n/c]

Alok Srivastava

Lecture
Lecture
101212
-02
02

Character variable in R
- to store name or categorical variable
- Use double quotes to store variable

H6

- Use with c operator to store multiple values

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Getting help in R
Within R:
The ? Command can be used to get help on a specific command within R

? keyword or help(keyword)
# Command search
H6-b

library(help=pamr)

??keyword or help.search(keyword) # If dont know function

apropos("mean")
# list all functions containing
string mean
Search library functions
library(help=base)
# List of base function available with R console
library(help=samr)
# To display the list of function available in
package samr. But to display the help page, first we have to load the
library.
Documentation
Help files can be accessed in the text file or html format.
Manuals, reference cards, tutorials and news about recent developments
are available at http://www.r-project.org/other-docs.html
Online help
R-help : https://www.stat.math.ethz.ch/pipermail/r-help/
Bioconductor-help : https://stat.ethz.ch/mailman/listinfo/bioconductor
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

PracticeSession:1
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Topic 2 : Data Types in R

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Variable types in R

Numeric
Integer

x=6
is.real(x)
is.integer(x)

# TRUE
# FALSE

Logical
x = c(1,2,3,4,5); y = (x<3);

Character String
x = c("M)
x = c("Kinjal","Madhav","Roopa","Suraj")
x = c("gene1","gene2","gene3","gene4",gene5)

List : collection of several objects of any type


x1 = c("gene1","gene2","gene3","gene4",gene5)
x2 = c(2,4,7,9,11)
x = list(x1,x2)

Complex arithmetic is also supported in R

z = complex(real = rnorm(10), imag = rnorm(10))


Re(z)
Im(z)

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Vectors in R
Vectors may have mode logical,numeric,character.
Examples of Vectors
x = c(45, 90, 135 )
y = c("Kinjal","Madhav","Roopa","Suraj")
z = c(" gene1 " , " gene2 " , " gene3 " , " gene4" )

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Arrays in R
Vectors may also have mode logical,numeric,character.
Two dimension array is same as matrix
Examples of Two dimension array
x = array(data, dim)
x = array(1:3, c(2,4))

Examples of Three dimension array


x = array(1:3, c(2,4,2)) # 2, represent the dimension
x = array(1:3, c(2,4,3))

Getting
BasicsStarted
of R programming
Programming
with R

# 3, represent the dimension

Alok Srivastava

Lecture
Lecture
101212
-02
02

Matrices in R
Col1

Col2

Col3

Col4

Row1
Row2
Row3

Is a matrix

Dimension : 3 X 4

Row names : Row1, Row2, Row3

Column names : Col1, Col2, Col3

Row size: 3

Column size: 4

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data frames in R
Col1

Col2

Col3

Col4

Row1
Row2
Row3

Data frame is a generalization of a matrix

Different column may have different data types

All elements of any column must ,have the same datatype, i.e.
all numeric, or all factor, or all character, or all logical
Use for R modeling and graphical functions
If the data is read in using the command read.csv, read.txt etc, it will
automatically be saved as a data frame.

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data lists in R

Row1
Row2
Row3

Data list is arrangement of different lists

Different rows may have different number of variables

All elements of any rows must ,have the same datatype, i.e.
all numeric, or all factor, or all character, or all logical

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Topic 3 : Data Creation

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Variable types in R

Numeric
Integer

x=6
is.real(x)
is.integer(x)

# TRUE
# FALSE

Logical
x = c(1,2,3,4,5); y = (x<3);

Character String
x = c("M)
x = c("Kinjal","Madhav","Roopa","Suraj")
x = c("gene1","gene2","gene3","gene4",gene5)

Complex arithmetic is also supported in R

z = complex(real = rnorm(10), imag = rnorm(10))


Re(z)
Im(z)

H7
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Vectors in R
Vectors may have mode logical,numeric,character.
Examples of Vectors
x = c(45, 90, 135 )

H8

y = c("Kinjal","Madhav","Roopa","Suraj")
z = c(" gene1 " , " gene2 " , " gene3 " , " gene4" )

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data Creation : Vectors and Arrays


Data creation :

c(1,2,3,4)

# combine argument to create a vector

from:to

# create sequence from to to

seq(from,to,by=diff) # create airthmetic series

rep(c(1,2,3,4),4)

# Replicate Elements of Vectors

array(1:3, c(2,4))

# create array of size 2X4

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

H9

Lecture
Lecture
101212
-02
02

Arrays in R
Vectors may also have mode logical,numeric,character.
Two dimension array is same as matrix
Examples of Two dimension array
x = array(data, dim)

H10

x = array(1:3, c(2,4))

Examples of Three dimension array


x = array(1:3, c(2,4,2)) # create two array of size 2X4
# 2, represent the dimension

x = array(1:3, c(2,4,3))

Getting
BasicsStarted
of R programming
Programming
with R

# 3, represent the dimension

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data creation : Matrices in R

A = 1:3
B = 4:6
c = 7:9
# cbind combined object by Column
X = cbind(a,b,c)

H11

# rbind combined object by Row


Y = rbind(a,b,c)
# Matrix by defining number of rows and columns
Z = matrix(c(1,4,6,2,3,7.8), nrow=2, ncol=3, byrow=T)
Z = matrix(c(1,4,6,2,3,7.8), nrow=2, ncol=3, byrow=F)
expression_data = matrix(c(1,2,3, 11,12,13), nrow = 2, ncol=3,
byrow=TRUE,dimnames = list(c("gene1", "gene2"),c("Sample.1",
"Sample.2", "Sample.3")))
# To generate random matrix of 10 rows and 5 columns
replicate(5, rnorm(10))

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data Creation : Data frames


Data Frame :

go.term = c (GO0009117,GO0009253,GO0009354)
gene.count = c(15,18,25)
avg.expression.value = c(0.5432,0.2371,0.7867)
go.term.rank.rank= c(2,1,3)

H12

mydata = data.frame
(go.term,gene.count,avg.gene.expression,go.term.rank)

mydata2 =
data.frame(rank=1:4,gene_name=c("ddr1","apr2","bac","p53"),n=
c(.90,.75,.52,.31));

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data Creation : Data Lists


Data List :

genelist1 = c (abc1,abc2)
genelist2 = c(brca1,brca2,tp53,mdm2)
genelist3 = c(apr,erpn,myc)

mylist = list (genelist1,genelist2,genelist3)

mylist2 =
list(rank=1:4,gene_name=c("ddr1","apr2","bac","p53"),n=c(.90,.75,.52,.
31));

H13

List : collection of several objects of any type


x1 = c("gene1","gene2","gene3","gene4",gene5)
x2 = c(2,4,7,9,11)
x = list(x1,x2)

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

PracticeSession:2
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Topic 4 : Data Curation

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Variable Information

is.na (x)
is.array(x)
is.vector(x)
is.matrix(x)
is.data.frame(x)
is.numeric(x)
is.complex(x)
is.character(x)

Getting
BasicsStarted
of R programming
Programming
with R

# To identify missing values


# To store one, two or more dimension data
# One dimension array
# Two dimension array

H14

Alok Srivastava

Lecture
Lecture
101212
-02
02

Variable conversion

as.vector(x)
as.matrix(x)
as.data.frame(x)
as.character(x)

Getting
BasicsStarted
of R programming
Programming
with R

H14

Alok Srivastava

Lecture
Lecture
101212
-02
02

Variable attributes
Attributes
length(x)
dim(x)
dimnames(x)

Getting
BasicsStarted
of R programming
Programming
with R

# Length of vector
# Dimension of matrix
# Dimension names

Alok Srivastava

H15

Lecture
Lecture
101212
-02
02

Missing Values
Variables of each data type (numeric, character, logical) can also take the
value NA: not available.
- NA is not the same as 0
- NA is not the same as
- NA is not the same as FALSE

H16

For any operations (calculations, comparisons) that involve NA, we have to


logically indicate whether missing values should be considered or removed.
> NA==1
[1] NA
> 1+NA
[1] NA
> max(c(NA, 4, 7))
[1] NA
> max(c(NA, 4, 7), na.rm=T)
[1] 7
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Data selection and manipulation


Slicing and Extracting Data : Vectors
x[n]
# nth element
x[-n]
# all but nth element
x[1:n]
# first n element
x[-c(1:n)]
# elements from n+1 to end
x[c(2,5,7)]
# specific elements
x[x>5]
# all elements greater than 5
x[x<9]
# all elements less than 9
x[x>5 & x < 9]
# all elements between 5 and 9
x[x %in% c("ab","sh")]
# elements in given vector

H17

Data selection from list and data frame :


x[[n]]
# nth element of the list
x$name
# extract x attribute with variable name
attributes(x)
# attributes of data frame

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Basic Matrix operation

Matrix curation :

x[r,c]

# element at rth row and cth column

x[r,]

# row r

X[,c]

# column c

x[c(2,5,8)]

# To select specific column

Matrix operation:

dim(x)

x+y

dim(x)

t(x)

diag(x)

nrow(x)

rownames(x)

rowSums(x)

rowMeans(x)

cor(x)
var(x)

Getting
BasicsStarted
of R programming
Programming
with R

H18

# Dimesnion of matrix
# Sum of matrix x and y
# Dimesnion of matrix
# Transpose of matrix
# Diagonal element of matix
# numer of rows
# row names
# row sum
# row means
# correlation matirx
# variance matrix
Alok Srivastava

Lecture
Lecture
101212
-02
02

Data selection and manipulation


Data selection and manipulation :
X*2
# scalar multiplication
length(x)
# length of the vector
sum (x)
# sum of element in vector
max, min
# max and min values
rev
# reverse in order
sort
# sorting
unique
# unique
rle
# run length encoding
table(a,b)
# comparison table
sample(x)
# for random sampling of the data
which.max(x)
# return index of the max elements of x.
Which.min(x)
# return index of the min elements of x.
Which (x == a) # returns a vector of indices of x, if
comparsion operator is TRUE
Which (x %in% a) # return index which matches with a
choose (n,k)
# combinations of k events among n repetitions.
rank(x)
# ranking
round(x,3)
# round the element of x to 3 decimal places
order
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

H19

Lecture
Lecture
101212
-02
02

Basic Math and Statistics function


Basic Maths and Statistics functions:

sqrt(x)
sin(x), cos(x),
asin(x), acos(x)
log(x), log10(x), log(x,base)
exp(2)
max(x), min(x),
range(x)
sum(x)

# square root of the function


# trignometry functions
# inverse trignometry functions
# log
# exponential function
# min and max value
# range
# sum of x

mean(x)
median(x)
var(x)
sd(x)

# mean of the elements of x


# median of the elements of x
# variance of the element of x
# standard deviation of x

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

H20

Lecture
Lecture
101212
-02
02

Advance R Built-in function


functions:

abs(x)
ceiling(x)
floor(x)
trunc(x)
round(3.4578)
signif(3.4578)

Getting
BasicsStarted
of R programming
Programming
with R

# absolute values
# Ceiling
# floor
# trunc
# round, decimal place
# signif, significant digits

Alok Srivastava

H21

Lecture
Lecture
101212
-02
02

PracticeSession:3
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Exercise:1

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Exercise 1: Round off the number 3.543321 up to three


decimal place.
Exercise 2: Generate a sequence, x=seq(1,524,d). where
d is a random number between 2 to 9. Find
length(x)
sum(x)
cube root of x
extract 5, 7th element from vector x
extract 2nd to 5th element from vector x
create vector without 2nd to 5th element from
vector x
which elements of vector x are greater than 10
find a vector whose elements are greater than 10
find a vector whose elements are greater than 10
and less than 50
find: max, min, rev, sort, unique, range
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Exercise 3: Explore the commands


a=rep(2,5)
b=rep(3,7)
c=rep(4,2)
z2= c(a,b,c)
z= sample(z2) # analyze z
u = rle(z)
sort (z)
unique (z)
what sample command does?
attributes of u
analyze u # Interpret what rle does
mean, median, var,sd,
convert z into log scale
Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Exercise 4: Generate 20 replicate of TRUE sample


denoted by T, and 10 replicate of FALSE sample denoted
by F.
Exercise 5: Generate a vector z1 from 2 to 5, second
vector z3 from 12 to 15, and combine them into a new
vector z.
Exercise 6: Write the sequence expression for
5 10 15 20 25 30 35 40 45 50
Exercise 7: Generate a sequence start with 19 to 957,
with a difference of 17.
Exercise 8: Generate any 3X4 matrix using command
matrix
Exercise 9: Ceate 3 vectors, a,b,c of size 5, generate a
matrix using cbind and rbind, calculate the dimension of
matrix.

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Exercise 10: A class of 20 student, appeared for maths


and biology exam, secure marks between 20 to 90.
Genearte random marks, satisfying the above criteria in a
matrix, that contain First Row as name of the student as
S1, S2, ...., S20, and first column as the subject math1
and bio1 respectively.
Save the name of the students and marks of the student
who,
Secure more than 70 % marks in either of two
subjects, and
Fail in either of two.
Average marks secure by students in both subjects

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Dontforgetto
saveWorkspace..

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

THANK YOU .........

Alok Srivastava
Assistant Professor, CRRAO- AIMSCS, Hyderabad, INDIA
Date 8-01-15

Getting
BasicsStarted
of R programming
Programming
with R

Alok Srivastava

Lecture
Lecture
101212
-02
02

Potrebbero piacerti anche