Sei sulla pagina 1di 37

Topic 1: Introduction to R

WHAT IS R

INTRODUCTION

R is an open source statistical tool which not just manages data but also carries out alot of sophisticated
analytical processes as well.

Before looking at how R works, it is important to get a good overview of R.

So, here’s what will be covered in this tutorial:

 What is R?
 Why R?

WHAT IS R?

So, to begin let’s start with a very basic question. What is R?

 R as we already know is a statistical tool which is at par with other statistical


tools like SAS, SPSS and Python in terms of what it can do.
 R can manage and analyze data. It can execute all statistical techniques like liner regression,
logistical regression, forecasting, decision trees and any other technique that you can think of.

WHY R?

So what makes R stand out when compared to other statistical tools? Let us break it down.

1. Firstly, R can work with any type of data and can handle data of any size. So whether the data
you are working with is small or really big, R will be able to handle it.
2. R can work with data received in any type of file format, whether text, CSV, SASS and so on.
3. R offers really great visualization of data. It can connect with Google maps and
Motion charts.
4. Next – and this is what makes R so much more powerful than other statistical tools –it is open
source. Open source does not just mean that it can be used for free, but that anyone can
contribute to it as well.
5. R does not use much code, even if it is handling large volumes of data carrying out complicated
analytical techniques.
6. R being open source means anyone can contribute to it. This is why R has a huge community of
contributors who almost on a daily basis keep adding functionality to it. This is the reason why
even the most complicated techniques can be executed in R by just calling a function. So, when
using R we as users do not need to worry about how to perform a linear regression or a logistics
regression. The code to execute this and many other advanced analytical functions is already
built in and refined by those in the R community on a regular basis.

7. R is used by a lot of big corporations like Facebook, Google, Mozilla, Llyodsand Merck, among
others. This goes a long way in validating the capability of R and adds to its credibility.
INSTALLATION of R

INTRODUCTION

To start leveraging the power of R, it first needs to be installed.

So, here’s what will be covered in this tutorial:

 Installation of R
 Overview of a typical GUI or Graphic User Interface of R
 Installation of R Studio
 Overview of the GUI of R Studio

HOW CAN R BE INSTALLED?

To begin, let’s look at how to install R.

To install R click on the link displayed:

https://cran.r-project.org/bin/windows/base/

On opening this link different options to download R based on system configuration and operating
systems are available - like “R for 32bit system or R for a 64bitsystem or R for Windows.”

Download the version of R that is best suited to the operating system being used. When you update
your version of R, the earlier version is NOT automatically uninstalled. Further, R Studio allows you to
run multiple versions of R (though not in same session) Therefore in R Studio, find out which version of R
is running by typing R. Version().

INSTALLATION OF R STUDIO

A more user friendly option available to users is R Studio. It has a better GUI and

comes with more options. To take a more detailed look at R Studio, let us first install it.
To download R Studio, click on the link below

http://www.rstudio.com/ide/download/

There are 4 sections in R Studio.

A. The first section is the Editor section which is used to enter scripts or codes for R to execute.
B. The second section is the Console where output is displayed.
C. The data that is generated or being worked upon can be found in the Workspace section.
D. Files, packages and Help make up the fourth
WHY ARE DATA TYPES IMPORTANT?

To begin, it is important to understand why data types are useful and why it is necessary to be able to
distinguish between different types of data. Suppose, you have been asked to evaluate five different
brands of cars –let us call them Brand A, Brand B, Brand C, Brand D and Brand E. If you were asked to
calculate the mean of these five cars, how would you go about it? It most likely would be an impossible
operation to carry out because all you have is the name or the brand of these cars and as you know you
cannot calculate the mean of names! Now, the situation would have been different if you had some
numeric data about these cars. This emphasizes the need to understand the type of data you have to
work with because certain types of functions can be carried out on certain types of data. Like calculating
mean is not possible with character data types like names or brands.

DATA TYPES

Data can be of different types. The different types of data one would commonly

come across are:

Numeric:

Refers to any number or numeric value.

Eg: 1.2, 2.1 etc

Numeric data types include even decimals.

Integer:

Refers to any number without a fractional part.

Eg: 1, 2, 3…..

Logical:

Refers to any values which are either True or False.

Eg: if x = 1, y = 2, then x being greater than y is False

Character:

Refers to textual data.

Eg: learning, education….

Factor:

Refers to data in categories.

Eg: City, Gender

Each data type will now be discussed in some detail.

Numeric data types


Numeric data type is any number or numeric value like 2.1, 1.2 and so on. It could be

an integer or a decimal value.

In R Studio, to create a numeric data type the syntax :

y<-3.1 (or y is equal to 3.1) is

used. This means that a variable y is being created against which a numeric value of

3.1 is being stored. To indicate equal to we can either use the symbol <- or =

Integer data types

Integer data type indicates any data which stores integer values.

In R Studio, numeric data types can be converted to integer data types by using the

following syntax:

as.integer(numeric value)

Eg: as.integer (3.1)

Logical data types

Logical data type indicates any data where the value is either True or False, but never

both. In R Studio, the following syntax can be used to create a logical data type:

if x <-1, y<-2, then x > y is FALSE can we try this?

(x is equal to 1, y is equal to 2, then x being greater than y is false)

Character data types

Character data type stores characters or strings.

In R Studio, they have to be written within double quotes. For example, the text

learning would be written as “learning”.e.g Y<-"learning"


DATA STRUCTURES
WHAT IS A DATA STRUCTURE

A data structure in simple terms is a way of storing and organizing data. Let us understand this better
with the help of an example. Shown here is a table with different types of information stored in it.

When storing information of different types, it will need to be stored across more

than one variable. For eg, if the data to be stored relates to employee records, then the variables across
which this data would be stored would be Name, Age, Address, Nationality, Assessment scores and so
on. This collection of information displayed across different variables is referred to as a data structure.

WHAT IS THE DIFFERENCE BETWEEN DATA STRUCTURE AND DATA TYPE

A data structure is different from data type because of the number of values stored. Let’s look at this
with the help of an example. If a variable “Name” has been created, and a value “Bob” stored against it,
it will result in the creation of a character data type. In a data type only one value is stored.
But when different information related to Bob apart from his name, is stored, like his age, address,
nationality and assessment score then it results in the creation of a data structure. A data structure
stores more than one value. A simple way to look at a data structure is to think of an Excel sheet with
rows and columns where the columns are made up of different data types. In the example used, the
Name column will store character data types, the Age column will store integer data types, and the
Score column will store numeric data types and so on.

TYPE OF DATA STRUCTURE – VECTORS

The first type of data structure that will be discussed is referred to as Vectors. A Vector is like a column
in an Excel sheet. Going back to the example used earlier, Vectors would be Name, Age, Address,
Nationality and so on. In Vectors, all the elements within a Vector should be of the same data type.

Vectors cannot have a combination of data types! So, if Age is a Vector, then all the elements under age
should be of the data type integer. This Vector cannot have any other data type within it like character
or number, nor can they be a combination of data types.

So, Vector is therefore a data structure which contains elements of the same data type.
Visualize a single column in an Excel sheet which contains values of the same data type.

HOW TO CREATE A VECTOR IN R


In R Studio, a Vector can be created through a function known as c operator or concatenate. So, let’s
create a Vector called vector 1, and store 4 values in it. This vector will contain elements of the numeric
data type. To create this vector enter the code
vector1<-c(9,8,2,7)

MIXING UP DATA TYPES IN A VECTOR

Now let us look at something interesting. As discussed, a Vector can only contain elements of the same
data type. There can be no mixing of data types within a Vector. So what happens if a second Vector is
created and along with numeric data types, a character data type is inserted into it? Shown here, is the
code to create a new Vector called vector 2 with some values. Inserted into these values is a character
value “bob”.

vector2<-c(1,2,5,1,2,1,4,1,"bob",9,1)

When the contents of vector 2 are printed, all values in the Vector are displayed in the Console in
quotes. This indicates that by default R has converted all numeric data types in the Vector to character
data types by adding quotes to all the numbers. This is why R does not display any error on executing
this code!

ARITHMETIC FUNCTIONS BETWEEN VECTORS

It is also possible to carry out arithmetic functions between Vectors like addition, subtraction,
multiplication and division. The only pre requisite to execute these functions is that the data types in
each Vector should be of equal length. vector 1 and vector 2, are of numeric data type and have 4 values
each, which means they are both of the same length.

It is possible to carry out any type of arithmetic function on these 2 vectors such as

vector 1 + vector 2 or vector 1 – vector 2 and so on.

Let us enter the code

vector1 + vector2

and press Control + Enter.

Vector entries can also be calculations or previously stored items (including vectors themselves).

E.g:

myvec <- c(1,3,1,42)

myvec

[1] 1 3 1 42

foo <- 32.1

myvec2 <- c(3,-3,2,3.45,1e+03,64^0.5,2+(3-1.1)/9.44,foo)

myvec2

3.000000 -3.000000 2.000000 3.450000 1000.000000 8.000000

[7] 2.201271 32.100000

This code created a new vector assigned to the object myvec2. Some of the entries are defined as
arithmetic expressions, and it’s the result of the expression that’s stored in the vector. The last element,
foo, is an existing numeric object defined as 32.1.

Let's look at this example:

R> myvec3 <- c(myvec,myvec2)

R> myvec3

IDENTIFYING ELEMENTS IN A VECTOR


Another interesting feature in Vectors is referred to as indexing. This feature allows a particular element
in a Vector to be accessed.

For eg, we know that vector 1 contains 4 elements, 9, 8, 2 and 7. Let us suppose that we want to find out
the third element in vector 1 which is 2.

Let us enter the code

vector1 [3]

Entering 3 indicates that we want to access the third element of vector 1.We can see a value of 2
displayed in the console which as we know is the third element in vector 1.

Looking at these examples what do you think it's happening here:

1. myvec <- c(5,-2.3,4,4,4,6,8,10,40221,-8)

length(x=myvec)

[1] 10

myvec[1]

2. foo <- myvec[2]

foo

[1] -2.3

3. myvec[length(x=myvec)]

[1] -8

4. myvec.len <- length(x=myvec)

bar <- myvec[myvec.len-1]

bar

[1] 40221

Here R will do a count from 1 to whatever the length of myvec.len might be:

1:myvec.len

myvec[-1]
This line produces the contents of myvec without the first element.

R> baz <- myvec[-2]

R> baz

Similarly, the following code assigns to the object baz the contents of myvec without its second element

1. bar <- c(3,2,4,4,1,2,4,1,0,0,5)

bar

[1] 3 2 4 4 1 2 4 1 0 0 5

bar[1] <- 6

bar

[1] 6 2 4 4 1 2 4 1 0 0 5

Here you overwrites the first element of bar, which was originally 3, with a

new value, 6.

2. bar[c(2,4,6)] <- c(-2,-0.5,-1)

bar

[1] 6.0 -2.0 4.0 -0.5 1.0 -1.0 4.0 1.0 0.0 0.0 5.0

Here you overwrite the second, fourth, and sixth elements with -2, -0.5, and -1, respectively; all else
remains the same.

3. bar[7:10] <- 100

bar

[1] 6.0 -2.0 4.0 -0.5 1.0 -1.0 100.0 100.0 100.0 100.0 5.0

REPLACING CONTENTS IN A VECTOR

Now, let us suppose that we want to create a new Vector called new_vector. In this
new Vector we want to populate the same elements as vector 1 but without the second element. So in
new_vector we only want to store the first, third and fourth elements of vector 1.

Let us enter the code

new_vector<-vector1[-2]

Entering minus next to 2 indicates that we want to exclude the second element of vector 1 in
new_vector.

When the code is executed we can see in the Workspace section that the vector

new_vector has been created with three values of numeric data type

To view the contents of new_vector, enter the name of the vector In the console, 9,2 and 7 are
displayed. 8 is not displayed as it is the second element in vector 1 and hence has been excluded.

If a Vector has only three elements but if a value of 10 is being entered in square brackets, then it means
that we are trying to index elements that are greater than

what are actually present in the Vector. This situation is referred to as an Index out of Boundary.

Sequence Operator

Let us suppose that a Vector is to be created with some numbers, which are not

continuous but have some sort of order to it. An example would be 1,3,5,7,9 and so

on. To create this Vector, the Sequence Operator can be used.

Let’s create a Vector called Age and populate it with the values values 1,3,5,7,9 and

so on till 101. To do this, enter the code

age<-(1,101,2)

In the code entered, 1 represents the start point, 101 represents the end point and 2 represents how
the numbers should increment.

Sequences, Repetition, Sorting, and Lengths in R

Here I’ll discuss some common and useful functions associated with R vectors: seq, rep, sort, and length.
Let’s create an equally spaced sequence of increasing or decreasing numeric values. This is something
you’ll need often, for example when programming loops

R> 3:27

The example 3:27 should be read as “from 3 to 27 (by 1).”

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Looking at this E.g what do you think is happening?


R> foo <- 5.3

R> bar <- foo:(-47+1.5)

R> bar

Sequences with seq

You can also use the seq command, which allows for more flexible creations of sequences. This ready-to-
use function takes in a from value, a to value, and a by value, and it returns the corresponding sequence
as a numeric vector.

R> seq(from=3,to=27,by=3)

[1] 3 6 9 12 15 18 21 24 27

E.g2

R> seq(from=3,to=27,length.out=40)

By setting length.out to 40, you make the program print exactly 40 evenly spaced numbers from 3 to 27

Repetition with rep

Sometimes you may want simply to repeat a certain value. You do this using rep.

R> rep(x=1,times=4)

[1] 1 1 1 1

R> rep(x=c(3,62,8.3),times=3)

[1] 3.0 62.0 8.3 3.0 62.0 8.3 3.0 62.0 8.3

R> rep(x=c(3,62,8.3),each=2)

[1] 3.0 3.0 62.0 62.0 8.3 8.3

R> rep(x=c(3,62,8.3),times=3,each=2)

[1] 3.0 3.0 62.0 62.0 8.3 8.3 3.0 3.0 62.0 62.0 8.3 8.3 3.0 3.0 62.0

[16] 62.0 8.3 8.3

Can you explain what happening above?

Another example I want us to consider is below:

R> foo <- 4

R> c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))

[1] 3.00 8.30 32.00 32.00 32.00 32.00 -2.00 -1.25 -0.50 0.25 1.00
Sorting with sort

Sorting a vector in increasing or decreasing order of its elements is another simple operation that crops
up in everyday tasks. The conveniently named

sort function does just that.

R> sort(x=c(2.5,-1,-10,3.44),decreasing=FALSE)

[1] -10.00 -1.00 2.50 3.44

R> sort(x=c(2.5,-1,-10,3.44),decreasing=TRUE)

[1] 3.44 2.50 -1.00 -10.00

R> foo <- seq(from=4.3,to=5.5,length.out=8)

R> foo

[1] 4.300000 4.471429 4.642857 4.814286 4.985714 5.157143 5.328571 5.500000

R> bar <- sort(x=foo,decreasing=TRUE)

R> bar

[1] 5.500000 5.328571 5.157143 4.985714 4.814286 4.642857 4.471429 4.300000

Finding a Vector Length with length

This functions allows us to determine how many entries exist in a vector given as the argument x.

R> length(x=c(3,2,8,1))

[1] 4

R> length(x=5:13)

[1] 9

R> foo <- 4

R> bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))

R> length(x=bar)

[1] 11

Practice corner
Datasets and Data Structures In R

The first step in any data analysis is the creation of a dataset containing the information
to be studied, in a format that meets your needs. In R, this task involves the
following:

 Selecting a data structure to hold your data


 Entering or importing your data into the data structure

PatientID AdmDate Age Diabetes Status


1 1 10/15/2009 25 Type1 poor
2 11/01/2009 34 Type2 Improved
3 10/21/2009 28 Type1 Excellent
4 10/28/2009 52 Type1 Poor
The above is a sample dataset
Data structures are used for holding data/object in R and some of them include vectors, factors,matrices,
data frames, and lists. They differ in the type of data they hold and how they are created.
Vectors
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical
data. The combine function c() is used to form the vector. Here are examples of
each type of vector:
a <- c(1, 2, 5, 3, 6, -2, 4)
b <- c("one", "two", "three")
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
Here, a is numeric vector, b is a character vector, and c is a logical vector. Note that thedata in a vector
must only be one type or mode (numeric, character, or logical). Youcan’t mix modes in the same vector.
NOTE Scalars are one-element vectors. Examples include f <- 3, g <- "US" andh <- TRUE. They’re used
to hold constants.
You can refer to elements of a vector using a numeric vector of positions within brackets.
For example, a[c(2, 4)] refers to the 2nd and 4th element of vector a. Here areadditional examples:
> a <- c(1, 2, 5, 3, 6, -2, 4)
> a[3]
[1] 5
> a[c(1, 3, 5)]
[1] 1 5 6
> a[2:6]
[1] 2 5 3 6 -2
The colon operator used in the last statement is used to generate a sequence of numbers.
For example, a <- c(2:6) is equivalent to a <- c(2, 3, 4, 5, 6).
Matrices
A matrix is a two-dimensional array where each element has the same mode (numeric,or logical).
Matrices are created with the matrix function .
myymatrix<- matrix(vector, nrow=number_of_rows, ncol=number_of_columns,
byrow=logical_value, dimnames=list(
char_vector_rownames, char_vector_colnames))
where vector contains the elements for the matrix, nrow and ncol specify the row and column dimensions, and
dimnames contains optional row and column labels stored in character vectors. The option byrow indicates whether
the matrix should be filled in by row (byrow=TRUE) or by column (byrow=FALSE). The default is by column.

1. y <- matrix(1:20, nrow=5, ncol=4)


>y
2. > cells <- c(1,26,24,68)
>rnames<- c("R1", "R2")
>cnames<- c("C1", "C2")
>mymatrix<- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
dimnames=list(rnames, cnames))
>mymatrix
3. mymatrix<- matrix(cells, nrow=2, ncol=2, byrow=FALSE,
dimnames=list(rnames, cnames))

You can identify rows, columns, or elements of a matrix by using subscripts and brackets. X[i,] refers to the ith row
of matrix X, X[,j] refers to jth column, and X[i, j]
refers to the ijth element, respectively. The subscripts i and j can be numeric vectors inorder to select multiple rows
or columns.

1. > x <- matrix(1:10, nrow=2)


> x[2,]
[1] 2 4 6 8 10

2. > x[,2]
[1]
34
3. > x[1,4]
[1]
7
4. > x[1, c(4,5)]
[1]
79

Arrays
Arrays are similar to matrices but can have more than two dimensions. They’re created
with an array function of the following form:myarray<- array(vector, dimensions, dimnames)
where vector contains the data for the array, dimensions is a numeric vector givingthe maximal index for
each dimension, and dimnames is an optional list of dimensionlabels.
1. a <- array(0, dim=c(4,5))
b <- a
a*b
a*5
diag(a)
2. a <- array(0, dim=c(4,5))
b <- a
a*b
a*5
diag(a)

3. x <- array(1:24, c(3, 4, 2) (3rows,4cols,2layers)

> x <- array(1:24, c(3, 4, 2))


>x
,,1

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

,,2

[,1] [,2] [,3] [,4]


[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24

> x[1, 3, 2]
[1] 19

Data frames
A data frame is more general than a matrix in that different columns can contain different modes of
data (numeric, character, etc.). It’s similar to the datasets you’d typically see in SAS, SPSS, and Stata.
Data frames are the most common data structure you’ll deal with in R.
A data frame is created with the
data.frame() function :
mydata<- data.frame(col1, col2, col3,…)where col1, col2, col3, … are column vectors of any type
(such as character, numeric,or logical). Names for each column can be provided with the names
function.
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata<- data.frame(patientID, age, diabetes, status)
patientdata[1:2]
You can access elements using column names as shown blow:
patientdata[c("diabetes", "status")]

The $ notation is used to indicate a particular variable from a dataset as shown below.
>patientdata$age
>table(patientdata$diabetes, patientdata$status)

Factor
In statistics we know there is categorical variables which takes a limited number of variables. In R
there is a specific data structure for this called factor. A good example of a categorical variable is a
person’s blood type which could be A, B, AB or O

1. blood <- c("B", "AB", "O", "A", "O", "O", "A", "B")
blood
blood_factor<- factor(blood)
levels = A, B, AB & O
R looks at the variable and brings out the distinct objects as you can see.R also sorts it in
alphabetical order and it convert the character values to integer values . Inspecting the values we
use this
Str(blood_factor)
You can alter the levels by specifying it in the factor function like this
> blood_factor2 <- factor(blood,
levels = c("O", "A", "B", "AB"))
using str(blood_factor2)
gives a different output.
Changing the label of your factor is also possible by doing this
levels(blood_factor) <- c("BT_A", "BT_AB", "BT_B", BT_O")
>blood_factor
Ordered Factor: This allows you to do some comparison within your factor. Unlike nominal
factor which we did earlier you can’t really compare, you enable this on the factor by setting the
ordered=true. See example below
tshirt<- c("M", "L", "S", "S", "L", "M", "L", "M")
tshirt_factor<- factor(tshirt, ordered = TRUE,
levels = c("S", "M", "L"))
tshirt_factor
tshirt_factor[1] <tshirt_factor[2]

LIST
Lists are the most complex of the R data types. Basically, a list is an ordered collection
of objects (components). A list allows you to gather a variety of (possibly unrelated)
objects under one name. For example, a list may contain a combination of vectors,
matrices, data frames, and even other lists. You create a list using the list() function :

mylist<- list(object1, object2, …)


where the objects are any of the structures seen so far. Optionally, you can name the
objects in a list:
mylist<- list(name1=object1, name2=object2, …)
1. > g <- "My First List"
> h <- c(25, 26, 18, 39)
> j <- matrix(1:10, nrow=5)
> k <- c("one", "two", "three")
>mylist<- list(title=g, ages=h, j, k)
>mylist
In this example, mylist[[2]] and mylist[["ages"]]both refer to the same four-element numeric
vector.

Data Input in R
In this section we would be examining the various ways you input data from various data sources
into R.
Among the ways you input into R are
1. Entering data from keyboard
This method of data entry is from the keyboard. The edit() function in R will invoke a text editor
that will allow you to enter your data manually.
Here are the steps involved:
 Create an empty data frame (or matrix) with the variable names and modes
you want to have in the final dataset.
 Invoke the text editor on this data object, enter your data, and save the results
back to the data object.
In the following example, you’ll create a data frame named mydata with three variables:
age (numeric) , gender (character) , and weight (numeric) . You’ll then invoke
the text editor, add your data, and save the results.

mydata<- data.frame(age=numeric(0),
gender=character(0), weight=numeric(0))
mydata<- edit(mydata)

2. Importing data from delimited text file


You can import data from delimited text files using read.table() , a function that
reads a file in table format and saves it as a data frame. Here’s the syntax:
mydataframe<- read.table(file, header=logical_value,
sep="delimiter", row.names="name")
where file is a delimited ASCII file , header is a logical value indicating whether
the first row contains variable names (TRUE or FALSE), sep specifies the delimiter separating
data values, and row.names is an optional parameter specifying one or
more variables to represent row identifiers.
For example, the statement
grades <- read.table("studentgrades.csv", header=TRUE, sep=",",
row.names="STUDENTID")

NB: there are other methods through which R connects to other data sources but this won’t be
covered in this class.

Variable Label & Value Label


Variable Label
The second column, named age, contains the ages at which individuals were first
hospitalized. The code
names(patientdata)[2] <- "Age at hospitalization (in years)"
renames age to "Age at hospitalization (in years)"
Value Label
The factor() function can be used to create value labels for categorical variables.
Continuing our example, say that you have a variable named gender, which is coded 1
for male and 2 for female. You could create value labels with the code
patientdata$gender<- factor(patientdata$gender,
levels = c(1,2),
labels = c("male", "female"))
Here levels indicate the actual values of the variable, and labels refer to a character
vector containing the desired labels

Some Functions in R
length(object) Number of elements/components.
dim(object) Dimensions of an object.
str(object) Structure of an object.
class(object) class or type of an object.
mode(object) How an object is stored.
names(object) Names of components in an object.
c(object, object,...) Combines objects into a vector.
cbind(object, object, ...) Combines objects as columns.
rbind(object, object, ...) Combines objects as rows.
object Prints the object.
head(object) Lists the first part of the object.
tail(object) Lists the last part of the object.
ls() Lists current objects.
rm(object, object, ...) Deletes one or more objects. The statement
rm(list = ls()) will remove most objects
from the working environment.
newobject<- edit(object) Edits object and saves as newobject.
fix(object) Edits in place.
Getting into Graphs in R
Human beings we are quick at discerning relationships from visual representations. A well-crafted graph
can help you make meaningful comparisons among thousands of pieces of information, extracting
patterns not easily found through other methods. This is one reason why advances in the field of statistical
graphics have had such a major impact on data analysis.

attach(mtcars) #attach the dataframemtcars


plot(wt, mpg)# generates a scatter plot between automobile weight on the x-axis and miles per
gallon on the y-axis
abline(lm(mpg ~ wt))#adds a line ofbest fit
title("Regression of MPG on Weight")
detach(mtcars)#dettachdataframe
In R its possible for your= to save your graph output, you do that by embedding your code in between like
this below
pdf("mygraph.pdf")
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg~wt))
title("Regression of MPG on Weight")
detach(mtcars)
dev.off()
Doing this saves your graph as a PDF format in your current working directory. In addition to pdf(), you
can use the functions win.metafile(), png(), jpeg(),bmp(), tiff(), xfig(), and postscript() to save graphs in
other formats.

Modifying a simple graph


How do you modify and enhance your graph to meet your needs? We would use the table below to
demonstrate how to
Dosage Response to Drug A Response to Drug B
Dosage Response To Drug A Response To Drug B
20 16 15
30 20 18
40 27 25
45 40 31
60 60 40

Plotting dose on x-axis and drugA on y-axis


dose <- c(20, 30, 40, 45, 60)
drugA<- c(16, 20, 27, 40, 60)
drugB<- c(15, 18, 25, 31, 40)
plot(dose, drugA, type = "b")

Graphical parameters
You can customize many features of a graph (fonts, colors, axes, titles) through options called graphical
parameters.
One way is to specify these options is through the par() function. Values set in this manner will be in
effect for the rest of the session or until they’re changed.
par(optionname=value, optionname=value, ...). Specifying par()without parameters produces a list of the
current graphical settings. Adding theno.readonly=TRUE option produces a list of current graphical
settings that can bemodified.

opar<- par(no.readonly = TRUE)


par(lty = 2, pch = 17)
plot(dose, drugA, type = "b")
par(opar)
Now let’s plot this graph and see the output
plot(dose, drugA, type="b", lty=3, lwd=3, pch=15, cex=2)

Lets plot another graph with colors


#customizing the graph now
temperature <-
c(0.0002,0.0012,0.0060,0.0300,0.0900,0.2700,0.7500,1.8500,4.2000,8.8000,17.3000,34.2800,99.0089,11
0.0235,146.0216,228.0026,360.0027,440.2589,806.0000
)
pressure <- c(0,20,40,60,80,100,120,140,160,180,200,220,240,260,280,300,320,340,360)
plot(temperature,pressure,
xlab = "Temperature",
ylab = "Pressure",
main = "T vs P for Testing",
type = "o",#this can also be "l"
col = "red",
col.main = "darkgray",
cex.axis = 0.8,#the x-axis and y-axis number fonts
lty = 5,
pch = 4)

Text Formatting in R
There are various functions in R that can be used when working with text formatting in R. Below are a
list of such.

For example, all graphs created after the statement


par(font.lab=3, cex.lab=1.5, font.main=4, cex.main=2)
will have italic axis labels that are 1.5 times the default text size, and bold italic titles
that are twice the default text size.
Using graphical parameters to control graph appearance
dose <- c(20, 30, 40, 45, 60)
drugA<- c(16, 20, 27, 40, 60)
drugB<- c(15, 18, 25, 31, 40)
opar<- par(no.readonly=TRUE)# save the current graphical parameter settings(so that you can restore
them later)
par(pin=c(2, 3))#modify the default graphical parameters so that graphs will be 2 inches wide by 3 inches
tall
par(lwd=2, cex=1.5)#lines will be twice d default width and symbols will be 1.5 times the default size
par(cex.axis=.75, font.axis=3)
plot(dose, drugA, type="b", pch=19, lty=2, col="red")
plot(dose, drugB, type="b", pch=23, lty=6, col="blue", bg="green")
par(opar)
Adding text, customized axes, and legends
Many high-level plotting functions (for example, plot, hist, boxplot) allow you to include axis and text
options, as well as graphical parameters. For example, the following adds a title (main), subtitle (sub),
axis labels (xlab, ylab), and axis ranges (xlim,ylim).

plot(dose, drugA, type="b",


col="red", lty=2, pch=2, lwd=2,
main="Clinical Trials for Drug A",#main title
sub="This is hypothetical data",#subtitle
xlab="Dosage", ylab="Drug Response",
xlim=c(0, 60), ylim=c(0, 70))#adds the x and y axis range, try changing it.
NoTe: not all functions allow you to add these option in the plot() Some high-level plotting functions
include default titles and labels. You can remove them by adding ann=FALSE in the plot() statement or in
a separate par() statement

Title
Use the title() function to add title and axis labels to a plot. The format is
title(main="main title", sub="sub-title",
xlab="x-axis label", ylab="y-axis label")
Graphical parameters (such as text size, font, rotation, and color) can also be specified
in the title() function. For example, the following produces a red title and a blue
subtitle, and creates green x and y labels that are 25 percent smaller than the default
text size:
title(main="My Title", col.main="red",
sub="My Sub-title", col.sub="blue",
xlab="My X label", ylab="My Y label",
col.lab="green", cex.lab=0.75)

Axes
Rather than using R’s default axes, you can create custom axes with the axis() function.
The format is
axis(side, at=, labels=, pos=, lty=, col=, las=, tck=, ...)
x <- c(1:10)
y <- x
z <- 10/x
opar <- par(no.readonly=TRUE)
par(mar=c(5, 4, 4, 8) + 0.1)
plot(x, y, type="b",
pch=21, col="red",
yaxt="n", lty=3, ann=FALSE)
lines(x, z, type="b", pch=22, col="blue", lty=2) #line()statement let you add new graph elements to an
existing graph.
axis(2, at=x, labels=x, col.axis="red", las=2)
axis(4, at=z, labels=round(z, digits=2),
col.axis="blue", las=2, cex.axis=0.7, tck=-.01)
mtext("y=1/x", side=4, line=3, cex.lab=1, las=2, col="blue") #mtext() function is used to add text to the
margins of the plot
title("An Example of Creative Axes",
xlab="X values",
ylab="Y=X")
par(opar)
Minor Tick Marks
To create minor tick marks, you’ll need the minor.tick() function in
the Hmisc package. If you don’t already have Hmisc installed, be sure to install it first
install.package("Hmisc") You can add minor tick marks with the code
library(Hmisc)
minor.tick(nx=n, ny=n, tick.ratio=n)
where nx and ny specify the number of intervals in which to divide the area between major tick marks on
the x-axis and y-axis, respectively. tick.ratio is the size of the minor tick mark relative to the major tick
mark. The current length of the major tick mark can be retrieved using par("tck"). For example, the
following statement will add one tick mark between each major tick mark on the x-axis and two tick
marks between each major tick mark on the y-axis:
minor.tick(nx=2, ny=3, tick.ratio=0.5)
The length of the tick marks will be 50 percent as long as the major tick marks.

Reference lines
The abline() function is used to add reference lines to our graph. The format is
abline(h=yvalues, v=xvalues)
Other graphical parameters (such as line type, color, and width) can also be specified
in the abline() function. For example:
abline(h=c(1,5,7))
adds solid horizontal lines at y = 1, 5, and 7, whereas the code
abline(v=seq(1, 10, 2), lty=2, col="blue")
adds dashed blue vertical lines at x = 1, 3, 5, 7, and 9.

Legend
When more than one set of data or group is incorporated into a graph, a legend can
help you to identify what’s being represented by each bar, pie slice, or line. A legend
can be added (not surprisingly) with the legend() function. The format is
legend(location, title, legend, ...)

Other common legend options include bty for box type, bg for background color, cex for size, and text.col
for text color. Specifying horiz=TRUE sets the legend horizontally rather than vertically. For more on
legends, see help(legend).

Text Annotation
Text can be added to graphs using the text() and mtext() functions. text() places
text within the graph whereas mtext() places text in one of the four margins. The
formats are
text(location, "text to place", pos, ...)
mtext("text to place", side, line=n, ...)
attach(mtcars)
plot(wt, mpg,
main="Mileage vs. Car Weight",
xlab="Weight", ylab="Mileage",
pch=18, col="blue")
text(wt, mpg,
row.names(mtcars),
cex=0.6, pos=4, col="red")
detach(mtcars)

Here we’ve plotted car mileage versus car weight for the 32 automobile makes provided
in the mtcars data frame. The text() function is used to add the car makes to the right of each data point.
The (wt,mph)represent the location or point while the row.name(mtcar) displays the row names which
happen to be the maker of cars. The point labels are shrunk by 40 percent and presented in red
Combining Graphs
R makes it easy to combine several graphs into one overall graph, using either the par() or layout()
function. Using the par() function, you can include the graphical parameter mfrow=c(nrows,ncols) to
create a matrix of nrows x ncols plots that are filled in by row. Alternatively, you can use mfcol=c(nrows,
ncols) to fill the matrix by columns.

attach(mtcars)
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,2))
plot(wt,mpg, main="Scatterplot of wt vs. mpg")
plot(wt,disp, main="Scatterplot of wt vs disp")
hist(wt, main="Histogram of wt")
boxplot(wt, main="Boxplot of wt")
par(opar)
detach(mtcars)

#another multiple graph


attach(mtcars)
opar <- par(no.readonly=TRUE)
par(mfrow=c(3,1))
hist(wt)
hist(mpg)
hist(disp)
par(opar)
detach(mtcars)
Multiple Graph Using the layout()
The layout() function has the form layout(mat) where mat is a matrix object specifying the location of the
multiple plots to combine.
#multiple graph using the layout()
attach(mtcars)
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))#we put the matrix obj ad define the col nrows
hist(wt)
hist(mpg)
hist(disp)
detach(mtcars)

Potrebbero piacerti anche