Sei sulla pagina 1di 22

Statistical Analysis With

R
Introduction to R
Introduction
 Developed specifically for statistical analysis, R is a computer
language that implements many of the analytical tools
statisticians have developed for decision-making.

 An important aspect of statistical analysis is to present the


results in a comprehensible way. For this reason, graphics is a
major component of R.

 Ross Ihaka and Robert Gentleman developed R in the 1990s at


the University of Auckland, New Zealand. Supported by the
Foundation for Statistical Computing, R is getting more and
more popular by the day.

 RStudio is an open source integrated development


environment (IDE) for creating and running R code. It’s
available in versions for Windows, Mac, and Linux. Although you
don’t need an IDE in order to work with R, RStudio makes life a
lot easier.
Downloading R
 Download R from the Comprehensive R Archive Network
(CRAN). In your browser, type this address if you work in
Windows:
cran.r-project.org/bin/windows/base/

 Type this one if you work on the Mac:


cran.r-project.org/bin/macosx/

 Click the link to download R. This puts the win.exe file in your
Windows computer, or the .pkg file in your Mac. In either
case, follow the usual installation procedures. When
installation is complete, Windows users see an R icon on
their desktop, Mac users see it in their Application folder.
Downloading RStudio
 Here’s the URL:
www.rstudio.com/products/rstudio/download

 Click the link for the installer for your computer, and again follow
the usual installation procedures.

 After the RStudio installation is finished, click the RStudio icon to


open the window shown in the next figure
RStudio History tab tracks R code that you enter

You can run the R code in this console Environment tab keeps track of things you create (objects)

• Files tab
shows files
you create

• Plots tab
holds graphs
you create
from your
data.

• The
Packages
tab shows
add-ons
(called
packages)
you
downloaded
as part of the
The Help tab, provides links to a wealth of information about R and R installation.
Rstudio.
RStudio
click the larger of the two icons in the upper right corner of the Console
pane. That changes the appearance of RStudio so that it looks like:

• The new pane in the upper


left is the Scripts pane. You
type and edit code in the
Scripts pane
• Press Ctrl+R
(Command+Enter on the
Mac), and then the code
executes in the Console
pane.
• Ctrl+Enter works just like
Ctrl+R
A Session with R
 Before you start working, select

File ⇒ Save As …

 save as “My First R Session”


 It rename the tab in the Scripts pane with the name of the file and adds the
.R extension.
 This also causes the filename (along with the .R extension) to appear on the
Files tab.

 If you ever forget the path to your working directory, type:


> getwd() …. R returns the path on screen
Inserting comments
 To insert a Comment line (describe next code line),
insert # at beginning of the line. Will not execute

 to jump out of multiline code, use the ESC key

 To copy your console code for a class assignment, use


COPY in RSTUDIO and PASTE in WORD docx
Assigning Variables
x < - 15 (assigns x the value of 15)

y < - “test” (assigns y the value test)

Find data type:


Class(x) (returns the data type for x)

Basic data types:


 Numerics – 4.5
 Integers – 8
 Characters – Test (or string)
 Logical – Boolean (True or False)

 Note: R is case sensitive.


Vectors
 Creating a vector: use the concatenate function c()

X< - c(3,4,5) #Numeric vector


You can read that line of R code as “x gets the vector 3, 4, 5.

Y< - c(“a”, “b”, “c”) #Character vector


Z< - c(TRUE, FALSE) #Boolean vector

 Type x into the Scripts pane and press Ctrl+R, and here’s what you see
in the Console pane:
>x
[1] 3 4 5

The 1 in square brackets is the label for the first value in the line of output. Here
you have only one value, of course.
What happens when R outputs many values over many lines? Each line gets a
bracketed numeric label, and the number corresponds to the first value in the
line. For example, if the output consists of 21 values and the 18th value is the
first one on the second line, the second line begins with [18].
Vectors
 Now you can work with x.
 Add all numbers in the vector. Typing
sum(x)
 Take the average of the numbers in the vector x
mean(x)
 Ctrl+R: executes to
> mean(x)
[1] 4
 variance is a measure of how much a set of numbers
differs from their mean.
> var(x)
[1] 1
Vectors
 Adding vectors: (element-wise sum)
a < - c(2,5,7)
b < - c(3,6,8)
c<-a+b
c
[1] 5 11 15

 Comparing vectors:
a>b
[1] FALSE FALSE FALSE

 Select specific elements of a vector:


d < - c(7,8,9,10,11,12,14)
d [c(1,5)] #returns the first and fifth elements of vector d
[1] 7 11
Vectors
 Select a range of elements of a vector:
> d[2:5]
[1] 8 9 10 11

 Assigning names to the elements of a vector


> Test_vector <- c("Devin","Engineer")
> names(Test_vector)<-c("Name","Profession")
> Test_vector
Name Profession
"Devin" "Engineer"

 creating a vector with continuous integer values 1:15


> demo2<- seq(1:15)
> demo2
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Vectors:
 creating a vector containing a repeating sequence of 2,5,9 three times
> demo3<-rep(c(2,5,9),times=3)
> demo3
[1] 2 5 9 2 5 9 2 5 9

 Creating a vector of text names, not numerical content. (Note that each text value is enclosed
by quotation marks.)
>names<-c(“one”, “two”, “three”)
> names
[1] "one" "two" "three"

 Adding the value “5” to each current value in demo1 vector.


> demo1 <- c(1,3,45,67,4,78)
> demo4<-demo1+5
> demo4
[1] 6 8 50 72 9 83

 SUMs the content of demo4 vector.


> sum(demo4)
[1] 228

 adding the respective numeric values of demo8 and demo1 vectors.


> demo8<-c(4,7,5,2,7,9)
> demo10<-demo1+demo8
> demo10
[1] 5 10 50 69 11 87
Vectors
 concatenating demo8 vector to demo1 vector
> demo11<-c(demo1,demo8)
> demo11
[1] 1 3 45 67 4 78 4 7 5 2 7 9

 displaying the number of items contained in demo11


> length(demo11)
[1] 12

 displaying the value of element position 3 in demo11. Note use of square bra
ckets here.
> demo11[3]
[1] 45

 changing the value of demo11 item position 3 to “99”


> demo11[3]<-99
> demo11
[1] 1 3 99 67 4 78 4 7 5 2 7 9

 calculating the square root of the 3rd value in demo11


> sqrt(demo11[3])
[1] 9.949874
Missing Data
 Oftentimes, you encounter data sets that have values missing for one
reason or another. R denotes a missing value as NA (for Not Available).

 For example, here is some data (from a much larger data set) on the
luggage capacity, in cubic feet, of nine vehicles:
capacity <- c(14,13,14,13,16,NA,NA,20,NA)

 Three of the vehicles are vans, and the term luggage capacity doesn’t
apply to them — hence, the three instances of NA. Here’s what
happens when you try to find the average of this group:

> mean(capacity)
[1] NA
 To find the mean, you have to remove the NAs before you calculate:

> mean(capacity, na.rm=TRUE)


[1] 15
Plot, BoxPlot, Histogram
 Creating a scatter plot of demo11 values displayed in the PLOT area in the lower right corne
r of Rstudio screen.

> plot(demo11)

 Creating a histogram of dmo8 values. NOTE that this histogram will OVERWRITE the prior
Plot. Thus, you should use the PC Snipping tool to copy/save the Plot image before creating
another graphic image.

> hist(demo8)

 creating a boxplot of demo11 values The xlab phrase displays a label for the X axis.

>boxplot(demo11, xlab = “My Creation”)

 Creating a histogram of demo11 with rainbow colors

>hist(demo11, col=rainbow(12))

> graphdemo<-c(2,4,6,8,9,3,4,7,11,14,18,20,9,6,4,4,11,14,20)
> graphdemo
[1] 2 4 6 8 9 3 4 7 11 14 18 20 9 6 4 4 11 14 20
> plot(graphdemo)
> boxplot(graphdemo)
> hist(graphdemo, xlab="my Monet”, col = rainbow(10))
BoxPlot
 If there are any outliers and you don’t want to show the outliers, you
can set the outline = FALSE :
test <- c(142,23,41,10,7,93,17,174,420,13)
boxplot(test, outline =FALSE)

 If you want to extend the range of the whiskers and suppress the
outliers inside this range, set range= 0
boxplot(test, range = 0)

boxplot(test) boxplot(test, outline=FALSE) boxplot(test, range=0)


BoxPlot
 If you want your BoxPlot to layout horizontally, set horizontal=TRUE
boxplot(test, horizontal = TRUE)

 You can also play with y axis limit: ylim= c(0,500)


boxplot(test, ylim=c(0,500), horizontal = TRUE)

boxplot(test, horizontal = TRUE) boxplot(test, ylim=c(0,500), horizontal = TRUE)


Histogram
 Inorder to change your bin width use the following :
test <- c(142,23,41,10,7,93,17,174,420,13)
my.bin.width<-50
hist(test,breaks=seq(0,500,by=my.bin.width))
Ending a session
 To end a session, select File ⇒    Quit Session or press
Ctrl+Q.
 As Figure 2-7 shows, a dialog box opens and asks what you
want to save from the session. Saving the selections enables
you to reopen the session where you left off the next time you
open Rstudio.

Potrebbero piacerti anche