Sei sulla pagina 1di 7

Getting Started with R

Computer Lab 01 of
CEE 6601 Statistics in Transportation, Fall 2013
Lab Hours: 10:05-11:55, Fri.
TA contact: yzhou81@gatech.edu
August 20, 2013
About Labs
The Friday labs for Course CEE 6601 are going unsupervised unless being otherwise notied. If you
have any question, you may visit TA Joy Zhou at oce SEB207 during the lab hours or email her by
yzhou81@gatech.edu.
We are going to have six labs for this semester. The software R is required to be used for the labs. Since one
of the objectives of this course is to master the R, so you are required to do your labs with the R. The rst
lab is the introduction to the operation environment of R and you need to get familiar with the structure of
R commands in this lab. You have only one lab problem in this lab, and you need to submit it by Sep 6th.
In other labs, you will normally have 3-4 problems to solve for each lab.
What is R?
R is a language and environment for statistical computing and graphics.
Statistical Techniques provided by R: linear and nonlinear modeling, classical statistical tests, time-
series analysis, classication, clustering, etc.
R is available as Free Software under the terms of the Free Software Foundations GNU General Public
License in source code form.
The functions of R can be extended by using contributed packages.
(Source: www.r-project.org)
Installing R
1. R 3.01 Basic environment:
Visit the website of R at: www.r-project.org, and click the CRAN under the Download,
Packages tab in the left column. And then choose any working CRAN mirror from the list in
the USA.
OR
For convenience, you can also do it by typing any of the following address in your web browser to
access a CRAN mirror:
http://cran.cnr.berkeley.edu/
http://cran.stat.ucla.edu/
http://streaming.stat.iastate.edu/CRAN/
2. Installing Extra Packages:
1
After successfully installing R, run R 3.01 on your computer. You should see a operation window
as shown in Figure (1).
Click the Packages from the menu bar, and then choose Installing package(s). A dialog pops
out and you can select a download mirror from the list. You can choose any USA mirror and click
OK. A new dialog comes up and you can select the desired package(s) from the list. Choose
IPSUR and click OK. Thus, you have completed of the installation of package IPSUR on
R.
To use the installed package, you further need to type in the following command in the R
Console window:
> library(IPSUR)
> read(IPSUR)
Figure 1: The operating environment for R 3.01 (in Windows 7)
Basic Statistics in R
1. Input data vectors:
> a<-c(1,2,3,4,5)
2
> a
[1] 1 2 3 4 5
Notice that we use symbols < to assign the vector to the variable a. The symbols < looks
like a right arrow, and it means that the quantity on its right is assigned to the variable on its left. In
correspondence, the symbols > means exactly the reverse (i.e. assign the quantity on its left to
the variable on its right). The above command has the same eect with
> c(1,2,3,4,5)->a
(However, you can also use a very simple symbol = to assign values, but using the arrow symbols can
avoid confusions in R.)
Also notice that dierent from MATLAB, the assign command wont automatically show you the value
of the variable. You must type the variable in a separate line to see its value.
We now show how to input a matrix. We rst have to write a matrix as a single vector, going down
the elements column by column. And then set the number of rows. It can be fullled in one command
line:
> a<-matrix(c(1,2,3,4,5,6),nrow=3)
> a
[ ,1] [ ,2]
[1, ] 1 4
[2, ] 2 5
[3, ] 3 6
2. Generate repeated data vectors:
> a<-seq(from=1,to=5)
> a
[1] 1 2 3 4 5
> a<-seq(from=1,to=5,by=0.5)
> a
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
> a<-seq(from=1,by=0.5,length.out=4)
> a
[1] 1.0 1.5 2.0 2.5
3. Import text data into variable
> a<-read.table(c:/download/2013sta/testsample.dat,header=FALSE, sep= )
> a
V1 V2 V3
1 2 6 9
2 4 5 8
3 6 5 7
4 2 3 1
5 9 6 6
4. Save/Load data
We rst create some variables and save one of them to a text le test.dat, so we can use this variable
later.
> a<-c(1,2,3,4,5,6)
> save(a,le=test.dat)
> b<-c(3,3,2,1)
> c<-c(4,6,8,10,12)
Now we use the following command to see all the active variables available in the computer memory.
We can see variable a in the list.
> ls()
[1] a b c
Then we clear the memory by using the following command:
> rm(list=ls())
Now we check the variable in the memory again and nd that there is no variables in storage.
> ls()
character(0)
3
We need to load the stored value for variable a. We do this by:
> load(test.dat)
> a
[1] 1 2 3 4 5 6
Of course, you can choose to save all the active variables in the memory to a single le by:
> save(list=ls(),le=test.dat)
5. Extract elements from a vector:
> a<-c(3,2,1,6,5,4)
> a
[1] 3 2 1 6 5 4
Inquire the third element in vector variable a.
> a[3]
[1] 1
Inquire the third to the fth elements in vector variable a.
> a[3:5]
[1] 1 6 5
Inquire all elements except the second elements in vector variable a.
> a[-2]
[1] 3 1 6 5 4
6. Simple statistics with a vector variable
> a<-c(3,2,1,6,5,4)
> sum(a)
[1] 21
> mean(a)
[1] 3.5
> median(a)
[1] 3.5
> var(a)
[1] 3.5
> sd(a)
[1] 1.870829
Note that sd(a) command returns non-biased standard deviation for the vector. > length(a)
[1] 6
> sort(a)
[1] 1 2 3 4 5 6
> min(a)
[1] 1
> range(a)
[1] 1 6
Plotting One-Vector Data
1. Save created images
To show your graphic work, you need to know how to save your created graphics. After you are satised
with your graphic works, you should enter the following command to save the pictures to jpeg format:
> dev.copy(jpeg,c:/download/2013sta/test.jpeg)
> dev.o()
2. Frequency plot (bar and histogrm)
We have observed how many people are waiting at a bus station each hour for a successive 27 hours.
> x=c(1,2,2,1,4,4,5,3,3,2,2,2,2,12,6,9,8,8,7,4,5,6,3,5,5,5,5)
> barplot(table(x),xlab=People,ylab=Frequency)
The plotted result is shown in Figure (2). We can also do it by using a histogram function. As shown
in Figure (3), the histogram classify the sample by intervals instead of plotting the frequency for each
4
sample value.
> hist(x,xlab=People,ylab=Frequency)
Figure 2: Frequency vs. people(bar)
3. Box plot
This is a quantile graphic expression for a set of samples (Figure (4)).
> boxplot(x,ylab=People)
Lab Work
This lab is due September 6th night 11 p.m on T-square
Download dataset ex01.dat from T-square. This is a text le that records the observed waiting time (in
minutes) for each passengers at bus stations. There are three independent bus stations in concern: A, B and
C. At each bus station, observations are made for 100 passengers.
The structure of ex01.dat looks like whats in Table (1):
Each column corresponds to the data for one bus station. There are 100 rows of data in this text le that
presents the observed waiting time for 100 passengers at each station. And there is a rst row (header) that
is composed of the letter A B and C.
Your task is to import data from this text le into R. Then you just need to choose ONE bus station (either
A, B or C) to make a basic statistical analysis on the waiting time of the passengers at that station. You need
to present (1) which bus station you choose, (2) the maximum, mean, median, range, non-biased standard
deviation of the waiting time, (3) ONE graphic plot of the data and simply describe it (select any plot type
youd like to show and make a short description of what you have observed on that plot. The description is
limited to one paragraph).
You can submit the work in either WORD or PDF format to T-square. Please mark your full name on
your work clearly. The graphic should be inserted in the submitted work and you dont need to submit it
seperately. But be sure one can open and see the graphic in your work and dont make it too small to see
clearly. Please limit this work to no more than 1 page.
5
Figure 3: Frequency vs. people(histogram)
Hints:
(1) Import command: You need to set header=TRUE because we have headers A, B and C.
(2) Extracting data for one station: You may use a new variable to assign the value for only one station.
Since we have 100 records, the data can be extracted by using 1:100 for the rows (for example: A[1:100,2]).
A B C
3 6 7
4 5 5
6 7 5
...
8 6 8
Table 1: Data Structure of ex01.dat
6
Figure 4: Boxplot for number of people
7

Potrebbero piacerti anche