Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
mm 40 60 80 100 120
40
Introduction to R
Guy Yollin
Aliate Instructor, University of Washington Principal Consultant, r-programming.org
60
80
2011)
Introduction to R
1 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
2 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
4 / 131
What is R?
R ismm a language and computing 40 environment 60 for statistical 80 100 and graphics
120
R is based on the S language originally developed by John Chambers and 40 colleagues at AT&T Bell Labs in the late 1970s and early 1980s R (sometimes called GNU S ) is free open source software licensed under the GNU general public license (GPL 2)
60
R development was initiated by Robert Gentleman and Ross Ihaka at the University of Auckland, New Zealand R is 80 formally known as The R Project for Statistical Computing
www.r-project.org
Guy Yollin (Copyright
2011)
Introduction to R
5 / 131
80
HAM1 EDHEC LS EQ SP500 TR
100
HAM1 Performance
120
Data manipulation
40
60
Data visualization
80
0.4
0.3
0.2
0.1
0.0 0.10
Statistical modeling
0.05
0.00
0.05
Data analysis
1.0
1.5
2.0
2.5
Jan 96
Jan 97
Jan 98
Jan 99
Jan 00
Jan 01
Jan 02
Jan 03
Jan 04
Jan 05
Jan 06
Dec 06
Date
2011)
Introduction to R
6 / 131
S language implementations
R is the most mm recent and 40 full-featured 60 implementation of the S language Original S - AT & T Bell Labs
40 80 100 120
2011)
Introduction to R
7 / 131
R timeline
mm 40
1991 Statistical Models in S (white book) S3 methods 1984 S: An Interactive Envirnoment for Data Analysis and Graphics 1988 (Brown Book) The New S Language Written in C (Blue Book) Work on S Version 1
60
1999 John Chambers 1998 ACM Software Award
80
2001 R 1.4.0 (S4)
100
120
40
2002 Modern Applied Statistics with S 4th Edition (S+ 6.x, R 1.5.0)
2004 R 2.0.0
1976
60
S-PLULS developed by Statistical Sciences, Inc. 1988 StatSci Licenses S 1993 Modern Applied Statistics with S-PLUS 1994 Modern Applied Statistics with S-PLUS 2nd Edition 1997 Modern Applied Statistics with S-PLUS Programming with Data 3rd Edition (Green Book) (S+ 2000, 5.x) (S+ 5.x) (R complements) 1998 1999
2011
80
S era
S-PLUS era
R era
2011)
Introduction to R
8 / 131
2011)
Introduction to R
9 / 131
The R Foundation
mm 40 non-prot 60organization 80 located in100 120 The R Foundation is the Vienna, Austria which is responsible for developing and maintaining R
Hold and administer the copyright of R software and documentation Support continued development of R Organize meetings and conferences related to statistical computing Ocers Presidents 60 Secretary Treasurer At Large Auditors 80 Robert Gentleman, Ross Ihaka Friedrich Leisch Kurt Hornik John Chambers Peter Dalgaard, Martin Maechler
40
2011)
Introduction to R
10 / 131
120
2011)
Introduction to R
11 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
12 / 131
An Introduction to R
W.N. Venables, D.M. Smith R Development Core Team 40
R Reference Card
R Reference Card
by Tom Short, EPRI Solutions, Inc., tshort@eprisolutions.com 2005-07-12 Granted to the public domain. See www.Rpad.org for the source and latest version. Includes material from R for Beginners by Emmanuel Paradis (with permission).
60
Most R functions have online documentation. help(topic) documentation on topic ?topic id. help.search("topic") search the help system apropos("topic") the names of all objects in the search list matching the regular expression topic help.start() start the HTML version of help str(a) display the internal *str*ucture of an R object summary(a) gives a summary of a, usually a statistical summary but it is generic meaning it has different operations for different classes of a ls() show objects in the search path; specify pat="pat" to search on a pattern ls.str() str() for each variable in the search path dir() show les in the current directory methods(a) shows S3 methods of a methods(class=class(a)) lists all the methods to handle objects of class a options(...) set or examine many global options; common ones: width, digits, error library(x) load add-on packages; library(help=x) lists datasets and functions in package x. attach(x) database x to the R search path; x can be a list, data frame, or R data le created with save. Use search() to show the search path. detach(x) x from the R search path; x can be a name or character string of an object previously attached or a package.
cat(..., file="", sep=" ") prints the arguments after coercing to character; sep is the character separator between arguments print(a, ...) prints its arguments; generic, meaning it can have different methods for different objects format(x,...) format an R object for pretty printing write.table(x,file="",row.names=TRUE,col.names=TRUE, sep=" ") prints x after converting to a data frame; if quote is TRUE, character or factor columns are surrounded by quotes ("); sep is the eld separator; eol is the end-of-line separator; na is the string for missing values; use col.names=NA to add a blank column header to get the column headers aligned correctly for spreadsheet input sink(file) output to file, until sink() Most of the I/O functions have a file argument. This can often be a character string naming a le or a connection. file="" means the standard input or output. Connections can include les, pipes, zipped les, and R variables. On windows, the le connection can also be used with description = "clipboard". To read a table copied from Excel, use x <- read.delim("clipboard") To write a table to the clipboard for Excel, use write.table(x,"clipboard",sep="\t",col.names=NA) For database interaction, see packages RODBC, DBI, RMySQL, RPgSQL, and ROracle. See packages XML, hdf5, netCDF for reading other le formats.
Data creation
load() load the datasets written with save data(x) loads specied data sets read.table(file) reads a le in table format and creates a data frame from it; the default separator sep="" is any whitespace; use header=TRUE to read the rst line as a header of column names; use as.is=TRUE to prevent character vectors from being converted to factors; use comment.char="" to prevent "#" from being interpreted as a comment; use skip=n to skip n lines before reading data; see the help for options on row naming, NA treatment, and others read.csv("filename",header=TRUE) id. but with defaults set for reading comma-delimited les read.delim("filename",header=TRUE) id. but with defaults set for reading tab-delimited les read.fwf(file,widths,header=FALSE,sep="",as.is=FALSE) read a table of f ixed width f ormatted data into a data.frame; widths is an integer vector, giving the widths of the xed-width elds save(file,...) saves the specied objects (...) in the XDR platformindependent binary format save.image(file) saves all objects
c(...) generic function to combine arguments with the default forming a vector; with recursive=TRUE descends through lists combining all elements into one vector from:to generates a sequence; : has operator priority; 1:4 + 1 is 2,3,4,5 seq(from,to) generates a sequence by= species increment; length= species desired length seq(along=x) generates 1, 2, ..., length(x); useful for for loops rep(x,times) replicate x times; use each= to repeat each element of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3; rep(c(1,2,3),each=2) is 1 1 2 2 3 3 data.frame(...) create a data frame of the named or unnamed arguments; data.frame(v=1:4,ch=c("a","B","c","d"),n=10); shorter vectors are recycled to the length of the longest list(...) create a list of the named or unnamed arguments; list(a=c(1,2),b="hi",c=3i); array(x,dim=) array with data x; specify dimensions like dim=c(3,4,2); elements of x recycle if x is not long enough matrix(x,nrow=,ncol=) matrix; elements of x recycle factor(x,levels=) encodes a vector x as a factor gl(n,k,length=n*k,labels=1:n) generate levels (factors) by specifying the pattern of their levels; k is the number of levels, and n is the number of replications expand.grid() a data frame from all combinations of the supplied vectors or factors rbind(...) combine arguments by rows for matrices, data frames, and others cbind(...) id. by columns
Indexing lists x[n] list with elements n x[[n]] nth element of the list x[["name"]] element of the list named "name" x$name id. Indexing vectors x[n] nth element x[-n] all but the nth element x[1:n] rst n elements x[-(1:n)] elements from n+1 to the end x[c(1,4,2)] specic elements x["name"] element named "name" x[x > 3] all elements greater than 3 x[x > 3 & x < 5] all elements between 3 and 5 x[x %in% c("a","and","the")] elements in the given set Indexing matrices x[i,j] element at row i, column j x[i,] row i x[,j] column j x[,c(1,3)] columns 1 and 3 x["name",] row named "name" Indexing data frames (matrix indexing plus the following) x[["name"]] column named "name" x$name id.
Variable conversion
as.array(x), as.data.frame(x), as.numeric(x), as.logical(x), as.complex(x), as.character(x), ... convert type; for a complete list, use methods(as)
Variable information
is.na(x), is.null(x), is.array(x), is.data.frame(x), is.numeric(x), is.complex(x), is.character(x), ... test for type; for a complete list, use methods(is) length(x) number of elements in x dim(x) Retrieve or set the dimension of an object; dim(x) <- c(3,2) dimnames(x) Retrieve or set the dimension names of an object nrow(x) number of rows; NROW(x) is the same but treats a vector as a onerow matrix ncol(x) and NCOL(x) id. for columns class(x) get or set the class of x; class(x) <- "myclass" unclass(x) remove the class attribute of x attr(x,which) get or set the attribute which of x attributes(obj) get or set the list of attributes of obj
which.max(x) returns the index of the greatest element of x which.min(x) returns the index of the smallest element of x rev(x) reverses the elements of x sort(x) sorts the elements of x in increasing order; to sort in decreasing order: rev(sort(x)) cut(x,breaks) divides x into intervals (factors); breaks is the number of cut intervals or a vector of cut points
80
Denitely obtain these PDF les!
Guy Yollin (Copyright
2011)
Introduction to R
13 / 131
A Beginners Guide to R
Zuur, Ieno, Meesters Springer, 2009
60
80
2011)
Introduction to R
14 / 131
For those with experience in MATLAB, David Hiebeler has created a MATLAB/R cross reference document:
40 http://www.math.umaine.edu/~hiebeler/comp/matlabR.pdf
For those with experience in SAS, SPSS, or Stata, Robert Muenchen has written R books for this audience:
60
http://r4stats.com
80
2011)
Introduction to R
15 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
16 / 131
The R GUI
mm 40 60 80 100 120
Running R in Windows
40
Typically run Rgui.exe Can also run R.exe from command prompt
60
80
2011)
Introduction to R
17 / 131
Interactive R session
mm 40 60 80 100 120
R is an interpreted language
40GUI is an interactive The R command driven environment type R commands at the R GUI console 60 Run previously created R scripts (R commands in a text le) 80
Commands entered interactively into the R console
2011)
Introduction to R
18 / 131
2011)
Introduction to R
19 / 131
Object orientation in R
mm 40 60 80 100 120
Everything in R is an Object
Use functions ls and objects to list all objects in the current workspace 40
R Code: Listing objects
> > > > > x <- c(3.1416,2.7183) m <- matrix(rnorm(9),nrow=3) 60 tab <- data.frame(store=c("downtown","eastside","airport"),sales=c(32,17,24)) cities <- c("Seattle","Portland","San Francisco") ls() "m" "r" "s" "tab" "x" "y"
80
2011)
Introduction to R
20 / 131
Data types
mm 40 60 80 100 120
All R objects have a type or storage mode Use function typeof to display 40 an objects type Common types are:
double 60 character list integer 80
2011)
Introduction to R
21 / 131
Object classes
mm 40
R Code: Object 60 80 class
> m [,1] [,2] [,3] [1,] 1.8130510 -1.2820088 -0.1628178 [2,] -1.7734811 -0.9612151 -0.2419549 [3,] 0.5810513 -2.8942578 -1.8208523 > class(m)
100
120
All R objects have a class Use function class to 40 display an objects class There are many R classes; basic classes are:
60 numeric character data.frame matrix 80
[1] "matrix" > tab store sales 1 downtown 32 2 eastside 17 3 airport 24 > class(tab) [1] "data.frame"
2011)
Introduction to R
22 / 131
Vectors
R is a vector/matrix language
mm can easily 40 80 100 vectors be created 60 with c, the combine function 120
most places where single value can be supplied, a vector can be supplied and R will perform a vectorized operation
R Code: Creating vectors and vector operations 40
> constants <- c(3.1416,2.7183,1.4142,1.6180) > names(constants) <- c("pi","euler","sqrt2","golden") > constants pi euler sqrt2 golden 3.1416 2.7183 60 1.4142 1.6180 > constants^2 pi euler sqrt2 golden 9.869651 7.389155 1.999962 2.617924 > 10*constants 80 pi euler sqrt2 golden 31.416 27.183 14.142 16.180
Guy Yollin (Copyright
2011)
Introduction to R
23 / 131
Indexing vectors
mm
R Code: Indexing vectors
40
60
> constants[c(1,3,4)] pi sqrt2 golden 3.1416 1.4142 1.6180 > constants[c(-1,-2)] sqrt2 golden 1.4142 1.6180
80
100
120
Vectors indices are placed with square brackets: [] Vectors can be indexed in any of the following ways: vector of positive integers vector 60of negative integers vector of named items logical vector
80 40
> constants[c("pi","golden")] pi golden 3.1416 1.6180 > constants > 2 pi TRUE euler TRUE sqrt2 golden FALSE FALSE
2011)
Introduction to R
24 / 131
60 > constants*c(0,1)
pi euler sqrt2 golden 0.0000 2.7183 0.0000 1.6180 > constants*c(0,1,2)
last input generates a warning: longer object length is not a multiple of shorter object length
2011)
Introduction to R
25 / 131
Sequences
An integer sequence vector can be created with mm 40 60 80the : operator 100
120
A general numeric sequence vector can be created with the seq function
R Code: seq arguments
40
> args(seq.default) function (from = 1, to = 1, by = ((to - from)/(length.out - 1)), length.out = NULL, along.with = NULL, ...) NULL
60
to starting value
2011)
Introduction to R
26 / 131
Sequences
mm 40 60 80 100 120
40
> seq(from=0,to=1,len=5) [1] 0.00 0.25 60 0.50 0.75 1.00 > seq(from=0,to=20,by=2.5) [1] 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
80
2011)
Introduction to R
27 / 131
60 > seq(by=2,0,10)
[1] 0 2 4 6 8 10 > seq(0,10,len=5) [1] 0.0 2.5 5.0 7.5 10.0
80
2 3 4 5 6 7 8 9 10
Introduction to R 28 / 131
2011)
100
120
This is a mechanism to allow additional arguments to be passed 60 which will subsequently be passed on to a sub-function that the main function will call An example of this would be passing graphic parameters (e.g. lwd=2) to the plot function which will subsequently call and pass these 80 arguments on to the par function
Guy Yollin (Copyright
2011)
Introduction to R
29 / 131
mm
40
60
80
100
120
# initialize a vector
40
[1] 1 2 3 4 1 2 3 4 > rep(1:4, each = 2) [1] 1 1 2 2 3 3 4 4 > rep(1:4,60 c(2,1,2,1)) [1] 1 1 2 3 3 4 > rep(1:4, each = 2, len = 10) [1] 1 1 2 2 3 3 4 4 1 1 > rep(1:4,80 each = 2, times = 3) # length 24, 3 complete replications # 8 integers plus two recycled 1's. # repeat each element 2 times
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
Guy Yollin (Copyright
2011)
Introduction to R
30 / 131
Generic functions
A generic function behaves in a way that is appropriate based on the class of its argument; for example: plot print 40 summary
R Code: Some classes handled by the plot function
> methods(plot)[1:15] [1] [4] [7] [10] [13] "plot.acf" "plot.default" "plot.ecdf" "plot.hclust" "plot.isoreg"
mm
40
60
80
100
120
60
80
2011)
Introduction to R
31 / 131
R packages
40 stored in packages 60 All mm R functions are 80 100 120
Recommended packages
boot, rpart, foreign, MASS, cluster, Matrix, etc.
Additional packages can be downloaded through the R GUI or via the 60 install.packages function
2011)
Introduction to R
32 / 131
80
2011)
Introduction to R
33 / 131
120
80
2011)
Introduction to R
34 / 131
40
60
80
2011)
Introduction to R
35 / 131
The following R add-on packages are recommended for computational nance: Package zoo tseries PerformanceAnalytics 60 quantmod xts
40
Description Time series objects Time series analysis and computational nance Performance and risk analysis Quantitative nancial modeling framework Extensible time series
80
2011)
Introduction to R
36 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
37 / 131
The backslash character \ in a character string is used to begin an escape 80 sequence, so to use backslash in a string enter it as \\ The forward slash character / can also be used as a directory separator on windows systems
Guy Yollin (Copyright
2011)
Introduction to R
38 / 131
40
60
80
2011)
Introduction to R
39 / 131
120
60
file le name (with path if necessary) header TRUE/FALSE if there are column names in the le sep column separation character (e.g. comma or tab)
80
2011)
Introduction to R
40 / 131
Reading a text le
mm 40 60 80 100 120
40
2011)
Introduction to R
mm
40
60
80 > dim(dat)
[1] 6092 6 > dat[1:2,1:3]
100
120
The read.table function 40 returns a data.frame object A data.frame is a 2D matrix-like object where the 60 columns can be of dierent classes
80
V1 V2 V3 1 1986-07-09 0.380000 0.384419 2 1986-07-10 0.383214 0.383214 > typeof(dat) [1] "list" > class(dat) [1] "data.frame" > class(dat[,1]) [1] "character" > class(dat[,2]) [1] "numeric"
2011)
Introduction to R
42 / 131
80
100
120
40
V2 0.380000 0.383214 0.366494 0.344416 0.338684 0.335329 V3 0.384419 0.383214 0.370909 0.344416 0.343026 0.335329 V4 0.371163 0.367710 0.331169 0.335584 0.325658 0.317910 V5 0.38 0.37 0.34 0.34 0.33 0.32 V6 1564294736 3623351351 8968235294 1615007058 2110865454 2303230600
> tail(dat,3) V1 V2 V3 V4 V5 V6 6090 2010-08-30 18.25 18.31 17.94 17.96 73718900 80 6091 2010-08-31 17.88 17.92 17.60 17.67 111601400 6092 2010-09-01 17.94 18.27 17.89 18.14 73506800
2011)
Introduction to R
43 / 131
R has a number of size related and diagnostic helper functions Function 40 dim nrow ncol length 60 head tail str
80
Description return dimensions of a multidimensional object number of rows of a multidimensional object number of columns of a multidimensional object length a vector or list display rst n rows (elements) display last n rows (elements) summarize structure of an object
2011)
Introduction to R
44 / 131
mm
40
60
R has extremely powerful data manipulation capabilities especially in the area of vector and matrix indexing 40 data.frames and matrices can be indexed in any of the following ways
60 vector of positive integers vector of negative integers character vector of columns (row) names a logical vector 80
> tail(dat[,c("date","close")],3) date close 6090 2010-08-30 17.96 6091 2010-08-31 17.67 6092 2010-09-01 18.14 > dat[dat[,"volume"]>15e9, c("date","close","volume")] date close volume 328 1987-10-22 0.61 16717377049 602 1988-11-21 0.52 16116850769 1715 1993-04-19 2.58 21509122170
45 / 131
2011)
Introduction to R
80
100
120
60
object to be written (data.frame, matrix, vector) le name (with path if necessary) column separation character (e.g. comma or tab) write row names (T/F) write col names (T/F)
2011)
Introduction to R
46 / 131
40
60
80
2011)
Introduction to R
47 / 131
60
80
100
120
> constants <- list(pi=3.1416,euler=2.7183,golden=1.6180) > class(constants) [1] "list" > length(constants) 40 [1] 3 > constants $pi [1] 3.141660 $euler [1] 2.7183 $golden [1] 1.618
80
2011)
Introduction to R
48 / 131
Items in a list can be accessed using [], [[]], or $ syntax as follows: [] returns a sublist
40 vector of positive integers vector of named items logical vector 60 single integer single name
2011)
Introduction to R
49 / 131
class query an objects class str reports structure of an object attributes returns list of objects attributes
40
attr get/set attributes of an object names gets the names of a list, vector, data.frame, etc. dimnames gets the row and column names of a data.frame or matrix
60 column names of a data.frame or matrix colnames
rownames row names of a data.frame or matrix dput makes an ASCII representation of an object unclass 80 removes class attribute of an object unlist converts a list to a vector
Guy Yollin (Copyright
2011)
Introduction to R
50 / 131
80
2011)
Introduction to R
51 / 131
80
There are a number of apply related functions; one mark of mastering R is mastering apply related functions
Guy Yollin (Copyright
2011)
Introduction to R
52 / 131
S4 Classes
mm 60 80 100 S4 classes are a more40 modern implementation of object-oriented programming in R compared to S3 classes 120
2011)
Introduction to R
53 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
54 / 131
Probability distributions
mm 40 60 80 100 120
Random variable A random variable is a quantity that can take on any of a set of possible values but only one of those values will actually occur
40 discrete random variables have a nite number of possible values continuous random variables have an innite number of possible values
Probability distribution
60
The set of all possible values of a random variable along with their associated probabilities constitutes a probability distribution of the random variable
80
2011)
Introduction to R
55 / 131
60
80
100
PDF
120
fY (y )dy = 1
0.0
0.1
0.2
0.3
fY (y )
0.4
FY (y ) = Pr (Y y ) =
fY (y )
80
0.0
0.2
0.4
0.6
0.8
1.0
2011)
Introduction to R
56 / 131
40
60
80
100
120
40
> x <- seq(from = -5, to = 5, by = 0.01) > x[1:10] [1] -5.00 -4.99 -4.98 -4.97 -4.96 -4.95 -4.94 -4.93 -4.92 -4.91 > y <- dnorm(x) > y[1:5] 60 [1] 1.486720e-06 1.562867e-06 1.642751e-06 1.726545e-06 1.814431e-06 > par(mar = par()$mar + c(0,1,0,0)) > plot(x=x,y=y,type="l",col="seagreen",lwd=2, xlab="quantile",ylab="density\ny = dnorm(x)") 80 > grid(col="darkgrey",lwd=2) > title(main="Probability Density Function (PDF)")
Guy Yollin (Copyright
2011)
Introduction to R
57 / 131
120
40
density y = dnorm(x)
60
80
0.0
0.1
0.2
0.3
0 quantile
2011)
Introduction to R
58 / 131
60 > args(qnorm)
function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) NULL > y <- pnorm(x) > par(mar = par()$mar + c(0,1,0,0)) 80 > plot(x=x,y=y,type="l",col="seagreen",lwd=2, xlab="quantile\nx = qnorm(y)", ylab="probability\ny = pnorm(x)") ; grid(col="darkgrey",lwd=2) > title(main="Cumulative Distribution Function (CDF)")
Guy Yollin (Copyright
2011)
Introduction to R
59 / 131
120
40
probability y = pnorm(x)
60
80
0.0
0.2
0.4
0.6
0.8
0 2 quantile x = qnorm(y)
2011)
Introduction to R
60 / 131
40
60
80
100
120
40
> y <- rnorm(50,sd=3) > y[1:5] 60 [1] 1.3505613 -0.0556795 -0.9542051 -2.7880864 -4.4623809
2011)
Introduction to R
61 / 131
Histograms
The generic function hist computes a histogram of the given data values
40 R Code: mm hist arguments
> args(hist.default) function (x, breaks = "Sturges", freq = NULL, probability = !freq, include.lowest = TRUE, right = TRUE, density = NULL, angle = 45, col = NULL, border = NULL, main = paste("Histogram of", xname), 40 xlim = range(breaks), ylim = NULL, xlab = xname, ylab, axes = TRUE, plot = TRUE, labels = FALSE, nclass = NULL, warn.unused = TRUE, ...) NULL
60
80
100
120
breaks number of breaks, vector of breaks, name of break algorithm, break function prob probability densities or counts ylim y-axis range col color or bars
Guy Yollin (Copyright
80
2011)
Introduction to R
62 / 131
Plotting histograms
R Code: Plotting histograms
mm > hist(x,col="seagreen") 40
60
Histogram of x
80
100
120
Frequency
60
80
10
15
20
25
30
40
35
0 x
2011)
Introduction to R
63 / 131
Plotting histograms
R Code: Plotting histograms
mm > hist(c(x,y),prob=T,breaks="FD",col="seagreen") 40 60
80
100
120
Histogram of c(x, y)
0.4 0.0 0.1 0.2 0.3
40
60
80
Density
0 c(x, y)
2011)
Introduction to R
64 / 131
Plotting histograms
R Code: Plotting histograms
mm > hist(c(x,y),prob=T,breaks=50,col="seagreen") 40 60
80
100
120
Histogram of c(x, y)
0.5
40
0.3 0.0 0.1 0.2 0.4
60
80
Density
0 c(x, y)
2011)
Introduction to R
65 / 131
mean mean of a vector or matrix median median of a vector or matrix mad median absolute deviation of a vector or matrix
40
var variance of a vector or matrix sd standard deviation of a vector cov covariance between vectors
60 correlation between vectors cor
diff dierence between elements in a vector log log of a vector or matrix exp 80 exponentiation of a vector or matrix abs absolute value of a vector or matrix
Guy Yollin (Copyright
2011)
Introduction to R
66 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
67 / 131
Function plot lines segments 40 points text abline 60 curve legend matplot par
80
Description generic function to plot an R object adds lines to the current plot adds lines line segments between point pairs adds points to the current plot adds text to the current plot adds straight lines to the current plot plot a function over a range adds a legend to the current plot plot all columns of a matrix sets graphics parameters
2011)
Introduction to R
68 / 131
60
80
100
120
vector to be plotted (or index if y given) vector to be plotted x & y limited x & y axis labels plot title (can be done with title function p = points (default), l = lines, h = bars, n = no plot color or bars control the aspect ratio
2011)
Introduction to R
69 / 131
40
60
80
100
120
40
q q q q q q q qq qq qq q q q q q q q q q qq q q q q q q q q q q q qq q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q qq qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qqq q q q qq q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q
Capm[, "rf"]
60
80
0.2
0.4
0.6
0.8
1.0
1.2 0
100
200 Index
300
400
500
2011)
Introduction to R
70 / 131
mm > plot(Capm[,"rf"],type="l") 40
60
80
100
120
Capm[, "rf"]
60
80
0.2 0
0.4
0.6
0.8
1.0
1.2
40
100
200 Index
300
400
500
2011)
Introduction to R
71 / 131
mm > plot(Capm[,"rmrf"],type="h") 40
60
80
100
120
Capm[, "rmrf"]
80
20 0
10
60
10
40
100
200 Index
300
400
500
2011)
Introduction to R
72 / 131
mm > plot(Capm[,"rmrf"],Capm[,"rcon"]) 40
60
80
100
120
40
Capm[, "rcon"]
20
10
60
20
q q qq q q q q q qqq q q q q q q q qq q q qqq q qq q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q qq q qq q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q q q qq q q q q q q qq q q q q q q q qq qq q qq q q q qq q q q q
q q q q q
30
0
q
80
10
20
10
0 Capm[, "rmrf"]
10
2011)
Introduction to R
73 / 131
The points function adds points to the current plot at the given x, y coordinates
R Code: points arguments
> args(points.default) function (x, y = NULL, type = "p", ...) NULL
40
60
2011)
Introduction to R
74 / 131
The lines function adds connected line segments to the current plot
R Code: lines 40 arguments
> args(lines.default) function (x, y = NULL, type = "l", ...) NULL
y vector of y coordinates
80
2011)
Introduction to R
75 / 131
40 y = NULL, labels = seq_along(x), adj = NULL, pos = NULL, function (x, offset = 0.5, vfont = NULL, cex = 1, col = NULL, font = NULL, ...) NULL
x/y 60 location to place text labels text to be display adj adjustment of label at x, y location pos position of text relative to x, y
80 offset oset from pos
2011)
Introduction to R
76 / 131
100
120
40
construction return
60
80
20 20
10
10
20
10
0 market return
10
20
2011)
Introduction to R
77 / 131
mm > plot(0,xlim=c(-20,20),ylim=c(-20,20),type="n", 40 60 80 xlab="market return",ylab="construction return") > points(x=Capm[,"rmrf"],y=Capm[,"rcon"],col="gray") > lines(x=-20:20,y=-20:20,lwd=2,col="darkred") > text(20,20,labels="slope = 1",pos=2) 40
20
q q qq q q qq q q q q q q qqq q qq qq qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq qq q qq q q q q qq q q q q q qq q qqq q q qq qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q qq qq q q q q qq qq qq qq q q q q q qq q q qqq q q q q q qq q qqq qq q q qq q qq q q q q q q q q qq q
100
120
slope = 1
q
10
60
construction return
10
80
20 20
q
10
0 market return
10
20
2011)
Introduction to R
78 / 131
40
> args(segments) function (x0, y0, x1 = x0, y1 = y0, col = par("fg"), lty = par("lty"), lwd = par("lwd"), ...) NULL
60
x0, y0 point coordinates from which to draw x1, y1 point coordinates to which to draw
80
2011)
Introduction to R
79 / 131
40 function (expr, from = NULL, to = NULL, n = 101, add = FALSE, type = "l", ylab = NULL, log = NULL, xlim = NULL, ...) NULL
expr function or expression of x from start of range to end of range n number of points over from/to range add 80 add to current plot (T/F)
60
2011)
Introduction to R
80 / 131
The abline function adds one or more straight lines through the current plot
40 R Code: abline arguments
> args(abline) function (a = NULL, b = NULL, h = NULL, v = NULL, reg = NULL, coef = NULL, untf = FALSE, ...) NULL
60
h/v vertical or horizontal coordinate of line a/b intercept and slope of line
80
2011)
Introduction to R
81 / 131
40
> args(matplot) function (x, y, type = "p", lty = 1:5, lwd = 1, lend = par("lend"), pch = NULL, col = 1:6, cex = NULL, bg = NA, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, ..., add = FALSE, verbose = getOption("verbose")) 60 NULL
2011)
Introduction to R
82 / 131
120
40
[1] [7] [13] [19] [25] [31] [37] [43] [49] [55] [61] [67]
"xlog" "bty" "cin" "cra" "family" 60 "font.lab" "lheight" "mar" "mkh" "pin" "tck" "xpd"80
"ylog" "cex" "col" "crt" "fg" "font.main" "ljoin" "mex" "new" "plt" "tcl" "yaxp"
"adj" "cex.axis" "col.axis" "csi" "fig" "font.sub" "lmitre" "mfcol" "oma" "ps" "usr" "yaxs"
"ann" "cex.lab" "col.lab" "cxy" "fin" "lab" "lty" "mfg" "omd" "pty" "xaxp" "yaxt"
"ask" "cex.main" "col.main" "din" "font" "las" "lwd" "mfrow" "omi" "smo" "xaxs"
"bg" "cex.sub" "col.sub" "err" "font.axis" "lend" "mai" "mgp" "pch" "srt" "xaxt"
2011)
Introduction to R
83 / 131
40
60
Parameter 40 col lwd lyt mfrow cex.axis cex.lab cex.main pch las bty
Description 60 80 100 plot color line width line type set/reset multi-plot layout character expansion - axis character expansion - labels character expansion - main point character axis label orientation box type around plot or legend
120
some parameters can be passed in a plot function (e.g. col, lwd) 80 some parameters can only be changed by a call to par (e.g. mfrow)
Guy Yollin (Copyright
2011)
Introduction to R
84 / 131
mm > args(legend)
40
60
80
100
120
function (x, y = NULL, legend, fill = NULL, col = par("col"), border = "black", lty, lwd, pch, angle = 45, density = NULL, bty = "o", bg = par("bg"), box.lwd = par("lwd"), box.lty = par("lty"), box.col = par("fg"), pt.bg = NA, cex = 1, pt.cex = cex, pt.lwd = lwd, xjust 40 = 0, yjust = 1, x.intersp = 1, y.intersp = 1, adj = c(0, 0.5), text.width = NULL, text.col = par("col"), merge = do.lines && has.pch, trace = FALSE, plot = TRUE, ncol = 1, horiz = FALSE, title = NULL, inset = 0, xpd, title.col = text.col, title.adj = 0.5, seg.len = 2) NULL
60
location of the legend (can be give as a position name) vector of labels for the legend vector of colors line type line width character
2011)
Introduction to R
85 / 131
40
60
80
100
120
function (height, width = 1, space = NULL, names.arg = NULL, legend.text = NULL, beside = FALSE, horiz = FALSE, density = NULL, angle 40 = 45, col = NULL, border = par("fg"), main = NULL, sub = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL, xpd = TRUE, log = "", axes = TRUE, axisnames = TRUE, cex.axis = par("cex.axis"), cex.names = par("cex.axis"), inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0, add = FALSE, args.legend = NULL, ...) NULL 60
height vector or matrix (stacked bars or side-by-side bars) of heights names.arg axis labels for the bars beside 80 stacked bars or side-by-side if height is a matrix legend vector of labels for stacked or side-by-side bars
Guy Yollin (Copyright
2011)
Introduction to R
86 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
87 / 131
A time series class in R is a compound data object that includes a data matrix as40 well as a vector of associated time stamps class ts 60 mts its timeSeries fts zoo 80 xts
Guy Yollin (Copyright
overview regularly spaced time series multiple regularly spaced time series irregularly spaced time series default for Rmetrics packages R interface to tslib (c++ time series library) reg/irreg and arbitrary time stamp classes an extension of the zoo class
Introduction to R 88 / 131
2011)
start return start of time series end return end of time series frequency return frequency of time series window Extract subset of time series index return time index of time series time return time index of time series
60 return data of time series coredata 40
diff dierence of the time series lag lag of the time series aggregate aggregate to lower resolution time series 80 cbind merge 2 or more time series together
Guy Yollin (Copyright
2011)
Introduction to R
89 / 131
> library(zoo) > msft.df <- read.table("table.csv", header = TRUE, sep = ",", as.is = TRUE) > head(msft.df,2) Date Volume Adj.Close 40 Open High Low Close 1 2010-09-13 24.20 25.29 24.09 25.11 114606300 25.11 2 2010-09-10 23.98 24.03 23.79 23.85 58284300 23.85 > args(zoo) function (x = NULL, order.by = index(x), frequency = NULL) 60 NULL > msft.z <- zoo(x=msft.df[,"Close"],order.by=as.Date(msft.df[,"Date"])) > head(msft.z) 2009-01-02 2009-01-05 2009-01-06 2009-01-07 2009-01-08 2009-01-09 20.33 20.52 20.76 19.51 20.12 19.52
80
2011)
Introduction to R
90 / 131
60
80
100
120
60
80
[1] "Date"
2011)
Introduction to R
91 / 131
100
120
40
60
20
80
2009 2010
15
Guy Yollin (Copyright
25
30
2011)
Introduction to R
92 / 131
40
log return
60
0.20
0.10
0.00 0.05
1982
1984
1986 year
1988
1990
80
2011)
Introduction to R
94 / 131
40
60
80
100
120
[1] "data.frame"
40 > dim(SP500)
[1] 2783 1 > SPreturn = SP500$r500 > head(SPreturn)
60 [1] -0.0117265
0.0024544
0.0110516
> n = length(SPreturn) > year_SP = 1981 + (1:n)*(1991.25-1981)/n > head(year_SP) [1] 1981.004 1981.007 1981.011 1981.015 1981.018 1981.022
80
2011)
Introduction to R
95 / 131
40
change in rate 0.005 0.015 1980 0.005
60
0.015
1982
1984 year
1986
80
2011)
Introduction to R
97 / 131
Plot DM returns
R Code: Plot DM returns
> > > > 1 2 3 4 5 6 # figure 4.2 mm library(zoo) data(Garch) head(Garch)
40
60
80
100
120
date day dm ddm bp 800102 wednesday 0.5861 NA 2.2490 40 800103 thursday 0.5837 -0.0041032713 2.2365 800104 friday 0.5842 0.0008562377 2.2410 800107 monday 0.5853 0.0018811463 2.2645 800108 tuesday 0.5824 -0.0049670394 2.2560 800109 wednesday 0.5834 0.0017155606 2.2650
80diff(dm) > diffdm <> plot(diffdm,xlab="year",ylab="change in rate",type="h") > title("changes in DM/dollar exchange rate")
Guy Yollin (Copyright
2011)
Introduction to R
98 / 131
40
change in rate 0.0 0.4 1960 0.2
60
0.2
1970
1980 year
1990
2000
80
2011)
Introduction to R
100 / 131
mm
40
60
80
100
120
rfood rdur rcon rmrf rf 1 -4.59 0.87 -6.84 -6.99 0.33 40 2.78 0.99 0.29 2 2.62 3.46 3 -1.67 -2.28 -0.48 -1.46 0.35 4 0.86 2.41 -2.02 -1.70 0.19 5 7.34 6.33 3.69 3.08 0.27 6 4.99 -1.26 2.05 2.09 0.24 > rf <- zooreg(Capm[,"rf"], frequency = 12, start = c(1960, 1),end=c(2002,12)) 60 > head(rf) 1960(1) 1960(2) 1960(3) 1960(4) 1960(5) 1960(6) 0.33 0.29 0.35 0.19 0.27 0.24 > diffrf <- diff(rf) 80 > plot(diffrf,xlab="year",ylab="change in rate") > title("changes in risk-free interest return")
2011)
Introduction to R
101 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
102 / 131
Free variables
mm of a function, 40 3 types60 80 be found: 100 In the body of symbols may 120
formal parameters - arguments passed in the function call local variables - variables created in the function free 40 variables - variables created outside of the function (note, free variables become local variables if you assign to them)
R Code: Types of variables in functions
60 > f <- function(x) { y <- 2*x print(x) # formal parameter print(y) # local variable print(z) # free variable } 80
2011)
Introduction to R
103 / 131
Environments
mm The main workspace in with 40 R (i.e. what 60 you are interacting 80 100at the R 120 console) is called the global environment
According to the scoping rules of R (referred to a lexical scoping), R will search for a free variable in the following order:
40
1
The 60 parent environment of the environment where the function was created The parent of the parent ... up until the global environment is searched The 80 search path of loaded libraries found using the search() function
2011)
Introduction to R
104 / 131
The function search returns a list of attached packages which will be searched in order (after the global environment) when trying to resolve a free variable
40
R Code: The search path
> search() [1] [4] [7] [10] [13] ".GlobalEnv" "package:nutshell" 60 "package:graphics" "package:datasets" "package:base" "package:zoo" "package:patchDVI" "package:grDevices" "package:methods" "package:Ecdat" "package:stats" "package:utils" "Autoloads"
80
2011)
Introduction to R
105 / 131
mm
60
80
100
120
[1] 12 > # example 2 > f<- function (x) { a<-5 60 g(x) } > g <- function(y) y + a > f(2) [1] 12
80
2011)
Introduction to R
106 / 131
> # example 4 60 > f <- function (x) { x + mean(rivers) # rivers is defined in the dataset package } > f(2) [1] 593.1844
80
2011)
Introduction to R
107 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
108 / 131
R has a comprehensive HTML help facility Run the 40 help.start function R GUI menu item Help|Html help
60
R Code: Starting HTML help
> help.start() If nothing happens, you should open 'http://127.0.0.1:27039/doc/html/index.html ' yourself 80
2011)
Introduction to R
109 / 131
One can also obtain help on a particular topic via the help function
40
help(topic) ?topic
R Code: Topic 60 help
> help(read.table)
80
2011)
Introduction to R
110 / 131
R-help archives
60
R Code: Running RSiteSearch
> library(RSiteSearch) > RSiteSearch("ODBC")
80
A search query has been submitted to http://search.r-project.org The results page should open in your browser shortly
Guy Yollin (Copyright
2011)
Introduction to R
111 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
112 / 131
R Homepage
mm 40 60 80 100 120
Links
80
2011)
Introduction to R
113 / 131
2011)
Introduction to R
114 / 131
2011)
Introduction to R
115 / 131
Statconn
http://rcom.univie.ac.at mm 40 COM interface for R connectivity Excel Word40 C# VB Delphi R Statconn
80 Notepad++ 60 80 100 120
60
2011)
Introduction to R
116 / 131
R-SIG-FINANCE
mm 40 60 80 100 120
https://stat.ethz.ch/mailman/ listinfo/r-sig-finance Nerve center of the R nance community Daily must read Exclusively for Finance-specic 60 questions, not general R questions
80 40
2011)
Introduction to R
117 / 131
80
2011)
Introduction to R
118 / 131
Quick R
mm 40 60 80 100 120
2011)
Introduction to R
119 / 131
2011)
Introduction to R
120 / 131
A presentation from the Bay Area R Users Meetup by John Mount Link40 to presentation
http://www.win-vector.com/dfiles/SurviveR.pdf
80
2011)
Introduction to R
121 / 131
Programming in R
mm 40 60 80 Online R programmingn manual from UC Riverside 100 120
URL
http://manuals.bioinformatics.ucr.edu/home/programming-in-r
40
Selected Topics
R Basics Finding Help 60 Code Editors for R Control Structures Functions Object Oriented Programming Building R Packages 80
2011)
Introduction to R
122 / 131
R Bloggers Aggregation of about 100 R blogs http://www.r-bloggers.com Stack Overow Excellent developer Q&A forum http://stackoverflow.com R Graph Gallery Examples of many possible R graphs http://addictedtor.free.fr/graphiques 60 Revolution Blog Blog from David Smith of Revolution http://blog.revolutionanalytics.com Inside-R R community site by Revolution Analytics 80 http://www.inside-r.org
Guy Yollin (Copyright
40
2011)
Introduction to R
123 / 131
Outline
1 2 3 4 5 6 7 8 9 10 11
R language references R language and environment basics The working directory, data les, and data manipulation
40
Basic statistics and the normal distribution Basic plotting Working with time series in R
60
Variable scoping in R The R help system Web resources for R 80 IDE editors for R
2011)
Introduction to R
124 / 131
RStudio
mm 40 60 80 100 120
RStudio is an fully-featured open-source IDE for R R language highlighting 40 Paste/Source code to R object explorer graphics window in main IDE
60
2011)
Introduction to R
125 / 131
2011)
Introduction to R
126 / 131
2011)
Introduction to R
127 / 131
StatET is a plug-in for the open-source Eclipse development environment R language highlighting Paste/Source code to R Supports R in SDI mode
60 documentation by Excellent Longhow Lam 40
http://www.walware.de/goto/statet
80
2011)
Introduction to R
128 / 131
R-WinEdit
mm 40 60 80 100 120
Based on WinEdt, an excellent shareware editor with support for A L TEXand Sweave development
40
R language highlighting Paste/Source code to R Supports R in MDI mode Paste/Source code to S-PLUS
60
80 http://www.winedt.com http://cran.r-project.org/web/packages/RWinEdt
Guy Yollin (Copyright
2011)
Introduction to R
129 / 131
2011)
Introduction to R
130 / 131
The End
mm 40 60 80 100 120
40
60
80
2011)
Introduction to R
131 / 131