Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Operadores em R
Operadores em R Operator [ [[ $ ^ : % special % Descrio Indexao componente Exponenciao Sequncia Operadores especiais (ex: 7%/%3 (maior inteiro que cabe na diviso,7 %%3 ( Resto da diviso)) Ordenao e comparao (menor, maior, menor ou igual, maior ou igual, igual, diferente) Smbolo lgico de negao Simbolo lgico de conjuno (AND) Simbolo lgico de disjuno (OR) Frmula Atribuio (esquerda para a direita) Atribuio ao argumento (direita para a esquerda) atribuio (direita pata a esquerda) ajuda
%%
< > <= >= == != ! & && | || ~ > ->> = <- <<?
Objetos em R
Funes que operam sobre objetos Funo
ls() print() str() length() class() attributes() attr(x, nome do atributo) is.numeric(x) is.null(x) is.logical(x) is.character(x) is.vector(x) is.factor(x) is.matrix(x) is.list(x) is.data.frame(x) is.na(x)
Descrio
# lista os nomes dos objetos no espao de trabalho # devolve o contedo do objeto especificado # fornece informao complete sobre as caractersticas do objeto e contedo # informa sobre o n de elementos de um objeto # informa sobre a classe a que pertence um objeto # informa sobre os atributos de um objeto # Informa sobre o atributo especificado do objeto Devolve TRUE #if all elements of x are numeric or integer (x <-c(1,-3.5)) #if x is NULL (the object has no length) (x <-NULL) #if all elements of x are logical (x <- c(TRUE,FALSE)) #if all elements of x are character string (x <- c(,A,,,Quad,)) #if the object x is a vector (a single dimension). Returns FALSE if object has any attributes other than names #if the object x is a factor #if the object x is a matrix (2 dimensions but not a data frame) #if the object x is a list #if the object x is a data frame #for each missing (NA) element in x (x <- c(NA,2))
Conceio Leal
Computao Estatstica II Actividade 2 Function as.numeric(x) as.null(x) as.logical(x) as.character(x) as.vector(x) as.factor(x) as.matrix(x) as.list(x) as.data.frame(x) Converte os objetos em # a numeric vector (integer or real). Factors converted to integers. # a NULL # a logical vector. Values of >1 converted to TRUE, otherwise FALSE # a character vector #a vector. All attributes (including names) are removed. # a factor. This is an abbreviated version of factor # a matrix. Any non-numeric elements result in all matrix elements being converted to character strings # a list # a data frame. Matrix columns and list columns are converted into a separate vectors of the data frame, and character vectors are converted into factors. All previous attributes are removed Descrio
tapply(x, factorlist, FUN) lapply(x, FUN) replicate(n, EXP) aggregate(x, by, FUN)
sort()
# Subset a vector or data frame according to a set of conditions # Apply the function (FUN) to the margins (INDEX=1 is rows,INDEX=2 is columns, INDEX=c(1,2) is both) of a vector array or list (x) # Apply the function (FUN) to the vector (x) separately for each combination of the list of factors # Apply the function (FUN) to each element of the list x # Re-evaluate the expression (EXP) n times. Differs from rep function which repeats the result of a single evaluation # Splits data according to a combination of factors and calculates summary statistics on each set
# sort elements into order, by default omitting NAs
which.min(x) which.max(x)
which(x == a) match(x,y) choose(n,k) combn(x,k) with(x,EXP)
unique(x)
cumsum(x)
Conceio Leal
Indexao Vectors
x[i] x[i:j] x[c(1,5,6,9)] x[-i] x["name"] x[x > 10] x[x > 10 & x < 20] x[y == "value"] x[x > 10 | y == "value"] Matricies x[i,j] x[i,] x[,j] x[-i,] x["name",1:2] x[x[,"Var1"]>4,] x[,x[,"Var1"]=="value"]
Descrio
# Select the ith element # Select the ith through jth elements inclusive # Select specific elements (see # Select all except the ith element # Select the element called "name" # Select all elements greater than 10 # Select all elements between 10 and 20 (both conditions must be satisfied) # Select all elements of x according to which y elements are equal to value # Select all elements which satisfy either condition Descrio # Select element in row i, column j # Select all elements in row i # Select all elements in column j # Select all elements in each row other than the ith row # Select columns 1 through to 2 for the row named "name" # Select all rows for which the value of the column named "Var1" is greater than 4 # Select all columns for which the value of the column named "Var1" is equal to value Descrio
Listas
x[[i]] Select the ith object of the list x[["value"]] Select the object named "value" from the list x[["value"]][1:3] Select the first three elements of the object named "value" from the list Data frames Indexar por linha (unidades amostrais) x[c(i,j),] x[,"name"] Indexing by columns (Variveis) x[["name"]] x$name E[,c('X','Y')] Indexing by conditions Descrio # Select the first 10 rows of each of the vectors in the data frame >x[1:10,] Select each of the vectors for the row called NOMEVETOR from the data frame > x['NOMEVETOR',] # Select rows i and j for each column of the data frame # Select each row of the column named "name" # Select all rows but just the i-simo and j-simo vector of the data frame : x[,c(i,j)] # Select the column named "name" # Refer to a vector named "name" within the data frame (x) # Select the X and Y vectors for all sites from the dataframe # Selecionar dados da linha Z que tem no vetor X valores maiores que 3 > x[x$X>3,] Selecionar dados com valor DADO do vetor Z que tem o valor do vetor Y maior que 3 > x$X>3 & x$Z==['DADO',]
Conceio Leal
Seq<-seq(from, to, by=) Seq<-seq(from, to, length=) Rep<-(x, n vezes cada) Ex: rep (2,3)
rep (c(3,5),2) rep(c(2,5),c(8,9))
Factor<- factor(x)
Algumas funes que operam sobre vetores Functions, operators and loops Function
mean() median() var(x) order(x, decreasing=) rank(x, ties.method=) range() sort(x, decreasing=) rev() unique() which() which.max() which.min() max() min(x) sum(x) prod(x) Conceio Leal
Description
# mean of the elements of a vector # median of the elements of a vector # sample variance # Returns a list of indices reflecting the vector sorted in ascending (default) or descending order. (By default, NAs are last) # Returns the ranks of the values in the vector, tied values averaged by default # minimum and maximum value elements of vector # Sorts a vector in increasing (default) or decreasing order. # reverse the order of vector elements # form the vector of distinct values # locates TRUE indices of logical vectors # locates (first) maximum of a numeric vector # locates (first) minimum of a numeric vector # maximum of a numeric vector # minimum of a numeric vector # Somatrio de todos os valores do vector x # Produto de todos os valores do vector x
Computao Estatstica II Actividade 2 cumprod(x) sd(x) cor(x,y) length(x) # Apresenta um vector cujos elementos so o produto acumulado dos elementos do vector x # Desvio padro amostral # Correlao amostral entre os vectores x e y # Nmero de elementos do vector x
# Quantil p # Combine multiple vectors together after converting them into character vectors #Randomly resample size number of elements from the x vector without replacement. Use the option replace=TRUE to sample with replacement. #Extract substrings from a character vector # Creates a factor out of a vector by slicing the vector x up into chunks. The option breaks is either a number indicating the number of cuts or else a vector of cut values # Lists the levels (in order) of a factor # Apply the function (FUN) to the vector (x) separately for each combination of the list of factors Classe das matrizes alguns aspetos
Funo
matrix(x, nrow = 5) matrix (x,c(5,2))
Descrio
#Matriz com 5 linhas formadas com os elementos do vetor x distribudos por coluna, em 5 linhas. Com a opo ncol=2, distribuiu os valores de x por coluna, em duas colunas. Por defeito a matriz preenchida por coluna. Se se pretender que seja preenchida por linha: matrix(x, nrow = 5, byrow=T). #Atribui nomes s colunas ou s linhas com os elementos de um vetor de strings. Ex: colnames(MX)<-c("A",B,C,D,E), rownames(MX)<-LETTERS[1:5] #Combina dois ou mais vetores em linha ou colunas. # D a dimenso da matriz X # D a transposta da matriz # D a matriz produto de X e Y (produto cruzado). Note-se que X*Y d a matriz que resulta de multiplicar os elementos aij por bij (elementos homlogos das matrizes) e no a matriz produto. # D o determinante de uma matriz # D a soluo da equao matricial A%*%X=B, sendo A uma matriz no singular. No caso particular de B ser a matriz identidade, X a matriz inversa de A. # D a matriz inversa generalizada Moore- Penrose de uma matriz (invertvel ou no). Note-se que esta matriz apenas coincide com a matriz que multiplicada direita e esquerda pela matriz inicial X a identidade se X for no singular. (pacote MASS)
colnames(MX) ou rownames(MX)
det(matriz) solve(matrizA,matrizB)
ginv(matriz)
Conceio Leal
Computao Estatstica II Actividade 2 summary(X) # Extrai a informao sobre todas as colunas da matriz: mnimo e mximo, mdia, e quartis. A funo aplicada transposta da matriz d o mesmo por linhas. # d o sumrio do vetor formado por todos os elementos da m #D a soma de todos os elementos de cada coluna ou linha
attach(nomedataframe)
fix(nomedataframe)
Grficos
Funo Descrio
plot(x)
plot(~x)
plot(x,y)
plot(y~expr)
# if x is a numeric vector this form of the plot() function produces a time series plot, a plot of x against index numbers. >plot(X) # if x is a numeric vector this form of the plot() function produces a stripchart for x. The same could be achieved with the stripplot() function. The ~ indicates a formula in which the left side is modeled against the right. >plot(~x) # if x and y are numeric vectors this form of the plot() function produces a scatterplot of y against x. >plot(X,Y) #if y is a numeric vector and expr is an expression, this form of the plot() function plots y against each vector in the expression. > plot(Y ~ X)
Conceio Leal
plot(fact)
plot(fact, dv)
plot(dv~fact)
# Grficos de disperso de matrizes de variveis ou frmulas (duas a duas) # Diagrama de extremos e quartis (Caixa de Bigodes para um vetor ou frmula, vertical ( por defeito) ou horizontal) # Histograma de frequncias (absolutas ou relativas)do vetor x. A opo breaks especifica como e quantas classes so construdas podendo ser atravs de um nmero ou de um vetor de pontos de quebra. # Diagrama de caule e folhas. # Grfico circular # adiciona uma reta de regresso linear de um modelo ajustado. # Grfico de probabilidade normal # Reta que, com o grfico anterior, permite analisar o ajustamento de um conjunto de dados a uma distribuio normal (anlise de resduos) # Curva de ajustamento a uma distribuio emprica.
Type (plot) Descrio type="p" #Pontos type="l" # linhas type="b" # Pontos e linhas type="o" # Pontos sobre as linhas type="h" # Histograms type="s" # Degraus type="n" # Sem pontos *Nota: O parmetro log indica se ou quais os eixos devem ser representados em escala logartmica
Conceio Leal
Descrio
# The type of line. Specified as either a single integer in the range of 1 to 6 (for predefined line types) or as a string of 2 or 4 numbers that define the relative lengths of dashes and spaces within a repeated sequence: lty=1 lty=2 lty=3 lty=4 lty=5 lty=6 lty=7 lwd=1234 lwd=9111 # The thickness of a line as a multiple of the default thickness (which is device specific) lwd=0.5 lwd=0.75 lwd=1 lwd=2 lwd=4
lwd
TRANSFORMAES NOS DADOS Uma grande parte das ferramentas da inferncia paramtrica assenta no pressuposto da distribuio normal dos dados. Quando este pressuposto no verificado, pode usar-se transformaes de escala dos dados. O objectivo da transformao de escala ento o de normalizar os dados de modo a satisfazer os pressupostos subjacentes a uma anlise estatstica. Como tal, possvel aplicar qualquer funo aos dados. No entanto, certos tipos de dados respondem mais favoravelmente a determinadas transformaes, dado as suas caractersticas. As transformaes mais comuns so as que constam da tabela seguinte:
Common data transformations. Natureza dos dados Transformao loge log10 log10 log x + 1 arcsin R syntax
log(x) log(x, 10) log10(x) log(x+1) sqrt(x) asin(sqrt(x))*180/pi
Nota: x is the name of the vector (variable) whose values are to be transformed.
Conceio Leal
R syntax
mean(X) mean(X, trim=0.05)
Winsorized mean
Median Minimum, maximum Estimates of Spread Variance (2) Standard deviation () Median absolute deviation
the number of values (n) #The arithmetic mean calculated after a fraction (typically 0.05 or 5%) of the lower and upper values have been discarded #The arithmetic mean is calculated after the trimmed values are replaced by the upper and lower trimmed quantiles #The middle value #Smallest and largest values
#Average deviation of observations
from the mean #Square-root of variance #The median difference of observations from the median value #Difference between the 75% and 25% ranked observations #Precision of the estimate y y
IQR(X)
sd(X)/sqrt(length(X)) library(gmodels)
95% confidence interval of #Interval with a 95% probability of containing the true mean
ci(X) NOTA:Only L-estimators are provided. L-estimators are linear combinations of weighted statistics on ordered values. M-estimators (of which maximum likelihood is an example) are calculated as the minimum of some function(s).
Conceio Leal
TESTES DE HIPTESES
Testes de hipteses paramtricos verificam-se os pressupostos de normalidade e homogeneidade de # Perform one-sample t-test > t.test(DV, dataset) # Perform (separate variances) independent-sample t-test one-tailed (H > B) > t.test(DV ~ FACTOR, dataset, alternative = "greater") two-tailed (H0 : A = B) > t.test(DV ~ FACTOR, dataset) For pooled variances t-tests, include the var.equal=T argument # Perform (separate variances) paired t-test one-tailed (H0 : A > B) > t.test(DV1, DV2, dataset, alternative = "greater") > t.test(DV ~ FACTOR, dataset, alternative = "greater",paired = T) two-tailed (H0 : A = B) > t.test(DV1, DV2, dataset) > t.test(DV ~ FACTOR, dataset, paired = T) for pooled variances t-tests, include the var.equal=T argument.
Nota: Quando no se verificam os pressupostos pode tentar-se a transformao dos dados.
Observaes independentes ou emparelhadas, no homogeneidade de varincias (Wilcoxon rank sum nonparametric test) # Perform one-sample Wilcoxon (rank sum) test > wilcox.test(DV, dataset) #Perform independent-sample Mann-Whitney Wilcoxon test one-tailed (H0 : A >) > wilcox.test(DV ~ FACTOR, dataset, alternative = "greater") two-tailed (H0 :A = B) > wilcox.test(DV ~ FACTOR, dataset) #Perform paired Wilcoxon (signed rank) test one-tailed (H0 : A > B) > wilcox.test(DV1,DV2, dataset, alternative="greater") > #OR for long format > wilcox.test(DV~FACTOR, dataset, alternative="greater", paired=T) two-tailed (H0 : A = B) > wilcox.test(DV1, DV2, dataset) > wilcox.test(DV ~ FACTOR, dataset, paired = T)
Adaptado de Logan, Murray (2010), Biostatistical Design and Analysis Using R , A Practical Guide, John Wiley & Sons, Inc., Publication Conceio Leal