Sei sulla pagina 1di 5

R Cheats:

Create a Vector:
vector.name=c(1,2,3,4)
other.vector=c(5,6,7,8)
Finding Difference Between Means of Two Vectors:
mean(vector.name-other.vector) OR mean(vector.name)-mean(other.vector)
Creating a Data Frame:
data.frame(Vertical.Name, Horizontal.Name)
Standard Deviation: amount of variation within a data set
sd(vector.name)
Median:
median(vector.name) **median is equal to 50% quantile
Mean: average
mean(vector.name)
Quantile:
quantile(vector.name)
max(vector.name) **is equal to the 100% quantile**
min(vector.name) **is equal to the 0% quantile**
*****Contingency Tables: frequency of observations in 1+ categories
table()
Create a Histogram:
hist(vector.name)
hist(vector.name, main='Title of Histogram You Want',xlab='Title for X Axis',yla
b='Title for YAxis')
Create a Bar Plot:
barplot(vector.name,main='Name of Bar Plot')
Bar Plot with Colours, Labels and Axis Readjustment:
barplot(vector.name, ylim(0,100), col='red', main='Title of Bar Plot', xlab='Tit
le of X Axis', ylab='Title of Y Axis')
Create a bar plot of the nectar volume, color it red and add a y-axis label.
Provide the R code used to create the bar plot in the space below
ANSWER: barplot(data$nectar,col='red',ylab="Nectar Volume (ul)")
****Box Plots:
Create a box plot of pollen counts separated by each species. Add both an
x-axis and y-axis label, and make each species a different color. Provide the
R code used to create the box plot in the space below.
ANSWER: boxplot(data$pollen~data$species, xlab="Species",
ylab="Pollen Grains",col=c('blue','green','orange'))
Create a Scatterplot with Labels:
plot(Vertical.Name,Horizontal.Name, ylab="Vertical Name I Want", xlab="Horizonta
l Name I Want")
plot(data$pollen,data$nectar,ylab="Nectar Volume
(ul)",xlab='Pollen Grains',main="Relationship Between Pollen and
Nectar",xlim=c(0,120),ylim=c(0,55))
Inserting File: my.dat=read.csv(file.choose())
variance: var()
#first quartile: 25%, second quartile: 50%, third quartile: 75%, forth quartile:
100%
==========================================================
Bio243 Tut:
6ax@queensu.ca
Lougheed lab room 4430 floor 4
command enter on script
[row,column]
importing excel work sheet.
-save file as .csv
data=read.csv
What is the standard deviation: weevil$flight 2.64
What is the first, second, third quantile of weevil... 3.14772
interquartile range....
1.45
mean weevil flight time: 12.49633
Histogram of Weevil Flight Time (Sec)
QUIZ ANSWERS:
Interquartile range: 1.456325
First quartile: 1.687447
Third quartile: 3.143772
Median weevil mass: 2.47572
which gender has the greatest amount of variation: a) the females have the grea
test variation
b) r code: boxplot(weevils$mass~weevils$gender,xlab="Gender",ylab="Mass(mg)",ma
in="Mass(mg) of Weevils by Gender")
scatter plot:
a) does flight time increase or dec as mass of individual increases? the flight
time increases as the mass of the indiviual increases
b) plot(weevils$mass,weevils$flight,xlab="Mass(mg)",ylab="Flight Time(sec)",main
="Mass(mg) of Weevils by Flight Time(sec)")
Histogram:
hist(weevils$flight,xlab="Flight Time(sec)",ylab="Frequency",main="Histogram of
Weevils Flight Time(sec)",xlim=c(5,20))
=============================================================
the following is my R code (cause I am doing trees from victoria park, city park
and breakwater park):
#my vectors are written as (leaves are present, leaves are absent)
#so you literally have to count how many there are in each category of your cate
gorical data.
Victoria=c(10,10)
#this means in my data, from victoria park, I collected data 10 trees with leave
s and 10 trees without leaves
City=c(14,6)
#blah blah blah
Breakwater=c(9,11)
#im assuming you get the point of my three vectors
#NOW MAKE A DATA FRAME
dat=data.frame('Victoria Park'=Victoria,'City Park'=City,'Breakwater Park'=Break
water)
dat
#NOW DO CHI-SQUARED TEST
chisq.test(dat)
#USE YOUR P-VALUE TO REJECT OR FAIL TO REJECT HYPOTHESIS
====================================================
"Measuring Centre of Data" ; where values are
mean= sensitive to changes (add all divide by num)
median= data point in the middle when data in order
mode= most common occurring
"Measures of Variation"; how spread out the values are (more spread out, less co
nsistent) more variation=not as confident
range: difference between max and min
-max, min, 3 quartiles
Variance: average distance from each point to the middle (middle is the mean) ((
x-x)^2)/(n-1)
Standard deviation (sq rt of variance)
"Measures of Relationship"; measure strength and nature
strength: correlation(r) correlation is always between -1 and 1. r<0: negative
(plot slopes down to right), r>0: postiive (plot slopes up to the right) r=0 wea
k correlation
nature: line of best fit
Probability:
Probability of an event is the proportion of times it occurs
this is the relative frequency approach
outcomes: recordable observations
sample space: set of all outcomes
events: combination of outcomes in sample space
Pr(A)=(number of outcomes in A)/(number of outcomes in sample space)
Pr(A or B)=P(A)+P(B)-P(A and B)
conditional probability of A, given that event B occurs:
Pr(A|B)= (P(A and B))/ P(B)
Bayes Theorem: used when you want to switch conditions around
- say you already the conditional probability of b given a but you want a given
b: have P(B|A) want P(A|B)
P(A|B)=....
Probability Distributions:
Discrete random variables
Random variable X can assume a value for every outcome in a sample space
X is discrete if all outcomes can be put in a list of separate items
Probability distribution is a function that assigns a probability to each outcom
e
===================================================
# qt-critical t valueqt(0.95,19) 19 is the degree of freedom, qt(0.95,19)qt(0.05,19
)qt(0.975,19)#qchisq-chi sqaure valueqchisq(0.95,19)#use chi sqaured test if you're
testing for independence between two variables. alternative hypothesis-they'not r
elatedred=c(2,26)pale=c(14,17)colour.data=data.frame(red,pale)colour.datachisq.test(c
olour.data)#in this case it's a one tailed test because we would expect more red
than pale. we have a higher expectationfor red which makes it more one direction
#use number of categories as degrees of freedom qchisq(0.95,2)#our tcritcal is 5.9
9, our observed balue is 8.9214, our observed is greater than out tcritical so w
e failto reject the null hypothesis.#the number beside x-sqaured is our observed v
alue daphnia=read.csv(file.choose())daphnia#has the number of daphnia increased com
pared to the year before which was 71?hist(daphnia$count)head(daphnia)#formula for
a one-sample-t-test: glm(y-x~1), y is our data, x is the known value, 1 tells r
that we're not comparing any dataone.sample=glm(daphnia$count-71~1)summary(one.samp
le)qt(0.95,23)#two sampled t-test #paired t-test is used when the subject is the sa
me for two treatments,#formula: glm(y~1), y is our data #we want to know if the de
pth's are different for the two different treatments (light and dark)diff=daphnia
$light[1:12]-daphnia$dark[1:12]paired=glm(diff~1)summary (paired)#our p-value is le
ss than 0.5, so we reject the null hypothesis p<0.05 or if t-observed>tcritical y
ou would reject the null hypothesis, null hypothesis is always there's no differ
ence but alternativehypothesis is that there is a difference. qt(0.975, 11)#the ans
wer for the function above is what you would find from the table #independent t-t
est-two groups that are independent of each other- red and pale are independent o
f one another #i want to know if the hemoglobin content is different between the
red and the pale #null hypothesis-there's no difference between the color content
of the red and the pale #formula: glm(y~x)two.sample=glm(daphnia$hemo~daphnia$col
or)#hemo=hemoglobin content and color is the color of the daphnia summary(two.samp
le)#our p-value is less than 0.5, so we reject the null hypothesis p<0.05 or if t
-observed>tcritical you would reject the null hypothesis, null hypothesis is alwa
ys there's no difference but alternativehypothesis is that there is a difference.
mpale=mean(daphnia$hemo[13:24])mred=mean(daphnia$hemo[1:12])mpalemredmred-mpale
dsc.fundraising@gmail.com
slogan ideas
my.data=read.csv(file.choose())
my.data
my.datatwo=read.csv(file.choose())
my.datatwo
mean(my.data$DensityA1998)
mean(my.data$DensityA1999)
mean(my.data$DensityB1999)
diff=(my.data$DensityA1999-my.data$DensityA1998)
diff
paired=glm(diff~1)
summary(paired)
qt(.975,49)
qt(.95,49)
independent=glm(my.datatwo$Density~my.datatwo$Region)
independent
summary(independent)
qt(.95,98)
qt(.975,98)
=========================================

Potrebbero piacerti anche