Sei sulla pagina 1di 6

ENVR 210: ANOVA in R

Part 1: Review Basics of ANOVA


General
Extension of a t-test, where were testing means among groups
o Factor = categories of treatment
o Replicates = observations within each treatment
Null hypothesis: Any variation among the means of the levels of the
factor is due to random variation
Partitioning the sum of squares
F-statistic (or F-ratio):
o Mean sum of squares = Sum of squares/degrees of freedom
o F-statistic = MS of factors / MS of unexplained
o Derive p-value from where the F-statistic falls on the Fdistribution. This is why large F-statistics imply a smaller p-value
(further out in the tail of the F-distribution)
Assumptions of ANOVA
o Samples independent and identically distributed
o Variance among groups is homogeneous
o Residuals are normally distributed
o Samples are classified correctly
ANOVA is robust for slight deviations from the assumptions
One-Way ANOVA
Purpose: Compare means among levels of factor
Two-Way ANOVA
Purpose: Compare means among levels of factors (main effects) and
the interaction among factors (interaction effects)
Interaction effects are often the goal of a research study because it
identifies responses not due to the main effects
Part 2: ANOVA One-Way
Lets work through the sponge growth rate study from the book. The
hypothesis of the study is that root growth rate (response, or y-variable) of
mangrove trees is higher when sponges (factor, or type of treatment) grow on
their roots.
Open the Sponge Data for Lab document on the website and copy/paste
the data into R. Lets characterize the sponges dataset and examine the first
few observations.
dim(sponges)
summary(sponges)
head(sponges)

For this section on One-Way ANOVA, lets focus on how treatment influences
root growth rate. Our first step is to produce a summary table of the number
of observations, mean and standard deviation of root growth for each of the 4
treatments.
mean_growth = by ( sponges$RootGrowthRate.mm.d. ,
sponges$Treatment , mean )
sd_growth = by ( sponges$RootGrowthRate.mm.d. ,
sponges$Treatment , sd )
N_obs = by ( sponges$RootGrowthRate.mm.d. , sponges$Treatment ,
length )
cbind ( N_obs , mean_growth , sd_growth )

The means among levels look different, but are they statistically significantly
different?
sponge_anova = aov ( sponges$RootGrowthRate.mm.d. ~
sponges$Treatment )
summary(sponge_anova)

We reject the null hypothesis that there is no difference among the treatment
means with 95% confidence (p = 0.0004), and conclude that treatment
significantly affects root growth rate.
Since we have a significant result, lets visualize the confidence intervals.
install.packages("gplots")
library(gplots)
plotmeans(sponges$RootGrowthRate.mm.d. ~ sponges$Treatment)

We see that only 2 treatments (Control, Tedania) dont overlap. The rest
overlap to some degree.
ANOVA tells us if theres a difference among the means for the levels of the
factor. To determine which treatments are different, we use Tukeys HSD.
TukeyHSD(sponge_anova)

Each one of these contrasts is essentially a t-test for the indicated levels of
factors. The first line can be read: the mean growth rate of the Foam
treatment was 0.35 mm greater than the mean growth rate of the Control

treatment. The p-value for this comparison is 0.076, and so the difference is
not significant at the 95% level.
What can we conclude from Tukeys HSD? At the 5% level:
1. Mangrove root growth rate increases when sponges (Haliclona
p=0.009 and Tedania p=0.0003) are attached
2. Mangrove root growth rate does not increase when foam is
attached (p = 0.077)
3. No significant difference between treatments (each treatment
contrast those without Control - has p > 0.05)
Part 3: ANOVA Two-Way Factorial (with Interaction Effects)
We also recorded the location for each of our plots. Each location was
measured by a different technician does this affect our results? Lets take a
look.
First, lets examine the number of samples and mean root growth rate for
each treatment and location.
table( sponges$Location , sponges$Treatment )

(R code omitted)

It is difficult to extract much meaning from this table of means. We see a


range of values for all factors, but as we recall from our previous labs, we
need a statistical test to tell us if there is significant difference among them.

Lets perform an ANOVA analysis to determine which effects (main and their
interaction) are statistically significant. To do this, we add Location to our
ANOVA model. Note that were using a * between the factors in the model
definition below. The reason for this is because we want to analyze the
interaction effects of the two factors.
sponge_anova2=aov ( sponges$RootGrowthRate.mm.d. ~
sponges$Treatment * sponges$Location )
summary(sponge_anova2)

Our conclusions are:


Main effects
o Treatment affects root growth (p-value 0.000759 < 0.05)
o Location does not affect root growth (p-value 0.322 > 0.05)
Interaction effects
o Treatment and Location do not have a significant interaction
effect (p-value 0.859 > 0.05)
Location has an insignificant main effect what do you think a plot of
confidence intervals for each level of Location looks like?
plotmeans(sponges$RootGrowthRate.mm.d. ~ sponges$Location)

Now lets graph the interaction effects between Location and Treatment.
interaction.plot ( sponges$Treatment , sponges$Location ,
sponges$RootGrowthRate.mm.d. )

Components of this graph (in order of arguments passed to interaction.plot):


1) The x-axis is each level of Treatment
2) Each line corresponds to each level of Location, legend is on the left
3) The y-axis is the response variable (root growth)
If there were a significant interaction effect, you would see all of the lines
would abruptly jump to a different mean response value. The lines in this
chart do not demonstrate any consistent jumps.

Potrebbero piacerti anche