Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A statistical analysis tool that separates the total variability found within a data set into two components: random and systematic factors. The random factors do not have any statistical influence on the given data set, while the systematic factors do. The ANOVA test is used to determine the impact independent variables have on the dependent variable in a regression analysis.
Analysis of variance
From Wikipedia, the free encyclopedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups). In ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance.
Contents
[hide]
o o o o o o o o o o
1 Motivating example 2 Background and terminology 2.1 Design-of-experiments terms 3 Classes of models 3.1 Fixed-effects models 3.2 Random-effects models 3.3 Mixed-effects models 4 Assumptions of ANOVA 4.1 Textbook analysis using a normal distribution 4.2 Randomization-based analysis 4.2.1 Unit-treatment additivity 4.2.2 Derived linear model 4.2.3 Statistical models for observational data 4.3 Summary of assumptions 5 Characteristics of ANOVA 6 Logic of ANOVA 6.1 Partitioning of the sum of squares 6.2 The F-test 6.3 Extended logic 7 ANOVA for a single factor 8 ANOVA for multiple factors
o o
9 Worked numeric examples 10 Associated analysis 10.1 Preparatory analysis 10.1.1 The number of experimental units 10.1.2 Power analysis 10.1.3 Effect size 10.2 Followup analysis 10.2.1 Model confirmation 10.2.2 Follow-up tests 11 Study designs and ANOVAs 12 ANOVA cautions 13 Generalizations 14 History 15 See also 16 Footnotes 17 Notes 18 References 19 Further reading 20 External links
No fit at all
Fair fit
The analysis of variance can be used as an exploratory tool to explain observations. A dog show provides an example. A dog show is not a random sampling of the breed. It is typically limited to dogs that are male, adult, pure-bred and exemplary. A histogram of dog weights from a show might plausibly be rather complex, like the yellow-orange distribution shown in the illustrations. An attempt to explain the weight distribution by dividing the dog population into groups (young vs old)(short-haired vs long-haired) would probably be a failure (no fit at all). The groups (shown in blue) have a large variance and the means are very close. An attempt to explain the weight distribution by (pet vs working breed)(less athletic vs more athletic) would probably be somewhat more successful (fair fit). The heaviest show dogs are likely to be big strong working breeds. An attempt to explain weight by breed is likely to produce a very good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in weights between Setters and Pointers does not justify separate breeds. The analysis of variance provides the formal tools to justify these intuitive judgments. A common use of the method is the analysis of experimental data or the development of models. The method has some advantages over correlation: not all of the data must be numeric and one result of the method is a judgment in the confidence in an explanatory relationship.
ANOVA is a particular form of statistical hypothesis testing heavily used in the analysis of experimental data. A statistical hypothesis test is a method of making decisions using data. A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis. A statistically significant result (when a probability (p-value) is less than a threshold (significance level)) justifies the rejection of the null hypothesis. In the typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. This implies that all treatments have the same effect (perhaps none). Rejecting the null hypothesis implies that different treatments result in altered effects. By construction, hypothesis testing limits the rate of Type I errors (false positives leading to false scientific claims) to a significance level. Experimenters also wish to limit Type II errors (false negatives resulting in missed scientific discoveries). The Type II error rate is a function of several things including sample size (positively correlated with experiment cost), significance level (when the standard of proof is high, the chances of overlooking a discovery are also high) and effect size (when the effect is obvious to the casual observer, Type II error rates are low). The terminology of ANOVA is largely from the statistical design of experiments. The experimenter adjusts factors and measures responses in an attempt to determine an effect. Factors are assigned to experimental units by a combination of randomization and blocking to ensure the validity of the results. Blinding keeps the weighing impartial. Responses show a variability that is partially the result of the effect and is partially random error. ANOVA is the synthesis of several ideas and it is used for multiple purposes. As a consequence, it is difficult to define concisely or precisely. "Classical ANOVA for balanced data does three things at once: 1. As exploratory data analysis, an ANOVA is an organization of an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model). 2. 3. Comparisons of mean squares, along with F-tests ... allow testing of a nested sequence of models. Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors." [1] In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data. Additionally: 4. It is computationally elegant and relatively robust against violations to its assumptions.
5. 6.
ANOVA provides industrial strength (multiple sample comparison) statistical analysis. It has been adapted to the analysis of a variety of experimental designs. As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical technique in psychological research."[2] ANOVA "is probably the most usefultechnique in the field of statistical inference."[3] ANOVA is difficult to teach, particularly for complex experiments, with split-plot designs being notorious.[4] In some cases the proper application of the method is best determined by problem pattern recognition followed by the consultation of a classic authoritative test.[5]
Error that occurs when the analysis omits one or more important terms or factors from the process model. Including replication in a DOE allows separation of experimental error into its components: lack of fit and random (pure) error. Model Mathematical relationship which relates changes in a given response to changes in one or more factors. Random error Error that occurs due to natural variation in the process. Random error is typically assumed to be normally distributed with zero mean and a constant variance. Random error is also called experimental error. Randomization A schedule for allocating treatment material and for conducting treatment combinations in a DOE such that the conditions in one run neither depend on the conditions of the previous run nor predict the conditions in the subsequent runs.[nb 1] Replication Performing the same treatment combination more than once. Including replication allows an estimate of the random error independent of any lack of fit error. Responses The output(s) of a process. Sometimes called dependent variable(s). Treatment A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.
and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.[8]
Independence of observations this is an assumption of the model that simplifies the statistical analysis. Normality the distributions of the residuals are normal. Equality (or "homogeneity") of variances, called homoscedasticity the variance of data in groups should be the same. The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors ( 's) are independent and
In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald A. Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University.[14] Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.[citation needed]
from
and the
The assumption of unit-treatment additivity implies that, for every treatment , the th treatment have exactly the same effect on every experiment unit.
The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant. The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.
constant from all observations (when equivalent to dropping leading digits) to simplify data entry.[27][28] This is an example of data coding.
called the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means. The fundamental technique is a partitioning of the total sum of squares SS into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.
The number of degrees of freedom DF can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.
because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution. The expected value of F is (where n is the treatment sample size) which is 1
for no treatment effect. As values of F increase above 1 the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls. The textbook method of concluding the hypothesis test is to compare the observed value of F with the critical value of F determined from tables. The critical value of F is a function of the numerator degrees of freedom, the denominator degrees of freedom and the significance level (). If F FCritical (Numerator DF, Denominator DF, ) then reject the null hypothesis. The computer method calculates the probability (p-value) of a value of F greater than or equal to the observed value. The null hypothesis is rejected if this probability is less than or equal to the significance level (). The two methods produce the same result. The ANOVA F-test is known to be nearly optimal in the sense of minimizing false negative errors for a fixed rate of false positive errors (maximizing power for a fixed significance level). To test the hypothesis that all treatments have exactly the same effect, the F-test's p-values closely approximate the permutation test's pvalues: The approximation is particularly close when the design is balanced.[19][30] Such permutation tests characterize tests with maximum power against all alternative hypotheses, as observed by Rosenbaum.[nb
3]
The ANOVA Ftest (of the null-hypothesis that all treatments have exactly the same effect) is recommended
modelsnot as an alternative to regression but as a tool for summarizing complex high-dimensional inferences ..."[32]
Several standardized measures of effect have been proposed for ANOVA to summarize the strength of the association between a predictor(s) and the dependent variable (e.g., 2, 2, or 2) or the overall standardized difference () of the complete model. Standardized effect-size estimates facilitate comparison of findings across studies and disciplines. However, while standardized effect sizes are commonly used in much of the professional literature, a non-standardized measure of effect size that has immediately "meaningful" units may be preferable for reporting purposes.[44]
Following ANOVA with pair-wise multiple-comparison tests has been criticized on several grounds.[44][48] There are many such tests (10 in one table) and recommendations regarding their use are vague or conflicting.[49][50]
One-way ANOVA is used to test for differences among two or more independent groups (means),e.g. different levels of urea application in a crop. Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test.[52] When there are only two means to compare, the t-test and the ANOVA F-testare equivalent; the relation between ANOVA and t is given by F = t2.
Factorial ANOVA is used when the experimenter wants to study the interaction effects among the treatments. Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a longitudinal study).
Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.
While ANOVA is conservative (in maintaining a significance level) against multiple comparisons in one dimension, it is not conservative against comparisons in multiple dimensions. [55]
Purpose The reason for doing an ANOVA is to see if there is any difference between groups on some variable. For example, you might have data on student performance in non-assessed tutorial exercises as well as their final grading. You are interested in seeing if tutorial performance is related to final grade. ANOVA allows you to break up the group according to the grade and then see if performance is different across these grades. ANOVA is available for both parametric (score data) and non-parametric (ranking/ordering) data.
Types of ANOVA One-way between groups The example given above is called a one-way between groups model. You are looking at the differences between the groups. There is only one grouping (final grade) which you are using to define the groups. This is the simplest version of ANOVA. This type of ANOVA can also be used to compare variables between different groups - tutorial performance from different intakes. One-way repeated measures A one way repeated measures ANOVA is used when you have a single group on which you have measured something a few times.
For example, you may have a test of understanding of Classes. You give this test at the beginning of the topic, at the end of the topic and then at the end of the subject. You would use a one-way repeated measures ANOVA to see if student performance on the test changed over time. Two-way between groups A two-way between groups ANOVA is used to look at complex groupings. For example, the grades by tutorial analysis could be extended to see if overseas students performed differently to local students. What you would have from this form of ANOVA is: The effect of final grade The effect of overseas versus local The interaction between final grade and overseas/local Each of the main effects are one-way tests. The interaction effect is simply asking "is there any significant difference in performance when you take final grade and overseas/local acting together". Two-way repeated measures This version of ANOVA simple uses the repeated measures structure and includes an interaction effect. In the example given for one-way between groups, you could add Gender and see if there was any joint effect of gender and time of testing - i.e. do males and females differ in the amount they remember/absorb over time.
Non-parametric and Parametric ANOVA is available for score or interval data as parametric ANOVA. This is the type of ANOVA you do from the standard menu options in a statistical package.
The non-parametric version is usually found under the heading "Nonparametric test". It is used when you have rank or ordered data. You cannot use parametric ANOVA when you data is below interval measurement. Where you have categorical data you do not have an ANOVA method - you would have to use Chi-square which is about interaction rather than about differences between groups. How its done What ANOVA looks at is the way groups differ internally versus what the difference is between them. To take the above example: 1. ANOVA calculates the mean for each of the final grading groups (HD, D, Cr, P, N) on the tutorial exercise figure - the Group Means. 2. It calculates the mean for all the groups combined - the Overall Mean. 3. Then it calculates, within each group, the total deviation of each individual's score from the Group Mean - Within Group Variation. 4. Next, it calculates the deviation of each Group Mean from the Overall Mean - Between Group Variation. 5. Finally, ANOVA produces the F statistic which is the ratio Between Group Variation to the Within Group Variation. If the Between Group Variation is significantly greater than the Within Group Variation, then it is likely that there is a statistically significant difference between the groups. The statistical package will tell you if the F ratio is significant or not. All versions of ANOVA follow these basic principles but the sources of Variation get more complex as the number of groups and the interaction effects increase.
Comments It is unlikely that you would do an analysis of variance by hand. Except for small data sets it is very time consuming.
Available Software SPSS The ANOVA routines in SPSS are OK for simple one-way analyses. Anything more complicated gets difficult. All statistical packages (SAS, Minitab etc.) provide for ANOVA. Excel Excel allows you to ANOVA from the Data Analysis Add-on. The instructions are not good.
The corresponding mean squares are: MST = SST / DFT MSE = SSE / DFE The F-test The test statistic, used in testing the equality of treatment means is: F = MST / MSE. The critical value is the tabular value of the F distribution, based on the chosen level and the degrees of freedom DFT and DFE. The calculations are displayed in an ANOVA table, as follows: ANOVA table Source Treatments Error Total (corrected) SS SST SSE SS DF k-1 N-k N-1 MS F SST / (k-1) MST/MSE SSE / (N-k)
The word "source" stands for source of variation. Some authors prefer to use "between" and "within" instead of "treatments" and "error", respectively. ANOVA Table Example A numerical example The data below resulted from measuring the difference in resistance resulting from subjecting identical resistors to three different temperatures for a period of 24 hours. The sample size of each group was 5. In the language of Design of Experiments, we have an experiment in which each of three treatments was replicated 5 times. Level 1 6.9 5.4 5.8 4.6 4.0 Level 2 8.3 6.8 7.8 9.2 6.5 Level 3 8.0 10.5 8.1 6.9 9.3
means
5.34
7.72
8.56
The resulting ANOVA table is Example ANOVA table Source Treatments Error SS 27.897 17.452 DF 2 12 14 1 MS 13.949 1.454 F 9.59
Total (corrected) 45.349 Correction Factor 779.041 Interpretation of the ANOVA table
The test statistic is the F value of 9.59. Using an of .05, we have that F.05; 2, 12 = 3.89 (see the F distribution table in Chapter 1). Since the test statistic is much larger than the critical value, we reject the null hypothesis of equal population means and conclude that there is a (statistically) significant difference among the population means. The pvalue for 9.59 is .00325, so the test statistic is significant at that level. The populations here are resistor readings while operating under the three different temperatures. What we do not know at this point is whether the three means are all different or which of the three means is different from the other two, and by how much. There are several techniques we might use to further analyze the differences. These are:
constructing confidence intervals around the difference of two means, estimating combinations of factor levels with confidence bounds multiple comparisons of combinations of factor levels tested simultaneously.