Sei sulla pagina 1di 26

ANOVA

(Analysis of Variance) Instructor Dr. Syed Amir Iqbal

Introduction
Statistically based experimental design techniques are particularly useful in the engineering world for solving many important problems: discovery of new basic phenomena that can lead to new products, and commercialization of new technology including new product development, new process development, and improvement of existing products and processes. Most processes can be described in terms of several controllable variables, such as temperature, pressure, and feed rate. By using designed experiments, engineers can determine which subset of the process variables has the greatest influence on process performance. The results of such an experiment can lead to:
Improved process yield Reduced variability in the process and closer conformance to nominal or target requirements Reduced design and development time Reduced cost of operation

Introduction
Some typical applications of statistically designed experiments in engineering design include:
Evaluation and comparison of basic design configurations Evaluation of different materials Selection of design parameters so that the product will work well under a wide variety of field conditions (or so that the design will be robust) Determination of key product design parameters that affect product performance

Designed experiments are usually employed sequentially. That is, the first experiment with a complex system (perhaps a manufacturing process) that has many controllable variables is often a screening experiment designed to determine which variables are most important. Subsequent experiments are used to refine this information and determine which adjustments to these critical variables are required to improve the process.

Introduction
Every experiment involves a sequence of activities:
Conjecturethe original hypothesis that motivates the experiment. Experimentthe test performed to investigate the conjecture. Analysisthe statistical analysis of the data from the experiment. Conclusionwhat has been learned about the original conjecture from the experiment. Often the experiment will lead to a revised conjecture, and a new experiment, and so forth.

COMPLETELY RANDOMIZED SINGLE-FACTOR EXPERIMENT


Example: Tensile Strength A manufacturer of paper used for making grocery bags is interested in improving the tensile strength of the product. Product engineering thinks that tensile strength is a function of the hardwood concentration in the pulp and that the range of hardwood concentrations of practical interest is between 5 and 20%. A team of engineers responsible for the study decides to investigate four levels of hardwood concentration: 5%, 10%, 15%, and 20%. They decide to make up six test specimens at each concentration level, using a pilot plant. All 24 specimens are tested on a laboratory tensile tester, in random order. The data from this experiment are shown in Table

COMPLETELY RANDOMIZED SINGLE-FACTOR EXPERIMENT


Example: Tensile Strength

This is an example of a completely randomized single-factor experiment with four levels of the factor. The levels of the factor are sometimes called treatments, and each treatment has six observations or replicates. The role of randomization in this experiment is extremely important because the order of the 24 runs, the effect of any nuisance variable that may influence the observed tensile strength is approximately balanced out.

Analysis of Variance
Suppose we have a different levels of a single factor that we wish to compare. Sometimes, each factor level is called a treatment, a very general term that can be traced to the early applications of experimental design methodology in the agricultural sciences. The response for each of the a treatments is a random variable. The observed data would appear as shown in Table. An entry in Table, say yij, represents the jth observation taken under treatment i. We initially consider the case in which there are an equal number of observations, n, on each treatment.

Analysis of Variance
We may describe the observations in Table by the linear statistical model.

where Yij is a random variable denoting the (ij)th observation, is a parameter common to all treatments called the overall mean, i is a parameter associated with the ith treatment called the ith treatment effect, and ij is a random error component. We will assume that the errors ij are normally and independently distributed with mean zero and variance 2. Therefore, each treatment can be thought of as a normal population with mean i and variance 2 Equation above is the underlying model for a single-factor experiment. Since we require that the observations are taken in random order and that the environment in which the treatments are used is as uniform as possible, this experimental design is called a completely randomized design (CRD).

Analysis of Variance

The a factor levels in the experiment could have been chosen in two different ways. First, the experimenter could have specifically chosen the a treatments. In this situation, we wish to test hypotheses about the treatment means, and conclusions cannot be extended to similar treatments that were not considered. In addition, we may wish to estimate the treatment effects. This is called the fixed-effects model. Alternatively, the a treatments could be a random sample from a larger population of treatments. In this situation, we would like to be able to extend the conclusions (which are based on the sample of treatments) to all treatments in the population, whether or not they were explicitly considered in the experiment. Here the treatment effects i are random variables, and knowledge about the particular ones investigated is relatively unimportant. Instead, we test hypotheses about the variability of the and try to estimate this variability. This is called the random effects, or components of variance, model. Here we discuss & develop the analysis of variance for the fixed-effects model.

Analysis of Variance
Let yi. represent the total of the observations under the ith treatment and I represent the average of the observations under the ith treatment. Similarly, let y.. represent the grand total of all observations and . . represent the grand mean of all observations. Expressed mathematically,

where N = an is the total number of observations. Thus, the dot subscript notation implies summation over the subscript that it replaces. We are interested in testing the equality of the a treatment means 1, 2, . . . , a. We are going to test the hypotheses

Total Variation, Variation Within Treatments, & Variation Between Treatments


Thus, if the null hypothesis is true, each observation consists of the overall mean plus a realization of the random error component ij . This is equivalent to saying that all N observations are taken from a normal distribution with mean and variance 2. Therefore, if the null hypothesis is true, changing the levels of the factor has no effect on the mean response. The ANOVA partitions the total variability in the sample data into two component parts. Then, the test of the hypothesis in Equation is based on a comparison of two independent estimates of the population variance. The total variability in the data is described by the total sum of squares

Total Variation, Variation Within Treatments, & Variation Between Treatments


The partition of the total sum of squares is given in the following definition.

The identity in Equation above, shows that the total variability in the data, measured by the total corrected sum of squares SST, can be partitioned into a sum of squares of differences between treatment means and the grand mean denoted SSTreatments and a sum of squares of differences of observations within a treatment from the treatment mean denoted SSE. Differences between observed treatment means and the grand mean measure the differences between treatments, while differences of observations within a treatment from the treatment mean can be due only to random error.

Total Variation, Variation Within Treatments, & Variation Between Treatments


We define the total variation, denoted by V, as the sum of the squares of the deviations of each measurement from the grand mean X

By writing the identity and then squaring and summing over j and k, we have

We call the first summation on the right-hand side of equations (5) and (6) the variation within treatments (since it involves the squares of the deviations of Xjk from the treatment means Xj.) and denote it by VW. Thus

Total Variation, Variation Within Treatments, & Variation Between Treatments


The second summation on the right-hand side of equations (5) and (6) is called the variation between treatments (since it involves the squares of the deviations of the various treatment means Xj. from the grand mean X) and is denoted by VB. Thus

Shortcut Methods for Obtaining Variations


To minimize the labor of computing the above variations, the following forms are convenient:

where T is the total of all values Xjk and where Tj. is the total of all values in the jth treatment:

Expected Values of the Variations


The expected values of VW, VB, and V are given by (for proofs for these, any standard text can be referred) = ( 1) 2 = 1 2 + 2 = 1 2 + 2 From these equations, it is inferred/follows that

The F Test for the Null Hypothesis Of Equal Means


If the null hypothesis H0 is not true (i.e., if the treatment means are not equal), that we can expect SB2 to be greater than 2, with the effect becoming more pronounced as the discrepancy between the means increases. On the other hand, we can expect ^ SW2 to be equal to 2 regardless of whether the means are equal. It follows that a good statistic for testing hypothesis H0 is provided by SB2= SW2 . If this statistic is significantly large, we can conclude that there is a significant difference between the treatment means and can thus reject H0; otherwise, we can either accept H0. The statistic F = SB2 / SW2 has the F distribution with a -1 and a(b-1) degrees of freedom.

Analysis-of-Variance tables
The calculations required for the above test are summarized in Table, which is called an ANALYSIS-OF-VARIANCE TABLE. In practice, we would compute V and VB by using either the long method or the short method and then by computing VW=V-VB. It should be noted that the degrees of freedom for the total variation (i.e., ab -1) are equal to the sum of the degrees of freedom for the between-treatments and within-treatments variations.

Example
Table shows the yields in bushels per acre of a certain variety of wheat grown in a particular type of soil treated with chemicals A, B, or C. Find (a) the mean yields for the different treatments, (b) the grand mean for all treatments, (c) the total variation, (d) the variation between treatments, and (e) the variation within treatments.

(a) The treatment (row) means for Table are given, respectively, by: Thus the mean yields, obtained by adding 45 to these, are 49, 48, and 50 bushels per acre for A, B, and C, respectively. (b) The grand mean for all treatments is: Thus the grand mean for the original set of data is 45+4=49 bushels per acre.

Example
(c) The total variation is:

(d) The variation between treatments is:

(e) The variation within treatments is:

Find an unbiased estimate of the population variance 2 from (a) the variation between treatments under the null hypothesis of equal treatment means and (b) the variation within treatments. (a) (b) Referring to previous problem, can we reject the null hypothesis of equal means at significance levels of (a) 0.05 and (b) 0.01? We have F = SB2/SW2 = 4/(2/3) = 6 with a -1=3 -1= 2 degrees of freedom and a(b-1 )= 3 (4 -1 )= 9 degrees of freedom. (a) Referring to table, with v1=2 and v2=9, we see that F0.95 = 4.26. Since F= 6 > F0.95, we can reject the null hypothesis of equal means at the 0.05 level. (b) Referring to table, with v1=2 and v2=9, we see that F0.99 = 8.02. Since F = 6 < F0.99, we cannot reject the null hypothesis of equal means at the 0.01 level.

Example
A company wishes to purchase one of five different machines: A, B, C, D, or E. In an experiment designed to test whether there is a difference in the machines performance, each of five experienced operators works on each of the machines for equal times. Table shows the numbers of units produced per machine. Test the hypothesis that there is no difference between the machines at significance levels of (a) 0.05 and (b) 0.01.

Subtract a suitable number, say 60, from all the data to obtain Table. Then

We now form Table above. For 4 and 20 degrees of freedom, we have F0.95=2.87. Thus we cannot reject the null hypothesis at the 0.05 level and therefore certainly cannot reject it at the 0.01 level.

Problem 13-5.
The compressive strength of concrete is being studied, and four different mixing techniques are being investigated. The following data have been collected.
Mixing Tech. 1 2 3 4 3129 3200 2800 2600 Compressive Strength 3000 2865 3300 2975 2900 2985 2700 2600 2890 3150 3050 2765

(a) Test the hypothesis that mixing techniques affect the strength of the concrete. Use 0.05.

Problem 13-3.
In Design and Analysis of Experiments, 7th edition (John Wiley & Sons, 2009) D. C. Montgomery described an experiment in which the tensile strength of a synthetic fiber was of interest to the manufacturer. It is suspected that strength is related to the percentage of cotton in the fiber. Five levels of cotton percentage were used, and five replicates were run in random order, resulting in the data below. (a) Does cotton percentage affect breaking strength? Draw comparative box plots and perform an analysis of variance. Use = 0.05.

Potrebbero piacerti anche