Sei sulla pagina 1di 14

Chapter 7

Incomplete block designs


Contents
7.1 7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 Example: Investigate differences in water quality . . . . . . . . . . . . . . . . . . . . 550

7.1

Introduction

Blocking (or stratication or pairing) is a fundamental aspect of statistics. The idea behind blocking is to control for sources of variation that make it difcult to detect differences in the mean response between treatments. For example, in the stream slope example, the density of sh will vary among streams due to intrinsic differences among streams (e.g. different productivity) that is difcult to measure. Consequently, a good experiment will examine all treatments (e.g. the different stream slopes) in all streams so that the intrinsic differences in density among streams will cancel out when comparisons among treatments are made. In the examples in the earlier chapter, blocks were complete in that every block had every treatment occurring at least once and every treatment occurred in every block at least once. In some cases, blocks are incomplete, either by design (e.g. blocks are too small) or by accident (e.g. missing values). Some care needs to be taken in the analysis of incomplete block designs, but a basic analysis is straightforward with modern software. Please refer to the earlier chapter on the analysis of complete block designs as the points made there about examining assumptions are equally pertinent here.

549

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

7.2

Example: Investigate differences in water quality

Water quality monitoring studies often take the form of incomplete block designs. For example, the following data represents TSS in water samples taken upstream of a development (the reference sample), at the development (the mid-stream sample), or downstream of the development (the ds sample). Samples are taken during storm events when water quality may be compromised by the development. Here is a small set of data1 : Location Ref Mid DS Storm 1 51 173 Storm 2 137 Storm 3 25 100 170 Storm 4 20 110

The represents data that is missing. We will assume that the missing data are MCAR (Missing Completely at Random), i.e. that missingness is unrelated to the value of TSS or any other measurable covariate in the study. One way this could be violated is if missing value indicate that the TSS reading exceed the tolerance of the measurement device. In many cases, some of the data may also be censored, i.e. < LDL or > U DL where LDL and UDL are the lower and upper detection limits. If censored data are present, more advanced methods are available. Water quality varies among the storm events in some unknown fashion, but it is though that all locations should be inuenced in the same way. For example, events with large amounts of precipitation may increase the TSS in all locations. How should such data be analyzed? Looking at the raw data, it appears that water quality levels at the DS site are about three times that at the Mid site; and in turn, the water quality at the mid site is about twice that of the Ref site. However, a comparison of the simple average of the values in each location is an unfair comparison because not all locations were measured on all storm events and the different averages would compare different combinations of storm events. An incomplete-block analysis takes into account the pattern of missing values. For example, if you wish to look at the ref vs. mid locations should use the data from Storm 3; the comparison of ref and ds locations should use the storm 4 event; and the comparison of the mid and ds locations should look at the storm 1 and 3 events. The dataset is available in the water-quality.jmp data le available in the Sample Program Library at http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/MyPrograms. We start by entering the data into JMP in the usual way:
1 Such a small set of data likely has very poor power to detect anything but very large differences in water quality among the three locations. Before conducting such a study, please perform a power analysis to ensure that sufcient samples are taken.

c 2012 Carl James Schwarz

550

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

In many cases, where data has a wide range and where the ratio among values is of interest, a logtransformation of the data is often preferred. A formula is used to create a column of the log-values (the log-function is under the transcendental option of the function groups). Note that the log function is the natural logarithm (i.e. to base e) and not the common (i.e. to base 10) logarithm.

c 2012 Carl James Schwarz

551

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

This gives the revised data table:

c 2012 Carl James Schwarz

552

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

The shorthand notation for the model is: logT SS = Event Location

This model syntax is interpreted as saying that variation in readings of logTSS may be attributable to effects due to different location and to different storm events. A key assumption being made in analysis of data collected under a blocked design is that the relationship among the treatments is the same among all blocks (i.e. no block-treatment interaction). In this case, this assumption takes place at the log-level. If it is believed that the relationship among treatments differs among blocks, then there is no simple way to analyze this experiment. Because of the importance of this assumption, it is recommended that any data collection should follow a generalized-block design where replicate observations at some of the treatments takes place in each block2 This model is t in JMP using the Analyze->Fit Model platform:
2 Refer to Addelman, S. (1969). The Generalized Randomized Block Design. American Statistician, 23, 35-36. http://dx. doi.org/10.2307/2681737. and Gates C. E. (1999). What really is experimental error in block designs. American Statistician 49, 362-363. http://dx.doi.org/10.2307/2684574

c 2012 Carl James Schwarz

553

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

Notice that Location and Event must be nominal or ordinal scaled variables. The hypothesis of interest is that: H :location ds = location mid = location ref A :not the above where the terms refer to the mean logTSS at each of the three locations. In this rst analysis, blocks are treated as xed effects. This is known as the Intra-block analysis. The programs automatically account for the missing values note that in some cases, the design is known as non-connected, and the analysis can fail. This typically happens with extreme numbers of missing values contact me for more details. We start by looking at the F -test for location effects. The Effect Test is:

c 2012 Carl James Schwarz

554

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

The F -statistic is about 25 with a p-value of .0382. As this is somewhat smaller than = .05, there is some evidence of a difference in the mean logTSS among the three locations. It is instructive to look at the estimated differences among the mean logTSS: in the different locations.

c 2012 Carl James Schwarz

555

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

As in previous chapters, a multiple-comparison procedure (the Tukey HSD procedure) should be used to control the experimentwise error rate. Please consult earlier chapters for details. The estimated difference in the mean logTSS between the ds and ref locations is 1.91 (SE .27). This implies that the TSS at the ds location is estimated to be e1.91 = 6.75 TIMES larger (on average) than at the ref site. The se for the ratio is NOT found by simply taking the anti-log of the se on the log-scale. However, by application of a technique called the delta-method, it is possible to show that the se of the anti-log of the estimate is found as SEantilog = SElog (6.75) = 1.82 By taking the anti-logarithms of the condence interval for the difference in mean logTSS, we nd that we are 95% condence that the ratio of TSS between the ds and ref sites is (e.31 = 1.36 e3.51 = 33.4) times larger. Notice that while the condence interval is symmetric on the log-scale, it is not symmetric on the anti-log scale. The condence intervals are much wider than the usual 2se because the total sample size is only 8, but there are 6 parameters that are estimated leaving only 2 df for the residual error. The condence interval
c 2012 Carl James Schwarz

556

CHAPTER 7. INCOMPLETE BLOCK DESIGNS multiplier with 2 df is considerably larger than the multiplier of 1.96 (or about 2) used when sample sizes are large. Note that the estimated differences above automatically adjust for the missing values and are NOT equal to the differences in the raw mean (see below). The average logTSS across storm events can also be found.:

Notice that the estimated LSmeans or "population means" is different than the raw mean. This is because of the adjustment by the procedure for the pattern of missing values. The precision of each marginal mean differs because of the differing amount of samples collected at each location. The anti-logarithm of each marginal mean would be interpreted as an estimate of the geometric mean TSS at each location. Of course, the other assumptions made for any ANOVA need to be checked (i.e. equal variance among treatment groups; independence of residual; no outliers; normality of residuals; X measured without error, etc.) as in previous chapters. Dont forget to look at the residual plots. Unfortunately, with such limited data, there is likely to be very little power to detect anything but gross violations of the assumptions.

c 2012 Carl James Schwarz

557

CHAPTER 7. INCOMPLETE BLOCK DESIGNS Final Comment: A more rened analysis would treat the storm events as random effects. The analysis would proceed as above except Events are declared as a random effect. Notice how the random block is specied in the Analyze->Fit Model dialogue box:

c 2012 Carl James Schwarz

558

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

Ironically, in cases where blocks are incomplete, there are two sources of information about treatment effects. The major part of the information comes from the intra-block analysis (done above). Some small amount of additional information can be extracted (known as the inter-block information). By specifying that blocks are a random effect (i.e. that you wish to extrapolate to other events other than the observed storms), it is possible to combine both analyses with modern software. The hypothesis of interest is the same under the xed and random block models. The effect test uses more information and so indicates more evidence of an effect of location upon the mean logTSS:

c 2012 Carl James Schwarz

559

CHAPTER 7. INCOMPLETE BLOCK DESIGNS Notice how the F -statistic and p-value changes slightly for the test of no location effects compared to the intra-block analysis (where blocks are xed effects). This revised model extracts additional information (the inter-block information) from the data that the rst model ignored. The estimated DIFFERENCE of the mean logTSS are slightly changed (no dramatic change) but the standard error is improved3 :
3 If

there were many blocks, the standard error of the combined inter- and intra-block analysis could be dramatically improved

c 2012 Carl James Schwarz

560

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

Similarly, there is little change in the estimated marginal means (over all storm events) but some improvement in the estimated standard errors.
c 2012 Carl James Schwarz

561

CHAPTER 7. INCOMPLETE BLOCK DESIGNS

With modern software, the analysis of incomplete block designs is fairly straightforward. In some cases you can run into problems if there are substantial missing data in a systematic pattern (the design is not connected). Please consult a statistician for details on such models.

c 2012 Carl James Schwarz

562

Potrebbero piacerti anche