Sei sulla pagina 1di 10
An Introduction to Meta-Analysis With Articles From The Journal of Educational Research (1992-2002) ELISHA A, CHAMBERS University of Missouri-St. Louis ABSTRACT The author provides rescarchers, editors, and ‘readers with recommendations and guidelines for metaanalytic research, These recommendations are for procedures and for reporting metaanalytic findings. An applied example illustrates a meta-analysis with a data set and considers 2 research ques” tions. Four main section includes (a) a description of meta- analytic research methodology, (b) an illustration of a meta- Analysis, (e) a review of The Journal of Educational Research articles’ in which meta-analyses were investigated from 1992-2002, and (d) recommendations and guidelines for con- ducting and reporting meta-analyses. Key words: guidelines for reporting meta-analytic rescarchy meta-analyses reviewed in The Journal of Educational Research, years 19922002 O requires the in ne of the most salient ways to quantitatively syn- thesize research findings is through a meta analysis. Gene V Glass first introduced the mi technique in 1976. This statistical procedure stigator to calculate effect si: Hedges, Shymansky, and he procedures for from data, reported in research article Woexlworth (1989) explained e ducting meta-analysis ate composed of a series of steps in which the n and the last stage is preparing a report that includes methods, data, and lytic results allow the researcher to arliest stage is Forming the prob a discussion, Meta-a report on the efficacy of a program or intervention where the reported effect size is a standardized wt can be interpreted in terms of proportions or percentiles. Fu to examine the relation: ue tl more, one can use the effect ship between variables; for example investigation, Cohen (1981) considered the relationship erween stuxlent ratings of instruction and achievement The gonl of this article was to provide researchers, ed ndations for procedutes and in a metaanalytic tors, and readers with recon guidelines for reporting meta-analytic research. An applied example is pres computers in K- analysis with a data set and considers two research ques- nted (see the section on the efficacy of that ills classrooms) ss a meta ice. The first tions. There are four main sections in thi section describes meta-analytic research methoxlology: The second section illustrates a meta-analysis. The third section reviews The Joumal of Educational Reseanch 1992-2002 in which meta-analyses have been ¢ The fourth, and final, section examines recoramen for conducting and reporting me articles from, Description of Methodology analysis, as introduced by Glass (1976), requires for to caleulate effect sizes from data reported Glass stated the following: Primary analysis isthe orginal analysis of data in a research study Tes what one typically imagines as the application of statistical methods. Secondary analysis # the reanalysis of Jats for answering the original research question with hetter statistical techniques, of answering new questions with old Meta-analysis refers to the analysis of analyses. Lane alysis ofa lage collection of ara ie to refer to the statistical analysis results from ‘The statistical analysis thar is conducted on the research findings yields an effect size. An effect sie isa standardized value that is computed by subtracting the control group, mean from the experimental group mean (oF one group mean from another) and then dividing re by either the control group standard deviation or the pooled stan ation (see Glass, 1978). The formula is 2s follows: a An effect size also can be interpreted in terms of propor: tions or percentiles. Hedges and Olkin (1985) explained thar it can also be understood as“ cconttol scores that are less than the avera experimental group” (p. 76). For example, an effect the proportion of the score in the Addves correspondence to Elisha A. Chambers, Division of Educ tional Psychology, Research and Evaluation, College of Education Marilc Holl 402, University of Missouri-St. Lins, One Univerity Bivd., St. Louis, MO 63121-4400. (E-mail: clishachambers@ sumsl cde) 30 means that the average student who receives the treat ment would outperform 62% of the students in the control group. In terms of percentiles, with an effect size of 30, a typical student in the conerol group would perform at the 50th percentile, whereas a typical student in the treatment group woukl perform at the 62nd percentile, In terms of synthesis, the meta-analytic researcher atvempts to examine all stulies on a given topic, thereby avoiding selection bias in support of the auchor’s research incerest (Wolf, 1986). In a meta calculated from the results of all studies in a given a therefore, subjective weight is not an issue. In addition, meta-analytic re cies by examining the impact that different study features have on the outcome variable of interest. As Wolf noted, moderating variables are largely overlooked. It is posible ysis, an effect size is thers are able to examine inconsisten- for one ro examine moderating variables in a meta-analysis by conducting analyses on the other features of the stuxly ample, Dunn, Grigss, Olson, and Beasley (1995) ‘examined seven porential moderators (eg. sociveconomic status, length of intervention) in their meta-analytie study ‘on Teaming styles. As another example, in his stay of the relationship erween student ratings of their instructors and achievement, Cohen (1981) surmised that class size ‘might he a moderating variable. the meta-analysis were portioned so that one statistic rep- resented langer class [Because the two statistics were significantly different, class size was seen asa moderating variable. for and another smaller class sizes. Lis tations Although there are many benefits to conducting a mera analysis, there also are some li Glass, MeGaw, and Smith (1981) discovered four major disadvantages in conducting a meta-analysis. The first dis- advantage is called the “apples and oranges” comparison. Some authors (e.g, Gallo, 1978; Presby, 1978) noted thy aggregating results that use different research techniques is inappropriate because they are too dissimilar. However, ii fairly simple to correct this issue because one can code for dlfferent techniques, where appropriate, and test whether the results are too dissimilar. In the second disadvantage tations to this technique. arguments have been made against including poorly designed studies in a meta-analysis (e.g, Eysenck, 1978). Again, like the resolution for the “apples and oranges” argument, the meta-analytie researcher can cosle stay fea tures to test whether there ate differences based on study ‘quality, In the third disadvantage, Rosenthal (1979) noted ‘publication bias. The author described this issue as the “file drawer problem"; that is, there is a discrepancy between published and unpublished research in. which results from potential stulies are filed away and never pub lished. That hias has heen in favor of published research that yields positive significant results. The results reported in other meta-analytic research have somewhat substanti- ‘The Journal of Educational Research ated that claim because published research is more likely to he positive (e.g, Kulik & Kulik, 1991; Ryan, 1991). The problem can he addressed when one obtains unpublished manuscripts by concacting researchers directly oF locating unpublished manuscriprs through resources such as the Educational Resources Information Center (ERIC). Ie also is possible eo estimate the number of studies that would have to be in “file drawers" to change the results of a icta-analysis (see Rosenthal, 1979). The fourth, and final, disadvantage isthe use of multiple findings from the same study, thereby potentially biasing the results, Like the other three disadvantages, this prob: Tem also can be overcome. It is important that one late effect sizes that are independent, so it is a common practice for researchers to use only one effect size for each, study. A second altemative is for researchers to calculate ‘multiple effect sizes fora study if it ean be established that the samples are independ In addition to the limitations outlined in the preceding paragraph, Onwuexbusie and Levin (2003) discussed con- cers about the sensitivity of effect sizes, Onwueghuric and Levin noted that nine factors should be considered when investigating effect sizes. The factors include: research objective, research design, effect-size measure, interpreta: tion guidelines, smpling issues, distribution nonnormality, score variability, measurement ero, and scale of measure rent, Because meta-analytic research is dependent on effect size measures it is important chat researchers consid- er those factors. In rms of meta-analytic research, a sizable diserepaney tend to exist between the number of studies located and the actual inclusion of the number of studies used in a rcta-analysis. For example, Bangert-Deowns, Kulik and Kulik (1985) located almost 500 apparently pertinent arti- cles but found that over 90% were not suitable. Likewise, Gillingham andl Guthrie (1987) reported that cheir sample was composed of only 6% of the targeted literature as a result of cheir inability to caleulate effects from the lack of daa provided. Thus, i likely that there will ke adiserep- ancy between the apparent number of usable research arte cles anal the actual number of usable research articles (Overall, meta-analytic esearch is very effective method for researchers to statistically examine an abundant amount of research, Is important that one is aware of the potential downfalls in conducting a meta-analysis; however, when these downfalls are accounted for, the metaeanalytic, research can provide important and meaningful ansivers Procedures ‘The procedures used in meta-analytic research are siti lar to those used in primary research. Multiple phases to the process include (a) select a topic, (b} define a problem, (€) ‘conduct a literature review, (d) state a research question or hypothesis, (e) collect data, (f) analyze the data, and (2) evaluate the findings (sce Creswell, 2002; Fraenkel & September/October 2004 [Vol. 98(No. 1)] Wallen, 2005). After selecting a topic, defining a problem, and conducting the literature review, the researcher is almost ready to begin data collection. Prior to obtaining manuscripts, the meta-analyst needs to set criteria for inclu sion in the research synthesis; for example, the age of the participants, geographic location of the study, oF date of publication. Once the inclusion criteria have heen estab lished, the meta-analyst should select, modify, or ereate an instrument ro asist in the coding, org of the data (a coding scheme also should be documer Cading different study features allows the metaanalytic researcher to examine potential moderating variables. It is helpful if the instrument is reviewed by another individual (ea, an expert in the fickl) to ensure that there have been ro oversights; piloting the instrument also is informative. The meta-analyst is then ready to collect data. ‘To obtain published and unpublished manuscripts, there are at least two steps in the data collection procedures. The st step isto contact individuals who are involved in pro- grams of research that are related to the topic under inves- tigation. Requests can then be made for unpublished man- useripts and conference proceedings. The second step is to conduct several searches for manuscripts through pertinent clectronic databases. Manuscript acquisition is sometimes limited by library availability; however, interlibrary loans ccan he extremely helpful. It also is possible to obea manuscripts by consulting the reference lists of collected eles, Once the manuscripts have been obtained and coded, interrater reliability should be calculated to estab lish whether independent raters are able to consistently code the information (a higher percentage that the coders are in agreement indicates a more consistent coding of the information). ‘An Example: Efficacy of Computers in K-12 Classrooms. ‘The following illustration of synthesi ture through a meta-analysis was introduced by Glass (1976, 1978). The purpose of this example is to examine the efficacy of computers in K-12 classrooms from 1992-2002. Two research questions for this illust inelude: (a) [s educational technology generally effica- cious? (b) What effect sizes are revealed for each of the three grade levels (i Selection of Studies ‘The articles chosen for this illustration were scaled down, for the purposes of this article, thus, a randomly drawn stratified sample (n = 30) was selected from a langer meta- analytic study (Chambers, 2003). The stratification was based on grade level (m5 * 10,1, "10, m,,. * 10). The targeted studies were those that implemented computer assisted instruction in K-12 classrooms that provided out- ichievement and were reported between 7 the years 1992 and 2002, The studies were identified through various electronic databases such as Dissertation Abstracts International, PsycINFO, and ERIC. The articles were limited to those available at the university’ li darough inter ‘cles needed to include sufficient information e an effect size and the research had to be of experimental or quasi-experimental design (Campbell & Stanley, 1966); thus, the studies need- ced to include a treatment or experimental group and a co trol or comparison group. Furthermore, the treatn soup and the control group had to yield a total minimum size of 10 (ie, 5 students in each group) because instruc tional sessions with less than 5 students could be consi cred tutorials (Trace S Leitner, 1984). Procedures This stuxly began with the establishment of a coding scheme that inclised 37 items; however, for che purposes of this lh there were only 4 items of interest (ie grade level, achievement, total instructional time, quency of instruction). To address the flere (Gee Rosenthal, 1979), [attempted to loeate unpublished manuscripts hy conta Proj cect coonlinators directly; however, only 5% of those who were contacted responded. Once the txles were obtaines, they were coded and data were entered into SPSS 11.0 for Windows (SPSS, 2001). Interater reliability vas established in the lager stay (86% agreement), and diserepancies were solved through diseussion g researchers nd education: Treatment of Data Effect were calculated for all the studies under investigation. In this example, | used the standard devia- tion of the control group because it was not contaminated from the treatment (see Hedges et al, 1989). In attempting to gather the necessary information from stules, [had to calculate effec hast, For + (see Wolf, 1986). In addition, when there were an uneven ‘numberof participants in the control group and the exper- ‘imental group, Fused the eect formula reported by Holmes (1984), In order not to violate the assumption of independence, | calculated one effect size for each study (see Lipsey & Wilson, 2001). That single effect size was calculated on the brass ofthe study major focus (individ effect sizes are reported in the Appendix). When | calculated individual effect sizes, overall mean effect sizes were calculated, as well as standand errors forthe mean effect szes (sce Lipsey & Wilson), and the upper and lower values for 958% cont dence intervals. This reporting is in accordance with the recommendations of Wilkinson and the Task Force on Sta- tistical Inferences (1999), who stated ¢ should “fallvays present effect sizes for primary outcomes" (p- 599), in akltion to provid tes es from other test statisties researchers 38 Results ‘The 30 studies yielded 30 effect sizes for 4,467 students (see Table 1 for a summary of the effect sizes). By coding the length and frequency of the different instruction: Interventions, one can report descriptive info oth variables. The instructional interventions varied from 2 hr to as much as 105 hr (M = 29.71, SD = 42.08, Median = 9.0). The frequency of instruction vai ‘ovo times per week to as much as five times (M = 3.67, SD Median = 4.0). fon for Effect: Both weighted and unweighted mean effect sizes are reported below. To correct for bias (Hedges & Olkin, 198: Lipsey & Wilson, 2001; Rudner, Glas, Evartt & Emery 2002), | calculated weighted values with the inverse vari ght formula (Hedges & Olkin; Lipsey & Wilbon, Ruiner eval) To capture an accurate picture of effect sizes, the data were examined for outliers. The unweighted achievement mean effect sie for this sample was 0.61 (SD = 1-13). An examination of the data revealed one potential outlier (an size Adjusoments TABLE 1. Effect Size by Study Study Grade level Nua 1 KS 8 2 KS 2 3 KS 260 4 KS 8S fi KS oo 6 KS 198, 7 KS 48 8 KS 89 9 KS 66 0 al u ot or 2 oo 246 1B 6 4 a o 99 Is oF 96 16 os 151 7 OS 9 Is 68 Luo 9 os ® » 68 a 2 on on 2 92 1s a 9-2 130 4 od 134 2 on 0 26 oD 126 2 92 52 28 912 189 9 oD 2 30 on B ‘The Journal of Educational Research fier size beyond 0.61 ¢ 3.59) chat was obtained from Study No. 21 and was 5.89 (SE = 0.18, ,.4 = 671). The authors from Study No. 21 acknowledged that their study viekled a large effec size (they controlled for the students? contr examination scores, current grade-point average, and pretest ores). Homogenciey of Effect Sites The Q statistic was compute in a test of homogeneity of the effect sizes (see Hedges S Olkin, 1985; sce Step 3: Data Analysis for a diseussion on homogeneity). The Q statistic revealed thatthe effect sizes were heterogeneous because the test for homogeneity was significant Q(29) = 2919.11, criti eal 72029, N = 30) = 42.56, p = 05. On closer examination, it appears that the outlier diseussed previously (Study No. 21) contributed to the heterogeneity; however, the analysis ‘was still significant when the outlier was excluded Q(28) = 82.89, critical 72(28, N = 29) = 41.34, p = .05. Thompson (1994) suggested that that is often a result of methoxolosical diferences or participant differences; therefore itis impor- tant 10 examine the different factors asociated with these effect sizes. For instance, larger efecr sizes have been associ ated with lower ability students for computer-assisted instruction (est, Banert-Drowns, 1993). Similaly, the par- ticipants in Study No. 21 were solely low-ability students. However, due to the limitations of this article, only grade level was examined Research Questions To answer the first research question, “Is educational technology. generally efficacious?” I caleulated a mean clfet sie. The unweighted mean effect size for achie iment was 0.61 (SE = 0.21, Median = 0.47) and varied from =1.29 10 5.89. The upper Cl limit was 1.03 and the lower Cl, limic was 0.19. The weighted mean effect size for achievement was 1.37 (SE = 0.08) and the upper Cly; limit was 1.53, whereas the lower Cly, limit was 1.21. That find- ing indicates that students in the computer groups outper- forined their peers in comparison groups by over a half a standard deviation in measures of achievement (or over 1 of a standard deviation when examining the weighted mean). To answer the second research question, “Wha sizes are revealed for each of the three grade lev culated mean effect sizes by grade level for each srade-le category (ie, K=5, 6-8, 9-12). The unweighted and weighted mean effect sizes, standard ertors, al confidence intervals are reported in Table 1 for each grade level. (See Figure 1 for a graphic representation of the unweighted ‘means and Figure 2 for a graphic representation of the weighted means.) The median effeet size for elementary school students (K-5) was 0.61. The effect sizes varied from -0.43 t0 0.99. “The median effect size for middle school students (6-8) was effect September/October 2004 [Vol. 98(No. 1)] as au wi gis el aie T + gos LL 0 io io io mentary (K-S) High Schoo! (9-12) Junior High (6-8) FIGURE 1. Unveighted mean effect sizes for achieve ment, by grade level. 5 £ a a “Mean Etfect Size (98% Confidence Intervals) BRS in = = 2 High School (9-12) High (6-8) Blementary (K-5) Ju FIGURE 2. Weighted means effect sizes for achieve- iment, by grade level. O47. The effect sizes varied from -0.55 to 1.33. The medi an effect size for high school students (9-12) was 0.36. The effect sizes varied from —1.30 to 5.89. Those findings, and those reported in Table 2, suggest that overall, students at any grade level benefit from educational technology in terms of achievement outcomes. Students in the computer ‘groups outperformed their peers in comparison groups by cover a third of a standard deviation in measures of achieve- ment at all three grade levels when ex unweighted or weighted mean effect ning either the Discussion In general, and at specific grade levels (ise K-5, 6-8, 9-12), the metaanalytic findings in this illustration favored educational technology interventions and_pro- grams. That finding is consistent with previous meta-analy- ses in which educational technology in K-12 classrooms 39 was examined! (e.2, Bangert-Deowns, Kulik, & Kulik, 1985; Niemiec S& Walberg, 1985; Ryan, 1991). However, past meta-analyses typically reported that the effect sizes for children in lower grades were higher than those found for students in higher grades (ea, Fletcher-linn S& Gravatt, 1995; Kulik & Kulik, 1991; Kulik, Kulik, & Bangert- Drowns, 1985): perhaps the way in which technology is presently being used is different, as the mean effect sizes reported in many of the past meta-analyses are from studies thar are over 8 years old. It is important to further invest szite such differences co better understand the impact th computers and educational technology have on achieve- ment in K-12 classrooms. Review of The Journal of Educational Research Articles (1992-2002) In this section, [examine meta-analytical work that has been published in The Jowmal of Educational Research from 1992-2002. The following eight articles are considered Dunn et al. (1995); Fan (2001); Guthrie, Schafer, Vor Secker, anc! Alhan (2000); Hough and Hall (1994); Lo Abram, and Spence (2000); Mavrogenes and Besrucske (1994); MeGregor (1993); and Rael 2001). Two of the atcles reported on issues associated with conducting mu analyses and reporting effect sizes (Fans Hough S Hall, 1994), whereas in si of the her a meta-analysis was conducted or effect sizes were reported (Dunn etal; Guthrie etal; Lou et al., 2000; Mavrogenes & Bezrucsko, 1994; McGregor; Razel). Therefor, | fist review the atti cles in which the metaranalytic sues are explored and then examine the me analytic and effect Meta-analyticIsswes In the first articles, I examined meta-analytic issues that included statistical significance and effect sizes (Fan, 2001) and the use of different formulas in calculating effect sizes (Hough & Hall, 1994). Fan examined sampling variability! of the effect size measures d and R?, wh Hall contrasted effect-size formulas (see Glass, McGaw & nith, 1981; Schmide & Hunter, 1977). By using a Monte Carlo experiment, Fan (2001) found that sample d appears to be an unbiased estimator of popu lation d; however, the R? appears co have an upward bias that is consistently above the population mean. Therefore, hers use bias correction 2 popila- Fan recommended that res Ihecause the adjusted RE value is very close to 1 tion RE value. Similarly, Hough and Hall (1994) found differences berween the formulas that they investigated. By using the nalyses, they found that so of the three elfect sizes were slightly larger for the Hunter Schmidt formula than for the Glass formula (0.30, 0.34, 0.79 versus 0.29, 0.34, 0.75), In addition, when they used the measure- ment ertor correction recommended by Schmidt and dlata from three meta 40 ‘The Journal of Educational Research TABLE 2. Summary of Unweighted and Weighted Mean Effect Sizes (ES), Standard Errors (SE), and Confidence Intervals (CI) for Achievement Men Ch, I Weighted — Chg Chyg Grade level ES. upper lower —smean ES upper lower Elementary school(K-S) 05308202355 oot 049 SE 043 0.03 Middle schoo! (6-8) 055 09S 01S 036 047 SE os 0.02 High sehoo! (9-12) 076 212-060 290 3) 2s3 SE 0.60 19) Hunter (1977), they found chat nwo of the three corrected effect sizes were significantly larger than the effect sizes ca culated with the Glass formula. However, they did conclude that those differences appear to be minimal and that it ean be dilficule to acquire the reliability coefficients necessary to correct for measurement ertor. As noted by Hough and Hall, the Hunter-Schmide formula is more accurate (especially when measurement « ever, the Glass formula is an acceptable alter requires fewer resources to ealeulae. Even Hough and Hall noted that they were not able toacquire the necessary infor- imation for 15% of the studies (even after consulting tes ‘manuals. [eis important that meta-analytic researchers are avwre of those isues and discrepancies when they assess which of the formulas would best suit their needs, and note any limitations or corrections are conducted); how- Articles « published in The Jounal of 1992 and 2002, authors used ses and examined a vatiety rarch betwee lof topics, (Table 3 provides a brief summary of the infor imation from the articles) In two of the studies, measures of association were used, and in the remaining studies, eff sizes were tsed. In three of the four studies in which eff sizes were used, the pooled standard deviation for employed (Hedges & Olkin, 1985), and in one study, authors used the control-group standard deviation formula (Glass, 1978). The selection and inclusion of studies that 1 analyzed varied somewhat from study to study; however, 1 cstablished selection criteria. The studies also reported on issues of validity, homogeneity (and moderators), outliers, ‘weighting, and interrater coding. The way in which results were reported also greatly varied. Dunn et al. (1995) were the only researchers to report that they examined meta-analytic studies in tetms of threats to validity. By using eriteria that Campbell and nley (1966) established, Dunn and colleagues found that 6 of the 42 articles in which experimental designs were used had serious threats to internal and extemal validity; ir meta therefore, the 6 studies were not included in th analysis. Dunn and colleagues also used a variety of tech- hiques to examine homogeneity In addition to Dunn et al. (1995), Lou et al. (2000) also ned! homogeneity (see Hedges & Olkin, 1985). Dunn and colleagues used three indicators of homogeneity (ie, residual standard deviation, percentage of observed vari tance accounted! for by sampling error, and a chi-square test cof homogeneity), whereas Low and colleagues used one indicator (ie. chi-square test of homogeneity). Authors of both studies found that their data were heterogeneous, Even after Lou and colleagues removed five stusies from their analyses that appeared to he outliers, they found that the data were heterogeneous. In both studies, the possibili- ty of moxlerators ined (see Thompson, 1994). Another isse that was addressed was that of weighting (see Hedges & Olkin, 1985; Lipsey & Wilson, 2001; Rude ner et al, 2002). Only Dunn et al. (1995) and Lou etal (2000) reported thar they weighted their findings. Both stuclies reported the use of approaches consistent with Hedges and Olkin. Weighting sicer because it corrects for biasing that could result from ith smaller population sizes (Hedges & Olkin; Lipsey & Wilson). Reliability issues were examined in three studies (Dunia et al, 1995; Guthrie et al, 2000; Low et al, 2000), and validity was reported in two studies (Guthrie et als Mavro: genes & Besruccko, 1994). Although Dunn and colleagues contacted the primary researchers directly, Low and col ygues and Guthrie and colleagues examined inter ng. When Dunn and colleagues contacted the primary researchers, they requested verification of coding and sought to collect any missi Low and colleagues cal- culated the percentage of inercoder agreement for study features and effect-size information, and Guthrie and col- Teagues reported on intercod for scores on a performance assessment. Guthrie and colleagues and “Mavrogenes and Bezruczko also reported on internal con- sistency: In addition, Guthrie and colleagues ackiresse face validity, convergent validity, and construct validity ‘The eight studies provided some type of graphic repre sentation of theit findings—e bles and four studies presented figures (Dunn et al. 1995; Fan, jan important issue to cone ‘greet it stnlies provided September/October 2004 [Vol 98(No. 1)] a TABLE 3, Summary of Measure of Effect Findings From The Journal of Educational Research (1992-2002) Ravel (2001) 305 @ 1,022,000 Study Nevcoanr Naser Nopainon Measure of elfect Overall finding(s) Finding summary Dunn, Griggs. Olson, & Beasley (1995) 6s 6 181 Weighted conelation 40.38 coefficient learning syle Guthrie, Schater, Von Seeker, & Alban (2000) B B 2.719 Hedges & Olkin (1985) —Multipl eect fects of instruc —Pooled standard Sizes reported tional reading deviation formula practices 0n| Achievement in 6 Lou, Abgami, Spence tal, (2000) 3 NIA Hedges & Olkin (1985) +0.16 —Favoring within Pooled standard lass grouping over deviation formula ‘hole-class inst —Weighied tion Mavrozenes & Berraczho (1994) B ® 186 Pooled standard Multiple effect Examined gender evition formula sizes reported dlilferences in writing MeGregor (1993) a 7 NA Glass (1978) 40.42 In favor of role Control group playing on student standard deviation ‘acial animes Formel 4048 tn favor of anti ‘cst teachings on Student atiudes Multiple findings Relationship be= reported lvision 90% negative and achieve nt varies by age “A metoanayss way not conde in his sy ater, twas primary study thar reported eect sizes 2001; McGregor, 1993; Razel, 2000). The content in the tables varied greatly from guidelines (Fan, 2001), to mod: erator: (Dunn et al), to weighted effect-sie reliability coefficients (Guthrie et sl,, 2000), The content in the fig- also varied greatly from a student writing sample (Dunn et al.), to correlations (Razel) to normal distribu tion eurves (McGregor, 1993), Like the graphic representations, the narrative portion of the report also varied greatly in terms of the kinds of infor imation that were included. In addition to anticipated effect confidence intervals also were included (Lou et al, 2000), as were correlations (Mavrogenes & Bezruczko, 1994), and coefficients of variation (McGregor, 1995). The following section addresses recommendations and guide- lines for conducting and reporting meta-analytic studi Recommendations and Guidelines ‘There are four basic steps for conducting meta-analytic research: design, data collection (manuscript acquisition), data analysis, and reporting, Each of the steps is discussed Jn the following paragraphs. This section concludes with a brief examination of the limitations associated with con. ducting meta-analyie researc. Step 1: Designing a Meta-anatytie Study ‘To design a meta-analysis, researchers must consider sev eral factors. Those factors include criteria for manuscript inclusion, instrumentation, and coding schemes. Re searchers need to establish inclusion criteria forthe studies that will be included in the meta-analysis. Inclusion crite- ria may inelude (a) study desiga (e.g, experimental, quasi- experimental), (b) types of ourcome measures (e.. achievement rest scores), (c) ae oF grade level of part pants, oF (d) group sample sizes. Onee inclusion criteria have been established, then instrumentation and costing schemes shoukl be selected, modified, or created. It is inmportant that researchers have a protocol that can be used to consistently document the necessary information from 2 each study under festigation. In addition, a coding scheme is necessary for analyzing the data that are collect- ced. Following data collection, intercoder or interrater reli- ability should be conducted. Once those design issues are ‘established, then it is time to start searching for potential manuscripts. ‘Step 2: Data Collection o Manuscript Acquistion ‘There are three major ways to obtain manuscripts: con- tact individuals, conduct searches of electronic databases, ad consult reference lists. It is possible for contact individuals who are likely to have unpublished or published manuscripts that relate to the meta-analytic The second alternative literature searches of electronic data. study under investigation, involves conduct bases such as Dissertation Abstracts International and ERIC, Once a list of potential articles has been identi researchers should read through the abstracts and sum- rmaries and make decisions about including the study if they can make the determination from the inf provided in the abstract or summary. Otherwise, researchers might have to obtain and review the man script in its entirety: Manuscripts thar have been selected ccan then be coded for data analysis. However, prior to culating measures of effec, researchers might have to con tact the primary source to obtain missing or additional information, Step 3: Data Analysis ‘After the studies have been obtained and missing or additional information is collecte to be calculated. The most appropriate measure needs to be selected and the data should be analyzed accordingly: There are a variety of resources available to assist with this selection (eg, Glass, 1978; Glass et al, 1981; Fan, 2001; Hedges & Olkin, 1985; Hedges er a., 1989; Lipsey & Wil son, 2001; Maxwell & Delaney, 2000; Rosenthal, 1984: Rusdner et al., 2002; Wolf, 1986). In addition, when the sample sie of the control group and the experimental oup are uneven, additional effect formulas are available (caps, Holmes, 1984). Furths is possible for one to use transformations on the reported data to calculate effect sizes, and if means and standard deviations are not report- ced, effect sizes can be calculated from other test statistics such as & F, of 7 as transformation formulas are widely .. Wolf). Statistical independence is impor- “There is a concern with independence when multiple findings from the same study are included in the alysis, thereby potentially biasing the results. That wn be avoided by calculating effect s measures of effect need problem independent; therefore, itis common practice to use only ‘one effect size for each study. However, Lipsey and Wilson (2001) stated chat be statistically independent if, for a given distribution, no elfect sizes can usually be assumed to ‘The Journal of Educational Research more than one effect size comes from any subject sample” (p. 112), Therefore, multiple effect sizes can be reported for a single study if ic is established that the samples are independent. In general, homogeneity of effect sizes is examined: homogeneity indicates whether the distribution of the effect sizes accurately reflects the mean effect size for the population (Lipsey & Wilson, 2001). The Q statistic is ‘computed in atest of homogeneity and is distributed as a chi-square (see Hedges & Olkin, 1985). Thompson (1994) noted that when tests of homogeneity are signife cant, itis often a result of methodological differences or participant differences; thus, it is important that one ccxamine diferent factors aswociated with these effect sizes In addition, outliers should be noted and possibly removed from the analyses. Lipsey and Wilson defined outliers as values heyond two or three standard deviations of the mean, When data exploration is complete, weights can be calculated, Another predominant isue isd weighting the effet sizes because stuales with smaller sample sizes (Hedges & Olkin, 1985; Lipsey & Wilbon, 2001; Ruiner etal, 2002). One ean cal- cular weighted values by using the inverse variance weight formula to corteet bias (Hedges & Olkin; Lipsey & Wilson, Ruder etal) “The final phase in this step is for researchers to caleulate mean effect sizes (weighted and unweighted effect sizes can be calculated) along with other pertinent information, such as standand errors fidence intervals. The for for mean effect site is ESx = SES/n, where n repre sents the number of effect sizes in the calculation and SES represents the summation of all the effect sizes from the es used in the investigation. The standard ertor forthe cffct sizes ao can be calculated (see Lipsey & Wil 001). Thompson (2002) norest the importance of reporting confidence intervals, as they indicate the limits in which can is likely to be located, thereby providing an index of precision (Lipsey & Wilson, 2001), Furthermore, if the values of the confidence int encompasses zero, then its possible tha chete is no effect, Cumining and Finch (2001) noted that “[eonfidence inter vals] support meta-analysis and meta-analytic thinking focused on estimation. This feature of [confidence inter- vals] has been ltee explored or exploited in che social sci- ences but is i [thei] view crucial and deserving of much thought and development” (p. 334). The appropriate use and calculation of confidence intervals have been widely discussed (see Algina & Moulder, 2001; Cumming & Finch, 2001; Fan & Thompson, 201; Fidler & Thompson, 2001; Henson & Cumming, 2003; Mendoza & Stafon!, 2001; Smithson, 2000, 2001). Reporting of confidence ‘intervals also is consistent with the recommendations of Wilkinson and the Task Force on Statistical Inference (199). of weighting oF not ssing could result from September/October 2004 [Vol. 98(No. 1)] Suep 4: Reporting Meta-anaytic Findings “The final step is the reporting of meta-analytic findings Ie is important that one provides graphic and tabular represen tations of the findings (see Henson & Cumming, 2003; Thompson, 2002; Wilkinson & the Task Force on Statistical Inference}; for example, Thompson (2002) provided an illustration of graphing confidence intervals for effect sizes. The potential formars are varied but should enhance inter- pretation of the findings. In addition, Wilkinson and the Task Force stated, “interval estimates should be given for any effect sites involving principal outcomes” (p. 599). Further- ‘more, they stated that itis time for authors to cake advan: tage of chem and for editors and reviewers to unge authors to «do 50” (p. 602). Therefore, itis important for researchers t0 consult the guidelines and recommendation of the Ameri- can Pyychological Association (2001) because they are intended to facilitate the dissemination of information. Summary Meta-analysis is a salient way to synthe: findings. In this article, I provided a description of meta- analytic research with an illustration of a meta-analysis, in addition to a review of artic have heen investigated in The Journal of Educational Research from 1992 to 2002. Furthermore, recommenda- tions associated with conducting a metacanalysis were pre sented, thereby providing researchers, editors, and readers with guidelines for metacanalytic research. in which me! NOTE 1, The tem homogeneous “ue 0 refer to populations ae samples ‘hac have low varity” (Vast, 1993, p 105). Furthermore, Pn etl (1995) note th fancthon of the metacanalyr thedterma thon oF hormones describe sharing common eect ste” (y 359), [REFERENCES Alga}, Se Moule, B.C. (2001). Sample sts fr confidence intervals inthe irene ithe gure mltpe-ortlaton coxfictent. Elica tal and Pcl! Measeement, 61(3) 633-68. ‘American Peehologtel Avett, (2001). Pacaton manta of the “American Praga Aswan (5h el.) Washingt, DC: Auto Banger Dram, RL. (1993). The wand proceso aan isructional Took A metals of woe peocesng ta ting insttetion. Revie of Ecsinal Rein, 6311), 03-93 BaagereDrowns, RL, Kulik JA. 8 Kubik, CC. (1985). Eiectveness ff computer based. education im secondary schools. Jounal of Computer Based rr, 1203), 59-88, Campbell DT, 8 Sealy, C. (1968). Experimental and quasierpee sent dens for reser Chica: Rand McNally (Chambers EA. (2003). Bficacy of edocatonal well in elementary wal scandy claws, A tnctivanalyse of the research rare from 1992-2002 (Doctoral dssenation, Sather Ili Univesity, 2002). Discnatin Aaa Inert, 63,316 Cohen, PL AN(T9SI) Seen ratings of rsteaction an student ables Tent A tet anaiss of ulation validity tides. Roi of Educa {Brad Rowan, 51(3), 281-50, Cresell,| W. G02), Escalon, conducting, and val ating amie anata reach Upper Sale Ber Nf: Pea ton Elreation, Ine 8 (Cumming, Gs, & Finch, S, (2001). A primer on the underandina use fd cakulton of confidence instal hat ae ted on cena and fhoncentaldstibtons, Edwacoal ond Piychlgcal Mesaneney, o1ia)332-374 Dan R, Gnas S. A, Oban & Bese. M. (1995). A mta-amalye {ie vabdiio of the Din aa Du mol of etn tle preference The Juma of Educa Reseth, BS, 353-362 Ejsenck, HJ. (1979), An extcbe in megrsiliness. Ameian Pcl 33,517. Fan, X_ 2001). Statitical semficance and fot size in education escrch: Tho sides ofa own The Jounal of Eaucatial Resch, 89, pe Fan, X, & Thompson, B. (2001). Confidence intervals for et sen aku sone ably ocficents plese: An EPM genes tori Edeatinal and Pocholagcal Measnement 61), 517-531 Fidler B, & Thompson, Br (2001), Computing corect confidence inter ‘vas Gor ANOVA fied and rardomeet ect ster Ednctimal and Poschilagcal Mesaroment. 61(8), 575-604 lercher Flin, C. Mr & Gravatt, B (1995). The ffeacy of computer ssa stration (CAI) A tana, Joumal oj Educational Com Pring Rcarch, 123), 219-242 Fraenkel, JR S Wallen, N. E- (2003), How to dean and evaluate research ‘edict. Boston: McGraw-Hill Gallo, PS (1978). A metwaalis—a mised metaphor? Amen Po hod $5, 51 Gllinhar, SiG. S Curb, J.T (1987) Relatonshirs between CBI und rec on teaching, Contemporary Educational Pcs, 12, 19-199, as, G- V (1976). Primary ccomary, aad metaanalss of seca, EthcatonalRewarch, 3.3. Ghss, GV (1978) lntegraing fining The metaanahss of research, Review of Research in Educa, 5, 331-37 Glas, GV McCaw, B, 8 Sth, MLL (1981). Metra sia ilk Sige 1. T Scaler, W. D Von Seer, C:, & Alban, T, (2000), Com Tricine of dnutretional prietices to reading achicvetnent ina Statewide mnpeoverent progam. The femal of Educational Research 33, 211-236 Heiss, LY, 8 Olin, 1 (1985), Stats meth for metranas ‘Ofland, FL: Academie Pres. Hedges LV, Shymansh J A & Woodwork, G. (1989). A practical isu to mae matods of temas Waning, DC: Natal Sh nce Teachers Anca Henson, Ro, & Cumming, G. (2003, Api). Confdnce ita for ‘fects se fan other nborane sats! A apical oppraach. Pape Troentel athe sarual meeting of dhe Americ Edvcatsonel Research FRscianion, Chica Holos, CT (981). Ee sze etination in manasa of Exponent! Education, 52, 106-10 Hog SL Hall, BW. (1994) Copan ofthe Glas and Hunter ond tants recnigis The ral of deat Resch, Kuli, CC, S Kul, J. A. (1991), fective of computer into. An psd analy, Computers st Haman Bakar 7, 7594 Kuli, JA. Kulik, CC, S Bangen-Dromr, RL (1985). Eletveness ‘of compurctrsal elucaton in elementary schools. Computers bt Haman Beha 1,5 Line MW, Wilson, D. B. (2001), Paced means. Thousand TC Spon). (2000) Fees of with ls gp ing talent acevement: An exploratory mel. The Jal of Ee ‘tonal Rscarch, 94, 101-113. Macrogenes, NAL, 8 Bess, N. (1994). A stud ofthe waiting Figheprtechaleaneaged cullen. The aun of Esvcaonal Reser, 7.228.283, Manstll, SE, & Delaney, HD. 2000). Desig experiments nl en Tag dara: A model companion pops. Miwa, NJ Elba MeGraon I. (199), Efevtnenes cole playing nd antiacit teaching in raking sen projice The Juma of Ecsta Recerch, 80 215-216, “Mena, Ly & Seaford, KL (2001). Confidence ince power cae Casto, ad sample ss exit for the squared multiple corel ‘Hm coc une the fixe aid ard reseion mos A co “4 puter program ae uel sana ables Ec ful Mecarement, 614, 650-667 Niemies RP Walker, H.J (1985). Comparer and achievement the ckmentary yea, fal of Elana Caputing Resear, 1), $540 COnminoghusc, A.J. & Levi, JR. (2003, Apa Do ofc te mews mete? The oad, the Bad and the as. Paper peesented at the rm meting of dhe American Educational Reseach Ascii, Chie Pasty, 8 (1978). Overly bral categories obscure important diferences Tremeen therapy, Ameican Phage, 33,314 Race, M_ (2001). The complex mol of television viewing an educ ‘tonal achievement. The Jad of Edusatinol Rach, 94 31.37 Rowmthal R197) The “ile dame prble” aad tolerance tr nul results Paychudgea Btn, 863), 635-681 Rocha, R (1984, Metal procs for soi escach Beverly Fille Sige Rader LM Gln, GV vane, D. LS Emery.) 2002). A wae fet the maul of resch stubs. Calese Park, MD: ER Caring om Asicoment an Esalaton, Univepty of Moylan Ryan A.W. (1991). Metaanays of oe Iptcr applications in elementary school, Elec Admnanaies Quer, 2702), 161-18 Ssh EL, 8 Hite, JE (1977). Developmcnt ofa snr ha ‘The Journal of Educational Research te the prblem of vality general 62,529.54 Swithson, M: (2000). Sais with onfdense. Thora Onks, CA: Sage Stuhr M, (2001). Correct confidence cra fo varkns ees ‘et secs and parameters The inpwrtance of msnccntal distatons in compuring intervals Edicotonal an Pscbgl Moauromet, 11,9), (05-632 SPSS. 2001) SPSS 11.0 for Windows [Computer Software, Chicas SPSS Ine Thome, B. (2002), What fre quantitative social cence research ‘inkl look lke: Confidence intervals for elect ates Edwcatnal Ressnde, 310), 25-32 Thormpwn, S-C. (1994) Sptomatc review: Why seures of hetcrgene- fay im metasanalys shud i inventeatal. BJM, 309, 381-1355. Thies SM, & Lester, DW. (1984, Sopamber). Te rca of the eof cls ig a achive scary als Pap peered a the anual mestng of the Mil-Weern Elvcanal Reach Asc mon Chie. WT: (1993). Ditmar of sates and methdigy, Newbury Pak, ‘CA: Sage. Wilkimon, Ls S the Tink Force on Sttstical erence (1999) Sorital metho im psychokegy journals: Guelines and ‘splnaion.Amarcan Prchobse, 34(8), 594-604 Wai FM (1986) Meuanafis: Quantiatve mets for sarc she st Bevery Hill, CA: Se on Journal of Ald Pach, Cri Problem criteria |. Presenting a significant issue 2. Relating issue to theory 3, Weatfying confounds Design criteria Desering sample of sudies; How represe Coding scheme and interatr reliability ‘Total numberof studies 6, Describing characteristics of the study inbor of participants in each study er that are experiments Pereentage male and female students Adults andr children ly identifying delimitation Results evteria| 9, Test of homogeneity 10, Visual display of elfect sizes (he 1, Central tendency. spread of effet si Inerpretation eritera 12 raphe abulae) 14. Aniculating cautionvimitations 15, Adsaneing recommendations fr practice and research Elementary and/or secondary andlor postsecondary stodents APPENDIX ia for Evaluating a Meta-Analysis Yes No NIA, 44 Inclusion criteria for the studies (ee. experimental, se of participants) ive isthe sample of the population of studies? 5. Data collection procedures (eg, comact individuals, database searches, reference lists) . Single effect size foreach study versus multiple effect sizes (Le, statistical independence) es, and confidence intervals Significance testing of effect sies: Ave they greater than chance? 15, Comparing effect sizes of various hinds of subjects, designs, sample sizes, et ans 18), 393-407, “These criteria were amended rom Hall J. A & Rosen, R (1998) lterpeting and evaluat 3 metaanalysen valuation atthe Health Profs

Potrebbero piacerti anche