Sei sulla pagina 1di 16

This article was downloaded by: [IBA, Karachi]

On: 03 December 2014, At: 23:15


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Leisure Sciences: An Interdisciplinary


Journal
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/ulsc20

Understanding Meta-Analysis: A Review


of the Methodological Literature
a

Lori B. Shelby & Jerry J. Vaske

Human Dimensions of Natural Resources , Colorado State


University , Fort Collins, CO, USA
Published online: 07 Mar 2008.

To cite this article: Lori B. Shelby & Jerry J. Vaske (2008) Understanding Meta-Analysis: A Review
of the Methodological Literature, Leisure Sciences: An Interdisciplinary Journal, 30:2, 96-110, DOI:
10.1080/01490400701881366
To link to this article: http://dx.doi.org/10.1080/01490400701881366

PLEASE SCROLL DOWN FOR ARTICLE


Taylor & Francis makes every effort to ensure the accuracy of all the information (the
Content) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Leisure Sciences, 30: 96110, 2008


C Taylor & Francis Group, LLC
Copyright 
ISSN: 0149-0400 print / 1521-0588 online
DOI: 10.1080/01490400701881366

Understanding Meta-Analysis: A Review of the


Methodological Literature
LORI B. SHELBY
JERRY J. VASKE

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Human Dimensions of Natural Resources


Colorado State University
Fort Collins, CO, USA
Meta-analysis is a quantitative technique that uses specific measures (e.g., an effect
size) to indicate the strength of variable relationships for the studies included in the
analysis. The technique emphasizes results across multiple studies as opposed to results
from a single investigation. This article provides an introduction to the meta-analysis
literature and discusses the challenges of applying meta-analysis to human dimensions
research. Specifically, we review the definitions of meta-analysis techniques, the steps in
conducting a meta-analysis, and the advantages and disadvantages of meta-analysis.
Keywords comparative analysis, effect size, literature review, meta-analysis

Introduction
The use of meta-analysis as a technique for quantitative research syntheses of multiple
studies has become increasingly popular since Gene Glass introduced the term at the annual convention of the American Education Research Association in 1976. Although the
term was new, the concept of statistically integrating studies already existed (e.g., Pearson,
1904; Tippett, 1931). In the social sciences, meta-analysis was rarely seen until the 1970s
when several social scientists applied quantitative synthesis techniques to their respective
disciplines such as in social psychology (e.g., Schmidt & Hunter, 1977). A milestone was a
series of books that gave applied rules and techniques for meta-analysis (Glass, McGaw, &
Smith, 1981; Hedges & Olkin, 1985; Hunter, Schmidt, & Jackson, 1982; Rosenthal, 1984).
As a result, meta-analysis spread through the disciplines in the 1980s especially in psychology, education, and the medical sciences. The increasing use of meta-analysis has led to
increasing expectations for rigor in the process and procedures. The Handbook of Research
Synthesis (Cooper & Hedges, 1994) has become a definitive source for the behavioral and
medical sciences and highlights this attention to rigor.
In the human dimensions (HD) literature, several quantitative articles have summarized
findings across studies using comparative analysis (e.g., Donnelly, Vaske, Whittaker, &
Shelby, 2000; Shelby, Heberlein, Vaske, & Alfano, 1983; Shelby, Vaske, & Heberlein,
1989; Vaske & Donnelly, 2002; Vaske, Donnelly, Heberlein, & Shelby, 1982). Although
these comparative analyses would not be considered meta-analyses by some definitions,
such studies facilitate an understanding of the literature and therefore, have practical utility.
In this article, we argue that the use of more formal and rigorous meta-analytical procedures
Received 15 December 2006; accepted 15 November 2007.
Address correspondence to Jerry J. Vaske, Colorado State University, Human Dimensions of Natural Resources, Fort Collins, Colorado 80523. E-mail: jerryv@cnr.colostate.edu

96

Meta-Analysis Overview

97

can further enhance HD research. Our review of the methodological literature considers
the definitions of meta-analysis, the basic steps of meta-analysis, and the strengths and
weaknesses of meta-analysis in the context of HD research.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Definitions of Meta-Analysis
A single well-accepted definition or a single correct way to conduct a meta-analysis does
not exist. What constitutes a true meta-analysis is debatable. Glass (1976), for example,
took a broad view approach and referred to meta-analysis as the statistical analysis of a
large collection of analysis results from individual studies for the purpose of integrating
the findings (p. 3). Stricter definitions, however, are more common. Gliner, Morgan, and
Harmon (2003), for example, defined meta-analysis as a research synthesis that uses a
quantitative measure, effect size, to indicate the strength of relationship between the treatments and dependent measures of studies making up that synthesis (p. 1376). Between
these extremes other authors have proposed variants of these definitions (Hedges & Olkin,
1985; Hunter & Schmidt, 1990; Rosenthal, 1984). Differences in these defining characteristics have contributed to the confusion among researchers and can be reflected in four
issues: (a) total methodology vs. an analysis technique, (b) the use of effect sizes, (c) the
unit of analysis, and (d) meta-analysis vs. comparative analysis.
Total Methodology vs. Analysis Technique
Some researchers define meta-analysis as the entire process of collecting, synthesizing, and
analyzing research findings from multiple studies in a systematic way (i.e., a total methodology). Others use the term to simply describe the statistical methods used to combine the
results of studies (i.e., an analysis technique). The distinction between a total methodology
and an analysis technique has contributed to debate and confusion in the methodological
literature. Cooper (1982), for example, suggested that a rigorous method must be applied
to the process of collecting and coding studies to help prevent validity problems (e.g., the
people sampled in the collected studies might be different from the target population of
people). This suggestion contradicts Glasss (1976) assertion that a priori considerations on
research findings are not appropriate for a meta-analysis.
Use of Effect Sizes in Meta-Analysis
Effect sizes measure the strength of relationship between variables and are typically used as
the summary statistic in meta-analyses (e.g., Chen & Popovich, 2002; Cooper & Hedges,
1994; Grissom & Kim, 2005; Hedges & Olkin, 1985; Lipsey & Wilson, 2001; Rosenthal,
Rosnow, & Rubin, 2000). An effect size is calculated for each variable relationship of
interest for each study in the meta-analysis. The effect sizes are combined using a summary
effect size statistic. Statistical analysis is then conducted on the summary effect size. For
example, in a meta-analytic study on hunting satisfaction, an effect size could be calculated
for the relationship between bagging game and hunting satisfaction in each study and then
combined into an average effect size across all studies to produce a quantitative measure of
the strength of this relationship for all known studies on hunting satisfaction.
Effect sizes in the HD literature are often divided into two major types referred to as
the d-family of indices (Glasss , Hedges g) and the r -family of indices (r , eta) (Gliner,
Vaske, & Morgan, 2001; Vaske, Gliner & Morgan, 2002). The d effect sizes are expressed
in standard deviation units and r effect sizes are correlation coefficients. Effect sizes in
a meta-analysis can include a variety of other measures such as proportions (i.e., direct
method, Logit method), arithmetic means, standardized means gain, proportion difference,
and logged odds-ratios (see Table 1 for more information).

98

L. B. Shelby and J. J. Vaske

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

The use of an effect size, however, is not a requirement for a study to be considered
a meta-analysis. (For information on other methods of combining studies see Cooper &
Hedges, 1994; Rosenthal, 1984). The requirement for an effect size that is apparent in many
meta-analysis definitions likely stems from a misinterpretation of Glasss writings. Glass
focused on effect sizes but cautioned that effect size is not the only type of summary statistic
that could be used (Glass, 2000). Other authors also focused on effect size, but specifically
mentioned that it was only an example of a common statistic that could be found for each
study (Hunter & Schmidt, 1990; Rosenthal, 1984). Any quantitative method used to obtain
meaning from data is potentially useful in meta-analysis (Glass et al., 1981). Although effect
size has incorrectly been considered the only statistic for meta-analysis, it is the statistic
used most frequently (Lipsey & Wilson, 2001). As a result, much of the methodology and
techniques in meta-analysis are based on the numerous effect sizes available. For example,
see Hunter & Schmidts correction formulas.
Unit of Analysis in Meta-Analysis
Some meta-analysis definitions are broad enough to incorporate a variety of units of analysis (e.g., experiments, studies, publications, datasets). Conflict arises because scientists in
some disciplines (e.g., psychology, medicine) consider the gold standard to be randomized
controlled experiments (i.e., random assignment of participants to treatment and comparison groups; Gliner et al., 2003). Randomized controlled experiments, however, typically
have low external validity. HD research is generally focused on management or policy
implications that require high external validity. As a result, much of the HD literature is
derived from surveys (e.g., on-site, mailed, telephone) with samples representing the population of interest (Vaske, Shelby, & Manfredo, 2006). HD researchers should choose the
unit of analysis that is most appropriate for their study. Any unit of analysis (e.g., datasets,
publications) is acceptable as long as: (a) quantitative results have been generated, (b) the
results are conceptually comparable, (c) the findings are statistically comparable, and (d)
the results come from similar research designs (Cooper & Hedges, 1994; Glass et al., 1981;
Lipsey & Wilson, 2001).
Meta-Analysis vs. Comparative Analysis
Comparative analysis articles have become increasingly common in the HD literature (e.g.,
Shelby et al., 1989; Vaske & Donnelly, 2002; Vaske et al., 1982). By aggregating data over
specific variables (e.g., crowding, norms, satisfaction), comparative analyses reveal patterns
in findings and identify causal factors that are not evident in a single study (e.g., the relationship of multiple activities and settings on crowding). Similar to meta-analysis, they generally
involve a thorough literature review and use quantitative approaches to aggregate the data
(e.g., Donnelly et al., 2000; Manfredo, Driver, & Tarrant, 1996). Comparative analyses,
however, typically use original datasets instead of information gleaned from publications.
The relationship of meta-analysis to comparative analysis depends on how meta-analysis is
defined.
Donnelly et al. (2000), for example, reported an effect size (r ) to examine the relationships between predictor variables (type of resource, type of encounter, and question
response format) and norm prevalence across 56 evaluation contexts in 30 studies. Similarly, Vaske and Donnelly (2002) used t-tests and correlation coefficients from 73 evaluation
contexts in 13 studies to test the hypothesis that when encounters exceed an individuals
norms for seeing others, crowding will increase. In both comparative analyses (Donnelly
et al., 2000; Vaske & Donnelly, 2002), a single study could have more than one evaluation
context. Evaluation contexts referred to: (a) specific locations where recreationists reported

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Meta-Analysis Overview

99

a norm or felt crowded (e.g., at a trailhead and on the trail), (b) time (e.g., opening day vs.
rest of hunting season), or (c) visitors evaluating other types of recreationists (e.g., anglers
evaluations of kayakers).
By some definitions, these and other comparative analyses in the HD literature (e.g.,
Shelby et al., 1989; Vaske et al., 1982) would not be considered meta-analyses for at
least four reasons. First, traditional meta-analysis statistical techniques for identifying the
relevant literature (i.e., datasets in these examples) were not discussed as a part of the study
(e.g., bias corrections). Second, the authors of these examples used original datasets and not
the published literature as typically found in meta-analysis. Third, although these studies
sometimes computed an effect size for each evaluation context (e.g., Vaske & Donnelly,
2002), an overall summary effect size was not computed across all studies. Fourth, the unit
of analysis in these examples was an evaluation context and not a study or experiment as
typically used in a meta-analysis. More inclusive definitions of meta-analysis (Glass, 1976),
however, would allow these examples from the HD literature to be considered meta-analyses.
In summary, the definition of what constitutes a meta-analysis has been a source of
intense debate. Opinions related to the true definition of meta-analysis are primarily a matter of nomenclature, orthodoxy, and personal bias. The HD literature has generally avoided
referring to comparative analyses as meta-analyses. Although the issues presented here
allow for flexibility in the definition of meta-analysis, the fact that most disciplines accept
a strict definition should be considered. From an applied perspective, both meta-analyses
and comparative analyses offer advantages to the HD literature. Meta-analysis methods: (a)
have formal procedures for accounting for bias, error, and outliers and (b) provide a gauge
for determining the degree of homogeneity within and among different subgroups. When
the mean effect sizes are not homogeneous, comparative analyses can potentially highlight
the source of the heterogeneity (e.g., changing use conditions, different evaluation contexts,
management actions). Combining comparative and meta-analytic techniques can enhance
theoretical/empirical advancement and facilitate understanding practical applications of a
concept.

Basic Steps of Meta-Analysis


No single correct approach to conducting a meta-analysis exists. Separating the total
methodology analysis into a series of steps, however, encourages a rigorous approach and
an organizational framework for conducting a meta-analysis (e.g., Cooper, 1982; Cooper &
Hedges, 1994; Jackson, 1980; Lipsey & Wilson, 2001; Schafer, 1999). The steps are analogous to those in primary research: (a) problem conceptualization and operationalization,
(b) data collection and processing, (c) analysis, and (d) reporting.
Step 1: Problem Conceptualization and Operationalization
The first step is to conceptualize the problem, operationalize the variables, and create the
hypotheses. The problem statement should include a specification of the relevant research
literature and the major independent and dependent variables (Lipsey & Wilson, 2001).
Theoretical and statistical issues are also addressed during this step (Hall, Tickle-Degnen,
Rosenthal, & Mosteller, 1994). Researchers should consider: (a) their level of confidence that
relationships exist between the independent and dependent variables; (b) the generalizability
of the findings beyond a small subset of populations, settings, and procedures; and (c)
whether the analysis will advance the theoretical understanding of the literature.
Planning for the inclusion or exclusion criteria is perhaps the most important component
of meta-analysis. Such criteria are directly related to internal/external validity and generalizability (Gliner et al., 2003). Factors such as sampling methods, research methods, time

100

L. B. Shelby and J. J. Vaske

frames, publication types, and cultural/language differences of studies should all be considered (Cooper & Hedges, 1994). These decisions are heavily dependent on a researchers
purposes, goals, and available resources. Lipsey and Wilson (2001) suggest that researchers
prepare a detailed, written specification of the criteria a study must meet for inclusion in
the meta-analysis.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Step 2: Data Collection and Processing


Given that numerous articles will likely be identified, procedures should be established
for tracking article collection and organizing citation information (e.g., a researcher may
choose to create a bibliographic database). Cooper and Hedges (1994) offer suggestions for
how to conduct a thorough literature search.
Coding studies for meta-analysis is analogous to survey research. A questionnaire is
prepared, and each article is interviewed by the coder based on the information provided by
the article. As in survey research, preparing the questionnaire carefully, training the coders,
and monitoring the completeness, reliability, and validity of the resulting data is important.
Close-ended items should be used as much as possible to facilitate creation of the database
(Brown, Upchurch, & Acton, 2003; Cooper & Hedges, 1994; Stock, Benito, & Lasa, 1996).
The choice of the specific summarizing statistic is central to the development of the
codebook and depends on the nature of the research findings, the type of statistics reported
for each study, and the hypotheses tested by the meta-analysis (Lipsey & Wilson, 2001).
Table 1 shows some common effect size statistics, their specific use for meta-analysis, and
the formulas necessary for computing the study level effect size. The Cohens d formula,
for example, can be used to examine the difference between means of two or more groups
of respondents in each study (e.g., snowmobilers vs. cross-country skiers evaluations of
crowding). Care should be taken to choose an effect size that can be combined across studies
with reasonable ease (Grissom & Kim, 2005; Lipsey & Wilson; Rosenthal et al., 2000;
Shadish & Haddock, 1994). Due to the complex calculations required for meta-analysis,
many effect sizes are rarely used. Cohens d, Pearsons r , and odds-ratios are typically
used because applicable standard error formulations and other statistical procedures are
readily available (Lipsey & Wilson). For more information on specific effect sizes used in
meta-analysis, see Cooper and Hedges (1994), Grissom and Kim, and Lipsey and Wilson.
Hedges (1994) reviews the statistical issues that should be considered in this stage.
After the creation of a codebook, coders need to be trained to ensure familiarity with the
software, data file, and codebook. A single coder is not unusual in meta-analysis. However,
if the researcher chooses to use multiple coders, specific training methods are suggested
in Lipsey and Wilson (2001). For multiple coders, reliability is determined based on the
consistency between the coders (Orwin, 1994; Yeaton & Wortman, 1993). For an individual
coder, the consistency between one coding session to another is of interest (Orwin, 1994).
Researchers frequently code studies directly into the computer (Lipsey & Wilson,
2001). In this case, the structure of the dataset and codebook become the survey instrument.
This method is efficient and allows for revisions of the codebook and dataset. Practically,
the coding protocol is developed in two steps: one module codes information that applies
to the entire study (e.g., sample size) and the other module codes effect size information
for specific analysis variables.
Important decisions in the data collection step are the type(s) of software that will be
used and the structure of the meta-analytic files. For a typical meta-analysis, a single file
can be used if the number of effect sizes is small. Multiple files are often used and then
merged for analysis. In this circumstance, one data file contains the information about study
descriptors (e.g., sample size, study location, mean income), and the other data file contains
information about effect sizes. If numerous effect sizes are analyzed, a relational database
may prove beneficial.

Meta-Analysis Overview

101

TABLE 1 Commonly Used Effect Sizes for Meta-Analysis1


Effect Size
Type
r -family

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

d-family

Proportion
(Direct
Method)

Proportion
(Logit
Method)

Arithmetic
Mean

Standardized
Mean Gain

Proportion
Difference

Logged
Odds-Ratio
1

Use

Effect Size Statistic (to be


computed for each study)

Describes strength of the relationship E Sr = r x y


between two variables across all
studies. For example, Pearsons r is
used when each study compared
two continuous variables.
X G2
Describes strength of differences
E Sd = X G1sp
=d
between two variables across all
s p = pooled standard deviation
studies. For example, Cohens d is
of group 1 & group 2 scores
used when each study compared
the standardized difference
between two group means. Also
termed standardized mean
difference.
Describe central tendency of a
E S p = p = nk
proportion across all studies. Can
p = proportion of subjects in
compare proportions determined
category of interest k =
for different subgroups of studies.
number of subjects in category
of interest n = total number of
subjects in sample


Describe central tendency of a
E S pL = loge 1p p
proportion across all studies (when
p = proportion of subjects in
mean proportion across all studies
category of interest
is expected to be <0.2 or >0.8).
Can compare proportions
determined for different subgroups
of studies.

E Sm = X = nxi
Describe central tendency of mean
across all studies.
xi = individual score for
Can compare means determined for
subject i (i = 1 to n) n = total
different subgroups of studies.
number of subjects in sample
t1
Contrast two variables that differ
E Sug = X t2sX
= s /G2(1r )
p
g
only by time of measurement (each
s p = pooled standard deviation
study involved a pre-post test).
of time 1 & time 2 scores
All studies did not use the same
sg = standard deviation of
measure but the measures can be
gain scores
standardized.
r = correlation between time 1
& time 2 scores
Contrast a variable across two or
E S pd = pG1 pG2
more groups of respondents j(each
pG1 = group 1 proportion
study involved group contrast using
pG2 = group 2 proportion
proportions).
 
Contrast a dichotomous variable
E SL O R = loge ad
bc
across two or more groups of
respondents.

See Lipsey & Wilson (2001) for more information

102

L. B. Shelby and J. J. Vaske

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Step 3: Analysis
Three variables are needed to compute the summary effect size. An effect size statistic, the
standard error of the effect size, and the inverse variance of the standard error are computed
for each study or evaluation context. For formulas specific to each effect size, see Lipsey
and Wilson (2001) and Cooper and Hedges (1994). Effect sizes based on larger samples
provide better population estimates than those based on smaller samples (Lipsey & Wilson,
2001). Meta-analysts typically weight each effect size for each study to account for sample
size differences. Although weights based on sample sizes are optional, Hedges (1982) and
Hedges and Olkin (1985) demonstrated that optimal weights are based on the standard error
of the effect size. Because a larger standard error corresponds to a less precise effect size,
the actual weights are computed as the inverse of the square standard errorthe inverse
variance weight (Lipsey & Wilson, 2001).
Once the necessary information is coded and the necessary adjustments to the effect
size statistics have been made, the effect sizes are analyzed. To account for bias and error,
correction formulas have been developed. For example, if the sample sizes of the studies
differ or the researcher is concerned with small sample bias, various weighting methods are
available (Lipsey & Wilson, 2001). Weights for reliability and validity issues, transformation
formulas and bias corrections, outliers, and missing data can all be used (Beal, Corey, &
Dunlap, 2002; Huffcutt & Arthur, 1995; Little & Rubin, 2002). Frequently used summary
effect size statistics are more likely to have empirically proven correction formulas. For more
information see Hunter and Schmidt (1990), Lipsey and Wilson, and Rosenthal (1984).
The basic analytic goals of meta-analysis are to: (a) combine and analyze the distribution
of effect sizes and (b) examine the relationship between effect sizes and other descriptive
variables to understand the variability of effect sizes across studies. Figure 1 outlines analytic
methods typically used in meta-analysis. The four basic steps are:
1.
2.
3.
4.

create independent effect sizes for each study,


compute the weighted mean of effect sizes using inverse variance weights,
determine the confidence interval for the mean, and
analyze for homogeneity.

The homogeneity of the effect size distribution provides an indicator of whether the
independent effect sizes for each study are from the same population (Hedges & Olkin,
1985; Rosenthal, 1984; Snedecor, 1946). The chi-square distribution and test statistic Q
(the sum of squares of the effect size around the weighted mean effect size) are commonly
used (Hedges & Olkin). The Q test for homogeneity facilitates the choice of fixed effects,
random effects, fixed effects with variance explained, or mixed effects models (see Figure 1).
A homogeneous distribution of effect sizes (i.e., nonsignificant Q statistic) implies
that the dispersion of effect sizes around their mean is less than or equal to the expected
sampling error. The variability in effect sizes is hypothesized to equal sampling error alone.
In this situation, a fixed effects model can be used. The fixed effects model assumes: (a)
an observed effect size from a study is an accurate estimate for the population with only
random subject-level sampling error, and (b) the mean effect size for all studies is an accurate
estimate of the relationship (Lipsey & Wilson, 2001).
When the mean effect size is heterogeneous (i.e., a significant Q statistic) or there is
an inferential reason to conduct further analysis, three options are available: (a) a random
effects model, (b) a fixed effects model that attempts to explain the variability in effect sizes,
and (c) a mixed effects model (Figure 1). Various authors have provided detailed approaches
for explaining between study variability (Cooper & Hedges, 1994; Hedges, 1982; Hedges
& Olkin, 1985; Rosenthal, 1984).

103

FIGURE 1 Analysis steps for meta-analysis (Adapted from Lipsey & Wilson, 2001).

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

104

L. B. Shelby and J. J. Vaske

First, a random effects model assumes that the variability beyond subject-level sampling
error is random and cannot be identified (Lipsey & Wilson, 2001). In this case, an inverse
variance weight is used to account for random variability at both the study/context level
and the respondent level for each study. There are two common methods for obtaining a
random variability estimate: (a) the method of moments, and (b) the method of maximum
likelihood (Raudenbush, 1994; Shadish & Haddock, 1994). The meta-analyst should repeat
the analysis steps shown in Figure 1 using the new inverse variance weight.
Second, a fixed effects with variance explained model assumes that the variability
beyond subject-level sampling error can be explained by other variables in the meta-analysis.
These variables would systematically differentiate studies with larger or smaller effect sizes
(Lipsey & Wilson, 2001). Two methods for modeling the variability between studies for
fixed effects models are evident in the literature: (a) a meta-analytic approach to ANOVA,
and (b) a meta-analytic approach to regression.
If heterogeneity is evident, the data can be a priori segmented to create more homogeneous subgroups (e.g., trout anglers in wilderness and salmon anglers in nonwilderness
areas). An analysis similar to ANOVA can be used to compare between these subgroups.
See Hedges (1982, 1994) and Hedges and Olkin (1985) for computational details. The homogeneity of effect sizes between groups and within groups tests the effects of groupings
and provides an indication that the final groups have internally homogeneous effect sizes.
When meta-analysts are interested in explaining the variance between studies that have
continuous independent variables, an analogue to multiple regression is often used (Hedges,
1982; Hedges, 1994; Hedges & Olkin, 1985). The predictor variables represent study characteristics such as the population studied (e.g., demographics, type of activity participated
in, location of activity) or methodological characteristics (e.g., variable operationalization,
on-site vs. mailed surveys, methodological quality). The effect sizes are regressed on the
predictor variables to determine the relationship between study characteristics and effect
sizes.
Third, if an effect size distribution is still heterogeneous after modeling variables explaining effect size variance, a mixed-effects model may be appropriate. Mixed effects
models attempt to explain the variability by study characteristic variables and assume random variability beyond subject-level sampling error (Lipsey & Wilson, 2001). For these
models, the random effects variance component is calculated based on the residual variability from modeling study characteristics instead of the total variability (Cooper & Hedges,
1994; Kalaian & Raudenbush, 1996; Overton, 1998).
Although average effect sizes that come from heterogeneous distributions should generally be treated with caution, homogeneity tests such as the Q statistic are not the only
decision criteria. The decision about using a fixed, random, fixed with variance explained,
or mixed effects model should be based primarily on theory (i.e., the assumptions made by
the researcher regarding the underlying reasons for differences leading to a nonhomogeneous effect size such as random error; Matt & Cook, 1994). In some cases, a fixed effects
analysis (i.e., without variance explained) may be appropriate even when there is significant
heterogeneity (e.g., when the research question is not attempting to generalize beyond the
studies that are being included in the meta-analysis; Hedges & Vevea, 1998).

Step 4: Reporting
The interpretation and reporting of results depends on the meta-analysts personal judgments, understanding of the research, and purpose of the work. Specific methods for interpreting and using the results are integrally related to what measures were used. Specific
advice for interpreting and evaluating meta-analysis results are provided in the literature

Meta-Analysis Overview

105

(Cooper & Hedges, 1994; Hall & Rosenthal, 1995). Guidance for writing up meta-analyses
is also available (Halvorsen, 1994; Light, Singer, & Willett, 1994; Rosenthal, 1995).

Advantages of Meta-Analysis
The advantages of meta-analysis can be summarized into two categories: (a) a means of
considering the practical significance of research findings and (b) a rigorous methodology
for quantitative research synthesis.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Practical Significance
Meta-analysis is one method for providing evidence for or against practical significance.
The method encourages researchers to consider the whole picture and gives credence to
repeated results through the use of a summary statistic. Meta-analytic effect sizes provide
one indicator of practical significance and avoid the problems commonly associated with
Null Hypothesis Significance Testing (Gliner et al., 2001; Vaske et al., 2002). Meta-analysis
can find effects or relationships that are obscured in other approaches (Lipsey & Wilson,
2001). Qualitative literature reviews, for example, do not allow researchers to consider
statistical differences between studies.
Rigorous Methodology
Meta-analysis provides a rigorous methodology for quantitative research synthesis. Although
no one single best approach exists, researchers are expected to follow meta-analysis procedures. Rules and techniques for meta-analysis in the social sciences that encourage such
rigor can be found in several references (Glass et al., 1981; Hedges & Olkin, 1985; Hunter &
Schmidt, 1990; Rosenthal, 1984). The Handbook of Research Synthesis (Cooper & Hedges,
1994) is a definitive source for the behavioral and medical sciences. The use of a rigorous
approach encourages the researcher to become intimate with the data, create focused research hypotheses, and identify moderator variables (Rosenthal & DiMatteo, 2001). Thus,
researchers are afforded quantitative tools for evaluating the meaning of the literature.

Disadvantages of Meta-Analysis
The effort and expertise required to conduct a meta-analysis can be problematic. Lipsey and
Wilson (2001), however, have provided a clear introduction to meta-analysis concepts and
analysis. Their text is relatively easy to understand for individuals with a basic knowledge
of statistics and is specific to the social sciences (see also Rosenthal and DiMatteo, 2001.)
Most of the criticisms of meta-analysis are related to the potential error and bias that
can result from combining studies. Error and bias in a meta-analysis can stem from a series
of interrelated issues. First, with the mix of studies used in a meta-analysis, differences
between studies have been referred to as an apples and oranges problem. For example, it may
not be appropriate to meta-analyze studies that use different methodologies (e.g., surveys
vs. experiments), sampling designs, and/or variable measurements. Such methodological
variables, however, can be coded and used to test for their impact on the overall findings. In
response to the apples and oranges criticism, Glass (2000) stated, Of course it mixes apples
and oranges; in the study of fruit nothing else is sensible; comparing apples and oranges is
the only endeavor worthy of true scientists; comparing apples to apples is trivial (p. 5).
Second, the concern with mixing studies also includes the problems associated with
mixing studies of different methodological quality (garbage in-garbage out). If studies

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

106

L. B. Shelby and J. J. Vaske

with poor methodological quality are included, the meta-analysis results may be biased.
A judgment on the quality of each study by the researcher minimizes this problem. For
example, each study can simply be rated as high or low quality (Hedges, 1982). Other
methods such as rating threats to internal and external validity (Campbell & Stanley, 1966;
Cook & Campbell, 1979) and evaluations of each studys methods (Cooper, 1984) also
have been proposed. These methods are relatively easy to accomplish and can be done
as part of setting the standards for study inclusion and exclusion in the early steps of
meta-analysis.
Third, the inclusion or exclusion of specific studies can influence error and bias. Glass
consistently put forward the idea that a priori considerations of study quality and study
differences are not necessary in meta-analysis (Glass, 1982; Glass et al., 1981). Others
have suggested means for limiting inclusion/exclusion bias (e.g., Cooper, 1982; Hunter &
Schmidt, 1990).
Fourth, a related concern has been referred to as the file-drawer problem, which involves the issue of publication bias. This problem arises when a meta-analysis attempts
to review all significant and non-significant findings to provide a complete perspective.
Nonsignificant findings, however, are often not published and the file drawer problem becomes an important issue (Hedges, 1992). Methods have been developed to help detect and
minimize publication bias ( Pham et al., 2001) such as the funnel plot (i.e., a scatterplot of
sample size versus estimated effect size for a group of studies; Copas & Shi, 2000; Light &
Pillemer, 1984), checking the file drawer of unpublished studies and estimating their numbers (Iyengar & Greenhouse, 1988; Rosenthal, 1979), trim-and-fill estimates to estimate
the number of missing studies (Duval & Tweedie, 2000; Givens, Smith, & Tweedie, 1997),
and weighted estimation methods (Murtaugh, 2002; Sutton, Song, Gilbody, & Abrams,
2000). Alternatively, the file drawer problem can be solved by specifying the standards for
the inclusion of studies. By limiting the studies to those with adequate power and that the
researcher believes would pass high standards of peer-review for publication or funding
(Kraemer, Gardner, Brooks, & Yesavage, 1998), the researcher eliminates publication bias
since the sample is not intended to cover all of the studies.
Fifth, using multiple findings from the same study can be a source of bias in metaanalysis because the corresponding effect sizes may not be independent of one another
(Glass et al., 1981; Rosenthal & DiMatteo, 2001). Statistical methods are often used to
account for effect sizes that are not independent. In the comparative analyses presented
earlier, more than one evaluation context could come from a single study. For example,
respondents in a single study could rate their perceptions of crowding at the trailhead,
on the trail, and at the summit of a mountain. One solution is to average the dependent
effect sizes in each study, but this approach contradicts previous literature showing that
perceptions of crowding vary by location of the encounter (e.g., Shelby et al., 1989).
Other solutions may include: (a) treating the effect sizes as independent, (b) choosing
only one effect size from each study, (c) computing the degree of interdependence from
intercorrelations, and (d) computing a weighted average of the correlations (Cheung &
Chan, 2004; Gleser & Olkin, 1994; Hunter & Schmidt, 1990). Ultimately, the solution of
choice depends on the research question and the magnitude of dependence between the
effect sizes.
Overall, researchers must balance each of these concerns (e.g., apples and oranges,
garbage in-garbage out, file drawer) against the objectives of the meta-analysis and the
research questions to be addressed. Meta-analysis is only as good as the individual studies from which it is composed. The tendency to overestimate the value of the results of a
meta-analysis without considering the individual studies should be avoided. Careful documentation of the procedures that were followed can minimize these problems.

Meta-Analysis Overview

107

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Conclusions
Over the past several decades, the accumulating body of human dimensions research has resulted in a more complete understanding of concepts such as satisfaction (Vaske et al., 1982),
motivation (Manfredo et al., 1996), crowding (Shelby et al., 1989; Shelby & Vaske, 2007),
and norms (Donnelly et al., 2000; Laven, Manning, & Krymkowski, 2005; Vaske & Donnelly, 2002). As multiple datasets have been generated, often using identical variables and
comparable study methods, integrative analytical approaches (e.g., comparative analysis)
have become possible and potentially productive. Meta-analyses compliment comparative
analyses by offering a rigorous methodology for quantitative research synthesis. The use of
formal meta-analysis techniques in the HD literature remains unexplored.
Although meta-analysis can be a structured process as shown by the four basic steps
(Figure 1), each step allows researchers to use their personal judgment. Thus, researchers
are given the opportunity to use their expertise on the subject matter (e.g., in determining
constraints on the studies to be included), which allows for practical significance to be
considered not only as a result of the meta-analysis but also as an integral part of the
process. Meta-analysis does have disadvantages including the amount of effort and expertise
required as well as the error and biases that result from quantitatively mixing studies.
These limitations, however, can generally be overcome by careful research, planning, and
interpretation.
Effect sizes are the most common summary statistic in meta-analyses. Although researchers agree that reporting an effect size should accompany reporting of statistical tests
(Gliner et al., 2001; Vaske et al., 2002), there is some disagreement about reporting effect
sizes for findings that are not statistically significant. Robinson and Levin (1997), for example, suggest only reporting effect sizes after statistical significance has been found. Their
rationale is that effect sizes for outcomes that are not statistically significant represent chance
deviations. We argue for computing effect sizes for all studies regardless of the statistical
outcome of the findings based on the value of this statistic for conducting meta-analyses. If
only statistically significant studies are reported in the literature, a meta-analysis performed
on a given topic will overestimate the effect sizes. Meta analyses performed across all studies (i.e., those with and those without statistical significance) will more accurately reflect
the strength of the relationship (Gliner et al., 2001; Schmitt, 1996; Vaske et al., 2002).

References
Beal, D. J., Corey, D. M., & Dunlap, W. P. (2002). On the bias of Huffcutt and Arthurs (1995) procedure
for identifying outliers in meta-analysis of correlations. Journal of Applied Psychology, 87(3),
583589.
Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme
for meta-analysis. Western Journal of Nursing Research, 25(2), 205222.
Campbell, D. T. & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research.
Chicago: Rand McNally.
Chen, P. Y. & Popovich, P. M. (2002). Correlation: Parametric and nonparametric measures.
Thousand Oaks, CA: Sage.
Cheung, S. F. & Chan, D. K. S. (2004). Dependent effect sizes in meta-analysis: Incorporating the
degree of interdependence. Journal of Applied Psychology, 89(5), 780791.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field
settings. Boston, MA: Houghton Mifflin.
Cooper, H. M. (1982). Scientific guidelines for conducting integrative research reviews. Review of
Educational Research, 52(2), 291302.
Cooper, H. M. (1984). The integrative literature review: A systematic approach. Beverly Hills, CA:
Sage.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

108

L. B. Shelby and J. J. Vaske

Cooper, H. & Hedges, L. V. (Eds.). (1994). The handbook of research synthesis. New York, NY:
Sage.
Copas, J. & Shi, J. Q. (2000). Meta-analysis, funnel plots and sensitivity analysis. Biostatistics, 1(3),
247262.
Donnelly, M. P., Vaske, J. J., Whittaker, D., & Shelby, B. (2000). Toward an understanding of norm
prevalence: A comparative analysis of 20 years of research. Environmental Management, 25(4),
403414.
Duval, S. & Tweedie, R. (2000). A nonparametric trim and fill method of accounting for publication
bias in meta-analysis. Journal of the American Statistical Association, 95(449), 8998.
Givens, G. H., Smith, D. D., & Tweedie, R. L. (1997). Publication bias in meta-analysis: A Bayesian
data-augmentation approach to account for issues exemplified in the passive smoking debate.
Statistical Science, 12(4), 221250.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher,
5(10), 38.
Glass, G. V. (1982). Meta-analysis: An approach to the synthesis of research results. Journal of
Research in Science Teaching, 19(2), 93112.
Glass, G. V. (2000). Meta-analysis at 25. Retrieved June 10, 2007, from http://glass.ed.asu.edu/gene/
papers/meta25.html.
Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills,
CA: Sage.
Gleser, L. J. & Olkin, I. (1994). Stochastically dependent effect sizes. In H. Cooper & L. V. Hedges
(Eds.), The handbook of research synthesis (pp. 339355). New York: Sage.
Gliner, J. A., Morgan, G. A., & Harmon, R. J. (2003). Meta-analysis: Formulation and interpretation.
Journal of the American Academy of Child and Adolescent Psychiatry, 42(11), 13761379.
Gliner, J. A., Vaske, J. J., & Morgan, G. A. (2001). Null hypothesis significance testing: Effect size
matters. Human Dimensions of Wildlife, 6(4), 291301.
Grissom, R. J. & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah,
NJ: Lawrence Erlbaum.
Hall, J. A. & Rosenthal, R. (1995). Interpreting and evaluating meta-analysis. Evaluation and the
Health Professions, 18(4), 393407.
Hall, J. A., Tickle-Degnen, L., Rosenthal, R., & Mosteller, F. (1994). Formulating a problem for a
research synthesis. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis
(pp. 1728). New York: Sage.
Halvorsen, K. T. (1994). The reporting format. In H. Cooper & L. V. Hedges (Eds.), The handbook
of research synthesis (pp. 425437). New York: Sage.
Hedges, L. V. (1982). Issues in meta-analysis. Review of Research in Education, 13, 353398.
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science,
7(2), 246256.
Hedges, L. V. (1994). Statistical considerations. In H. Cooper & L. V. Hedges (Eds.), The handbook
of research synthesis (pp. 2938). New York: Sage.
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hedges, L. V. & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological
Methods, 3(4), 486504.
Huffcutt, A. I. & Arthur, W. (1995). Development of a new outlier statistic for meta-analytic data.
Journal of Applied Psychology, 80(2), 327334.
Hunter, J. E. & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research
findings. Newbury Park, CA: Sage.
Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings
across studies. Beverly Hills, CA: Sage.
Iyengar, S. & Greenhouse, J. B. (1988). Selection models and the file drawer problem. Statistical
Science, 3(1), 109135.
Jackson, G. B. (1980). Methods for integrative reviews. Review of Educational Research, 50(3),
438460.
Kalaian, H. A. & Raudenbush, S. W. (1996). A multivariate mixed linear model for meta-analysis.
Psychological Methods, 1, 227235.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

Meta-Analysis Overview

109

Kraemer, H. C., Gardner, C., Brooks, J. O., III, & Yesavage, J. A. (1998). Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological
Methods, 3, 2331.
Laven, D. N., Manning, R. E., & Krymkowski, D. H. (2005). The relationship between visitor-based
standards of quality and existing conditions in parks and outdoor recreation. Leisure Sciences,
27, 157173.
Light, R. J. & Pillemer, D. B. (1984). Summing up: The science of reviewing research. Cambridge,
MA: Harvard University Press.
Light, R. J., Singer, J. D., & Willett, J. B. (1994). The visual presentation and interpretation of metaanalyses. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 439453).
New York: Sage.
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
Little, R. J. A. & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ:
Wiley.
Manfredo, M. J., Driver, B. L., & Tarrant, M. A. (1996). Measuring leisure motivation: A meta-analysis
of the recreation experience preference scales. Journal of Leisure Research, 28(3), 188213.
Matt, G. E. & Cook, T. D. (1994). Threats to the validity of research syntheses. In H. Cooper & L. V.
Hedges (Eds.), The handbook of research synthesis (pp. 503520). New York: Sage.
Murtaugh, P. A. (2002). Journal quality, effect size, and publication bias in meta-analysis. Ecology,
84(4), 11621166.
Orwin, R. G. (1994). Evaluating coding decisions. In H. Cooper & L. V. Hedges (Eds.), The handbook
of research synthesis (pp. 140155). New York: Sage.
Overton, R. C. (1998). A comparison of fixed-effects and mixed (random-effects) models for metaanalysis tests of moderator variable effects. Psychological Methods, 3, 354379.
Pearson, K. (1904). Report on certain enteric fever inoculation statistics. The British Medical Journal,
3, 12431246.
Pham, B., Platt, R., McAuley, L., Klassen, T. P., & Moher, D. (2001). Is there a best way to detect
and minimize publication bias? Evaluation and the Health Professions, 24(2), 109125.
Raudenbush, S. W. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.), The handbook
of research synthesis (pp. 301321). New York: Sage.
Robinson, D. H. & Levin, J. R. (1997). Reflections on statistical and substantive significance, with a
slice of replication. Educational Researcher, 26, 2126.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin,
86, 638641.
Rosenthal, R. (1984). Meta-analytic procedures for social research. Beverly Hills, CA: Sage.
Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 118(2), 183192.
Rosenthal, R. & DiMatteo, M. R. (2001). Meta-analysis: Recent developments in quantitative methods
for literature review. Annual Review of Psychology, 52, 5982.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research.
Cambridge, UK: Cambridge University Press.
Schafer, W. D. (1999). Methods, plainly speaking: An overview of meta-analysis. Measurement and
Evaluation in Counseling and Development, 32, 4361.
Schmidt, F. L. & Hunter, J. E. (1977). Development of a general solution to the problem of validity
generalization. Journal of Applied Psychology, 62, 529540.
Schmitt, N. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 114129.
Shadish, W. R. & Haddock, C. K. (1994). Combining estimates of effect size. In H. Cooper & L. V.
Hedges (Eds.), The handbook of research synthesis (pp. 261282). New York: Sage.
Shelby, B., Heberlein, T. A., Vaske, J. J., & Alfano, G. (1983). Expectations, preferences, and feeling
crowded in recreational activities. Leisure Sciences, 6(1), 114.
Shelby, B., Vaske, J. J., & Heberlein, T. A. (1989). Comparative analysis of crowding in multiple
locations: Results from fifteen years of research. Leisure Sciences, 11, 269291.
Shelby, L. B. & Vaske, J. J. (2007). Perceived crowding among hunters and anglers: A meta-analysis.
Human Dimensions of Wildlife, 12(4), 241261.
Snedecor, G. W. (1946). Statistical methods, 4th ed. Ames, IA: Iowa State College Press.

Downloaded by [IBA, Karachi] at 23:15 03 December 2014

110

L. B. Shelby and J. J. Vaske

Stock, W. A., Benito, J. G., & Lasa, N. B. (1996). Research synthesis: Coding and conjectures.
Evaluation and the Health Professions, 19, 104117.
Sutton, A. J., Song, F., Gilbody, S. M., & Abrams, K. R. (2000). Modeling publication bias in metaanalysis: A review. Statistical Methods in Medical Research, 9, 421455.
Tippett, L. H. C. (1931). The methods of statistics. London: Williams & Norgate.
Vaske, J. J. & Donnelly, M. P. (2002). Generalizing the encounter-norm-crowding relationship. Leisure
Sciences, 24, 255269.
Vaske, J. J., Donnelly, M. P., Heberlein, T. A., & Shelby, B. (1982). Differences in reported satisfaction
ratings by consumptive and nonconsujmptive recreationists. Journal of Leisure Research, 14,
195206.
Vaske, J. J., Gliner, J. A., & Morgan, G. A. (2002). Communicating judgments about practical significance: Effect size, confidence intervals and odds ratios. Human Dimensions of Wildlife, 7(4),
287300.
Vaske, J. J., Shelby, L. B., & Manfredo, M. (2006). Bibliometric reflections on the first decade of
Human Dimensions of Wildlife. Human Dimensions of Wildlife, 11(2), 7987.
Yeaton, W. H. & Wortman, P. M. (1993). On the reliability of meta-analytic reviews: The role of
intercoder agreement. Evaluation Review, 17(3), 292309.

Potrebbero piacerti anche