Sei sulla pagina 1di 25

ARTICLE

ORGANIZATIONAL
10.1177/1094428103257358
Cortina
/ MODERATORS
RESEARCH
IN META-ANALYSIS
METHODS

Apples and Oranges (and Pears, Oh My!):


The Search for Moderators in Meta-Analysis
JOSE M. CORTINA
George Mason University

The purpose of this article is to review current practices with respect to detection
and estimation of moderators in meta-analysis and to develop recommendations
that are driven by the results of this review and previous research. The first purpose
was accomplished through a review of the meta-analyses published in Journal of
Applied Psychology from 1978 to 1997. Results show, first, that practices with
respect to both the execution of and the reporting of results from searches for moderators are highly variable and, second, that findings relevant for detection of
moderators (e.g., percentage variance attributable to artifacts, SD, etc.) are often highly inconsistent with what has been suggested in the past. These practices
held regardless of time of publication, specificity of the question addressed in the
paper, and content area. Detailed suggestions for modifications of current practices are offered.
Keywords: meta-analysis; moderators; review; second-order-meta-analysis

Since its advent, meta-analysis has become the predominant form of literature review
in areas such as psychology, education, and medicine. The detection and estimation of
moderators is central to the interpretation of meta-analytic results in many cases. Moderators provide boundary conditions for the effects that are hypothesized, thus informing researchers of the situations in which the effects in question do and do not hold
(Cortina & Folger, 1998). The identification of such boundary conditions, if they exist,
is critical if meta-analytic results are to be generalized.
Given the amount of importance that our field tends to attach to meta-analytic findings, it is essential that we have agreed upon mechanisms for dealing with moderators
in meta-analytic studies. Nevertheless, a variety of authors have questioned the appropriateness of many of the methods that are typically used to identify moderators (e.g.,
James, Demaree, & Mulaik, 1986; James, Demaree, Mulaik, & Ladd, 1992; Sackett,
Harris, & Orr, 1987). The purposes of this article are to review current practices with

Authors Note: Thanks to Adam Winsler and Kim Eby for their helpful comments. Correspondence regarding this article should be sent to Jose M. Cortina, Department of Psychology, George Mason University,
Fairfax, VA 22030.
Organizational Research Methods, Vol. 6 No. 4, October 2003 415-439
DOI: 10.1177/1094428103257358
2003 Sage Publications

415

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

416

ORGANIZATIONAL RESEARCH METHODS

respect to detection of and testing for moderators in meta-analysis and to develop recommendations that are driven by the results of this review and previous research.
The article is organized as follows: First, a brief description of previous research on
the identification and estimation of moderators in meta-analysis is offered. Second, a
review of meta-analyses published in the Journal of Applied Psychology is used to
address the first purpose mentioned above. Third, the results of this review are evaluated in light of recommendations made in the past. Finally, a set of modified recommendations is offered as per the second stated purpose of the article.

A Brief History of Moderator Detection and Estimation


Let us begin by considering the alternatives for detecting and estimating moderators. The term detection is used here to mean the recognition, often post hoc, that moderators of a given relationship appear to exist through consideration of variability in
observed correlations, sample size, and other artifacts. A variety of detection options
has been offered. Among them are the 75% rule (moderators are present if less than
75% of observed variance is attributable to artifacts), the Callender and Osburn (1981)
bootstrap significance test for residual variability, the Hunter and Schmidt (1990) chisquare approximation (Q), the examination of lower credibility interval values, the
comparison of effect size variability between and within categories (Schmidt, Hunter,
& Caplan, 1981), the practical consideration of residual variability, and Marascuilos
(1971) U.
Various studies have evaluated one or more of these alternatives (e.g., Aguinis &
Whitehead, 1997; Huffcutt & Arthur, 1995; Hunter & Schmidt, 1994; James et al.,
1986; Johnson, Mullen, & Salas, 1995; Kemery, Mossholder, & Roth, 1987; Osburn,
Callender, Greener, & Ashworth, 1983; Raju, Pappas, & Williams, 1989; Sackett et al.,
1987; Spector & Levine, 1987; Switzer, Paese, & Drasgow, 1992; Whitener, 1990).
Although the specific set of alternatives varies across these studies, as a whole they
suggest the following.
First, the 75% rule is the most powerful of the commonly used moderator detection
techniques, although its power is not high if the difference in population correlations is
less than .2, or if k and mean N are small (e.g., Osburn et al., 1983; Sackett et al., 1986;
Spector & Levine, 1987). Second, power for other procedures is often very low, especially for the 90% credibility value (CV) test. Third, the 90% CV test addresses a different question from other procedures insofar as its primary focus is on whether a credibility interval includes zero (Kemery et al., 1987). Fourth, Type I error is inflated in
the 75% rule and in some other procedures but controlled for in the Callender and
Osburn (1981), chi-square, and U procedures (Sackett et al., 1986; Spector & Levine,
1987). Fifth, estimation of residual standard deviation and the standard deviation of
depends on artifacts corrected for, appropriateness/accuracy of those corrections, estimation procedure (e.g., Hunter & Schmidt, 1994; Raju et al., 1989; Switzer et al.,
1992), and consideration of outliers (e.g., Beal, Corey, & Dunlap, 2002; Huffcutt &
Arthur, 1995; James et al., 1992; Raju et al., 1989).
In addition to these moderator detection techniques, there exist various alternatives
for estimating specific moderator effects that have been coded for in the meta-analytic
data set (see Steel & Kammeyer-Mueller, 2002, for a recent review). The most common method in the organizational sciences is the subgroup meta-analysis in which
studies are categorized according to some substantive or methodological attribute.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

417

Meta-analyses are then conducted within categories. These subgroup meta-analyses


can be followed up by t tests, informal consideration of percentage variance attributable to artifacts (e.g., Huffcutt, Roth, & McDaniel, 1996), Hedges and Olkin (1985) Q
tests (e.g., Gerstner & Day, 1997), or more sophisticated techniques such as hierarchical linear modeling (van Eerde & Thierry, 1996). The primary limitation of subgroup
analysis is the need to categorize continuous moderators. Steel and KammeyerMueller (2002) reviewed techniques used to estimate moderators, whether continuous
or categorical, in meta-analysis and found that weighted least squares (WLS) regression is the most accurate estimator of moderator effects under a variety of conditions.
Unfortunately, this technique is seldom used in the organizational sciences.
Clearly, considerable guidance has been offered as to how to identify the presence
and nature of moderators in meta-analyses. The next section of this article describes a
review of meta-analyses published in the Journal of Applied Psychology (JAP). The
purpose of this review is to address three critical questions. First, how have authors
anticipated and planned for detection of moderators in their meta-analyses? Second,
how have authors presented information relevant for the detection of moderators in
their meta-analyses? Third, how has this information been interpreted?

What Has Been Done? A Second-Order Meta-Analysis


To address the three issues raised above, meta-analyses published in JAP were
coded for 18 variables. The variables coded for are listed in Table 1. Before moving to
the description of the coding procedure, two points should be made. First, it is in no
way my intention to cast work published in JAP in an unfavorable light. Much of the
most important work on meta-analysis has been published there (e.g., Callender &
Osburn, 1981; James et al., 1986; Johnson et al., 1995; Law, Schmidt, & Hunter,
1994a, 1994b; Raju & Burke, 1983; Sackett et al., 1987; Schmidt & Hunter, 1977;
Spector & Levine, 1987; Switzer et al., 1992; Wanous, Sullivan, & Malinak, 1989;
Whitener, 1990). Indeed, JAP is the ideal choice of target if for no other reason than
because those who publish there are at least as likely to be well informed with respect
to appropriate meta-analytic procedures as are those who publish in any other journal.
In other words, if JAP authors have difficulty in choosing analysis/reporting strategies,
it is likely that others will have similar difficulties. I revisit this issue in the Discussion
section.
Second, it is recognized that not all relationships are moderated to a substantial
degree by other variables. However, it is also the case that moderators unsought are
likely to be moderators undetected. As was mentioned above, existing techniques for
detecting the presence of moderators using variability of effect size values have considerable limitations. If the importance and generality that we typically attribute to
meta-analytic results is to be justified, efforts to consider at least the possibility of
moderators should be made. It is for this reason that both consideration of potential
moderators a priori and attention to information suggesting the presence of moderators
post hoc are important.
Coding. A total of 59 quantitative reviews (see the appendix) were included in the
present analysis. These 59 reviews contained a total of 1,647 meta-analyses, some of
which were subgroup meta-analyses (M = 27.45, range = 1 to 435). If the review containing what was by far the largest number of meta-analyses (Podsakoff, MacKenzie,

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

418

ORGANIZATIONAL RESEARCH METHODS

Table 1
Variables Coded for in Second-Order Meta-Analysis
1.

2.
3.
4.
5.
6.
7.

8.
9.
10.
11.
12.
13.
14.
15.

16.
17.
18.

Was the percentage variance attributable to artifacts (or the information necessary to compute it) presented? It should be mentioned that many studies failed to present this value but did present observed
variance, mean r, k, N, and artifact info that allowed the computation of the percentage values. These
were also coded yes.
If so, what was the percentage value?
Was the residual standard deviation [i.e., sqrt(sr2 s2artifacts)] or the information necessary to compute it
presented?
If so, what was the residual standard deviation? (This variable and the next were coded from metaanalyses based on correlations only.)
What was the standard deviation of (i.e., residual standard deviation divided by the compound attenuation factor)?
What artifacts were corrected for?
Was the analysis a zero-order, first-order, second-order, and so forth? For example, McDaniel,
Whetzel, Schmidt, and Maurer (1994) examined the relationship between interviews and job performance, training performance, and tenure. These overall analyses were coded as zero-order analyses.
Interviews were then broken down by amount of structure, and the analyses that resulted were coded
first-order analyses. The structured interviews were broken down further by whether or not ability test
information was available to the interviewer, and the analyses that resulted were coded second-order
analyses.
Was the analysis the lowest order analysis included in the study, the second lowest, the third lowest,
and so forth?
Was a criterion for concluding that a moderator exists specified?
If so, what was the criterion?
Were the data relevant for detection of moderators discussed?
Was a watered-down criterion used? That is, was a less stringent version of an existing procedure for
detecting moderators used (e.g., a 60% rule)?
Were a priori moderators specified?
How many substantive moderators were specified?
How many methodological moderators were specified? Substantive moderators were psychological
constructs of the sort that might be seen in a causal model. Methodological moderators were things
like civilian versus military or administrative versus research purpose
What procedures were used to test these moderators?
Were outliers removed?
What was the mean compound attenuation factor, computed either directly as the product of the individual attenuation factors or indirectly as the ratio of uncorrected to corrected r?

& Bommer, 1996) were excluded, the mean number of meta-analyses per study drops
to 20.54 and the range is 1 to 120. Analyses with and without the Podsakoff et al.
(1996) values were very similar. Therefore, the analyses reported here include values
from the Podsakoff et al. (1996) study. Although most of the analyses were conducted
at the individual meta-analysis level (n = 1,647), some questions were also addressed
at the paper level (n = 59).
Each of the 1,647 meta-analyses was coded for each of the 18 variables. My goal
was to code for any attributes that reflect the choices made by the author with regard to
moderators (e.g., were moderators specified a priori?), that influence the way that the
reader might make sense of the results (e.g., were moderator-relevant values discussed?), or that speak to widely held assumptions about meta-analysis (e.g., how
much variance was explained by artifacts?).
Although all of the items coded for were defined in an unambiguous manner, Items
7, 11, 12, 14, and 15 required some judgment. Thus, a second rater with experience in

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

419

Table 2
How Were Moderators Anticipated and Planned For?
Of Those That Specified a Criterion
Specified Criterion
1,054/1,647
(64)

75%
Rule

598/1,054
(57)

201/1,054
(19)

90% Credibility
Value

Conjunctive
Combination

Disjunctive
Combination

18/1,054
(2)

105/1,054
(10)

70/1,054
(7)

Of Those That Tested Moderators


A Priori
Moderators
Overall
314/1,647 (19)

A Priori Moderators
for Zero-Order
Relationships
136/675

(20)

Number of
Substantive
Moderators Tested

Number of
Methodological
Moderators Tested

443/1,072

629/1,072

(41)

(59)

Note: Percentages in parentheses.

meta-analysis independently coded these items for 5 of the studies (188 meta-analyses) reviewed. Of the 940 judgments, there were 10 disagreements, all of which were
resolved through additional discussion of the definition of Item 7.
It should be noted that even Items 7, 11, 12, 14, and 15 required relatively little judgment because of the way that they were defined. For example, Item 11 asks, Were the
data relevant for detection of moderators discussed? This may seem like it requires
considerable judgment, but because a study was coded as having discussed these
data if any interpretive text at all was offered, there was little basis for disagreement.
The same can be said of the other items: Because the definitions were so specific, there
was little room for disagreement.

Results
Results are presented in three sections: Anticipation of and Planning for Moderators in Meta-Analyses, Presentation of Information Relevant for Moderator Detection,
and Estimation and Interpretation of Moderator Information. Because of the amount
of information contained in each section, they begin with a summary of the results that
they contain. Each summary is followed by the specifics of the analyses that are relevant to them.
Anticipation of and planning for moderators in meta-analyses. Results of these
analyses are presented in Table 2. Although the details of these results are described
below, they can be summarized as follows. Approximately two thirds of meta-analyses
included specific criteria for detection of moderators. Of these, the vast majority used
either the percentage of variance attributable to artifacts or a chi-squared test of the null
hypothesis ( 2res = 0). A priori moderators were infrequent and were slightly more
likely to be methodological. Post hoc moderators were almost always methodological.
With regard to specifics, the first variable relevant for the issue of anticipation and
planning had to do with whether a particular criterion was specified for determining
existence of moderators. Of the 1,647 meta-analyses, 1,054 (64%) specified such a criterion. Although the percentage was higher for analyses of zero-order relationships
(88%), it dropped off sharply for more specific relationships (43%, 57%, and 9% for

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

420

ORGANIZATIONAL RESEARCH METHODS

first-order, second-order, and third-order relationships, respectively). Of the 1,054


analyses in which a criterion was specified, 57% specified the 75% rule, 19% specified a 2 of some kind (e.g., Q, U), 2% relied on a credibility interval does not contain
zero rule, and 6% specified some other rule (e.g., Salgado, 1997: sampling error
accounts for more than half of the observed variance). In addition, some of the studies
that specified a criterion used two or more of the above in combination: 10% used what
might be referred to as a conjunctive combination such that the existence of moderators was ruled out only if multiple criteria were met (e.g., Kozlowsky, Sagie, Krausz, &
Singer, 1997: 75% rule, nonsignificant Q, 90%CV > 0). The final 7% used what might
be referred to as a disjunctive combination such that the existence of moderators was
ruled out if any of multiple criteria were met (e.g., Schmidt, Hunter, & Caplan, 1981:
75% rule or 90%CV > 0).
Given the research reviewed previously, we can conclude that the 57% that used the
75% rule, along with those that used a conjunctive combination, would have the most
power to detect moderators but would also have the highest Type I error rates by a considerable margin. Those studies using the credibility interval does not contain zero
rule or a disjunctive combination would have the least power and the lowest Type I
error rates, and those studies using one of the 2 tests would be somewhere in between
on both counts.
Studies were also coded as to whether they made reference to a watered down
version of a conventional criterion. Of those that reported a criterion, 7.4% (78 metaanalyses from five studies) made use of a scaled down version of a conventional criterion. Watered-down criteria included a 60% rule, a 50% rule, and an informal comparison of within-group and between-group variance.
Also relevant for the issue of moderator anticipation is the extent to which a priori
moderators were proposed. Certainly there is no rule stating that moderators must be
present in a meta-analysis, but given the importance that is often attached to metaanalytic results and the lack of variance that is typically attributable to artifacts (see
below), it is critical that opportunities to consider potential moderators not be missed.
As can be seen at the bottom of Table 2, of the 1,647 meta-analyses coded, only 314
(19%) offered a priori moderator variables to be considered. Moreover, of the 675
zero-order analyses, only 136 (20%) offered a priori moderators. Thus, it would
appear that the possibility of moderators is considered prior to examination of the data
only infrequently, and such consideration is no more likely in meta-analyses of general
topics than in meta-analyses of specific topics.
Finally, the nature of moderators that were tested was examined. Both a priori and
post hoc moderators were included in this analysis; 1,072 moderators were examined
in 419 meta-analyses. Of the 1,072 moderators, 443 were coded substantive and 629
were coded methodological. The ratio of substantive to methodological moderators
differed depending on whether moderators were suggested a priori. When moderators
were suggested a priori, 45% were substantive in nature, whereas the other 55% were
methodological. When moderators were post hoc, only 15% were substantive. This
finding highlights the importance of a priori moderator consideration. It may be that
without such consideration, substantive moderators are likely to be missed.
Presentation of information relevant for moderator detection. One of the keys to
understanding how authors go about investigating moderators is the presentation of
moderator-relevant information. The focus in this article is on the percentage of vari-

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

421

Table 3
How Have We Presented Information?
Was percentage variance information presented?
Was SDres information presented?
Were relevant values discussed?
Information presentation by study
1. Did not present percentage of variance attributed to by sampling error
2. Did not present residual SD
3. Did not present specific moderators criterion
4. Did not present a priori moderators for zero-order moderators identified
5. Did not discuss relevant information
Did not discuss failed relevant information in zero-order analysis
Did not discuss failed relevant information in 1, 2, 3, or 5 above

1,294/1,647 (78)
1,288/1,647 (78)
604/1,647 (37)
16/59 (27)
15/59 (25)
34/59 (58)
17/49 (35)
38/59 (64)
15/49 (31)
48/59 (81)

Note: Percentages in parentheses.

ance attributable to artifacts, the residual standard deviation, and the discussion of values relevant for moderator detection. Once again, many studies did not present the percentage of variance attributable to artifacts or the residual standard deviation but did
provide mean correlations, observed variance in correlations, k, mean sample size, and
relevant artifact information. This information could then be used to compute artifact
variance, residual standard deviation, and the standard deviation of (SD) using
standard formulas presented in Hunter and Schmidt (1990). The residual standard
deviation value was the square root of the difference between variance in observed values and variance expected from variability in artifacts. Because SD is estimated by
the ratio of residual standard deviation to the mean compound attenuation factor and
because the mean compound attenuation factor was almost always available from the
ratio of uncorrected to corrected mean correlations if nowhere else, SD could almost
always be computed when the residual standard deviation was available.
Percentage information is presented in Table 3. This information can be summarized as follows: Percentage variance values and residual standard deviation values
were omitted in approximately 22% of the 1,647 meta-analyses. Discussion of values
relevant for the detection of moderators was presented in 37% of the 1,647 meta-analyses. This percentage is higher for the studies that specified a criterion for moderator
detection but lower for analyses involving zero-order relationships. Finally, the vast
majority of the 59 studies reviewed here omitted one of these pieces of information
from one of more of the meta-analyses on which they report, and these omissions were
not driven by content area.
With regard to specifics, percentage variance attributable to artifacts and the residual standard deviation (or the information necessary to compute them) was presented
in 1,294 and 1,288 of the 1,647 meta-analyses respectively (78.6% and 78.2%). Thus,
although most studies reported this information, approximately 22% did not.
Studies were also coded on whether values relevant for detection of moderation
were discussed, in which discuss simply means something more than a mere mentioning of the values. Of the 1,647 meta-analyses that were coded, only 604 (37%) provided discussion of values relevant for the detection of moderators. This percentage is
higher for the studies that specified a criterion for moderator detection (526 out of
1,054 or 50%). Oddly, this percentage is lower for analyses involving zero-order relationships (187 out of 675, or 28%).

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

422

ORGANIZATIONAL RESEARCH METHODS

Finally, it is possible that many of the between-analysis differences that have been
identified so far may be better described as between-study differences. Each of the 59
meta-analysis studies reported on here was examined for each of five anticipation and
reporting problems identified thus far: failure to report percentage variance attributable to artifacts, failure to report residual standard deviation, failure to specify a moderator criterion, failure to present a priori moderators for zero-order analyses, and failure to discuss relevant moderator information. Results of these analyses are presented
at the bottom of Table 3.
As can be seen, about one quarter of the 59 studies failed to provide information on
percentage variance attributable to artifacts in one or more of the meta-analyses that
they contained. The same can be said for residual standard deviation. In addition, more
than half of the 59 studies failed to specify a criterion for deciding whether moderators
were present in one or more of the meta-analyses that they contained. Table 3 also
shows that about one third of the 49 studies that contained zero-order meta-analyses
failed to generate a priori moderators for one or more of those zero-order analyses.
Because the zero-order relationships are the most general, they are the most likely to
contain moderators. Thus, the fact that so many studies containing such relationships
failed to generate moderators is disconcerting.
Almost two thirds of the 59 studies failed to discuss the values relevant for moderator detection. The situation is better for zero-order analyses, in which 31% of the studies that contained zero-order analyses failed to discuss relevant values for one or more
of them.
The final value contained in Table 3 provides a more global picture of information
presentation in the 59 studies; 81% of the 59 studies failed to do one or more of the following: present percentage of variance attributable to artifacts, present residual standard deviation, present specific moderator criterion, or discuss relevant values. Thus,
only rarely did a meta-analytic study present and discuss all of these sources of information critical to the detection of moderators.
Before concluding this section, it should be mentioned that the 59 studies were also
coded for topic using the coding scheme adopted for the 16th Annual Conference of
the Society for Industrial and Organizational Psychology (Society for Industrial and
Organizational Psychology, 2000). Examination of study by topic suggested that presentation omissions were not tied to particular content areas. Thus, it does not appear
that presentation habits were tied to content area.
Estimation and interpretation of moderator information. Results with regard to
estimation and interpretation, contained in Tables 4 and 5, can be summarized as follows. Relatively little variance is typically attributable to artifacts, leaving a sizable
residual standard deviation and SD. When moderators are tested for, they are usually
tested through subgroup analysis. Outlier removal is rare but simplifies the ruling out
of moderators. Finally, examination of changes over time suggest that the questions
addressed with meta-analysis have become more general with time but that the number
of effect size values included has become smaller.
With regard to specifics, to address this issue, we must first know something about
the information being tested and interpreted. Thus, let us begin by examining levels of
percentage variance attributable to artifacts, levels of residual standard deviation, and
levels of SD. This is followed by examinations of the methods used to test for moderators, the treatment of outliers, and changes over time.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Table 4
Amount of Variance Attributable to Artifacts, Residual Standard Deviation Values, and SD Values
Overall Meana
Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Percentage variance attributed to artifacts

21.7
(544)
.122
(512)
.160
(502)

Residual SD

SD

Zero-Order Meanb
19.27
(171)
.116
(149)
.186
(143)

First-Order Mean
24.80
(215)
.126
(195)
.137
(191)

Second-Order Mean
21.0
(154)
.123
(164)
.179
(164)

Higher Order Mean


25.51
(4)
.078
(4)
.078
(4)

Artifacts Corrected for in the Meta-Analysis


SE Only
Percentage variance
attributed to artifacts
Residual SD

SD

14.9
(197)
.128
(197)
.128
(197)

SE + ryy
18.80
(21)
.134
(21)
.175
(21)

SE + rxxryy
24.21
(49)
.102
(44)
.127
(44)

SE + ryy+RR b
38.17
(38)
.110
(38)
.182
(38)

SE + rxxryy+RR
63.57
(146)
.159
(146)
.308
(143)

SE + rxxryy+RR+zc

SE + Other Combination

Insufficient data

27.4
(42)
.093
(59)
.130
(59)

Insufficient data
Insufficient data

Note: k values are in parentheses; SE = sampling error.


a. As suggested by a reviewer, these are harmonic means as opposed to arithmetic means. Also, because some studies rounded percentage values to 100% and residual SD values to 0, precise values could not be computed for these studies. Thus, data from these studies were omitted from the harmonic mean computations.
b. Values in parentheses are the number of meta-analyses included in the analysis.
c. The additional artifact represented by z was typically correction for artificial dichotomization or unequal category sample sizes.

423

424

Table 5
Trends in Reporting and Interpretation Practices
Correlationsa With Study Number Variable
Percentage Variance
Attributable to Sampling Error

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Study number

.184

SD

.049

.236

Criterion
Specification

Analysis Order b
.289

Discussion of
Relevant Values

.126

.355

Frequency by Year Block (in percentages)


Variance Information
Presented

Residual SD
Information Presented

1978-1984

77.5

75.0

1985-1989

82.8

84.2

1990-1994

54

50.1

1995-1997

89.5

92.8

What Artifacts Were Corrected For


SE only:
ryy:
rxxryy:
ryyRR:
rxxryyRR:
rxxryyRR+z:
SE only:
ryy:
rxxryy:
ryyRR:
rxxryyRR:
rxxryyRR+z:
SE only:
ryy:
rxxryy
ryyRR:
rxxryyRR: 1
rxxryyRR+z:
SE only:
ryy:

25.4
6.1
1.4
22.1
37.9
0
14.9
29.0
31.7
18.6
0
0
20.7
2.6
41.5
1.8
3.8
13.3
9.6
4.2

What Criterion Was Used


75% rule:
2
:
90%CV:
Conjunctive:
Disjunctive:

45.5
33.8
0
0
13.6

75% rule:
2:
90%CV:
Conjunctive:
Disjunctive:

22
30.8
0
2.2
45.1

75% rule:
2:
90%CV:
Conjunctive
Disjunctive:

26.1
42
0
21.7
0

75% rule:
2:

68
10.6

rxxryy:
ryyRR:
rxxryyRR:
rxxryyRR+z:

76.4
0
6.1
0

90%CV:
Conjunctive:
Disjunctive:

2.6
12.9
0

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Note: S.E. = sampling error; CV = credibility value.


a. Correlations involving percentage of variance and residual standard deviations are equal to 1 times the correlation between the study number variable and the reciprocal of the variable in question.
b. Analysis order refers to the specificity of the analysis. A lower order analysis is more general than a higher order analysis.

425

426

ORGANIZATIONAL RESEARCH METHODS

As is shown in Table 4, for those studies from which this information could be
taken, the mean percentage variance value was 21.7.1 The mean residual standard deviation value was .122 and the mean SD value was .160. Thus, contrary to claims by
Hunter and Schmidt (1990) and others, a relatively small percentage variance is typically attributable to artifacts, and considerable variability remains after correction of
variance for artifacts.
To examine this issue further, the 1,647 meta-analyses were broken down according to whether they were zero order, first order, second order, or higher order as defined
in Table 1, with higher order analyses addressing more specific questions than lower
order analyses. Mean percentage variance, residual standard deviation values, and
SD values were then computed for each category. This information is also presented
in Table 4. Although there is a slight tendency for the percentage variance attributable
to artifacts to increase as the studies get more specific, there is no trend with respect to
the standard deviations. Thus, it would appear that breaking studies down by potential
moderators does little in the way of increasing our confidence in the notion that metaanalyses typically isolate those studies producing values sampled from a single population.
Studies were also broken down according to the artifacts for which they corrected.
This information is also presented in Table 4. As can be seen, there was a general tendency for the percentage variance attributable to artifacts to increase as a function of
the number of artifacts included. However, no combination of artifacts resulted in an
average percentage higher than 64%. No clear pattern emerged for the standard deviations.
We can now ask about the tests that were actually conducted. Of the 419 metaanalyses in which moderators were suggested (either a priori or post hoc), 399 actually
tested for them. In the other 20 meta-analyses, moderators were not tested because of a
lack of relevant information. Of the 399 meta-analyses that tested for moderators, the
vast majority (85%) used subgroup meta-analysis.
An additional issue relevant for interpretation and estimation is the treatment of
outliers. Various authors have suggested that outliers be routinely removed from data
sets so that the distributions conform better to expectations (e.g., Huber, 1981; Wilcox,
1997). One might feel, however, that this practice is suspect when it is known beforehand to increase the chances of finding results that correspond with hypotheses
(Cortina & Gully, 1999). Nevertheless, some authors have suggested that this practice
be extended to meta-analysis. To this end, Huffcutt and Arthur (1995) developed a procedure for determining the number of effect sizes in a meta-analysis that might be
labeled outliers. If these outliers are removed, then the observed variance of effect
sizes, and therefore the residual variance and variance of , are certain to decrease.
This in turn makes conclusions of cross-situational consistency easier to reach. Of
course, another way of phrasing this is that the test for existence of moderators, which
has been shown to be quite low in power for many situations, will be even lower if outliers are routinely removed.
The prevalence of outlier identification and removal in the 1,647 meta-analyses
under consideration was examined. Outliers were removed from 91 of the 1,647 metaanalyses (5.5%). Although this is not a large percentage, the percentage variance
attributable to artifacts and the residual standard deviation associated with the metaanalyses in which these values were reported and from which outliers had been
removed were also examined. In the 44 relevant meta-analyses that reported informa-

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

427

tion on variance attributable to artifacts and removed outliers, the mean percentage
variance attributable to artifacts was 30.03%. In the 33 relevant studies that reported
residual standard deviation, the mean residual standard deviation value was .087, and
the mean SD was .095. Thus, the mean percentage variance attributable to artifacts
value is larger than the value for studies in general (21.7%), and the residual standard
deviation and SD values are considerably smaller than their corresponding overall
values (.122 and .160, respectively). Clearly, removal of outliers makes conclusions of
cross-situational consistency easier to draw. Whether such ease is warranted (or
desired) is another matter.
Finally, we might consider trends in estimation and interpretation over the 20-year
period covered by this review. Trends were examined in two related ways. First, certain
study characteristics were correlated with the chronological order of the study. Second, chronology was broken down into four groups: 1978-1984, 1985-1989, 19901994, and 1995-1997. These groups are of different sizes so that they might include
similar numbers of meta-analyses. These results are presented in Table 5.
The percentage variance attributable to artifacts has actually decreased with time.
This might be surprising given the expectation that meta-analyses would have gotten
more specific over time. However, there also exist negative correlations between study
number and both k and analysis order. This means that higher order analyses were
more common in the past than they are now and that authors of more recent meta-analytic studies include fewer studies in their meta-analyses than did authors of less recent
meta-analytic studies. It appears, therefore, that more recent authors are asking more
general questions with meta-analysis and are answering them with fewer effect size
values. Finally, later studies are somewhat more likely to specify a criterion for detecting moderators. However, later studies are also less likely to discuss information relevant for that detection.
Table 5 also contains descriptive data for the four time blocks mentioned earlier.
The only trend that seems to emerge is that correction for range restriction is less common than it used to be. This may be due in part to the fact that whereas early metaanalyses often dealt with personnel selection situations in which range restriction was
a critical factor, later meta-analyses focused on a variety of areas, some of which had
little concern for range restriction. As for the Criterion column, the only trend appears
to be that the low power Disjunctive combinations of criteria for detecting moderators have fallen out of favor.

Summary
The preceding section of the article included an examination of the ways that
researchers have dealt with moderators in their meta-analyses. The following conclusions can be drawn. First, authors usually fail to provide all information relevant for the
detection of moderators. Second, a smaller percentage variance in effect size values is
attributable to artifacts than has been suggested in the past. Third, residual variability
and variability in is often considerable. Fourth, the 75% rule is still the most commonly used criterion for detecting moderators. Fifth, although post hoc moderators are
usually methodological, about half of a priori moderators are substantive in nature.
Sixth, subgroup meta-analysis is still the most common vehicle for estimating specific
moderators. Finally, authors have come, over time, to address more general questions

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

428

ORGANIZATIONAL RESEARCH METHODS

with fewer effect sizes, and authors have grown less inclined to discuss the values relevant for detection of moderators.

Discussion
The purpose of the review described above was to examine how authors have gone
about conducting meta-analyses. Although many of the practices uncovered by the
review appear sound, others are misguided. Before moving on to recommendations, I
should recognize that this article might be criticized for reviewing only studies published in JAP. In defense of this choice, I would point out that JAP is considered to be
the flagship journal in industrial and organizational psychology. In addition, much of
the meta-analysis wisdom in the organizational sciences has come from articles published there. In an attempt to further support the generalizeability of these results, I
examined the list of first authors of the 59 meta-analyses on which I report. Of the 48
first authors (some had multiple meta-analytic studies published in JAP), the overwhelming majority has published in other journals in the field (e.g., Personnel Psychology, Academy of Management Journal, Academy of Management Review, and
Organizational Research Methods), and 17 have served as editorial board members
and/or editors for journals such as these. Several of the authors of the meta-analyses
included in the present review have published meta-analyses in these other journals,
and there is no reason why the standards of these authors would be watered down for
their JAP submissions. Thus, problems associated with meta-analyses published in
JAP are unlikely to be unique to JAP.
In the remainder of this article, extensive recommendations with respect to the
search for moderators in meta-analysis are offered. Of course, it is neither possible nor
advisable to suggest a single method for dealing with moderators in meta-analysis.
The purposes here are to examine the options, highlight the criteria by which the
options might be judged, and consider the options in light of those criteria.
Recommendations. Although previous authors have made recommendations
regarding the conduct of meta-analyses (e.g., Hedges & Olkin, 1985; Hunter &
Schmidt, 1990; Orwin & Cordray, 1985; Slavin, 1984; Wanous et al., 1989), few have
been specific to moderator identification and estimation. Certainly, there exists no single compendium of best practices for the treatment of moderators in meta-analysis.
The results of the present study suggest that such recommendations are needed. The
remainder of the article is devoted to such recommendations.
The recommendations that might be offered depend on the purpose of the metaanalysis. Is ones goal parameter estimation or hypothesis testing? In meta-analytic
moderator terms, is ones goal cross-situational consistency or transportability?
Kemery et al. (1987) made the distinction between validity generalization (also known
as transportability) and cross-situational consistency. Cross-situational consistency is
said to exist if true validity variance equals 0. Even if cross-situational consistency is
not supported, however, transportability may still be possible. It may be the case that a
group of sample correlations come from one of two populations, the first with = .3
and the second with = .5. True validity variance is clearly greater than zero, but validity generalization is still possible as long as validity is defined in general terms.
If ones only interest lies in transportability, then a credibility interval that does not
contain zero provides reasonable evidence. However, it is important to impose two

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

429

additional conditions. First, the meta-analysis must contain sufficient between-study


variability on relevant variables to warrant conclusions of transportability. Second, the
95% credibility interval should be used. There is nothing sacred about 95%, but the
same can be said for 80%, which is the interval referenced by the 90% credibility value
often reported in meta-analyses. Given that 95% is traditional and that both are arbitrary, the only reason to make the low power test for moderators less powerful by scaling it back to 80% is to allow us to generate values that we would prefer to see. This
hardly seems a justification with scientific merit.
If, instead, our goal is parameter estimation, then the search for moderators must be
organized around the notion of cross-situational consistency. Cross-situational consistency suggests a definition of moderation that is more consistent with traditional definitions (e.g., a relationship between two variables that varies with a third variable as
opposed to a relationship equaling zero in certain cases). As such, it is best approached
through the development of a priori moderator hypotheses. These hypotheses should
be tested through whichever means is most appropriate given the nature of the moderator and error constraints regardless of the percentage of variance attributable to artifacts, the standard deviation of , and so forth. There are two reasons for this suggestion. First, conclusions about moderators, as with all scientific conclusions, should be
based on theory as well as data. Second, it is possible for distributions of observed
effect size values to both mimic chance distributions and yield sizable subgroup differences. This issue is revisited at the end of the article.
Even if a priori moderator hypotheses are not available, however, the importance
that is typically attached to meta-analytic findings makes the identification of moderators of paramount importance. As was mentioned in the introductory paragraphs, moderators unsought are likely to be moderators undetected. The proper beginning for
such an examination is the specification of criteria for concluding that moderators are
present. All available criteria have advantages and disadvantages. Consider first the
most commonly used of these, the 75% rule. The procedure based on this ratio has
been shown to have higher power values than do alternative procedures along with a
corresponding Type I error rate tradeoff (Sackett et al., 1986). However, as L. James
and his colleagues have pointed out (James, Demaree, Mulaik, & Mumford, 1988;
James et al., 1986), the procedure based on the 75% rule (or any percentage cutoff)
contains a fallacy known as the affirming the consequent fallacy. The logic of the
procedure is thus: If there were no situational specificity, then the variance ratio is
expected to exceed .75. The variance ratio exceeds .75. Therefore, there is no situational specificity. In more general terms, if A then B; B, therefore A. This is a bastardization of either the Modus Ponens or Modus Tollens rules of syllogistic reasoning.
The form of Modus Ponens is the following: If A then B; A, therefore B. The form of
Modus Tollens is as follows: If A then B; not B, therefore not A. The logic implied by
the 75% rule contains incompatible components of both of these rules and is therefore
fallacious. The antecedent in a statement of implication is only one of many possible
avenues to the consequent, so the latter does not imply the former. With regard to the
75% rule, there are many possible avenues to a variance ratio exceeding .75. One is a
lack of moderators, but there are others, such as small average sample sizes (James
et al., 1988), lack of between-study variability on a given moderator, and relatively
small differences between population effect sizes. Thus, it is entirely possible to suffer
the consequences of this fallacious reasoning by concluding that there was no moderator when in fact there was.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

430

ORGANIZATIONAL RESEARCH METHODS

The 2 tests do not suffer from the same logical problems. Their logic can be summarized as follows: If SDres (or SD) were equal to 0, then the amount of variability in
observed effect sizes would probably not exceed a certain value. The amount of variability in observed effect sizes does exceed that value. Therefore, SDres probably does
not equal zero. This is a probabilistic version of the Modus Tollens rule. The probabilistic nature of these statements can cause problems, but this is not especially likely
(Cortina & Dunlap, 1997). Nevertheless, the 2 tests do have more serious drawbacks.
First, they tend to be low in power. Sackett et al. (1986) showed that although the procedure based on a 90% rule (which is sometimes adopted when many artifacts are corrected for) has low power for many situations, the 2 test has adequate power only if the
difference between population effect sizes is large or if the difference is moderate and
the number of effect sizes is large.
The other, related problem with the 2 test is that it does, in some sense, penalize the
researcher with the good design. Assuming that one wishes to generalize the mean
effect size value obtained in a given meta-analysis, evidence of moderation is not
desired. However, the power of the 2 test to detect moderators increases with N and k.
Thus, the best way to avoid detection of moderators is to meta-analyze a small number
of small sample studies. This hardly seems a justifiable practice.
So, what should meta-analysts do? The 75% rule, the 2 test, and all other criteria
have advantages and disadvantages. The 2 test and, to a lesser degree, the 75% rule,
often have very low power. The 75% rule also has the logical problems described
above. Given the importance of identifying boundary conditions for the meta-analytic
values to which we tend to attach a great deal of weight, the failure to detect moderators when they do in fact exist (i.e., a Type II error) seems especially egregious. One
way to improve the power to detect moderators would be to adopt a more conjunctive
approach. For example, logical problems notwithstanding, one might base conclusions, namely, moderators on multiple criteria, such as the percentage of variance
attributable to artifacts, the 2 test, and SD, and do so in the following way. If the predetermined percentage variance cutoff is met and the 2 test is nonsignificant (i.e., a
conjunctive rule), then one may conclude that moderators are not present. This will
result in a more powerful search for moderators. If such an approach were adopted, it
would be wise to then examine the SD value. The reason for this is that there are cases
in which a small percentage variance can be attributed to artifacts, but SD is, in fact,
quite small. For example, a meta-analysis reported in Ones, Viswesvaran, and Reiss
(1996) resulted in a percentage variance value of 73%, but the standard deviation of
was only .034, resulting in a very small credibility interval. Any selection of a cutoff
for SD is somewhat arbitrary, but .05 would satisfy common practical criteria. The
95% credibility interval that results has a width of about .2 correlation units, and many
of the conclusions that are drawn are only slightly affected by correlation differences
of .2.
This is not to say that an approach involving the percentage variance attributable to
artifacts, the 2 test, and SD is the single best approach. Indeed, subsequent research
similar to that conducted by Sackett et al. (1987) and Spector and Levine (1987) should
be conducted in an attempt to determine the approaches that minimize error to the
greatest degree. If, however, one believes that the level of power associated with commonly used techniques is unacceptable, then some alternative must be employed.
The SD might be put to other uses as well. The SD represents the standard deviation of population correlations. Unfortunately, it is difficult to point to any given SD

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

431

value and conclude that moderators need not be pursued. A similar problem arises in
factors analysis with respect to the determination of the number of factors present and
in structural equation modeling with respect to model fit. One of the common conventions for deciding which factors or components to retain in a factor or component analysis is the eigenvalue > 1 convention. This convention stems from work by Kaiser
(1960) and Guttman (1954) in which it was shown that a factor with an eigenvalue = 1
has a reliability of zero. Thus, this can be thought of as a point of no return criterion
such that we may not think much of factors with eigenvalues only slightly larger than 1,
but we can rest assured that any factor whose eigenvalue is not greater than 1 should
not be retained. Although it can certainly be argued that Kaisers criterion is overextended (Cattell & Vogelmann, 1977; Fabrigar, Wegener, MacCallum, & Strahan,
1999; Zwick & Velicer, 1986), it is useful as a baseline value for ruling out factors/
components.
In structural equation modeling, indices of model fit, such as the normed fit index
(Bentler & Bonnet, 1980), involve a comparison of the residual matrix of the hypothesized model to the residual matrix of a null model.2 Although the definition of null
can be debated (Mulaik, James, Van Alstine, Bennett, Lind, & Stillwell, 1989), the
most common null model is one in which all linkages between observed and latent
variables, and all linkages among latent variables, are set to zero. The null model,
therefore, represents a worst case scenario. Model fit is then a function of the decrease
in residuals as one goes from this worst case scenario to a hypothesized model. Any
hypothesized model that fails to improve substantially on the null model has little to
recommend it.
It would be useful to have an apples and oranges moderator detection statistic
analogous to the eigenvalue = 1 or the null model residual matrix. There appears to be
no way of generating values from an assumed distribution of correlations because
there is no basis for any particular assumption. A uniform distribution is inappropriate
because extreme effect sizes are rare, and there is no compelling argument for any
other distribution either. As a result, a set of point of no return values were empirically generated. The intent was to randomly sample effect size values and meta-analyze
them, thereby generating percentage variance attributable to artifacts values, residual
standard deviation values, and SD values that represent the quintessential apples and
oranges meta-analysis.
To this end, one correlation value from each of the first two empirical, primary studies reported in the first issue of each volume of JAP from 1978 to 1997 was selected,
resulting in 40 correlations. For each study, a correlation table was randomly selected
(if there were multiple tables), and a correlation was randomly selected from the table.
The only substantive exclusion criterion was that the value could not represent the correlation between conceptually identical variables (e.g., test-retest reliabilities). It
should also be noted that variables were rarely repeated across correlations.
The resulting values were as follows. The observed variance in correlations was
.0378, whereas the variance attributable to sampling error was .0022. Hunter and
Schmidt (1990) claimed that the vast majority of variance attributable to artifacts is
attributable to sampling error. Given that the formula for variance attributable to artifacts other than sampling error is the product of the square of the mean value, the
square of the compound attenuation factor, and the sum of the coefficients of variation
for the individual attenuation factors (Hunter & Schmidt, 1990, p. 176), the variance
attributable to artifacts other than sampling error will seldom exceed .001. For exam-

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

432

ORGANIZATIONAL RESEARCH METHODS

ple, the average amount of variance attributable to artifacts other than sampling error
in the Ones et al. (1996) study was .0002. In any case, it is possible to estimate roughly
the variance in the 40 observed correlations attributable to artifacts other than sampling error by using the mean sample size weighted correlation, information from the
review of JAP meta-analyses, and assumed artifact distributions to fill in the values in
the formula just mentioned. The average compound attenuation factor from the review
of meta-analyses was .71, and this value can be used to correct the sample size
weighted mean of the 40 correlations for attenuation (.094/.71 = .132). The square of
this value provides the first component of the formula. The square of the average compound attenuation factor provides the second component. Although the relevant data
for the third component were not coded for, a conservative estimate of .0368, which is
the mean of the V values for the assumed distributions involving ryy = .6, rxx = .8, and
ratios of restricted to unrestricted standard deviations of either 1.0 or .59, can be taken
from Pearlman, Schmidt, and Hunter (1980). This last value is, nevertheless, always
small and of little consequence.
The three values needed to estimate variance attributable to artifacts other than
sampling error are, therefore, .1322, .712, and .0368. Their product, .0003, provides the
required value and is consistent with the notion that at least 90% of the variance attributable to artifacts is attributable to sampling error.
Thus, residual variance is estimated as .0378 .0022 .0003 = .0353, the residual
standard deviation is estimated as .19, and 6.6% of the variance in correlations is attributable to artifacts. If we divide .19 by the compound attenuation factor from the review
of meta-analyses, we have an SD value of .265 and, therefore, a 95% credibility interval value of 1.96 .265 = .519. These values are far from being out of reach. For
example, 21 of the 59 studies reviewed earlier contained a total of 148 meta-analyses
that yielded residual standard deviation values that equaled or exceeded the baseline
value of .19. Other, similar analyses would likely yield different baseline values, but
there is no reason to expect large discrepancies (a study is currently under way to investigate this issue). Thus, these values might be used as baseline values such that if our
residual standard deviation is no smaller than .19 (or our SD value is no smaller than
.265), then the mean correlation must be regarded as uninterpretable because it is a
mean of sample values that are no less discrepant than would be values taken from k
populations.
Note that the purpose served by this baseline value is different from that served by
the .05 recommendation made earlier. The baseline value is an absolute maximum in
the same way that the Kaiser criterion or the null model residual matrix is an absolute
minimum. The effect size estimation from a meta-analysis that yields a residual standard deviation value that is slightly smaller than .19 or an SD value that is slightly
smaller than .265 may well be uninterpretable, but these values can act as hard cut
points. The .05 suggestion is meant to lie at the other end of this continuum such that if
SD is less than .05, then there is little empirical reason to be concerned about interpretation of effect size estimates.
The previous several paragraphs have dealt with issues of detecting moderators.
Next, there is the issue of presentation. It should go without saying that all values relevant for decisions relating to moderators should be reported. Specifically, every metaanalysis should include some combination of the following: observed variance of
effect size values, residual variance of effect size values, percentage variance attributable to sampling error, percentage variance attributable to each other artifact consid-

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

433

ered in the study, corrected and uncorrected effect size values, compound attenuation
factors, and confidence and credibility intervals. If space permits, then all this information should be presented. At the very least, a subset should be presented that allows
the computation of the remaining values in the list (e.g., presentation of mean corrected and uncorrected effect size values allows computation of the compound attenuation factor).
The number of studies (k), N, and the criterion used to determine the existence of
moderators should also be included. Also included should be evidence of a priori consideration of moderators. It may be that no plausible moderators present themselves.
Nevertheless, the importance that is attached to meta-analytic findings makes paramount the concern over the identification of boundary conditions, and these cannot be
identified if they are never considered. This is very important for zero-order analyses
and less so for higher order analyses. Of course, this implies that studies to be included
in meta-analyses must be coded for information relevant to potential moderator variables.
If there is empirical evidence of moderators, then above all else, this evidence
should be discussed. It should be dismissed as unimportant only if there are overwhelming theoretical and empirical reasons to do so. Theoretical reasons may be rare
because they would involve a hypothesis of no effect and would contradict the data.
Empirical reasons are possible in the form of outliers. As was mentioned earlier, there
are a variety of opinions on the subject of outliers and what to do when they have been
identified. There should, nevertheless, be some substantive (as opposed to purely
empirical) grounds for deletion of an outlier, even if those grounds are discovered after
the fact. The grounds for deletion should be presented, and results should be generated
and discussed with and without the outliers in question.
As a last note on presentation, it is wise to use the phrase percentage variance
attributable to artifacts rather than the phrase percentage variance accounted for by
artifacts. The latter suggests that artifacts do account for observed variance when in
fact we do not know if this is true. We choose to attribute variance to artifacts or not.
Finally, there are the issues of estimation and interpretation. The vast majority of
previous meta-analyses published in JAP have used one of two procedures for testing
moderator hypotheses: subgroup meta-analysis and correlation/regression. Although
more research needs to be conducted in which additional procedures are developed
and in which these and other procedures are compared, it is possible to extrapolate
from the multiple groups analysis versus product term dichotomy in the structural
equation literature. Multiple groups analysis, such as subgroup meta-analysis, is particularly useful when the moderating variable is categorical. Inclusion of product
terms, not unlike the correlation of moderators with effect size values, is particularly
useful when the moderating variable is continuous. This is not to say that each method
cannot be modified to accommodate different sorts of variables (e.g., median splits of
continuous variables, dummy coding of categorical variables). It is enough to say that
each approach is particularly suited to one situation and that careful consideration of
the characteristics of the variables involved should dictate the choice of method.
There remains the issue of interpretation when evidence of moderators exists, but
the limitations of the data set preclude the estimation of further moderator effects. It is
assumed here that the theoretical evidence suggests that a nonzero effect should exist.
Just as caution must be exercised when interpreting main effects in primary studies in
the presence of an interaction, so must caution be exercised when interpreting esti-

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

434

ORGANIZATIONAL RESEARCH METHODS

mated and values in the presence of evidence that more than one such true value
exists. Indeed, in such a case, it may be wise to avoid specific conclusions and fall back
on more general, transportability-based conclusions, leaving parameter estimation to
future research. This is a reasonable strategy if the estimated effect size value is large.
If, however, the estimated effect size value is small such that the 95% credibility interval contains values that are trivially different from zero, then the results should probably be labeled inconclusive.
To summarize, the following steps in anticipating moderators, presenting relevant
information, and testing and interpreting such information in meta-analysis are suggested:
1.
2.
3.

4.

5.
6.
7.
8.
9.

Identify the goal of the meta-analysis (parameter estimation vs. hypothesis testing).
Identify possible moderators a priori for as many of the relationships to be examined as
possible and code for them.
Choose a decision rule, taking into account error rates, practical significance, and so
forth. Conjunctive strategies may work well in this regard (although more research is
needed). Conjunctive strategies may not control for Type I error in a precise way. Nevertheless, they are also less likely to let important moderators slip through the cracks.
Report all relevant values, including observed variances, residual standard deviation/
variance values, percentage observed variances attributable to artifacts, confidence intervals, and credibility intervals. Also include corrected and uncorrected effect size values, N, k, and any additional attenuation information that might be useful.
Compare relevant values to decision criteria and, if necessary, to baseline values.
Discuss the comparisons and the values.
Examine outliers and discard only if there are overwhelming empirical and substantive reasons for doing so. There is little to be lost by presented results with and without
outliers.
Choose a strategy for estimating moderator effects based on the nature of the moderators.
Interpret any population effect size values for which substantial variance remains unexplained with caution, pointing out that further research may be required to uncover the
variables causing observed variability in effect sizes.

It is hoped that such prescriptions will lead to better practice of meta-analysis.


Given the importance typically attributed to meta-analytic findings, such practice is
essential.

Appendix
Quantitative Reviews Included in the Analysis
Bothwell, R. K., Deffenbacher, K. A., & Brigham, J. C. (1987). Correlation of eyewitness accuracy and confidence: Optimality hypothesis revisited. Journal of Applied Psychology, 72,
691-695.
Brown, S. F. (1981). Validity generalization and situational moderation in the life insurance industry. Journal of Applied Psychology, 66, 664-670.
Burke, M. J., & Day, R. (1986). A cumulative study of the effectiveness of managerial training.
Journal of Applied Psychology, 71, 232-245.
Carsten, J. M., & Spector, P. E. (1987). Unemployment, job satisfaction, and employee turnover:
A meta-analytic test of the Muchinsky model. Journal of Applied Psychology, 72, 374-381.
Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-analysis of interrater and internal
consistency reliability of selection interviews. Journal of Applied Psychology, 80, 565-579.
Driskell, J. E., Copper, C., & Moran, A. (1994). Does mental practice enhance performance?
Journal of Applied Psychology, 79, 481-492.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

435

Driskell, J. E., Willis, R. P., & Copper, C. (1992). Effect of overlearning on retention. Journal of
Applied Psychology, 77, 615-622.
Finkelstein, L. M., Burke, M. J., & Raju, N. S. (1995). Age discrimination in simulated employment contexts: An integrative analysis. Journal of Applied Psychology, 80, 652-663.
Fisher, C. D., & Gitelson, G. (1983). A meta-analysis of the correlates of role conflict and ambiguity. Journal of Applied Psychology, 68, 320-333.
Fried, Y. (1991). Meta-analytic comparison of the job diagnostic survey and job characteristics
inventory as correlates of work satisfaction and performance. Journal of Applied Psychology, 76, 690-697.
Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72, 493-511.
Gerstner, C. R., & Day, D. V. (1997). Meta-analytic review of leader-member exchange theory:
Correlates and construct issues. Journal of Applied Psychology, 82, 827-844.
Hattrup, K., Rock, J., & Scalia, C. (1997). The effects of varying conceptualizations of job performance on adverse impact, minority hiring, and predicted performance. Journal of Applied Psychology, 82, 656-664.
Hom, P. W., Caranikas-Walker, F., Prussia, G. E., & Griffeth, R. W. (1992). A meta-analytical
structural equations analysis of a model of employee turnover. Journal of Applied Psychology, 77, 890-909.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterionrelated validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581-595.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984) revisited: Interview validity for
entry-level jobs. Journal of Applied Psychology, 79, 184-190.
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive ability in employment interview evaluations: moderating characteristics and implications for incremental validity. Journal of Applied Psychology, 81, 459-473.
Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.
Kozlowsky, M., Sagie, A., Krausz, M., & Singer, A. D. (1997). Correlates of employee lateness:
Some theoretical considerations. Journal of Applied Psychology, 82, 79-88.
Kraiger, K., & Ford, J. K. (1985). A meta-analysis of ratee race effects in performance ratings.
Journal of Applied Psychology, 70, 56-65.
Lee, R. T., & Ashforth, B. E. (1996). A meta-analytic examination of the correlates of the three
dimensions of job burnout. Journal of Applied Psychology, 81, 123-133.
Loher, B. T., Noe, R. A., Moeller, N. L., & Fitzgerald, M. P. (1985). A meta-analysis of the relation of job characteristics to job satisfaction. Journal of Applied Psychology, 70, 280-289.
Lord, R. G., DeVader, C. L., & Alliger, G. M. (1986). A meta-analysis of the relation between
personality traits and leadership perceptions: An application of validity generalization procedures. Journal of Applied Psychology, 71, 402-410.
Mabe, P. A., & West, S. G. (1982). Validity of self-evaluation of ability: A review and metaanalysis. Journal of Applied Psychology, 67, 280-296.
Martocchio, J. J., & OLeary, A. M. (1989). Sex differences in occupational stress: A metaanalytic review. Journal of Applied Psychology, 74, 495-501.
McDaniel, M. A., Schmidt, F. L., & Hunter, J. E. (1988). Job experience correlates of job performance. Journal of Applied Psychology, 72, 327-330.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment interviews: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599-616.
McEvoy, G. M., & Cascio, W. F. (1985). Strategies for reducing employee turnover: A metaanalysis. Journal of Applied Psychology, 70, 342-353.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

436

ORGANIZATIONAL RESEARCH METHODS

McEvoy, G. M., & Cascio, W. F. (1989). Cumulative evidence of the relationship between employee age and job performance. Journal of Applied Psychology, 74, 11-17.
Mitra, A., Jenkins, G. D., & Gupta, N. (1992). A meta-analytic review of the relationship between absence and turnover. Journal of Applied Psychology, 77, 879-889.
Murphy, K. R., & Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619-624.
Narby, D. B., Cutler, B. L., & Moran, G. (1993). A meta-analysis of the association between authoritarianism and jurors perceptions of defendant culpability. Journal of Applied Psychology, 78, 34-42.
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality
testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660679.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679-703.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests of
used to predict job proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373-406.
Podsakoff, P. M., MacKenzie, S. B., & Bommer, W. H. (1996). Meta-analysis of the relationships between Kerr and Jermiers substitutes for leadership and employee job attitudes, role
perceptions, and performance. Journal of Applied Psychology, 81, 380-399
Premack, S. L., & Wanous, J. P. (1985). A meta-analysis of realistic job preview experiments.
Journal of Applied Psychology, 70, 706-719.
Reilly, R. R., & Israelski, E. W. (1988). Development and validation of minicourses in the telecommunication industry. Journal of Applied Psychology, 73, 721-726.
Robertson, I. T., & Downs, S. (1989). Work-sample tests of trainability: A meta-analysis. Journal of Applied Psychology, 74, 402-410.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage.
Roth, P. L., BeVier, C. A., Schippmann, J. S., & Switzer, F. S. (1996). Meta-analyzing the relationship between grades and job performance. Journal of Applied Psychology, 81, 548-556.
Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied
Psychology, 75, 175-184.
Russell, C. J., Settoon, R. P., McGrath, R. N., Blanton, A. E., Kidwell, R. E., Lohrke, F. T., et al.
(1994). Investigator characteristics as moderators of personnel selection research: A metaanalysis. Journal of Applied Psychology, 79, 163-170.
Salgado, J. F. (1997). The five factor model of personality and job performance in the European
community. Journal of Applied Psychology, 82, 30-43.
Schmidt, F. L., Gast-Rosenberg, I., & Hunter, J. E. (1980). Validity generalization results for
computer programmers. Journal of Applied Psychology, 65, 643-661.
Schmidt, F. L., Hunter, J. E., & Caplan, J. R. (1981). Validity generalization results for two job
groups in the petroleum industry. Journal of Applied Psychology, 66, 261-273.
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Goff, S. (1988). Joint relation of experience
and ability with job performance: Test of three hypotheses. Journal of Applied Psychology,
73, 46-57.
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences as moderators of aptitude
test validity in selection: A red herring. Journal of Applied Psychology, 66, 166-185.
Steel, R. P., & Ovalle, N. K. (1984). A review and meta-analysis of research on the relationship
between behavioral intentions and employee turnover. Journal of Applied Psychology, 69,
673-686.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

437

Steel, R. P., & Griffeth, R. W. (1989). The elusive relationship between perceived employment
opportunity and turnover behavior: A methodological or conceptual artifact? Journal of
Applied Psychology, 74, 846-854.
Tubbs, M. E. (1986). Goal-setting: A meta-analytic examination of the empirical evidence.
Journal of Applied Psychology, 71, 474-483.
van Eerde, W., & Thierry, H. (1996). Vrooms expectancy model and work-related criteria: A
meta-analysis. Journal of Applied Psychology, 81, 575-586.
Viswesvaran, C., & Barrick, M. R. (1992). Decision-making effects on compensation surveys:
Implications for market wages. Journal of Applied Psychology, 77, 588-597.
Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of
job performance ratings. Journal of Applied Psychology, 81, 557-574.
Visewsvaran, C., & Schmidt, F. L. (1992). A meta-analytic comparison of the effectiveness of
smoking cessation methods. Journal of Applied Psychology, 77, 554-561.
Waldman, D. A., & Avolio, B. J. (1986). A meta-analysis of age differences in job performance.
Journal of Applied Psychology, 71, 33-38.
Wanous, J. P., Poland, T. D., Premack, S. L., & Davis, K. S. (1992). The effects of met expectations on newcomer attitudes and behaviors: A review and meta-analysis. Journal of Applied
Psychology, 77, 288-297.
Wanous, J. P., Reichers, A. E., & Hudy, M. J. (1997). Overall job satisfaction: How good are
single-item measures? Journal of Applied Psychology, 82, 247-252.
Wood, R. E., Mento, A. J., & Locke, E. A. (1987). Task complexity as a moderator of goal effects. Journal of Applied Psychology, 72, 416-425.
Wright, P. M. (1990). Operationalization of goal difficulty as a moderator of the goal difficultyperformance relationship. Journal of Applied Psychology, 75, 227-234.

Notes
1. All average percentages were calculated as the reciprocal of the average of the reciprocals
as suggested by Hunter and Schmidt (1990).
2. Indices such as the Normed Fit Index (NFI) are specifically composed of the fit function
minima for hypothesized and null models. Because all estimators include the residual matrix in
their fit functions, comparison of residual matrices is implicit in such indices.

References
Aguinis, H., & Whitehead, R. (1997). Sampling variance of the correlation coefficient under indirect range restriction: Implications for validity generalization. Journal of Applied Psychology, 82, 528-538.
Beal, D. J., Corey, D. M., & Dunlap, W. P. (2002). On the bias of Huffcutt and Arthurs (1995)
procedure for identifying outliers in the meta-analysis of correlations. Journal of Applied
Psychology, 87, 583-589.
Bentler, P. M., & Bonnet, D. G. (1980). Significance tests and goodness of fit in the analysis of
covariance structures. Psychological Bulletin, 88, 588-606.
Callender, J. C., & Osburn, H. C. (1981). Testing the constancy of validity with computer generated sampling distributions of the multiplicative model variance estimate: Results for petroleum industry validation research. Journal of Applied Psychology, 66, 274-281.
Cattell, R. B., & Vogelmann, S. (1977). A comprehensive trial of the scree and HG criteria for
determining the number of factors. Multivariate Behavioral Research, 12, 289-325.
Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2, 161-172.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

438

ORGANIZATIONAL RESEARCH METHODS

Cortina, J. M., & Folger, R. G. (1998). When is it acceptable to accept the null hypothesis: No
way, Jose? Organizational Research Methods, 1, 334-350.
Cortina, J. M., & Gully, S. M. (1999). So the great dragon was cast out . . . who deceives the
whole world. Newsletter of the Research Methods Division of the Academy of Management,
14(1).
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of
exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.
Gerstner, C. R., & Day, D. V. (1997). Meta-analytic review of leader-member exchange theory:
Correlates and construct issues. Journal of Applied Psychology, 82, 827-844.
Guttman, L. (1954). Some necessary conditions for common factor analysis. Psychometrika,
19, 149-162.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic
Press.
Huber, P. (1981). Robust statistics. New York: John Wiley.
Huffcutt, A. I., & Arthur, W. (1995). Development of a new outlier statistic for meta-analytic
data. Journal of Applied Psychology, 80, 327-333.
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive ability in employment interview evaluations: Moderating characteristics and implications for incremental validity. Journal of Applied Psychology, 81, 459-473.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in
research findings. Newbury Park, CA: Sage.
Hunter, J. E., & Schmidt, F. L. (1994). Estimation of sampling error variance in the meta-analysis of correlations: Use of average correlation in the homogenous case. Journal of Applied
Psychology, 79, 171-177.
James, L. R., Demaree, R. J., & Mulaik, S. A. (1986). A note on validity generalization procedures. Journal of Applied Psychology, 71, 440-450.
James, L. R., Demaree, R. J., Mulaik, S. A., & Ladd, R. T. (1992). Validity generalization in the
context of situational models. Journal of Applied Psychology, 77, 3-14.
James, L. R., Demaree, R. J., Mulaik, S. A., & Mumford, M. D. (1988). Validity generalization:
A rejoinder to Schmidt, Hunter, & Raju, 1988. Journal of Applied Psychology, 73, 673-678.
Johnson, B. T., Mullen, M., & Salas, E. (1995). Comparison of three major meta-analytic approaches. Journal of Applied Psychology, 80, 94-106.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and
Psychological Measurement, 20, 141-151.
Kemery, E. R., Mossholder, K. W., & Roth, L. (1987). The power of the Schmidt and Hunter additive model of validity generalization. Journal of Applied Psychology, 72, 30-37.
Kozlowsky, M., Sagie, A., Krausz, M., & Singer, A. D. (1997). Correlates of employee lateness:
Some theoretical considerations. Journal of Applied Psychology, 82, 79-88.
Law, K. S., Schmidt, F. L., & Hunter, J. E. (1994a). Nonlinearity of range corrections in metaanalysis: Test of an improved procedure. Journal of Applied Psychology, 79, 425-438.
Law, K. S., Schmidt, F. L., & Hunter, J. E. (1994b). A test of two refinements for procedures in
meta-analysis. Journal of Applied Psychology, 79, 978-986.
Marascuilo, L. A. (1971). Statistical methods for behavioral science research. New York:
McGraw-Hill.
Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stillwell, C. D. (1989). Evaluation of goodness of fit indices for structural equation models. Psychological Bulletin,
105, 430-445.
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality
testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660679.
Orwin, R. G., & Cordray, D. S. (1985). Effects of deficient reporting on meta-analysis: A conceptual framework and reanalysis. Psychological Bulletin, 97, 134-147.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Cortina / MODERATORS IN META-ANALYSIS

439

Osburn, H. C., Callender, J. C., Greener, J. M., & Ashworth, S. (1983). Statistical power of tests
of the situational specificity hypothesis in validity generalization studies: A cautionary
note. Journal of Applied Psychology, 68, 115-122.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests of
used to predict job proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373-406.
Podsakoff, P. M., MacKenzie, S. B., & Bommer, W. H. (1996). Meta-analysis of the relationships between Kerr and Jermiers substitutes for leadership and employee job attitudes, role
perceptions, and performance. Journal of Applied Psychology, 81, 380-399.
Raju, N. S., & Burke, M. J. (1983). Two new procedures for studying validity generalization.
Journal of Applied Psychology, 68, 382-395.
Raju, N. S., Pappas, S., & Williams, C. P. (1989). An empirical Monte Carlo test of the accuracy
of the correlation, covariance, and regression slope models for assessing validity generalization. Journal of Applied Psychology, 74, 901-911.
Sackett, P. R., Harris, M. M., & Orr, J. M. (1987). On seeking moderator variables in the metaanalysis of correlation data: A Monte Carlo investigation of statistical power and resistance
to Type I error. Journal of Applied Psychology, 71, 302-310.
Salgado, J. F. (1997). The five factor model of personality and job performance in the European
community, Journal of Applied Psychology, 82, 30-43.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529-540.
Schmidt, F. L., Hunter, J. E., & Caplan, J. R. (1981). Validity generalization results for two job
groups in the petroleum industry. Journal of Applied Psychology, 66, 261-273.
Slavin, R. E. (1984). Meta-analysis in education: How has it been used? Educational Researcher, 13, 6-15.
Society for Industrial and Organizational Psychology. (2000). 16th Annual Conference call for
proposals. Bowling Green, OH: Author.
Spector, P. E., & Levine, E. L. (1987). Meta-analysis for integrating study outcomes: A Monte
Carlo study of its susceptibility to Type I and Type II errors. Journal of Applied Psychology,
72, 3-9.
Steel, P. D., & Kammeyer-Mueller (2002). Comparing meta-analytic moderator estimation
techniques under realistic conditions. Journal of Applied Psychology, 87, 96-111.
Switzer, F. S., Paese, P. W., & Drasgow, F. (1992). Bootstrap estimates of standard errors in validity generalization research. Journal of Applied Psychology, 77, 123-129.
van Eerde, W., & Thierry, H. (1996). Vrooms expectancy model and work-related criteria: A
meta-analysis. Journal of Applied Psychology, 81, 575-586.
Wanous, J. P., Sullivan, S. E., & Malinak, J. (1989). The role of judgment calls in meta-analysis.
Journal of Applied Psychology, 74, 259-264.
Whitener, E. M. (1990). Confusion of confidence intervals and credibility intervals in metaanalysis. Journal of Applied Psychology, 75, 315-321.
Wilcox, R. R. (1997). How many discoveries have been lost by ignoring modern statistical
methods? American Psychologist, 53, 300-314.
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of
components to retain. Psychological Bulletin, 99, 432-442.

Downloaded from orm.sagepub.com at University of Bucharest on May 31, 2015

Potrebbero piacerti anche