Sei sulla pagina 1di 18

Journal of Applied Psychology Copyright 1996 by the American Psychological Association, Inc.

1996, Vol. 81, No. 5, 557-574 0021-9010/96/J3.00

Comparative Analysis of the Reliability of Job Performance Ratings


Chockalingam Viswesvaran Deniz S. Ones
Florida International University University of Houston

Frank L. Schmidt
University of Iowa

This study used meta-analytic methods to compare the interrater and intrarater reliabil-
ities of ratings of 10 dimensions of job performance used in the literature; ratings of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

overall job performance were also examined. There was mixed support for the notion
This document is copyrighted by the American Psychological Association or one of its allied publishers.

that some dimensions are rated more reliably than others. Supervisory ratings appear to
have higher interrater reliability than peer ratings. Consistent with H. R. Rothstein
(1990), mean interrater reliability of supervisory ratings of overall job performance was
found to be .52. In all cases, interrater reliability is lower than intrarater reliability, indi-
cating that the inappropriate use of intrarater reliability estimates to correct for biases
from measurement error leads to biased research results. These findings have important
implications for both research and practice.

Several measures of job performance have been used istrative use, it must have some reliability. Low-reliability
over the years as criterion measures (cf. Campbell, 1990; results in systematic reduction in the magnitude of ob-
Campbell, Gasser, & Oswald, 1996; Cleveland, Murphy, served relationships and can therefore distort theory test-
& Williams, 1989). Attempts have also been made to ing. The recent resurgence of interest in criteria (Austin
identify the specifications for these criteria (Blum & & Villanova, 1992) and in developing a theory of job per-
Naylor, 1968; Brogden, 1946; Dunnette, 1963; Stuit & formance (e.g., Campbell, 1990; Campbell, McCloy, Op-
Wilson, 1946; Toops, 1944). For example, Blum and pier, & Sager, 1993; McCloy, Campbell, & Cudeck, 1994;
Naylor (1968) identified 11 dimensions or characteristics Schmidt & Hunter, 1992) also emphasizes the impor-
on which the different criteria can be evaluated, whereas tance of the reliability of criterion measures. A thorough
Brogden (1946) identified relevance, reliability, and investigation of the criterion domain ought to include an
practicality as desired characteristics for criteria. Reli- examination of the reliability of dimensions of job per-
ability of criteria has been included as an important con- formance. The focus of this article is the reliability of job
sideration by all authors writing about job performance performance ratings.
measurement. Of the different ways to measure job performance, per-
Indeed, for a measure to have any research or admin- formance ratings are the most prevalent. Ratings are sub-
jective evaluations that can be obtained from supervisors,
peers, subordinates, self, or customers, with supervisors
Chockalingam Viswesvaran, Department of Psychology, being the most commonly used source (Cascio, 1991;
Florida International University; Deniz S. Ones, Department of
Management, University of Houston; Frank L. Schmidt, De-
Cleveland et al., 1989) and peers constituting the second
partment of Management and Organizations, University of most commonly used source. For example, Bernardin
Iowa. The order of authorship is arbitrary, and all three authors and Beatty (1984) found in a survey of human resource
contributed equally. Deniz S. Ones is now at the Department of managers that over 90% of the respondents used supervi-
Psychology, University of Minnesota. sory ratings as their primary source of performance rat-
An earlier version of this article was presented at a sympo- ings and peers were the second most widely used source
sium called "Reliability and Accuracy Issues in Measuring Job of ratings.
Performance," which was chaired by Frank L. Schmidt at the Constructing comprehensive and valid theories of hu-
10th Annual Meeting of the Society of Industrial and Organiza-
tional Psychology, Orlando, Florida, May 1995. man motivation and work behavior is predicated on the
Correspondence concerning this article should be addressed reliable measurement of constructs. Given the centrality
to Chockalingam Viswesvaran, Department of Psychology, of the construct of job performance in industrial and or-
Florida International University, Miami, Florida 33199. Elec- ganizational psychology (Campbell et al., 1996), and
tronic mail may be sent via Internet to vish@servax.fiu.edu. given that ratings are the most commonly used source for
557
558 VISWESVARAN, ONES, AND SCHMIDT

measuring job performance, it is important to estimate order accuracy in an earlier study by Borman, Hough,
precisely the reliability of job performance ratings. Fur- and Dunnette (1976). The rank order correlation was
thermore, competing cognitive process mechanisms have .88 for assessing managers and .54 for recruiters. That
been postulated (e.g., Borman, 1979; Wohlers & London, is, the rank order accuracy of different dimensions were
1989) to explain the convergence in ratings between two consistent across rating formats, studies and samples,
raters. An accurate evaluation of these competing mech- and jobs. Borman (1979) noted that this consistent di-
anisms will facilitate and enhance understanding of the mension effect, even across a variety of formats (and if
psychology underlying the rating or the evaluative judg- we may add, across jobs and samples), may be due to
ment process in general, and job performance in partic- something inherent in the nature of the dimensions that
ular. Finally, many human resource practices recom- makes them either difficult or easy for raters. Further-
mended to practitioners in organizations are predicated more, Borman suggested that "accuracy is highest on
on the reliable measurement of job performance. As those dimensions for which actors provided the least am-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

such, both from a theoretical perspective (i.e., to analyze biguous, most consistent performances, perhaps because
This document is copyrighted by the American Psychological Association or one of its allied publishers.

and build theories contributing to the science of indus- they, as well as the student raters, understood those par-
trial and organizational psychology) and from a practice ticular dimensions better than some of the other dimen-
perspective, a comparative analysis of reliability of job sions" (p. 420).
performance ratings is warranted. The hypothesis that certain dimensions of job perfor-
The primary purpose of this study was to investigate mance are easier to evaluate than others is also found in
the reliability of peer and supervisory ratings of various personality literature (e.g., Christensen, 1974). This line
job performance dimensions. Meta-analytic principles of thought is also found in literature on social psychology.
(Hunter & Schmidt, 1990) were used to cumulate reli- For example, Bandura (1977), as well as Salancik and
ability estimates across studies. Reliability estimates of Pfeffer (1978), posited from a social information-pro-
various job performance dimensions can be compared cessing framework that when there are no clear interpret-
to identify which dimensions are reliably rated so as to able signs of behavior or where the standards of evalua-
identify and improve through training the dimensions tion are ambiguous, the interrater agreement will be
rated with low reliability. A second objective of this study lower in comparison with where there are clear interpret-
was to compare interrater agreement and intrarater con- able signs and standards are unambiguous. This is also
sistency in the reliability of ratings. A third and final ob- hypothesized to be true when certain dimensions of job
jective of this study was to compile and present reliability performance have rare occurrence (low base rate) or
distributions that can be used in future meta-analyses in- have greater salience in memory (e.g., accidents).
volving job performance ratings. Comparing the reliability of dimensions facilitates an
empirical test of the hypothesis (Borman, 1979; Wohlers
Comparing Reliability Across Dimensions & London, 1989) of a gradient in reliabilities across job
performance dimensions. Such knowledge will facilitate
Supervisory and peer ratings have been used to assess an understanding of the rating processes.
individuals on many dimensions of job performance.
Comparing the reliability of ratings of different dimen- Comparing Different Types
sions enables an empirical test of the hypothesis that cer- of Reliability Estimates
tain dimensions of job performance are easier to evaluate
than others (cf. Wohlers & London, 1989). In essence, Comparing the different types of reliability estimates
the thrust of this hypothesis is that some dimensions of (coefficient of equivalence, coefficient of stability, etc.)
job performance are easier than others to evaluate be- for each dimension of job performance is also valuable.
cause they are easier to observe and clearer standards of Reliability of a measure is defined as the ratio of the true
evaluation are available. Wohlers and London (1989) to observed variance (Nunnally, 1978). Different types
suggested that dimensions of performance such as ad- of reliability coefficients assign different sources of vari-
ministrative competence, leadership, and communica- ance to measurement error. In general, the most fre-
tion competence are more difficult to evaluate than di- quently used reliability coefficients associated with crite-
mensions such as output and errors. rion ratings can be broadly classified into two categories:
Similarly, Borman (1979) found that "raters evaluated interrater and intrarater. In the context of performance
ratees significantly more accurately on some dimensions measurement, interrater reliability assesses the extent to
than on others, and that for most part these differences which different raters agree on the performance of
were consistent across formats" (p. 419). Borman different individuals. As such, individual raters' idiosyn-
(1979) also stated that the rank order accuracy on the cratic perceptions of job performance are considered to
different dimensions in his study were similar to the rank be part of measurement error. Intrarater reliability, on
RELIABILITY OF RATINGS 559

the other hand, assigns any specific error unique to the ratings have erroneously combined estimates of in-
individual rater to true variance. That is, each rater's id- terrater and intrarater reliability in one artifact distribu-
iosyncratic perceptions of job performance is relegated tion as if these estimates were equivalent. With the in-
to the true variance component. Both coefficient alpha creasing emphasis on precision of estimates to be used in
and the coefficient of stability (rate-rerate reliability with theory testing (Schmidt & Hunter, 1996; Viswesvaran &
the same rater) are forms of intrarater reliability. Intrar- Ones, 1995), it is imperative that future meta-analyses
ater reliability is most frequently indexed by coefficient use the appropriate reliability estimates. By providing es-
alpha computed on ratings from a single rater on the basis timates of different reliability coefficients for each dimen-
of the correlations or covariances among different rating sion of job performance, this article aims to provide a
items or dimensions. Coefficient alpha assesses the extent useful source of reference to researchers.
to which the different items used to measure a criterion Thus, the primary purpose in this article is to cumu-
are indeed assessing the same criterion.1 Rate-rerate re- late the reliabilities of job performance ratings with the
liability computed using data from the same rater at two principles of psychometric meta-analysis (Hunter &
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

points in time assesses the extent to which there is consis- Schmidt, 1990) and to compare the reliability of the rat-
tency in performance appraisal ratings of a given rater ings of different dimensions. Comparing the reliability of
over time. Both of these indices of intrarater reliability, different dimensions enables an evaluation of the hypoth-
coefficient alpha and coefficient of stability (over short esis (Borman, 1979; Wohlers & London, 1989) that eval-
period of times when it is assumed that true performance uation difficulty varies across dimensions. A secondary
does not change), estimate what the correlation would be purpose in this article is to compare the magnitude of
if the same rater rerated the same employees (Cronbach, the different sources of errors (by comparing interrater
1951).2 reliabilities, coefficient alphas, and coefficients of
Thus, different types of reliability estimates assign stability) that exist in ratings of each dimension of job
different sources of variance to measurement error vari- performance. A third purpose is to provide reliability dis-
ance. When a single judge rates a job performance di- tributions that could be used in future meta-analytic cu-
mension with a set of items, coefficient alpha may be mulations involving ratings of performance.
computed on the basis of the correlations or covariances
among the items. Coefficient alpha, which is a measure Method
of equivalence of the items, assigns the item specific vari-
ance and variance because of random responses in rat- Database
ings to measurement error variance. Influences unique We searched the literature for articles that reported reliability
to the particular rating occasion and unique to the rater coefficients either for job performance dimensions or for overall
are not assigned to measurement error but incorporated job performance. Only studies that were based on actual job
into the true variance. When job performance is assessed performance were included. Interviewer ratings, assessment
by the same rater with the same set of items at two differ- center ratings, and ratings of performance in simulated exer-
ent points in time, the resulting coefficient of stability cises were excluded. We searched all issues starting from the first
(rate-rerate reliability) assigns variance because of tran- issue of each journal through January 1994 of the following 15
sient errors in rating (i.e., variance from mental states journals: Journal of Applied Psychology, Personnel Psychology,
Academy of Management Journal, Human Relations, Journal
and other factors in raters that vary over days) to mea- of Business and Psychology, Journal of Management, Organiza-
surement error variance (Schmidt & Hunter, 1996). tional Behavior and Human Decision Processes, Accident Anal-
Thus, by comparing the different reliability estimates
for the same dimension of job performance, one can
gauge the magnitude of a particular source of error in 1
Coefficient alpha computed on ratings from a single rater is
ratings involving that dimension of job performance. an estimate of the rate-rerate reliability with the same rater. As
Such knowledge can be valuable in designing rating for- such, it is a form of intrarater reliability. However, it should be
mats and rater training programs. noted that a different coefficient alpha can be used to index in-
terrater reliability. This is possible if the variance-covariance
matrix across raters is used in the computations. In this study,
Constructing Artifact Distributions we did not examine coefficient alphas obtained by using data
across raters.
Constructing artifact distributions for different dimen- 2
For a recent discussion of these and other reliabilities in in-
sions of job performance also serves meta-analytic cu- dustrial and organizational psychology research, see Schmidt
mulations involving ratings of job performance. The re- and Hunter (1996).
liability distributions reported here could be used in 3
Frequency distributions of the reliabilities contributing to
meta-analyses of studies involving ratings of perfor- the analyses reported in this article may be obtained by writing
mance.3 Also, some published meta-analyses involving to Chockalingam Viswesvaran.
560 VISWESVARAN, ONES, AND SCHMIDT

ysis and Prevention, International Journal of Intercultural Re- For the coefficient of stability, without knowing the func-
lations, Journal of Vocational Behavior, Journal of Applied Be- tional relationship between the estimates of stability and the
havioral Analysis, Human Resources Management Research, time interval between the measurements, corrections to bring
Journal of Occupational Psychology, Psychological Reports, and estimates of stability to the same interval are impossible. All we
Journal of Organizational Behavior. can say, intuitively speaking, is that as the time interval in-
creases, the reliability estimate generally decreases. The func-
Analyses tion that captures this decrease is unknown. Jensen (1980),
based on fitting curves to empirical data, reported that the sta-
In cumulating results across studies, the same job perfor- bility of IQ test scores is a function of the square root of the
mance dimensions can be referred to with different labels. Any ratio of the chronological ages at the two points of measure-
grouping of the different labels as measuring the same criteria ment. Another possibility is to assume an asymptotic function
has to be guided by theoretical considerations. That is, we need where reliability estimate falls as time interval increases (at in-
to theoretically define the criteria first (Campbell et al., 1993). finite time interval, the reliability estimate will be zero, or an
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

The broader the definition, the more general and possibly more asymptote at zero). This is similar to Rothstein (1990) who
This document is copyrighted by the American Psychological Association or one of its allied publishers.

useful the criteria are; on the other hand, the narrower the defi- presented empirical evidence that as the opportunity to observe
nition (up to a point), the more unambiguous the criteria be- (indexed by number of years supervised or observed) increases,
come. The delineation of the job performance domain to its the interrater reliability increases but reaches an asymptotic
component dimensions was undertaken as part of a study ex- maximum of .60. Lacking information on the functional rela-
amining whether a general job performance factor is responsi- tionship between reliability estimates and time intervals be-
ble for the covariation among job performance dimensions tween measurements, no corrections were made to bring all es-
(Viswesvaran, 1993). Viswesvaran (1993) identified 10 job timates of coefficients of stability included in a meta-analysis to
performance dimensions that comprehensively represented the the same interval.
entire job performance domain. In this study, all the job perfor- Note that in our intrarater reliability analyses, we were care-
mance measures used in the individual studies were listed and ful to include only the coefficients of stability that were based
then grouped into one of the conceptually similar categories by on ratings from the same rater. Rate-rerate correlations from
the authors. That is, the definition of the job performance di- different raters at two points in time are interrater reliability
mensions and classification of the job performance ratings into coefficients and will be lower than the former estimates where
these 10 dimensions preceded the coding of the reliability esti- the same rater provides the ratings at the two points in time and
mates. We read all the articles making up our database and then thus intrarater reliability is assessed (Cronbach, 1947).
classified the reliabilities. In other words, not only did we take A meta-analysis correcting only for sampling error was con-
into account the definitions but also the context (and all other ducted for each of the 60 distributions for which there were at
information) provided in the article in classifying the reliabili- least four estimates to be cumulated. The sample size weighted
ties into the dimensions. Interrater agreement was 93%. Dis- mean, observed standard deviation, and residual standard devia-
agreements were resolved through mutual discussion until con- tion were computed for each distribution. We also computed the
sensus was reached. Definitions for the 10 groups of ratings for unweighted mean and standard deviation. The computations of
which analyses are reported here are provided in Table 1. the unweighted mean and standard deviation do not weight the
Given 10 dimensions of job performance and three types of reliability estimates by sample size of each study contributing to
reliabilities (interrater, stability, and equivalence), there were the analysis. Each reliability coefficient is equally weighted. The
potentially 30 reliability distributions to be investigated. Be- sample size weighted mean gives the best estimate of the mean
cause our interest was in examining the reliability of both su- reliability, whereas the unweighted mean ensures that our results
pervisory and peer ratings, there are potentially 60 distributions are not skewed by a few large sample estimates.
to be meta-analyzed. Of these, some combinations have not In addition, we also computed the mean and standard devia-
been assessed in the literature. The reliability values obtained tion of the square root of the reliabilities. The mean of the square
from the individual studies were coded into 1 of the 60 root of the reliabilities differ slightly from the square root of the
distributions. mean of the reliabilities. Therefore, we also assessed the mean
Next, in cumulating the reliability of any particular criteria and standard deviation of the square root of the reliabilities. Both
across several studies, the length of the measuring instrument sample size weighted and unweighted (i.e., frequency weighted)
(number of raters for interrater reliability estimates and num- analyses were undertaken. Thus, for each of the 60 distributions,
ber of items for coefficient alpha estimates) varied across the the objective was to estimate the mean and standard deviation
studies. One option was to use the Spearman-Brown formula of (a) sample size weighted reliability estimates, (b) reliability
to bring all estimates to a common length. We reduced all in- estimates (unweighted or frequency weighted), (c) sample size
terrater reliability estimates to that of one rater. In many organ- weighted square root of the reliabilities, and (d) square root of
izations, there will almost never be more than one supervisor the reliabilities (unweighted or frequency weighted).
doing the rating, but there will almost never be an instrument The sampling error variance associated with the mean of the
with only one item (i.e., performance dimension rated). As reliability was estimated as the variance divided by the number
such, we did not correct the coefficient alphas for the number of estimates averaged (Callender & Osburn, 1988). The sam-
of items. Furthermore, most rating instruments had a range of pling error of the mean was used to construct an 80% confidence
items where the Spearman-Brown adjustments did not make a interval around the mean. Assuming normality, 80% of points
practical difference. in the distribution falls within this interval. That is, the proba-
562 VISWESVARAN, ONES, AND SCHMIDT

Table 2
Interrater Reliabilities of Supervisory Ratings of Job Performance

Dimension Afw SDW AC 80% CI cred

Overall job
performance 14,650 40 .52 .0950 .68 .1469 .72 .0605 .82 .0924 .50-.54 .0870 .41 -.63
Productivity 2,015 19 .57 .1540 .57 .1769 .75 .1079 .75 .1236 .S2-.62 .1392 .39-.7S
Quality 1,225 10 .63 .1191 .65 .1406 .79 .0756 .80 .0885 .S8-.68 .1058 A9-.ll
Leadership 2,171 20 .53 .0928 .55 .1124 .73 .0598 .74 .0742 .50-.56 .0617 .45-.61
Communication
competence 1,563 9 .45 .1282 .43 .1824 .66 .1071 .64 .1568 .40-.50 .1129 .31-.59
Administrative
competence 1,120 9 .58 .1040 .59 .1674 .76 .0659 .76 .1056 .S4-.62 .0851 .47-.69
Effort 2,714 24 .55 .1250 .56 .1601 .74 .0858 .74 .1113 .S2-.58 .1062 .41 -.69
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Interpersonal
This document is copyrighted by the American Psychological Association or one of its allied publishers.

competence 3,006 31 .47 .1664 .53 .1983 .68 .1332 .70 .1711 .43-.51 .1461 .2S-.66
Job knowledge 14,072 20 .53 .0508 .56 .1976 .73 .0392 .73 .2356 .S2-.54 .0429 .48-.5S
Compliance with or
acceptance of
authority 905 8 .56 .1276 .60 .1295 .74 .1548 .77 .0900 .50-.62 .1099 .42-.70

Note, k = number of reliabilities included in the meta-analysis; wt = sample size weighted; unwt = unweighted or frequency weighted; sqwt =
square root of the estimates, weighted; squnwt = square root of the estimates, unweighted; CI = confidence interval; cred = credibility interval;
res = residual.

eluded in the meta-analysis. Columns 4 and 5 provide the root of the reliabilities are in Columns 10 and 11, respec-
sample size weighted mean and standard deviation of the tively. Column 12 provides the 80% confidence interval
values meta-analyzed. The unweighted (or frequency that is based on the sample size weighted mean reliability
weighted) mean and standard deviation of the values values. Different intervals (e.g., 95%) can be constructed
meta-analyzed are in Columns 6 and 7, respectively. The on the basis of the values reported in Columns 3, 4, and
sample size weighted mean and standard deviation of the 5. Similarly, different intervals can be computed for (a)
square root of the reliabilities are in Columns 8 and 9, unweighted (or frequency weighted) reliability values de-
respectively. Finally, the unweighted (or frequency rived from data reported in Columns 3, 6, and 7; (b)
weighted) mean and standard deviation of the square sample size weighted square root of the reliability esti-

Table 3
Interrater Reliabilities of Peer Ratings and Coefficients of Stability for Supervisory Ratings of Job Performance
Performance 80%
dimension « k A/« SDM A/unwl 5Dunwt
AC.W1
5Aqwt
AC.unwt Oi'squnwt 80% CI 5»» cred

Peer ratings
Overall job
performance 2,389 9 .42 .1063 .44 .1615 .64 .0764 .66 .1141 .S7-.47 .0935 .30-.54
Productivity 205 4 .34 .1419 .30 .1775 .57 .1414 .52 .1743 .2S-.43 .0676 .2S-.43
Leadership 434 4 .38 .1142 .36 .1595 .61 .0950 .59 .1387 .31-.45 .0789 .28-.4S
Effort 348 7 .42 .2454 .48 .2452 .61 .2332 .66 .2243 .30-.54 .2152 .14-.70
Interpersonal
competence 635 9 .42 .1451 .50 .1613 .64 .1095 .70 .1196 .36-.4S .1063 .28-.S6
Job knowledge 249 4 .33 .1012 .33 .1268 .57 .0900 .57 .1151 .27-.39 .1000 .20-.46
Compliance with or
acceptance of
authority 220 5 .71 .0493 .73 .0740 .84 .0290 .86 .0437 .6S-.74 .0467 .6S-.77

Supervisory ratings
Overall job
performance 1,374 12 .81 .0895 .84 .0875 .95 .0685 .91 .0462 .7S-.84 .0835 .70-.92

Note, k = number of reliabilities included in the meta-analysis; wt = sample size weighted; unwt = unweighted or frequency weighted; sqwt =
square root of the estimates, weighted; squnwt = square root of the estimates, unweighted; CI = confidence interval; cred = credibility interval;
res = residual.
RELIABILITY OF RATINGS 561

Table 1
Definition of Job Performance Rating

Dimension rated Definition

Overall job performance Ratings on statements (or ranking of individuals on statements) referring to overall performance, overall
effectiveness, overall job performance, overall work reputation, or the sum of all individual
dimensions rated.
Job performance or productivity Ratings of the quantity or volume of work produced. Raters' rating or ranking individuals were based on
productivity or sales; examples include ratings of the number of accounts opened by bank tellers and
the number of transactions completed by sales clerks.
Quality Measure of how well the job was done. Ratings of (or rankings of individuals on statements referring to)
the quality of tasks completed, lack of errors, accuracy to specifications, thoroughness, and amount of
wastage.
Leadership Measure of the ability to inspire, to bring out extra performance in others, to motivate others to scale
great heights, and professional stature; includes performance appraisal statments such as "gets
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

subordinates to work efficiently," "stimulates subordinates effectively," and "maintains authority


This document is copyrighted by the American Psychological Association or one of its allied publishers.

easily and comfortably."


Communication competence Skill in gathering and transmitting information (both in oral and written format). The proficiency to
express, either in written or oral format, information views, opinions, and positions. This refers to the
ability to make oneself understood; includes performance appraisal statements such as "very good in
making reports," "reports are clear," "reports are unambiguous," and "reports need no further
clarification."
Administrative competence Proficiency in handling the coordination of different roles in an organization. This refers to proficiency
in organizing and scheduling work periods, administrative maintenance of records (note, however,
that clarity is under Communication competence above), ability to place and assign subordinates, and
knowledge of the job duties and responsibilities of others.
Effort Amount of work an individual expends in striving to do a good job. Measure of initiative, attention to
duty, alertness, resourcefulness, enthusiasm about work, industriousness, earnestness at work,
persistence in seeking goals, dedication, personal involvement in the job, and effort and energy
expended on the job characterize this dimension of job performance.
Interpersonal competence Ability to work well with others. Ratings or rankings of individuals on cooperation with others,
customer relations, working with co-workers, and acceptance by others, as well as nominations for
"easy to get along with," are included in this dimension.
Job knowledge Measure of the knowledge required to get the job done. Includes ratings or rankings of individuals on
job knowledge, keeping up-to-date, as well as nominations of who knows the job best and
nominations of who keeps up-to-date.
Compliance with or acceptance A generally positive perspective about rules and regulations; includes obeying rules, conforming to
of authority regulations in the work place, having a positive attitude toward supervision, conforming to
organizational norms and culture, without incessant complaining about organizational policies and
following instructions.

bility of obtaining a value higher than the upper bound of the ability distributions. Interested readers can compute the other
interval and the probability of obtaining a value lower than the credibility intervals (90%, 95%, etc.) on the basis of the mean
lower bound is. 10. reliability and the residual standard deviation.
For both interrater reliability and coefficient of stability
(rate-rerate reliability with the same rater), in addition to the Results
confidence interval, the sampling error of the correlation was
computed, and credibility intervals were constructed. A resid- Tables 2-4 summarize the results of the meta-analyses.
ual standard deviation was computed as the square root of the Interrater reliability estimates for supervisory ratings are
difference between observed and sampling error variance of the summarized in Table 2. Interrater reliability estimates
correlation (i.e., the interrater reliability coefficient in the for- for peer ratings are in Table 3, and the estimates of co-
mer case and the rate-rerate reliability coefficient in the latter efficient of stability for supervisory ratings of overall job
case). Note, however, that the sampling error formula for co- performance are also in Table 3. Estimates of coefficient
efficient alpha is different from those of interrater reliability co- alpha for supervisory and peer ratings are provided in
efficients and coefficients of stability (i.e., correlation Table 4. Notice that all 10 dimensions are not present in
coefficients). Given the mean and residual standard deviation
every table; we do not present the results of meta-analyses
along with the normality assumption (assuming two-tailed
that were based on less than four reliability estimates.
tests), we can compute the estimated reliability below which the
population reliability value is likely to fall with a 90% chance; M
In each table, Column 1 indicates the job performance
+ 1.28(residual standard deviation). Though different (90%, dimension being meta-analyzed, Column 2 indicates the
95%, etc.) credibility intervals (and upper bound values) can be total sample size (the total number of individuals rated
constructed, we report only on the 80% credibility interval for across studies included in that meta-analysis), and Col-
the sample size weighted mean reliability estimate for the reli- umn 3 provides the number of independent estimates in-
RELIABILITY OF RATINGS 563

Table 4
Coefficient Alpha Reliabilities of Supervisory and Peer Ratings of Job Performance (Intrarater Reliabilities)
Dimension n k MM SDW M unwt
ivl SDmw M^ SAqw, Miqunwl SD^n 80% CI
Supervisory
Overall job
performance 17,899 89 .86 .1433 .84 .1510 .92 .0942 .91 .1089 .84-.S8
Productivity 2,697 17 .82 .1248 .85 .1110 .90 .0711 .92 .0630 .7S-.86
Quality 739 6 .81 .0752 .81 .0828 .90 .0413 .90 .0455 .77-.8S
Leadership 3,821 21 .77 .1239 .77 .1315 .87 .0735 .87 .0768 .74-.80
Communication
competence 943 8 .73 .1707 .69 .2091 .85 .1103 .82 .1359 .65-81
Administrative
competence 4,754 16 .79 .0544 .79 .0965 .89 .0305 .89 .0543 .77-81
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Effort 3,112 20 .79 .1147 .75 .1392 .88 .0678 .86 .0835 .76-82
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Interpersonal
competence 10,955 56 .77 .1691 .75 .1902 .87 .1185 .85 .1327 .74-80
Job knowledge 959 9 .79 .1077 .77 .1290 .89 .0645 .87 .0770 .74-84
Compliance with or
acceptance of
authority 3,438 15 .77 .1194 .76 .1858 .87 .0790 .86 .1383 .73-81
Peer
Overall job
performance 1,270 10 .85 .1193 .81 .1205 .92 .0664 .85 .0677 .80-90
Leadership 1,082 5 .61 .1931 .53 .2934 .76 .1892 .67 .3014 .50-72
Effort 1,205 7 .77 .2372 .72 .2526 .86 .1705 .83 .1852 .66-88
Interpersonal
competence 325 8 .61 .2067 .61 .2298 .77 .2054 .76 .1797 .52-70
Note, k = number of reliabilities included in the meta-analysis; wt = sample size weighted; unwt = unweighted or frequency weighted; sqwt
square root of the estimates, weighted; squnwt = square root of the estimates, unweighted; CI = confidence interval.

mates derived from data presented in Columns 3, 8, and performance was .52 (k = 40, N = 14,650). The 80% cred-
9; and (c) unweighted (or frequency weighted) square ibility interval ranged from .41 to .63. That is, it is esti-
root of reliabilities derived from information provided in mated that 90% of the values of interrater reliability of
Columns 3, 10, and 11. For interrater reliability and co- supervisory ratings of overall job performance are below
efficient of stability, the residual standard deviations of .63. For supervisors, the mean sample size weighted mean
the reliability distributions and the 80% credibility in- interrater reliability across nine specific job performance
tervals are reported in Columns 13 and 14, respectively. dimensions (excluding overall job performance) was .53.
The credibility interval refers to the entire distribution, It appears that, for supervisors, interrater reliability of
not the mean value. Also, it refers to population values overall job performance ratings is similar to the mean in-
(the estimated distribution of population values), not terrater reliability across job performance dimensions.
observed values, which were affected by sampling error. This is noteworthy because most interrater reliabilities for
In discussing the results, we first compared the super- overall performance in our database were for sums of
visory rating reliability of different dimensions of rated items across different job performance dimensions. Con-
performance for each type of reliability (e.g., interrater). trary to expectations, higher intrarater reliability associ-
Then we focused on the same type of reliability (e.g., ated with longer rating forms (see also the interrater reli-
interrater), but it was based on peer ratings of the differ- ability section below) does not appear to improve in-
ent dimensions. Third, we compared the reliability for terrater reliability in the job performance domain.
peer and supervisory ratings. These three steps were re- A second interesting point to note is that there is vari-
peated for each type of reliability: interrater, stability, and ation across the 10 dimensions in the mean interrater re-
coefficient alphas. A final section discusses assessment of liabilities for supervisory ratings. Although the credibil-
the relative influence of the different sources of error. ity intervals for all the 10 dimensions do overlap, the 80%
confidence intervals indicate that for example, both com-
Interrater Reliability munication competence and interpersonal competence
are rated less reliably, on average, than productivity or
From the results reported in Table 2, the mean in- quality. Thus, the hypothesis of Wohlers and London
terrater reliability for supervisory ratings of overall job (1989) and Borman (1979) is partially supported.
564 VISWESVARAN, ONES, AND SCHMIDT

Interrater reliability for peer ratings of 7 of the 10 di- reported coefficients of stability. This is consistent with
mensions are reported in Table 3 (there were less than the general trend of more cross-sectional than longitudi-
four estimates for the other three dirhensions). The esti- nal studies among published journal articles. In fact, we
mates ranged from .34 for ratings of productivity (SD = were able to assess the coefficient of stability only for su-
.14) to .71 for ratings of compliance with authority (SD pervisory ratings of overall job performance. There were
= .05). For peers, the sample size weighted mean in- 12 reliabilities across 1,374 individuals contributing to
terrater reliability across six specific dimensions of job this analysis. For supervisory ratings of overall job per-
performance (i.e., excluding overall job performance) formance, the sample size weighted mean coefficient of
was .42. For ratings of overall job performance, interrater stability was .81 (SD = .09). This analysis included only
reliability for peer ratings was also .42 (SD = . 11). The estimates where the same rater was used at the two points
80% credibility intervals for interrater reliability of peer in time.
ratings of overall job performance ranged from .30 to .54.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

That is, 90% of the actual (population) values are esti- Coefficient Alphas: Measures oflntrarater
This document is copyrighted by the American Psychological Association or one of its allied publishers.

mated to be less than .54, and 90% of the values were Reliability
estimated to be greater than .30. Similar to the results for
supervisors, interrater reliability of overall job perfor- Intrarater reliabilities assessed by coefficient alphas
mance ratings is the same as the sample size weighted were also substantial. For supervisory ratings, overall job
mean interrater reliability across individual job perfor- performance was most reliably rated (.86). The least re-
mance dimensions. Even though a large portion of the liably rated dimension was communication competence
peer interrater reliabilities for overall performance in our (.73). Although the alpha estimates for all dimensions
database were computed in studies by summing items as well as for overall ratings were higher than .70, it is
across different job performance dimensions, higher in- important to note that these estimates are inclusive of,
trarater reliability associated with longer rating forms among other things, a halo component. Another observa-
(also see Coefficient Alphas: Measures oflntrarater Reli- tion is that for supervisory ratings of overall job perfor-
ability below) does not appear to lead to higher peer in- mance, the coefficient of stability reported above and the
terrater reliability. This mirrors the case for supervisors. coefficient alpha were similar in size (.81 and .86,
A comparison of the results reported in Tables 2 and 3 respectively). This finding supports the inference that the
seems to indicate that there was generally more agreement variance of transient errors (variance because of rater
between two supervisors than there was between two peers. mental states or moods that vary over days) is small.
However, caution is needed in inferring such a conclusion. These figures suggest that this source of measurement er-
First, the interrater reliability estimates for peer ratings were ror variance in overall job performance ratings is only 5%
based on small number of studies (so are some dimensions of the total variance (.86-.81 = .05).
of supervisory ratings). Second, there is considerable over- In Table 4, it can be seen that peer ratings of overall job
lap in credibility intervals between interrater reliability es- performance had a mean alpha of .85 (k = 10, N =
timates of peer and supervisory ratings. Finally, two of the 1,270). The intrarater reliability associated with peer rat-
studies reporting interrater reliabilities of peer ratings ings of leadership, effort, and interpersonal competence
(Borman, 1974; Hausman & Strupp, 1955) reported very were above .60. Similar to interrater reliability, compar-
low values. When these two studies were eliminated from ing alphas for peer and supervisory ratings should be ten-
the database as outliers, peers and supervisors had compa- tative. When comparing peer and supervisory ratings, it
rable levels of interrater agreement. However, similar to the appears that the intrarater reliability is lower for peer
overall results of this meta-analysis, we should note that a than for supervisory ratings of specific job performance
recent large sample primary study has also reported lower dimensions, but not for overall performance.
interrater reliability estimates for peers compared with su- An interesting point to note for both peer and supervi-
pervisors (Scullen, Mount, & Sytysma, 1995). Further- sory ratings is that the alphas were higher for the overall
more, in practice, given that peer ratings are based on the job performance ratings than for any of the dimensional
average ratings of several peers, the averaged multiple peer ratings. For supervisors, the coefficient alpha for overall
ratings may be more reliable than the ratings from a single job performance was .86, whereas the mean sample size
supervisor. The Spearman-Brown prophecy formula can weighted alpha across the specific job performance di-
be used to determine the number of peer raters required. mensions was .78. For peers, the coefficient alpha for
overall job performance was .85, whereas the mean sam-
Coefficient of Stability ple size weighted alpha across the specific job perfor-
mance dimensions was .68. There are two potential ex-
Compared with the number of studies reporting in- planations for this result. First, this could have been due
terrater reliabilities or coefficient alphas, very few studies to greater length of the instrument used for measure-
RELIABILITY OF RATINGS 565

ment. In a large number of the studies we coded, overall For example, consider supervisory ratings of overall job
job performance was measured by summing the various performance: the mean interrater reliability estimate is
dimensions of job performance into a composite. How- .52, the mean coefficient of stability is .81 (on the basis of
ever, we should point out that the nature of the relation- ratings from same rater at the two points in time), and
ship between the number of items and reliability can be the mean coefficient alpha is .86. Approximately 29% of
best described as convex. The reliability increases rapidly the variance (.81-.52 = .29) in supervisory ratings of
initially as the number of items increases, but after some overall job performance appears to be due to rater idio-
point the increase in the reliability is very small. Most of syncrasy, whereas 5%, or .86-.81 X 100, of the variance
the scales meta-analyzed in this article had enough items is estimated to be from transient error, assuming true job
that further increases in length and the application of the performance is stable. Similar analyses can be done for
Spearman-Brown formula did not make an appreciable other dimensions on the basis of data reported in Tables
difference. Note that this could indirectly explain our 2-4 to compare the magnitude of the source of error in
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

earlier finding that both supervisor and peer interrater ratings of different dimensions of job performance. In-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

reliabilities for specific dimensions of job performance trarater reliability for supervisory ratings of job perfor-
are similar to the interrater reliabilities for overall job mance dimensions are between .70 and .90. However, the
performance. The second potential explanation for mean interrater reliabilities range approximately be-
higher alphas for ratings of overall job performance stems tween .50 and .65. The difference between intrarater and
from the broadness of the construct of overall job perfor- interrater reliability estimates that we obtained indicate
mance compared with any of the constructs represented that 20% to 30% of the variance in job performance di-
by the individual dimensions of job performance. More- mension ratings of the average rater is specific to the rater.
over, there is some evidence (at least in the personality Using coefficient alpha instead of interrater reliability of
domain) that broader constructs are more reliably rated job performance ratings to correct observed validities
than narrower constructs (Ones & Viswesvaran, in (say in validating interviews) will underestimate the va-
press). That is, this finding could reflect that broader lidity. Lacking empirically derived reliability distribu-
traits or constructs are more reliably rated than narrowly tions, like those yielded by this study, previous meta-ana-
defined traits (Ones & Viswesvaran, in press). Unfortu- lysts may have combined the correct interrater and incor-
nately, this meta-analytic investigation cannot determine rect intrarater reliabilities. However, future meta-
which of the two potential explanations is correct. analyses involving job performance ratings should use
the appropriate reliability coefficients (Schmidt &
Comparison of Different Types of Reliability Hunter, 1996) to obtain more precise estimates of corre-
lations that could be used for theory testing.
Estimates
Conceptually, given that (a) the reliability coefficient is Discussion
the ratio of true to observed variance and (b) observed
variance is true plus error variance, all types of reliability Job performance measures play a crucial role in re-
estimates have the same denominator. Coefficient alpha search and practice. Ratings (both peer and supervisory)
(using a single rater) has variance specific to the rater are an important method of job performance measure-
and variance because of transient error in the numerator. ment in organizations. Many decisions are made on the
Coefficient of stability or rate-rerate with the same rater basis of ratings. As such, the reliability of ratings is an
has variance specific to the rater in the numerator important concern in organizational science. Depending
(assuming true performance did not change in the rate- on the objective of the researcher, different reliability es-
rerate interval), but not transient variance. Thus, the timates need to be assessed. In personnel selection, the
difference between coefficient alpha and coefficient of use of intrarater reliabilities to correct criterion-related
stability with the same rater gives an estimate of the tran- validity coefficients for unreliability in job performance
sient error in that job performance dimension, as noted ratings may result in substantial downward biases in esti-
earlier. Interrater reliability does not have variance spe- mates of actual operational validity. This bias arises
cific to the rater or transient error variance in the numer- mostly from including rater specific error variance
ator. Therefore, the difference between interrater and co- (variance due to rater idiosyncrasies) as true job perfor-
efficient of stability provides an estimate of the variance mance variance in computing intrarater reliability. On
from rater idiosyncrasy. the other hand, what is needed to assess actual job per-
For both peer and supervisory ratings and for all di- formance and its dimensions is an answer to the question:
mensions and overall ratings, interrater reliability esti- Would the same ratings be obtained if a different but
mates are substantially lower than intrarater reliability equally knowledgeable judge rated the same employees.
estimates (coefficients of stability and coefficient alphas). This calls for an assessment of interrater reliability. This
566 VISWESVARAN, ONES, AND SCHMIDT

is why interrater reliability is the appropriate reliability important moderating influence may be whether the rat-
in making corrections for criterion unreliability in vali- ings were obtained for research or administrative
dation research, not coefficient alpha or rate-rerate reli- purposes. McDaniel, Whetzel, Schmidt, and Maurer
ability with the same rater. (1994) found that the purpose of the performance ratings
This article quantitatively summarizes the available (administrative vs. research) moderated the validities of
evidence in the literature for use by researchers and prac- employment interviews. In this study, we examined three
titioners. A question for future research is whether in- moderators of job performance rating reliabilities: type
terrater reliability ratings of overall job performance can of reliability (interrater vs. intrarater), source of rating
be increased by obtaining dimensional ratings before ob- (peer vs. supervisors), and job performance-dimension
taining the overall ratings.4 (Note that a similar potential rated. We were not able to examine the moderating in-
does not exist for intrarater reliabilities.) It is possible fluences of administrative versus research-based ratings.
that when overall performance is rated after dimension This was primarily because, given the number of studies,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ratings are made, interrater reliabilities for overall ratings analysis of any other moderator in a fully hierarchical
This document is copyrighted by the American Psychological Association or one of its allied publishers.

are higher because all raters have a more similar frame- design (Hunter & Schmidt, 1990) would have resulted in
of-reference compared to when the overall performance too few studies for a robust meta-analysis. The concern
rating is made on its own or when overall ratings precede for sufficient data to detect moderators, coupled with the
dimensional ratings. Furthermore, the issue here is com- fact that previous meta-analyses (e.g., Peterson, 1994)
plicated by the fact that in many studies overall job per- that included alternate moderators did not find support
formance ratings are obtained by summing the dimen- for those alternate moderators, led us to focus only on
sional ratings, whereas in others overall ratings are ob- these three moderators (type of reliability, source of rat-
tained on a single item (or a few items) before or after ing, and rating content). However, future research should
dimensional ratings are provided. To the extent frame-of- examine the interaction of these three moderators with
reference effects were operating, the standard deviation other potential moderators, such as the purpose for
of the mean interrater reliability for overall ratings which the ratings were obtained (administrative vs.
should be higher than the standard deviation of mean di- research).
mensional ratings. That is, some studies would have ob- The results reported here can be used to construct re-
tained overall performance ratings prior to dimensional liability artifact distributions to be used in meta-analyses
ratings, and others would have obtained overall ratings (Hunter & Schmidt, 1990) when correcting for unreli-
after dimensional ratings. If the frame-of-reference hy- ability in the criterion ratings. For example, the report by
pothesis were correct, in a meta-analytic investigation a National Academy of Sciences (NAS) panel (Hartigan
this would have been detected as greater variance in the & Wigdor, 1989) evaluating the utility gains from validity
interrater reliability of overall job performance ratings. generalization (Hunter, 1983) maintained that the mean
Of course, the interrater reliability of dimensional ratings interrater reliability estimate of .60 used by Hunter
would not have this source of variance. Hence, the stan- (1983) was too small and that the interrater reliability of
dard deviation of the interrater reliability for overall rat- supervisory ratings of overall job performance is better
ings would be high compared with the standard deviation estimated as .80. The results reported here indicate that
of dimensional ratings. Our results indicate that this is the average interrater reliability of supervisory ratings of
not the case. However, given that the studies contributing job performance (cumulated across all studies available
to our overall job performance analyses were a mixture of in the literature) is .52. FurthermoVe, this value is similar
a sum of dimensional ratings and items directly assessing to that obtained by Rothstein (1990), although we
overall job performance, we cannot reach any definite should point out that a recent large-scale primary study
conclusions regarding the frame-of-reference effects. In (N = 2,249) obtained a lower value of .45 (Scullen et al.,
any event, this is an interesting hypothesis for future 1995). On the basis of our findings, we estimate that the
research. probability of interrater reliability of supervisory ratings
In cumulating results across studies, a concern exists of overall job performance being as high as .80 (as
whether moderating influences are obscured. The low claimed by the NAS panel) is only .0026. These findings
values of the standard deviations (compared with the indicate that the reliability estimate used by Hunter
means) mitigate this concern to some extent. Further- (1983) is, if anything, probably an overestimate of the
more, Churchill and Peter (1984) and Peterson (1994) reliability of supervisory ratings of overall job perfor-
examined as many as 13 moderators of reliability esti- mance. Thus, it appears that Schmidt, Ones, and Hunter
mates (e.g., whether the reliabilities were obtained for re- (1992) were correct in concluding that the NAS panel
search or administrative purposes). No substantial rela- underestimated the validity of the General Aptitude Test
tionships were found between any hypothesized modera-
4
tor and magnitude of reliability estimates. A potentially We thank an anonymous reviewer for suggesting this.
RELIABILITY OF RATINGS 567

Battery (GATE). The estimated validity of other opera- Meglino, 1984) by which the criterion data are gathered
tional tests may be similarly rescrutinized. and thus improve the reliability of the obtained ratings.
An anonymous reviewer presented two concerns as There are several unique contributions of the present
fundamental questions that need to be addressed. First, study. Particularly, we want to clearly delineate how our
the reviewer raised the question whether reliability cor- study contributes over and beyond the Rothstein (1990)
rections should be undertaken when one does not have study, which was the largest scale primary study reported
estimates from the same study in which the validity was examining the interrater reliability of supervisory rat-
estimated. Second, if the answer was affirmative to the ings. First, Rothstein (1990) focused only on interrater
first question, another question arises as to whether one reliabilities. Here, we investigated interrater and intrara-
should use the mean reliabilities reported in this article ter reliabilities, cumulated interrater reliabilities, co-
or some conservative value (e.g., the 80% upper bound efficient alphas, and test-retest reliabilities. Second,
values reported in this article). Rothstein (1990) focused on overall job performance.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

There are two reasons for answering the first question Rothstein (1990) did not examine the reliabilities of di-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

in the affirmative. First, any bias introduced in the esti- mensions of the job performance construct. Given the
mated true validity from using reliability estimates re- theoretical arguments and rating processes hypothesized
ported in this article will be much less than the downward that posit different reliabilities for different dimensions,
bias in validity estimates if no corrections were under- we examined the reliability of different dimensions of job
taken. That is, when reliability estimates from the sample performance as well as the reliability of overall job per-
are not available, the alternative is to make no correc- formance. Third, whereas the Rothstein (1990) study
tions. Second, the meta-analytically obtained reliability was based on a large sample, it was nevertheless a single
estimates reported here may be more accurate than the primary study confined to one research corporation that
sample-based estimates a primary researcher could ob- markets the Supervisory Profile Record (see Rothstein,
tain, given the major effect of sampling error on reliabil- 1990). Finally, Rothstein (1990) focused on reliabilities
ity estimates in single studies (which typically have small of supervisory ratings only. We analyzed both supervi-
sample sizes). Using the meta-analytically obtained esti- sory and peer ratings, and we examined whether the reli-
mates reported here instead of the sample-based esti- abilities of peer and supervisory ratings are similar across
mates may result in greater accuracy. The numerous sim-
job performance dimensions.
ulation studies indicating the robustness of artifact dis-
tribution-based meta-analyses (cf. Hunter & Schmidt, However, in contrast to our study, Rothstein (1990)
1990) support the conclusion that bias is lower when was able to examine the effects of length of exposure on
meta-analytically obtained means are used to correct for interrater reliability with her primary data. We were not
bias than if either (a) sample-based estimates are used in able to test this effect, as most studies did not specify how
the corrections or (b) no corrections are made. long the raters were exposed to the ratees. (Of course,
The answer to the second question raised by the re- that was not the focus in many studies making up our
viewer can also be framed in terms of bias in the esti- database.) Future meta-analytic research should attempt
mated correlations. Using conservative values for reli- to generalize the Rothstein (1990) findings with regard
ability results in more bias than the use of the mean val- to length of exposure to other rating instruments.
ues. Many researchers maintain that being conservative The results of this article offer psychometric insights
is good science, but conservative estimates are by defini- into the psychological and substantive characteristics of
tion biased estimates. We believe it is more appropriate job performance measures. The construction of gener-
to aim for unbiased estimates because the research goal alizable theories of job performance starts with an ex-
is to maximize the accuracy of the final estimates. amination of the reliable measurement of job perfor-
Future meta-analytic research is needed to examine mance dimensions. Given that ratings (supervisory and
the reliability of criteria obtained from other sources peer) are used most frequently in the measurement of
such as customer ratings, self-ratings, and subordinate this central construct, it is crucial that researchers and
ratings. In a large-scale primary study (N= 2,273), Scul- managers be concerned about the reliability of these
len et al. (1995) reported that the inter rater reliability of measurements. For research involving the construct of
subordinate ratings is similar to those obtained for peers job performance, accurate construct measurement is
(ranging between .31 and .36) for various dimensions of predicated on reliable job performance measurement.
job performance. We see the efforts of Scullen et al. For practice, accurate administrative decisions depend
(1995) as a valuable first step in reaching generalizable on the reliable measurement of job performance. It is
conclusions about the reliability of subordinate ratings. our hope that the results presented here can be used to
Future research is also needed to examine the process understand and improve job performance measurement
mechanisms (e.g., Campbell, 1990; DeNisi, Cafferty, & in organizations.
568 VISWESVARAN, ONES, AND SCHMIDT

References ing the relationship of recruitment source to employee per-


formance. Journal of Vocational Behavior, 37, 303-320.
The asterisk (*) indicates studies that were included in *Bledsoe, J. C. (1981). Factors related to academic and job per-
the meta-analysis. formance of graduates of practical nursing programs. Psy-
chological Reports, 49, 367-371.
*Albrecht, P. A., Glaser, E. M., & Marks, J. (1964). Validation Blum, M. L., & Naylor, J. C. (1968). Industrial psychology: Its
of a multiple-assessment procedure for managerial person- theoretical and social foundations. New York: Harper & Row.
nel. Journal of Applied Psychology, 48, 351-360. *Borman, W. C. (1974). The rating of individuals in organiza-
*Anderson, H. E., Jr., Roush, S. L., & McClary, J. E. (1973). tions: An alternate approach. Organizational Behavior and
Relationships among ratings, production, efficiency, and the Human Performance, 12, 105-124.
general aptitude test battery scales in an industrial setting. Borman, W. C. (1979). Format and training effects on rating
Journal of Applied Psychology, 58, 77-82. accuracy and rater errors. Journal of Applied Psychology, 64,
*Arvey, R. D., Landon, T. E., Nutting, S. M., & Maxwell, 410-412.
S. E. (1992). Development of physical ability tests for police Borman, W. C., Hough, L. M., & Dunnette, M. D. (1976). Per-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

officers: A construct validation approach. Journal of Applied formance ratings: An investigation of reliability, accuracy,
Psychology, 77, 996-1009. and relationship between individual differences and rater er-
*Ashford, S. J., & Tsui, A. S. (1991). Self-regulation for mana- ror. Minneapolis, MN: Personnel Decisions.
gerial effectiveness: The role of active feedback seeking. Acad- *Breaugh, J. A. (1981 a). Predicting absenteeism from prior ab-
emy of Management Journal, 34, 251-280. senteeism and work attitudes. Journal of Applied Psychology,
Austin, J. T, & Villanova, P. (1992). The criterion problem: 66, 555-560.
1917-1992. Journal of Applied Psychology, 77, 836-874. *Breaugh, J. A. (1981b). Relationships between recruiting
*Baird, L. S. (1977). Self and superior ratings of performance: sources and employee performance, absenteeism, and work
As related to self-esteem and satisfaction with supervision. attitudes. Academy of Management Journal, 24, 142-147.
Academy of Management Journal, 20, 291-300. Brogden, H. E. (1946). An approach to the problem of differ-
Bandura, A. (1977). Social learning theory. Englewood Cliffs, ential prediction. Psychometrika, 11, 139-154.
NJ: Prentice-Hall. *Buckner, D. N. (1959). The predictability of ratings as a func-
*Barrick, M. R., Mount, M. K., & Strauss, J. P. (1993). Con- tion of interrater agreement. Journal of Applied Psychology,
scientiousness and performance of sales representatives: Test 43, 60-64.
of the mediating effects of goal setting. Journal of Applied *Buel, W. D., & Bachner, V. M. (1961). The assessment of cre-
Psychology, 78, 715-722. ativity in a research setting. Journal of Applied Psychology,
*Bass, A. R., & Turner, J. N. (1973). Ethnic group differences 45, 353-358.
in relationships among criteria of job performance. Journal *Bushe, G. R., & Gibbs, B. W. (1990). Predicting organization
of Applied Psychology, 57, 101-109. development consulting competence from the Myers-Briggs
*Becker, T. E., & Vance, R. J. (1993). Construct validity of type indicator and state of ego development. Journal of Ap-
three types of organizational citizenship behavior: An illus- plied Behavioral Science, 26, 337-357.
tration of the direct product model with refinements. Journal *Butler, M. C., & Ehrlich, S. B. (1991). Positional influences on
of Management, 19, 663-682. job satisfaction and job performance: A multivariate, predic-
*Bernardin, H. J. (1987). Effect of reciprocal leniency on the tive approach. Psychological Reports, 69, 855-865.
relation between consideration scores from the leader behav- Callender, J. C., & Osburn, H. G. (1988). Unbiased estimation
ior description questionnaire and performance ratings. Psy- of the sampling variance of correlations. Journal of Applied
chological Reports, 60, 479-487. Psychology, 73, 312-315.
Bernardin, H. J., & Beatty, R. W. (1984). Performance ap- *Campbell, C. H., Ford, P., Rumsey, M. G., Pulakos, E. D.,
praisal: Assessing human behavior at work. Boston: Kent. Borman, W. C., Felker, D. B., De Vera, M. V, & Riegelhaupt,
*Bhagat, R. S., & Allie, S. M. (1989). Organizational stress, B. J. (1990). Development of multiple job performance mea-
personal life style, and symptoms of life strains: An examina- sures in a representative sample of jobs. Personnel Psychol-
tion of the moderating role of sense of competence. Journal ogy, 43, 277-300.
of Vocational Behavior, 35, 231-253. Campbell, J. P. (1990). Modeling the performance prediction
*Blank, W., Weitzel, J. R., & Green, S. G. (1990). A test of the problem in industrial and organizational psychology. In M.
situational leadership theory. Personnel Psychology, 43, 579- Dunnette&L. M. Hough (Eds.), Handbook ofindustrial or-
597. ganizational psychology (2nd ed.. Vol. 1, pp. 687-732). Palo
*Blanz, F., & Ghiselli, E. E. (1972). The mixed standard scale: Alto, CA: Consulting Psychologists Press.
A new rating system. Personnel Psychology, 25, 185-199. *Campbell, J. P., Dunnette, M. D., Arvey, R. D., & Hellervik,
*Blau, G. (1986). The relationship of management level to L. V. (1973). The development and evaluation of behavior-
effort level, direction of effort, and managerial performance. ally based rating scales. Journal of Applied Psychology, 57,
Journal of Vocational Behavior, 29, 226-239. 15-22.
*Blau, G. (1988). An investigation of the apprenticeship organ- Campbell, J. P., Gasser, M. B., & Oswald, F. L. (1996). The
izational socialization strategy. Journal of Vocational Behav- substantive nature of job performance variability. In K. R.
ior, 32, 176-195. Murphy (Ed.), Individual differences and behavior in organi-
*Blau, G. (1990). Exploring the mediating mechanisms affect- zations (pp. 258-299). San Francisco: Jossey-Bass.
RELIABILITY OF RATINGS 569

Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. DeNisi, A. S., Cafferty, T. P., & Meglino, B. M. (1984). A cog-
(1993). A theory of performance. In N. Schmitt & W. C. nitive view of the performance appraisal process: A model
Borman (Eds.), Personnel selection in organizations (pp. 35- and research propositions. Organizational Behavior and Hu-
70). San Francisco: Jossey-Bass. man Performance, 33, 360-396.
Cascio, W. F. (1991). Applied psychology in personnel manage- *Dicken, C. F, & Black, J. D. (1965). Predictive validity of
ment (4th ed.). Englewood Cliffs, NJ: Prentice-Hall. psychometric evaluations of supervisors. Journal of Applied
*Cascio, W. F., & Valenzi, E. R. (1977). Behaviorally anchored Psychology, 49, 34-47.
rating scales: Effects of education and job experience of raters "Dickinson, T. L., & Tice, T. E. (1973). A multitrait-
andratees. Journal of Applied Psychology, 62, 278-282. multimethod analysis of scales developed by retranslation.
*Cascio, W. E, & Valenzi, E. R. (1978). Relations among cri- Organizational Behavior and Human Performance, 9, 421-
teria of police performance. Journal of Applied Psychology, 438.
63, 22-28. *Distefano, M. K., Jr., Pryer, M. W., & Erffmeyer, R. C. (1983).
*Cheloha, R. S., & Farr, J. L. (1980). Absenteeism, job involve- Application of content validity methods to the development
ment, and job satisfaction in an organizational setting. of a job-related performance rating criterion. Personnel Psy-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Journal of Applied Psychology, 65, 467-473. chology, 56,621-631.


Christensen, L. (1974). The influence of trait, sex, and infor- *Dreher, G. F. (1981). Predicting the salary satisfaction of ex-
mation accuracy of personality assessment. Journal of Per- empt employees. Personnel Psychology, 34, 579-589.
sonality Assessment, 38, 130-135. *Dunegan, K. J., Duchon, D., & Uhl-Bien, M. (1992). Exam-
Churchill, G. A., Jr., & Peter, J. P. (1984). Research design ining the link between leader-member exchange and subor-
effects on the reliability of rating scales: A meta-analysis. dinate performance: The role of task analyzability and vari-
Journal of Marketing Research, 21, 360-375. ety as moderators. Journal of Management, 18, 59-76.
*Cleveland, J. N., & Landy, F. J. (1981). The influence of rater Dunnette, M. D. (1963). A note on the criterion. Journal of
and ratee age on two performance judgments. Personnel Psy- Applied Psychology, 47, 251-253.
chology, 34, 19-29. *Edwards, P. K. (1979). Attachment to work and absence be-
Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). havior. Human Relations, 32, 1065-1080.
Multiple uses of performance appraisal: Prevalence and cor- *Ekpo-Ufot, A. (1979). Self-perceived task-relevant abilities,
relates. Journal of Applied Psychology, 74, 130-135. rated job performance, and complaining behavior of junior
*Cleveland, J. N., & Shore, L. M. (1992). Self- and supervisory employees in a government ministry. Journal of Applied Psy-
perspectives on age and work attitudes and performance. chology, 64, 429-434.
Journal of Applied Psychology, 77, 469-484. *Farh, J., Podsakoff, P. M., & Organ, D. W. (1990). Accounting
*Colarelli, S. M., Dean, R. A., & Konstans, C. (1987). Com- for organizational citizenship behavior: Leader fairness and
parative effects of personal and situational influences on job task scope versus satisfaction. Journal of Management, 16,
outcomes of new professionals. Journal of Applied Psychol- 705-721.
ogy, 72, 558-566. *Farh, J., Werbel, J. D, & Bedeian, A. G. (1988). An empirical
"Cooper, R. (1966). Leader's task relevance and subordinate investigation of self-appraisal-based performance evalua-
behaviour in industrial work groups. Human Relations, 19, tion. Personnel Psychology, 41, 141-156.
57-84. *Farr, J.-L., O'Leary, B. S., & Bartlett, C. J. (1971). Ethnic
*Cooper, R., & Payne, R. (1967). Extraversion and some as- group membership as a moderator of the prediction of job
pects of work behavior. Personnel Psychology, 20, 45-57. performance. Personnel Psychology, 24, 609-636.
*Cortina, J. M., Doherty, M. L., Schmitt, N., Kaufman, G., & *Flanders, J. K. (1918). Mental tests of a group of employed
Smith, R. G. (1992). The "Big Five" personality factors in men showing correlations with estimates furnished by em-
the IPI and MMPI: Predictors of police performance. Person- ployer. Journal of Applied Psychology, 2, 197-206.
nel Psychology, 45, 119-140. *Gardner, D. G., Dunham, R. B., Cummings, L. L., & Pierce,
*Cotton, J., & Stoltz, R. E. (1960). The general applicability of J. L. (1989). Focus of attention at work: Construct definition
a scale for rating research productivity. Journal of Applied and empirical validation. Journal of Occupational Psychol-
Psychology, 44, 276-277. ogy, 62, 61-77.
Cronbach, L. J. (1947). Test reliability: Its meaning and deter- *Gerloff, E. A., Muir, N. K., & Bodensteiner, W. D. (1991).
mination. Psychometrika, 12, 1-16. Three components of perceived environmental uncertainty:
Cronbach, L. J. (1951). Coefficient alpha and the internal An exploratory analysis of the effects of aggregation. Journal
structure of tests. Psychometrika, 16, 297-334. of Management, 17, 749-768.
*David, F. R., Pearce, J. A., II, & Randolph, W. A. (1989). *Ghiselli, E. E. (1942). The use of the Strong vocational inter-
Linking technology and structure to enhance group perfor- est blank and the Pressy senior classification test in the selec-
mance. Journal of Applied Psychology, 74, 233-241. tion of casualty insurance agents. Journal of Applied Psychol-
*Day, D. W., & Silverman, S. B. (1989). Personality and job ogy, 26, 793-799.
performance: Evidence of incremental validity. Personnel *Gough, H. G., Bradley, P., & McDonald, J. S. (1991). Perfor-
Psychology, 42, 25-36. mance of residents in anesthesiology as related to measures
*Deadrick, D. L., & Madigan, R. M. (1990). Dynamic criteria of personality and interests. Psychological Reports, 68, 979-
revisited: A longitudinal study of performance stability and 994.
predictive validity. Personnel Psychology, 43,117-744. *Graen, G., Dansereau, F, Jr., & Minami, T. (1972). An em-
570 VISWESVARAN, ONES, AND SCHMIDT

pirical test of the man-in-the-middle hypothesis among exec- comparison of validation criteria: Objective versus subjective
utives in a hierarchical organization employing a unit-set performance measures and self- versus supervisor ratings.
analysis. Organizational Behavior and Human Performance, Personnel Psychology, 44, 601-619.
8, 262-285. "Hogan, J., Hogan, R., & Busch, C. M. (1984). How to measure
*Graen, G., Novak, M. A., & Sommerkamp, P. (1982). The service orientation. Journal of Applied Psychology, 69, 167-
effects of leader-member exchange and job design on pro- 173.
ductivity and satisfaction: Testing a dual attachment model. *Hough, L. M. (1984). Development and evaluation of the "ac-
Organizational Behavior and Human Performance, 30, 109- complishment record" method of selecting and promoting
131. professionals. Journal of Applied Psychology, 69, 135-146.
*Green, S. B., & Stutzman, T. (1986). An evaluation of meth- *Huck, J. R., & Bray, D. W. (1976). Management assessment
ods to select respondents to structured job-analysis question- center evaluations and subsequent job performance of white
naires. Personnel Psychology, 39, 543-564. and black females. Personnel Psychology, 29, 13-30.
*Greenhaus, J. H., Bedeian, A. G., & Mossholder, K. W. "•Hughes, G. L., & Prien, E. P. (1986). An evaluation of al-
(1987). Work experiences, job performance, and feelings of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ternate scoring methods for the mixed standard scale. Person-


This document is copyrighted by the American Psychological Association or one of its allied publishers.

personal and family well-being. Journal of Vocational Behav- nel Psychology, 39, 839-847.
ior, 31,200-215. Hunter, J. E. (1983). Test validation for 12,000 jobs: An appli-
*Griffin, R. W. (1991). Effects of work redesign on employee cation of job classification and validity generalization to Gen-
perceptions, attitudes, and behaviors: A long-term investiga- eral Aptitude Test Battery (U.S. Employment Service Test
tion. Academy of Management Journal, 34,425-435. Research Report No. 45). Washington, DC: U.S. Depart-
*Guion, R. M. (1965). Synthetic validity in a small company: ment of Labor.
A demonstration. Personnel Psychology, 18, 49-63. Hunter, J. E., & Schmidt, F. L. (1990). Methods ofmeta-analy-
"•Gunderson, E. K. E., & Nelson, P. D. (1966). Criterion mea- sis: Correcting for error and bias in research findings. New-
sures for extremely isolated groups. Personnel Psychology, 19, bury Park, CA: Sage.
67-80. *Ivancevich, J. M. (1980). A longitudinal study of behavioral
*Gunderson, E. K. E., & Ryman, D. H. (1971). Convergent and expectation scales: Attitudes and performance. Journal of
discriminant validities of performance evaluations in ex- Applied Psychology, 65, 139-146.
tremely isolated groups. Personnel Psychology, 24,115-724. "Ivancevich, J. M. (1983). Contrast effects in performance
"Hackman, J. R., & Lawler, E. E., III. (1971). Employee reac- evaluation and reward practices. Academy of Management
tions to j ob characteristics [ Monograph ]. Journal of Applied Journal, 26, 465-476.
Psychology, 55, 259-286. *Ivancevich, J. M. (1985). Predicting absenteeism from prior
"Hackman, J. R., & Porter, L. W. (1968). Expectancy theory absence and work attitudes. Academy of Management Jour-
predictions of work effectiveness. Organizational Behavior nal, 28, 219-228.
and Human Performance, 3,417-426.
"Ivancevich, J. M., & McMahon, J. T. (1977). Black-white
Hartigan, J. A., & Wigdor, A. K. (Eds.). (1989). Fairness in
differences in a goal-setting program. Organizational Behav-
employee testing: Validity generalization, minority issues,
ior and Human Performance, 20, 287-300.
and the General Aptitude Test Battery. Washington, DC: Na-
"Ivancevich, J. M., & McMahon, T. J. (1982). The effects of
tional Academy Press.
goal setting, external feedback, and self-generated feedback
*Hatcher, L., Ross, T. L., & Collins, D. (1989). Prosocial be-
on outcome variables: A field experiment. Academy of Man-
havior, job complexity, and suggestion contribution under
agement Journal, 25, 359-372.
gainsharing plans. Journal of Applied Behavioral Science, 25,
"•Ivancevich, J. M., & Smith, S. V. (1981). Goal setting in-
231-248.
*Hater, J. J., & Bass, B. M. (1988). Superiors' evaluations and terview skills training: Simulated and on-the-job analyses.
subordinates' perceptions of transformational and transac- Journal of Applied Psychology, 66, 697-705.
tional leadership. Journal of Applied Psychology, 73, 695- "•Ivancevich, J. M., & Smith, S. V. (1982). Job difficulty as in-
702. terpreted by incumbents: A study of nurses and engineers.
*Hausman, H. J., & Strupp, H. H. (1955). Non-technical fac- Human Relations, 35, 391 -412.
tors in supervisors' ratings of job performance. Personnel "Jamal, M. (1984). Job stress and job performance contro-
Psychology, 5,201-217. versy: An empirical assessment. Organizational Behavior and
"Heneman, H. G., III. (1974). Comparisons of self- and supe- Human Performance, 33, 1-21.
rior ratings of managerial performance. Journal of Applied "•James, L. R., & Ellison, R. L. (1973). Creation composites for
Psychology, 59, 638-642. scientific creativity. Personnel Psychology, 26, 147-161.
*Heron, A. (1954). Satisfaction and satisfactoriness: Comple- Jensen, A. R. (1980). Bias in mental testing. New York: Free
mentary aspects of occupational adjustment. Occupational Press.
Psychology, 28, 140-153. "Johnson, J. A., & Hogan, R. (1981). Vocational interests, per-
*Hilton, A. C., Bolin, S. R, Parker, J. W, Jr., Taylor, E. K., & sonality and effective police performance. Personnel Psychol-
Walker, W. B. (1955), The validity of personnel assessments ogy, 34, 49-53.
by professional psychologists. Journal of Applied Psychology, "•Jones, J. W., & Terris, W. (1983). Predicting employees' theft
39, 287-293. in home improvement centers. Psychological Reports, 52,
"Hoffman, C. C., Nathan, B. R., & Holden, L. M. (1991). A 187-201.
RELIABILITY OF RATINGS 571

*Jordan, J. L. (1989). Effects of race on interrater reliability of *McCarrey, M. W., & Edwards, S. A. (1973). Organizational
peer ratings. Psychological Reports, 64, 1221-1222. climate conditions for effective research scientist role perfor-
"Jurgensen, C. E. (1950). Intercorrelations in merit rating mance. Organizational Behavior and Human Performance,
traits. Journal of Applied Psychology, 34,240-243. 9, 439-459.
*Keller, R. T. (1984). The role of performance and absenteeism *McCauley, C. D., Lombardo, M. M., & Usher, C. J. (1989).
in the prediction of turnover. Academy of Management Jour- Diagnosing management development needs: An instrument
nal, 27, 176-183. based on how managers develop. Journal of Management, 15,
*King, L. M., Hunter, J. E., & Schmidt, F. L. (1980). Halo in a 389-403.
multidimensional forced-choice performance evaluation McCloy, R. A., Campbell, J. P., & Cudeck, R. (1994). A con-
scale. Journal of Applied Psychology, 65, 507-516. firmatory test of a model of performance determinants.
*Klaas, B. S. (1989). Managerial decision making about em- Journal of Applied Psychology, 79,493-505.
ployee grievances: The impact of the grievant's work history. McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer,
Personnel Psychology, 42, 53-68. S. D. (1994). The validity of employment interviews: A com-
*Klaas, B. S., & DeNisi, A. S. (1989). Managerial reactions to prehensive review and meta-analysis. Journal of Applied Psy-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

employee dissent: The impact of grievance activity on perfor- chology, 79, 599-616.
mance ratings. Academy of Management Journal, 32, 705- *McEvoy, G. M., & Beatty, R. W. (1989). Assessment centers
717. and subordinate appraisals of managers: A seven-year exam-
*Klimoski, R. J., & Hayes, N. J. (1980). Leader behavior and ination of predictive validity. Personnel Psychology, 42, 37-
subordinate motivation. Personnel Psychology, 33, 543-555. 52.
*Knauft, E. B. (1949). A selection battery for bake shop man- *Meglino, B. M., Ravlin, E. C., & Adkins, C. L. (1989). A work
agers. Journal of Applied Psychology, 33, 304-315. values approach to corporate culture: A field test of the value
*Kubany, A. J. (1957). Use of sociometric peer nominations in congruence process and its relationship to individual out-
medical education research. Journal of Applied Psychology, comes. Journal of Applied Psychology, 74,424-432.
41, 389-394. *Meredith, G. M. (1990). Dossier evaluation in screening can-
*Landy, F. J., & Guion, R. M. (1970). Development of scales didates for excellence in teaching awards. Psychological Re-
for the measurement of work motivation. Organizational Be- ports, 67, 879-882.
havior and Human Performance, 5, 93-103. *Meyer, J. P., Paunonen, S. V., Gellatly, I. R., Coffin, R. D., &
"Latham, G. P., Fay, C. H., & Saari, L. M. (1979). The devel- Jackson, D. N. (1989). Organizational commitment and job
opment of behavioral observation scales for appraising the performance: It's the nature of the commitment that counts.
performance of foremen. Personnel Psychology, 32, 299-311. Journal of Applied Psychology, 74, 152-156.
*Latham, G. P., & Wexley, K. N. (1977). Behavioral observa- *Miner, J. B. (1970). Executive and personnel interviews as
tion scales for performance appraisal purposes. Personnel predictors of consulting success. Personnel Psychology, 23,
Psychology, 30, 255-268. 521-538.
*Lawshe, C. H., & McGinley, A. D., Jr. (1951). Job perfor- *Miner, J. B. (1970). Psychological evaluations as predictors of
mance criteria studies: I. The job performance of proofread- consulting success. Personnel Psychology, 23, 393-405.
ers. Journal of Applied Psychology, 35, 316-320. *Mitchell, T. R., & Albright, D. W. (1972). Expectancy theory
*Lee, R., Malone, M., & Greco, S. (1981). Multitrait- predictions of the satisfaction, effort, performance, and re-
multimethod-multirater analysis of performance ratings for tention of naval aviation officers. Organizational Behavior
law enforcement personnel. Journal of Applied Psychology, and Human Decision Processes, 8, 1-20.
66, 625-632. *Morgan, R. B. (1993). Self- and co-worker perceptions of eth-
*Love, K. G. (1981). Comparison of peer assessment methods: ics and their relationships to leadership and salary. Academy
Reliability, validity, friendship bias, and user reaction. of Management Journal, 36, 200-214.
Journal of Applied Psychology, 66, 451-457. *Morse, J. J., & Wagner, F. R. (1978). Measuring the process of
*Love, K. G., & O'Hara, K. (1987). Predicting job perfor- managerial effectiveness. Academy of Management Journal,
mance of youth trainees under a job training partnership act 21, 23-35.
program (JTPA): Criterion validation of a behavior-based *Mossholder, K. W., Bedeian, A. G., Norris, D. R., Giles, W. F.,
measure of work maturity. Personnel Psychology, 40, 323- & Feild, H. S. (1988). Job performance and turnover deci-
340. sions: Two field studies. Journal of Management, 14, 403-
*MacKenzie, S. B., Podsakoff, P. M., & Fetter, R. (1991). Or- 414.
ganizational citizenship behavior and objective productivity *Motowidlo, S. J. (1982). Relationship between self-rated per-
as determinants of managerial evaluations of salespersons' formance and pay satisfaction among sales representatives.
performance. Organizational Behavior and Human Decision Journal of Applied Psychology, 67, 209-213.
Processes, 50, 123-150. *Mount, M. K. (1984). Psychometric properties of subordinate
*Matteson, M. T., Ivancevich, J. M., & Smith, S. V. (1984). ratings of managerial performance. Personnel Psychology, 37,
Relation of type A behavior to performance and satisfaction 687-702.
among sales personnel. Journal of Vocational Behavior, 25, *Nathan, B. R., Morman, A. M., Jr., & Milliman, J. (1991).
203-214. Interpersonal relations as a context for the effects of appraisal
*Mayfield, E. C. (1970). Management selection: Buddy nomi- interviews on performance and satisfaction: A longitudinal
nations revisited. Personnel Psychology, 23, 377-391. study. Academy of 'Management Journal, 34, 352-369.
572 VISWESVARAN, ONES, AND SCHMIDT

*Nealey, S. M., & Owen, T. W. (1970). A multitrait- *Prien, E. P., & Liske, R. E. (1962). Assessments of higher level
multimethod analysis of predictors and criteria of nursing personnel: III. Rating criteria: A comparative analysis of su-
performance. Organizational Behavior and Human Perfor- pervisor ratings and incumbent self-ratings of job perfor-
mance, 5, 348-365. mance. Personnel Psychology, 15, 187-194.
*Niehoff, B. P., & Moorman, R. H. (1993). Justice as a media- *Puffer, S. M. (1987). Prosocial behavior, noncompliant behav-
tor of the relationship between methods of monitoring and ior, and work performance among commission salespeople.
organizational citizenship behavior. Academy of Manage- Journal of Applied Psychology, 72, 615-621.
ment Journal, 36, 527-556. *Pulakos, E. D., Borman, W. C., & Hough, L. M. (1988). Test
*Noe, R. A., & Schmitt, N. (1986). The influence of trainee validation for scientific understanding: Two demonstrations
attitudes on training effectiveness: Test of a model. Personnel of an approach to studying predictor-criterion linkages. Per-
Psychology, 39, 497-523. sonnel Psychology, 41, 703-716.
*Norris, D. R., & Niebuhr, R. E. (1984). Organization tenure *Pulakos, E. D., & Wexley, K. N. (1983). The relationship
as a moderator of the job satisfaction-job performance rela- among perceptual similarity, sex, and performance ratings in
tionship. Journal of Vocational Behavior, 24, 169-178.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

manager-subordinate dyads. Academy of Management Jour-


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Nunnally, J. C. (1978). Psychometric theory. New York: nal, 26, 129-139.


McGraw Hill. *Pym, D. L. A., & Auld, H. D. (1965). The self-rating as a
""O'Connor, E. J., Peters, L. H., Pooyan, A., Weekley, J., Frank, measure of employee satisfactoriness. Occupational Psychol-
B., & Erenkrantz, B. (1984). Situational constraint effects on ogy, 39, 103-113.
performance, affective reactions, and turnover: A field repli- *Rabinowitz, S., & Stumpf, S. A. (1987). Facets of role conflict,
cation and extension. Journal of Applied Psychology, 69, role-specific performance, and organizational level within the
663-672. academic career. Journal of Vocational Behavior, 30, 72-83.
*Oldham, G. R. (1976). The motivational strategies used by *Ronan, W. W. (1963). A factor analysis of eleven job perfor-
supervisors: Relationships to effectiveness indicators. Orga- mance measures. Personnel Psychology, 16, 255-267.
nizational Behavior and Human Performance, 15, 66-86. *Rosinger, G., Myers, L. B., Levy, G. W., Loar, M., Mohrman,
Ones, D. S., & Viswesvaran, C. (in press). Bandwidth-fidelity S. A., & Stock, J. R. (1982). Development of a behaviorally
dilemma in personality measurement for personnel selection. based performance appraisal system. Personnel Psychology,
Journal of Organizational Behavior. 35, 75-88.
*Organ, D. W., & Konovsky, M. (1989). Cognitive versus *Ross, P. F., & Dunfield, N. M. (1964). Selecting salesmen for
affective determinants of organizational citizenship behavior. an oil company. Personnel Psychology, 17, 75-84.
Journal of Applied Psychology, 74, 157-164. *Rosse, J. G. (1987). Job-related ability and turnover. Journal
*Otten, M. W., & Kahn, M. (1975). Effectiveness of crisis cen- of Business and Psychology, 1, 326-336.
ter volunteers and the personal orientation inventory. Psycho- *Rosse, J. G., & Kraut, A. I. (1983). Reconsidering the vertical
logical Reports, 37, 1107-1111. dyad linkage model of leadership. Journal of Occupational
*Parker, J. W., Taylor, E. K., Barrett, R. S., & Martens, L. Psychology, 56,63-71.
(1959). Rating scale content: III. Relationship between su-
*Rosse, J. G., Miller, H. E., & Barnes, L. K. (1991). Combining
pervisory- and self-ratings. Personnel Psychology, 12, 49-63.
personality and cognitive ability predictors for hiring service-
*Parsons, C. K., Herold, D. M., & Leatherwood, M. L. (1985).
oriented employees. Journal of Business and Psychology, 5,
Turnover during initial employment: A longitudinal study of
431-445.
the role of causal attributions. Journal of Applied Psychology,
*Rothstein, H. R. (1990). Interrater reliability of job perfor-
70, 337-341.
mance ratings: Growth to asymptote level with increasing op-
*Penley, L. E., & Hawkins, B. L. (1980). Organizational com-
portunity to observe. Journal of Applied Psychology, 75, 322-
munication, performance, and job satisfaction as a function
327.
of ethnicity and sex. Journal of Vocational Behavior, 16, 368-
384. *Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A.,
Peterson, R. A. (1994). A meta-analysis of Cronbach's coeffi- & Sparks, C. P. (1990). Biographical data in employment
cient alpha. Journal of Consumer Research, 21, 381-391. selection: Can validities be made generalizable? Journal of
*Podsakoff, P. M., Niehoff, B. P., MacKenzie, S. B., & Williams, Applied Psychology, 75, 175-184.
M. L. (1993). Do substitutes for leadership really substitute *Rousseau, D. M. (1978). Relationship of work to nonwork.
for leadership? An empirical examination of Kerr and Jer- Journal of Applied Psychology, 63, 513-517.
mier's situational leadership model. Organizational Behavior *Rush, C. H. Jr. (1953). A factorial study of sales criteria. Per-
and Human Decision Processes, 54, 1-44. sonnel Psychology, 6, 9-24.
*Podsakoff, P. M., Todor, W. D., & Skov, R. (1982). Effects of *Russell, C. J. (1990). Selecting top corporate leaders: An ex-
leader contingent and noncontingent reward and punishment ample of biographical information. Journal of Management,
behaviors on subordinate performance and satisfaction. 16, 73-86.
Academy of Management Journal, 25, 810-821. *Russell, C. J., Mattson, J., Devlin, S. E., & Atwater, D. (1990).
*Prien, E. P., & Kult, M. (1968). Analysis of performance cri- Predictive validity of biodata items generated from retrospec-
teria and comparison of a priori and empirically-derived keys tive life experience essays. Journal of Applied Psychology, 75,
for a forced-choice scoring. Personnel Psychology, 21, 505- 569-580.
513. *Sackett, P. R., Zedeck, S., & Fogli, L. (1988). Relations be-
RELIABILITY OF RATINGS 573

tween measures of typical and maximum job performance. *Spitzer, M. E., & McNamara, W. J. (1964). A managerial se-
Journal of Applied Psychology, 73, 482-486. lection study. Personnel Psychology, 17, 19-40.
Salancik, G. R., & Pfeffer, J. (1978). A social information pro- *Sprecher, T. B. (1959). A study of engineers' criteria for cre-
cessing approach to job attitudes and task design. Admin- ativity. Journal of Applied Psychology, 43, 141-148.
istrative Science Quarterly, 23, 224-253. *Springer, D. (1953). Ratings of candidates for promotion by
*Schaubroeck, J., Ganster, D. C, Sime, W. E., & Ditman, D. co-workers and supervisors. Journal of Applied Psychology,
(1993). A field experiment testing supervisory role clarifi- 37,347-351.
cation. Personnel Psychology, 46, 1-25. *Steel, R. P., & Mento, A. J. (1986). Impact of situational con-
*Schippmann, J. S., & Prien, E. P. (1986). Psychometric evalu- straints on subjective and objective criteria of managerial job
ation of an integrated assessment procedure. Psychological performance. Organizational Behavior and Human Decision
Reports, 59, 111-122. Processes, 37, 254-265.
Schmidt, F. L., & Hunter, J. E. (1992). Development of a causal *Steel, R. P., Mento, A. J., & Hendrix, W. H. (1987). Con-
model of processes determining job performance. Current straining forces and the work performance of finance com-
Directions in Psychological Science, 1, 89-92.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

pany cashiers. Journal of Management, 13, 473-482.


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in *Steel, R. P., & Ovalle, N. K. (1984). Self-appraisal based upon
psychological research: Lessons from 26 research scenarios. supervisory feedback. Personnel Psychology, 37, 667-685.
Psychological Methods, 1, 199-223. *Steel, R. P., Shane, G. S., & Kennedy, K. A. (1990). Effects of
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel social-system factors on absenteeism, turnover, and job per-
selection. In M. R. Rosenzweig & L. W. Porter (Eds.), Annual formance. Journal of Business and Psychology, 4, 423-430.
review of psychology (pp. 627-670). Palo Alto, CA: Annual •Stout, S. K., Slocum, J. W., Jr., & Cron, W. L. (1987). Career
Reviews. transitions of superiors and subordinates. Journal of Voca-
*Schuerger, J. M., Kochevar, K. F., & Reinwald, J. E. (1982). tional Behavior, 30, 124-137.
Male and female correction officers: Personality and rated Stuit, D. B., & Wilson, J. T. (1946). The effect of an increasingly
performance. Psychological Reports, 51, 223-228. well defined criterion on the prediction of success at naval
Scullen, S. E., Mount, M. K., & Sytysma, M. R. (1995). Com- training school (tactical radar). Journal of Applied Psychol-
parison of self, peer, direct report and boss ratings of manag- ogy, 30, 614-623.
ers'performance. Unpublished manuscript. *Stumpf, S. A. (1981). Career roles, psychological success, and
*Seybolt, J. W., & Pavett, C. M. (1979). The prediction of effort job attitudes. Journal of Vocational Behavior, 19, 98-112.
and performance among hospital professionals: Moderating
*Stumpf, S. A., & Rabinowitz, S. (1981). Career stage as a
effects of feedback on expectancy theory formulations.
moderator of performance relationships with facets of job
Journal of Occupational Psychology, 52, 91-105.
satisfaction and role perceptions. Journal of Vocational Be-
*Siegel, A. I., Schultz, D. G., Fischl, M. A., & Lanterman, havior, 18,202-218.
R. S. (1968). Absolute scaling of job performance. Journal
*Sulkin, H. A., & Pranis, R. W. (1967). Comparison of griev-
of Applied Psychology, 52, 313-318.
ants with non-grievants in a heavy machinery company. Per-
*Siegel, L. (1982). Paired comparison evaluations of manage-
sonnel Psychology, 20, 111-119.
rial effectiveness by peers and supervisors. Personnel Psychol-
*Swaroff, P. G., Barclay, L. A., & Bass, A. R. (1985). Recruiting
ogy, 35, 843-852.
*Slocum, J. W., Jr., & Cron, W. L. (1985). Job attitudes and sources: Another look. Journal of Applied Psychology, 70,
720-728.
performance during three career stages. Journal of Voca-
tional Behavior, 26, 126-145. *Szilagyi, A. D. (1980). Causal inferences between leader re-
*Smircich, L., & Chesser, R. J. (1981). Superiors' and subordi- ward behaviour and subordinate performance, absenteeism,
nates' perceptions of performance: Beyond disagreement. and work satisfaction. Journal of Occupational Psychology,
Academy of Management Journal, 24, 198-205. 53, 195-204.
*Sneath, F. A., White, G. C., & Randell, G. A. (1966). Validat- "Taylor, E. K.., Schneider, D. E., & Symons, N. A. (1953). A
ing a workshop reporting procedure. Occupational Psychol- short forced-choice evaluation form for salesmen. Personnel
ogy, 40, 15-29. Psychology, 6, 393-401.
*Soar, R. S. (1956). Personal history as a predictor of success in Taylor, R. L., & Wilsted, W. D. (1974). Capturing judgment
service station management. Journal of Applied Psychology, policies: A field study of performance appraisal. Academy of
40, 383-385. Management Journal, 17, 440-449.
*South, J. C. (1974). Early career performance of engineers— Taylor, R. L., & Wilsted, W. D. (1976). Capturing judgment
Its composition and measurement. Personnel Psychology, 27, policies in performance rating. Industrial Relations, 15,216-
225-243. 224.
*Spector, P. E., Dwyer, D. J., & Jex, S. M. (1988). Relation of Taylor, S. M., & Schmidt, D. W. (1983). A process-oriented
job stressors to affective, health, and performance outcomes: investigation of recruitment source effectiveness. Personnel
A comparison of multiple data sources. Journal of Applied Psychology, 36, 343-354.
Psychology, 73, 11-19. Tenopyr, M. L. (1969). The comparative validity of selected
*Spencer, D. G., & Steers, R. M. (1981). Performance as a leadership scales relative to success in production manage-
moderator of the job satisfaction-turnover relationship. ment. Personnel Psychology, 22, 77-85.
Journal of Applied Psychology, 66, 511-514. Thompson, D. E., & Thompson, T. A. (1985). Task-based per-
574 VISWESVARAN, ONES, AND SCHMIDT

formance appraisal for blue-collar jobs: Evaluation of race survival of new employees. Personnel Psychology, 32, 651-
and sex effects. Journal of Applied Psychology, 70, 747-753. 662.
"Thomson, H. A. (1970). Comparison of predictor and crite- "Wayne, S. J., & Ferris, G. R. (1990). Influence tactics, affect,
rion judgments of managerial performance using the and exchange quality in supervisor-subordinate interactions:
multitrait-multimethod approach. Journal of Applied Psy- A laboratory experiment and field study. Journal of Applied
chology, 54, 496-502. Psychology, 75, 487-499.
Toops, H. A. (1944). The criterion. Educational and Psycho- "Wernimont, P. F, & Kirchner, W. K. (1972). Practical prob-
logical Measurement, 4, 271-297. lems in the revalidation of tests. Occupational Psychology, 46,
Tsui, A. S., & Ohlott, P. (1988). Multiple assessment of man- 25-30.
agerial effectiveness: Interrater agreement and consensus in "Wexley, K. N., Alexander, R. A., Greenawalt, J. P., & Couch,
effectiveness models. Personnel Psychology, 41, 779-803. M. A. (1980). Attitudinal congruence and similarity as re-
Tucker, M. F., Cline, V. B., & Schmitt, J. R. (1967). Prediction lated to interpersonal evaluations in manager-subordinate
of creativity and other performance measures from biograph- dyads. Academy of Management Journal, 23, 320-330.
ical information among pharmaceutical scientists. Journal of "Wexley, K. N., & Pulakos, E. D. (1982). Sex effects on perfor-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Applied Psychology, 51, 131-138. mance ratings in manager-subordinate dyads: A field study.
Turner, W. W. (1960). Dimensions of foreman performance: Journal of Applied Psychology, 67,433-439.
A factor analysis of criterion measures. Journal of Applied "Wexley, K. N., & Youtz, M. A. (1985). Rater beliefs about
Psychology, 44, 216-223. others: Their effects on rating errors and rater accuracy.
"Validity information exchange. (1954). No. 7-045. Personnel Journal of Occupational Psychology, 58, 265-275.
Psychology, 7, 279. "Williams, C. R., Labig, C. E., Jr., & Stone, T. H. (1993). Re-
"Validity information exchange. (1954). No. 7-089. Personnel cruitment sources and posthire outcomes for job applicants
Psychology, 7, 565-566. and new hires: A test of two hypotheses. Journal of Applied
"Validity information exchange. (1956). No. 9-32. Personnel Psychology, 78, 163-172.
Psychology, 9, 375-377. "Williams, L. J., & Anderson, S. E. (1991). Job satisfaction and
"Validity information exchange. (1958). No. 11-10. Personnel organizational commitment as predictors of organizational
Psychology.il, 121-123. citizenship and in-role behaviors. Journal of Management,
"Validity information exchange. (1958). No. 11-27. Personnel 77,601-617.
Psychology, 11, 583-584. "Williams, W. E., & Seiler, D. A. (1973). Relationship between
"Validity information exchange. (1958). No. 11-30. Personnel measures of effort and job performance. Journal of Applied
Psychology, 11, 587-590. Psychology, 57, 49-54.
"Validity information exchange. (1960). No. 13-03. Personnel Wohlers, A. J., & London, M. (1989). Ratings of managerial
Psychology, 13, 449-450. characteristics: Evaluation difficulty, co-worker agreement,
"Validity information exchange. (1963). No. 16-04. Personnel and self-awareness. Personnel Psychology, 42, 235-261.
Psychology, 16, 181-183. "Woodmansee, J. J. (1978). Validation of the nurturance scale
"Validity information exchange. (1963). No. 16-05. Personnel of the Edwards Personal Preference Schedule. Psychological
Psychology, 16, 283-288. Reports, 42, 495-498.
"Vecchio, R. P. (1987). Situational leadership theory: An ex- "Worbois, G. M. (1975). Validation of externally developed as-
amination of a prescriptive theory. Journal of Applied Psy- sessment procedures for identification of supervisory poten-
chology, 72,444-451. tial. Personnel Psychology, 28, 77-91.
"Vecchio, R. P., & Gobdel, B. C. (1984). The vertical dyad link- "Yammarino, F. J., & Dubinsky, A. J. (1990). Salesperson per-
age model of leadership: Problems and prospects. Organiza- formance and managerially controllable factors: An investi-
tional Behavior and Human Performance, 34, 5-20. gation of individual and work group effects. Journal of Man-
"Villanova, P., & Bernardin, J. H. (1990). Work behavior cor- agement, 16, 87-106.
relates of interviewer job compatibility. Journal of Business "Yukl, G. A., & Latham, G. P. (1978). Interrelationships
and Psychology, 5, 179-195. among employee participation, individual differences, goal
Viswesvaran, C. (1993). Modeling job performance: Is there a difficulty, goal acceptance, goal instrumentality, and perfor-
generalfactor? Unpublished doctoral dissertation, University mance. Personnel Psychology, 31, 305-323.
of Iowa, Iowa City. "Zedeck, S., & Baker, H. T. (1972). Nursing performance as
Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Com- measured by behavioral expectation scales: A multitrait-
bining psychometric meta-analysis and structural equations multirater analysis. Organizational Behavior and Human De-
modeling. Personnel Psychology, 48, 865-887. cision Processes, 7, 457-466.
"Waldman, D. A., Yammarino, F. J., & Avolio, B. J. (1990).
A multiple level investigation of personnel ratings. Personnel Received October 10, 1995
Psychology, ¥5,811-835. Revision received March 29, 1996
"Wanous, J. P., Stumpf, S. A., & Bedrosian, H. (1979). Job Accepted April 22, 1996 •

Potrebbero piacerti anche