Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
999
1000 PERSONNEL PSYCHOLOGY
who took the same entrance test and completed the same training academy
class. The design allowed us to estimate joint operational validities for
cognitive and physical test components and examine the career-long sta-
bility of the estimates for our incumbent sample. We were able to test for
predictor × predictor and predictor × tenure interactions relevant to issues
discussed earlier. Using additional measures obtained during training, we
also describe a post hoc procedure to determine what modifications of
the original test might result in increased validity and compare validi-
ties of physical firefighter simulations with direct assessments of strength
and endurance. Finally, we extend the case for the construct validity of
GCA and strength/endurance as factors underlying firefighter training and
career-long job performance.
Method
Participants
Measures
The cognitive abilities component of the test had 120 questions in six
sections: (a) recall of study material from fire training manuals, includ-
ing diagrams; (b) reading comprehension based on technical materials
from fire service manuals, including mechanical diagrams and a graph;
(c) following a series of commands to navigate through a 5 × 5 letter grid;
(d) performing computations using simple formulas related to firefighting;
(e) drawing conclusions from brief written statements; (f) identifying a
NORMAN D. HENDERSON 1005
set of numbers, letters, or symbols that differed from the remaining sets.
Using the nomenclature of Carroll (1993), the six test sections reflected the
following first-order cognitive factors: (1) Associative Memory, Mean-
ingful Memory, and Visual Memory; (2) Reading Comprehension, Vi-
sualization, and Mechanical Knowledge; (3) Integrative Processes and
Sequential Reasoning; (4) Numerical Facility and Quantitative Rea-
soning; (5) Sequential Reasoning and Reading Comprehension; (6)
Induction.
KR-20 reliability coefficients of the test sections ranged from .65 to
.84 with a full-scale reliability of .93 (N = 2,157). Test section loadings
ranged from .72 to .81 on the first principal factor obtained from the scores.
The written portion of the firefighter entry exam was a highly g-saturated
selection test, reflecting what is commonly referred to as GCA in the
psychometric literature (e.g., Carroll, 1993) and sometimes referred to as
general mental ability (GMA) in the job selection literature (e.g., Schmidt
& Hunter, 2004). GCA is used in this report to refer to both terms and
the term intelligence, as defined by Cleary, Humphreys, Kendrick, and
Wesman (1975).
The physical abilities component of the selection test consisted of two
timed events. The first event simulated a fire scene arrival. Candidates were
fitted with the department’s self-contained breathing apparatus (SCBA)
tanks, without the mask and breathing tube. The applicant had to drag
two lengths of 6.4 cm hose a total of 55 m, 27.5 m in one direction,
drop the coupling, run to other end of hose, pick up and return; run
22 m to a pumper apparatus ladder rack, remove the 3.7 m one-person
straight ladder, weighing 15.9 kg, from the rack; carry the ladder into
the adjacent fire training tower, placing it against the back wall of the
first landing; continue up the stairwell to the fifth floor (total climb =
15.2 m); return to the ladder, retrieve it, and replace it on the pumper
rack. Candidates were briefed prior to the event and observed at least one
individual complete the event prior to testing. The applicant pool median
was 106.6 sec. After a 10 minute rest, candidates undertook Event 2, a
simulated rescue evolution, still wearing the SCBA tank. The evolution
consisted of dragging a 45.4 kg sack by a handle a total of 21.3 m, which
included 12 m of low headroom, narrow space (1 m high, 1.2 m wide).
Median seconds to complete the event was 38.8. Total seconds required
to complete the two timed events was used as the measure of entry-level
physical abilities test performance. In most critical fire-suppression and
rescue operations, the probability of success decreases with increasing
task completion time. Time required to execute fire ground tasks is the
widely accepted performance measure in fire service (e.g., Clark, 1973;
Cortez, 2001).
1006 PERSONNEL PSYCHOLOGY
During fire academy training the two timed events of the original selec-
tion test were rerun in an identical manner in 1985 with the cadets. After
a 15 minute rest, the dummy drag event was repeated to obtain a within-
session test–retest reliability coefficient (r xx = .83). After a 15 minute
rest, cadets were then required to complete as many lifts (reps) as possible
with a 15 kg barbell in 90 seconds. Lifts were done standing, using only
upper body strength. The task was repeated after another 15 minute rest to
obtain a within-session reliability coefficient (r xx = .61). The total num-
ber of lifts summed across both sessions, with a ceiling total of 120, was
used as the bar lift measure (Spearman-Brown corrected r xx = .76). The
test was designated to measure upper body muscular endurance, using a
relatively light weight with large number of reps.
NORMAN D. HENDERSON 1007
The decision to use global ratings in this study was based on prior
research in the department, in which 315 firefighters were evaluated on
several types of criterion measures, following a job analysis (Henderson,
1985). Most of the variance in a 16 firefighter job dimensions scale derived
by Bownas and Heckman (1976) was accounted for by a single factor.
The reliabilities of individual rating dimensions and the full-scale average
rating were below those obtained with more global evaluations. Variance
in nine scales assessing knowledge, skills, abilities, and other traits related
to firefighting was largely accounted for by a cognitive (knowledge and
judgment) and a physical (strength and endurance) factor. These two scales
were subsequently adopted for the 1992 rating study, along with a work
output measure described below. An added benefit of the shorter global
ratings procedure was greater supervisor cooperation and higher return
rates.
All job performance ratings were designed to assess the rater’s cu-
mulative impression of the incumbent’s work behavior not just recent
behavior. In the 1992 and all later job ratings, all 74 incumbents were
rated on each occasion, including firefighters who had left the department
prior to the time of the rating. The five firefighters in the study sample
who had left by 1992 had 2–4 (M = 3.2) years of postacademy time on
the job, each observed by multiple supervisors. By the 2006 senior officer
ratings, 14 firefighters had left the department for a variety of reasons,
including injuries. This group had a mean of 9.5 years of active job duty
before leaving. In all ratings this study group represented only a subgroup
of a larger group of approximate agecohorts who were being evaluated at
that time.
A five-point rating scale was used for the three global 1992 ratings
described below. The 83 participating supervisors were instructed to only
rate firefighters that they worked with sufficiently to rate with confidence.
Supervising officers usually rated a large number of firefighters (M = 60.2,
approximately 11% of total list). We eliminated between-rater differences
in means and SD by converting each supervisor’s raw ratings on the three
scales (physical, knowledge, work output) into within-rater T-scores, with
M = 50 and SD = 10. The standardized T-score ratings were used to
compute average ratings for each firefighter. Mean T-score ratings for the
study sample of 74 were very close to total sample T-score means of 50
for each scale (48.7 to 50.4). Firefighters in the study sample received an
average of 10.4 supervisor ratings for each scale. Mean supervisor T-score
ratings were used as criterion measures for each participant.
Between-rater reliability coefficients were computed for both the full
sample and the study group, using the ANOVA methods described in
1008 PERSONNEL PSYCHOLOGY
The first principle component (PC) factor score was extracted from
the 1992 composite global rating, the 2002 “Elite Firefighter Squad”
supervisor ratings, the 2002 “Elite Firefighter Squad” peer ratings and
the 2006 senior officer “Outstanding Career Nominations” for the study
sample. This PC factor score represents common performance variance
based on assessments obtained by different methods and rater groups
obtained over nearly the full careers of the incumbents. It also provides a
parallel measure to the performance ratings latent trait used in structural
modeling.
Results
First PC, academy physical measures .78i .23∗ .25 (.04, .45) .72† .90 (.83, 95)
Officer mean global ratings: T-scores (Sept. 1992)
Firefighting knowledge and judgment .85i .52† .77 (.61, .87) .13 –
Physical strength and endurance .91i .30∗ ∗ .50 (.21, .70) .61† .80 (.68, .89)
Continued
1011
1012
TABLE 1 (continued)
c
Observed correlations in incumbent sample.
d
Operational validity coefficients, corrected for indirect range restriction and criterion unreliability using Hunter, Schmidt, and Le (2006) procedure.
HSL corrected lower and upper values of r XYi 90% CI are shown in parentheses. Corrected coefficients and CI involving negative values are omitted (see
text).
Reliability coefficients: t = test-retest; i = interrater; a = assumed to be near unity. N = 74 for all criterion measures (see text).
One-tail significance levels (N = 74): ∗ P < .025; ∗∗ P < .01; ∗∗∗ P < .001; † P < .0001.
NORMAN D. HENDERSON 1013
2.0 2.0
1.0 1.0
0.0 0.0
-1.0 -1.0
-2.0 -2.0
r = .77 r = .70
-3.0 -3.0
3.0 3.0
2.0 2.0
1.0 1.0
0.0 0.0
-1.0 -1.0
r = .48
r = .47
-2.0 -2.0
(1st UPC) score derived from six measures: performance on the 1983
physical abilities selection test, the 1985 retest, barbell reps, 1 rep max-
imum kg lift, push-ups, and the mile run. The 1st UPC score from all
six academy measures was used as the training success criterion for val-
idating GCA. The 1st UPC score obtained from the physical, practical,
and ground ladder performance ratings was used as the physical training
success criterion for validating physical strength and endurance.
.41 vs. r XYi = .32), the profile of predictor-criterion correlations was nearly
identical (r = .97) across all criteria. The two Physical test events were
relatively short—applicant pool medians were 106.6 sec and 38.8 sec and
academy class medians were 94.0 sec and 31.0 sec. We examined how
the two events correlated with assessments of aerobic fitness (mile run
time), muscular strength (RM-1 weight lifted), and muscular endurance
(# barbell reps). Correlations were, respectively .60, .60, and .65 for the
hose drag and tower climb event and .46, .54, and .62 for the dummy drag
rescue event. Correlations with the 1985 retest were nearly identical to
those obtained with the 1983 entry test.
We examined the possibility of obtaining incremental validity of the
physical component of the 1983 selection test by adding the measures of
upper body muscular strength (1-rep maximum kg), muscular endurance
(number of 15-kg bar lifts), and aerobic endurance (mile run) in predicting
each of the physically based criterion measures. Neither mile run nor 1-rep
maximum kg significantly increase validity for any criterion measure but
adding the 15-kg barbell lifts score did increase the validity coefficients for
supervisor rating criteria in all three rating years and for the 1st PC based
on the ratings (R2 increases from .04 to .08; P = .057 to .001). The barbell
lifts measure also increases the validity coefficient for predicting number
of fire and rescue runs (R2 increase = .07; P < .02), and firefighters with
greater muscular endurance during training tended to be located in more
active station houses (r = .30, P < .01).
Table 2 summarizes the predictive validities of measures of upper body
muscular endurance, upper body strength, aerobic fitness, and their equal
z-score weighted sum (SEA composite) for physically based training and
job criteria. Predictive validities of the SEA composite were generally
comparable to those shown in Table 1 for the physical ability screening
test across the entire 21-year assessment period. Validity coefficients of
muscular endurance scores were consistently higher than comparable co-
efficients for strength and aerobic endurance measures across the 21-year
period, although strength and aerobic endurance were also significant
predictors for most criterion measures. The observed validity coefficient
(r XYi ) between muscular endurance and the ratings 1st PC (.53) was signif-
icantly larger than the comparable strength and aerobic endurance validity
coefficients of .29 and .26, respectively.
The physical ability selection test and the postacademy re-
administration of the test were separated by 26 months. Test–retest re-
liability for the incumbent class is.85 (r XXa ≈ .92). A regression of retest
time on initial test time showed a linear fit with slope of 1.24 (P < .0001)
with intercept not significantly different from zero. The mean time in-
creased 24% and the retest SD increased 45%, from 26.0 to 37.7. Despite
the decline in physical retest scores, validity coefficients for the original
TABLE 2
Correlations Between Strength and Endurance Measures and Subsequent Physically Demanding Training and Job Performance
Criteria
Muscular Aerobic
endurancea Strengthb endurancec SEA composited
r XYi e r XPa f r XYi e r XPa f r XYi e r XPa f r XYi e r XPa f
1985 fire academy evaluations
Physical and practical skills .58† .80 .48† .64 .31∗ ∗ .43 .61† .83
Handling ground ladders .46† .69 .45† .61 .21 .31 .50† .73
1992 Officer mean global ratings
Physical strength and endurance .68† .84 .45† .56 .30∗ ∗ .38 .62† .79
Work output .44† .63 .22 .30 .24∗ .31 .38∗ ∗ ∗ .56
Composite global rating .55† .72 .30∗ ∗ .38 .25∗ .32 .47† .64
“Elite Firefighter Squad” (Jan. 2002)
Number of officer nominations [sqrt] .50† .69 .29∗ ∗ .38 .28∗ ∗ .37 .47† .66
Number of peer nominations [sqrt] .32∗ ∗ .53 .21 .32 .16 .24 .31∗ ∗ .51
Fire Suppression and Rescue (1987–2006)
Total months in active duty .53† .69 .35∗ ∗ ∗ .43 .36∗ ∗ ∗ .44 .56† .70
Estimated number of fire and rescue runs .49† .65 .32∗ ∗ .39 .41∗ ∗ ∗ .49 .53† .69
NORMAN D. HENDERSON
M and 90% CI for composite score coefficients ranged from .83 (.68, .91) to .51 (.22, .70).
One-tail significance levels (N = 74): ∗ P < .02; ∗∗ P < .01; ∗∗∗ P < .001; † P < .0001.
1018 PERSONNEL PSYCHOLOGY
and retest scores were very similar—the mean r XYi in 1983 and 1985 for
the 18 physically based criterion measures was .60 and .61, respectively.
Henderson et al. (2007) found that when direct measures of strength
were used as predictors of performance on firefighter simulation events,
predictor-criterion relationships were linear at the upper range of strength
and endurance but showed a drop off in task performance at low strength
levels. We examined the current data for evidence of a similar drop-off
effect using the 1-rep maximum strength measure, the number of 15 kg
bar reps, mile run time, and the SEA composite as predictors of physically
based fire task performance, reflected in the 1983 Physical test and 1985
retest, and a fire academy PPS + ladders task rating composite score. The
performance drop-off effect was found for all three fire task performance
measures for the strength, muscular endurance, and SEA composite pre-
dictors. In each of the nine analyses, the quadratic component (squared
predictor score) produced a significant increment in prediction of task per-
formance, with P < .01 in eight of nine cases. The performance drop-off
effect was not found for any of the task measures when aerobic capacity
(mile run time) was used as the predictor.
Construct Validity
2006 Outstanding
Career Nominations
2002 Elite Squad 2002 Elite Squad
Peer Nomination Officer Nomination
1992 Composite .85
Rating
.62 .85
.68
Performance
Ratings
.43
.51
Phys & Pract Ladder State EMT Academy Critical Task
Skills rating Rating Exams Grades Deficiencies
.87 .89
General Cognitive
Strength/Endurance Ability
.90 .87
.95 .95
Discussion
The high operational validity of the g-loaded cognitive test and the
strong relationship between cognitively based performance during training
and career-long job performance ratings observed in this study call into
question the veracity of assertions that cognitive assessments obtained in
nonstress environments are poor predictors of decision-making behavior
of firefighters in high stress work situations. The effects of stress and
time pressure on decision making (increased errors, neglect of peripheral
cues, reduced working memory capacity, and speed/accuracy trade-offs)
are reasonably consistent in the literature (e.g., Orasanu, 1997), but a
deep and extensive base of organized knowledge appears to mitigate these
stress effects (Klein, 1996). The knowledge base obtained by training
and experience is not only available for decision making, thus reducing
the capacity demand of the task, it can also lead to greater confidence
in an individual’s ability to deal with stressful situations by reducing the
stress of threat (Orasanu, 1997). Given this knowledge-base contribution
to enhanced performance under stress, finding the substantive path from
GCA to training success to career-long job success is not surprising.
The 1983 selection test examined here and most of the fire service tests
reviewed by Barrett et al. (1999) were developed using a content-based
validity model, with test item development based on knowledge, skills,
and abilities identified as important for job success. The criterion-based
validity evidence for these g-saturated tests provides a convergent line of
evidence for the job relatedness of these tests for firefighter selection. The
regression coefficients of the structural model in Figure 2 provide further
validity evidence at the construct level. The consistency of relationships
among the four job ratings obtained using different methods over many
years supports a general job performance rating construct, which is not
time or method dependent, and is correlated with an academy cognitive
performance latent variable, which in turn is highly correlated with GCA,
as reflected by the 1983 selection test.
The 1983 Physical test consisted of two distinct events, the first re-
quiring less than 2 minutes and the second requiring less than 1 minute
for most applicants. Although physical tests with discrete events were
common at that time, this test format began to come under criticism dur-
ing challenges to the validity of firefighter physical selection tests. Some
critics argued firefighting is primarily an aerobic activity, whereas short
duration (<5 min) events are primary indicators of strength with a minor
aerobic component. Physical selection tests began to evolve into exercises
in which applicants completed several tasks in one continuous event re-
quiring several minutes, with scores based on total completion time. Such
tests diminish the short duration criticism but at a cost of increased psy-
chometric complexity and less control over the relative weights of various
task components on the test score.
All relevant evidence in our study suggests that criticisms of shorter
discreet test events are unfounded. The short individual simulation events
both correlated highly with the mile run assessment of aerobic endurance
(hose/tower r = .62, dummy rescue r = .47), and adding mile run times
to the physical component of the 1983 selection exam did not increase
predictive validity for any criterion measure in Table 1. Our results con-
tradict the joint claims that critical firefighting and rescue physical tasks
are primarily aerobic and that short duration test events do not reflect
aerobic fitness. The first assertion appears to be based on expert witness
opinion in litigation unsubstantiated by empirical research. The second
assertion was based the results of Ästrand and Rodhal (1986), which
indicated that a 2-minute exercise duration is required for equal contri-
butions from aerobic and anaerobic processes. The Ästrand-Rodahl time
course was later found to be determined incorrectly (Medbo & Tabata,
1989) and that aerobic energy release accounted for 40% of total energy
during the first 30 sec and 50% during the first 60 sec of exhausting
exercise.
Our firefighter sample showed a substantial decline in retest scores
on the Physical test readministered 26 months later. Presumably at peak
fitness at the time of the 1983 Physical test, firefighters across the full
range of 1983 scores showed an average 24% increase in time to complete
the test simulations in 1985. Because time increases were proportional
to 1983 time scores, the absolute task completion time differences be-
tween high and low performing firefighters increased over this period. If
a physical ability cut point had been used in selection, many hires near
that cut point would have entered training already below the designated
minimum acceptable physical ability level. Results from our test-retest
design for the physical directly contradict the claim that, because physical
capabilities can change over time, long delays between testing and hire
can compromise the validity of physical screening tests. Two-year test–
retest reliability for the incumbents was .85 (applicant pool r XXa ≈ .92),
NORMAN D. HENDERSON 1029
and validity coefficients for the test and retest were nearly identical for
both training and job criteria, despite the drop-off in retest performance.
The results from this study and others cited suggest that variation
in firefighting knowledge and decision making is primarily GCA-based
for individuals with comparable experience. In a parallel manner, the SE
construct depicted in Figure 2 and found in numerous studies in the psy-
chological (e.g., Arnold, Rauschenberger, Soubel, & Guion, 1982; Hen-
derson et al., 2007; Hogan, 1991a, 1991b; Meyers, Gebhardt, Crump, &
Fleishman, 1993) and exercise literature (e.g., Berger, 1963; Marsh, 1993;
Mead & Legg, 1994; Shaver, 1971) appears to be a physical counterpart
of GCA in fire service. SE encompasses both the full range of physically
demanding fire-suppression and rescue activities, and direct measures of
physical strength, upper body muscular endurance, and aerobic fitness.
Direct measures of muscular strength and endurance do however appear
to over predict firefighter task performance for firefighters at the low end
of strength and endurance distributions.
Practical Implications
REFERENCES
Arnold JD, Rauschenberger JM, Soubel WG, Guion RM. (1982). Validation and utility of
a strength test for steelworkers. Journal of Applied Psychology, 67, 588–604.
Arvey RD, Nutting SM, Landon TE. (1992). Validation strategies for physical ability testing
in police and fire settings. Public Personnel Management, 21, 301–312.
NORMAN D. HENDERSON 1035
Dawes RM, Corrigan B. (1974). Linear models in decision making. Psychological Bulletin,
81, 95–106.
Doolittle TL, Kaiyala K. (1986). Strength and musclo-skeletal injuries of firefighters.
Proceedings of the Annual Conference of the Human Factors Association of Canada,
49–52. Vancouver, BC.
Draper N, Smith H. (1981). Applied regression analysis. (Revised ed.). New York: Wiley.
EEOC v. Atlas Paper, 868 F. 2d 1487 (6th Cir. 1989), cert. denied, 493, U.S. 814.
Fiske DW. (1971). Measuring the concepts of personality. Chicago: Aldine-Atherton.
Grofman B, Merrill S. (2004). Anticipating likely consequences of lottery-based affirmative
action. Social Science Quarterly, 85, 1447–1468.
Gutman A. (2009). Major EEO issues relating to personnel selection decisions. Human
Resource Management Review, 19, 232–250.
Harbin G, Olson J. (2005). Post-offer, pre-placement testing in industry. American Journal
of Industrial Medicine, 47, 296–307.
Henderson ND. (1985). Validity and utility of a cognitive-physical screening test for fire-
fighters. (Technical report). Oberlin, OH: Henderson & Associates.
Henderson ND, Berry MW, Matic T. (2007). Field measures of strength and fitness predict
firefighter performance on physically demanding tasks. P ERSONNEL P SYCHOL -
OGY , 60, 431–473.
Hoffman CC. (1999). Generalizing physical ability test validity: A case study using test
transportability, validity generalization, and construct related validation evidence.
P ERSONNEL P SYCHOLOGY, 52, 1019–1042.
Hogan J. (1991a). Physical abilities. In Dunnette MD, Hough LM (Eds.), Handbook of
industrial-organizational psychology (2nd ed., pp. 753–831). Palo Alto, CA: Con-
sulting Psychologists Press.
Hogan J. (1991b). The structure of physical performance in occupational tasks. Journal of
Applied Psychology, 76, 495–507.
Hunter JE, Schmidt FL. (1990). Methods of meta-analysis: Correcting error and bias in
research findings. Beverly Hills, CA: Sage.
Hunter JE, Schmidt FL. (1996). Intelligence and job performance: Economic and social
implications. Psychology, Public Policy and Law, 2, 447–472.
Hunter JE, Schmidt FL. (2004). Methods of meta-analysis: Correcting error and bias in
research findings. Thousand Oaks, CA: Sage.
Hunter JE, Schmidt FL, Le H. (2006). Implications of direct and indirect range restriction
for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–
612.
International Public Management Association for Human Resources. (1996). B3R and B4R
Technical Report. Alexandria, VA: Author.
International Public Management Association for Human Resources. (2009). B-5/B-5a and
300 Series Technical Report. Alexandria, VA: Author.
Jackson AS. (1994). Preemployment physical evaluation. Exercise and sport sciences re-
views, 22, 53–90.
Jöreskog KG, Sörbom D. (1996). Structural equation modeling. Workshop presented for
the NORC Social Science Research Professional Development Training Sessions.
Chicago, IL.
Klein GA. (1996). The effect of acute stressors in decision making. In Driskell JE, Salas E.
(Eds.), Stress and human performance (pp. 49–88). Mahwah, NJ: LEA
Landy, Jacobs, Associates. (1987). Report on the criterion-related validity of the physical
capabilities test (PCT) for entry level firefighter positions in Columbus, Ohio. State
College, PA: Author.
Lanning v. Southeastern Pa. Transp. Auth. (Lanning I), 181 F .3d 478 (3d Cir. 1999).
Lanning v. Southeastern Pa. Transp. Auth. (Lanning II), 308 F .3d 286, 290 (3d Cir. 2002).
NORMAN D. HENDERSON 1037
Larsen GE, George JD, Alexander JL, Fellingham GW, Aldana SG, Parcell AC. (2002).
Prediction of maximum oxygen consumption from walking, jogging, or running.
Research Quarterly for Exercise and Sport, 73, 66–72.
Le HA. (2003). Correcting for indirect range restriction in meta-analysis: Testing a new
meta-analytic method. Unpublished doctoral dissertation, University of Iowa.
Lubinski D. (2004). Introduction to the special section on cognitive abilities: 100 years after
Spearman’s (1904) “general intelligence,” objectively determined and measured.
Journal of Personality and Social Psychology, 86, 96–111.
Luke v. City of Cleveland, 2006 WL 43759 (N.D. Ohio).
MacCallum RC, Austin JT. (2000). Applications of structural equation modeling in psycho-
logical research. In Fiske ST, Schacter DL, Zahn-Waxler C (Eds.), Annual review
of psychology (Vol. 51, pp. 201–226). Palo Alto, CA: Annual Reviews, Inc.
MacCallum RC, Widaman KF, Zhang S, Hong S. (1999). Sample size in factor analysis.
Psychological Methods, 4, 84–99.
Marsh HW. (1993). The multidimensional structure of physical fitness invariance
over gender and age. Research Quarterly for Exercise and Sport, 64, 256–
273.
Mead TP, Legg DL. (1994). Exploratory versus confirmatory factor analysis of collegiate
physical fitness. Education Resources Information Center, (ED 379336), 1–10.
Medbo JI, Tabata I. (1989). Relative importance of aerobic and anaerobic energy release
during short-lasting exhausting bicycle exercise. Journal of Applied Physiology, 67,
1881–1886.
Mendoza JL, Bard DE, Mumford MD, Ang SC. (2004). Criterion-related validity in multiple
hurdle designs: Estimation and bias. Organizational Research Methods, 7, 418–
441.
Mendoza JL, Mumford M. (1987). Correction for attenuation and range restriction on the
predictor. Journal of Educational Statistics, 12, 282–293.
Meyers DC, Gebhardt DL, Crump CE, Fleishman EA. (1993). The dimensions of human
physical performance: Factor analyses of strength, stamina, flexibility and body
composition measures. Human Performance, 6, 309–344.
Morgeson FP, Campion MA, Dipboye RL, Hollenbeck JR, Murphy K, Schmitt N. (2007a).
Reconsidering the use of personality tests in personnel selection contexts. P ERSON -
NEL P SYCHOLOGY , 60, 683–729.
Morgeson FP, Campion MA, Dipboye RL, Hollenbeck JR, Murphy K, Schmitt N. (2007b).
Are we getting fooled again? Coming to terms with limitations in the use of person-
ality tests for personnel selection. P ERSONNEL P SYCHOLOGY, 60, 1029–1049.
Murphy KR, Dzieweczynski JL. (2005). Why don’t measures of broad dimensions of
personality perform better as predictors of job performance? Human Performance,
18, 343–357.
Orasanu J. (1997). Stress and naturalistic decision making: Strengthening the weak links. In
Flin R, Salas E, Strub M, Martin L (Eds.), Decision making under stress: Emerging
themes and applications (pp. 43–66). Ashgate: Aldershot.
Potosky D, Bobko P, Roth PL. (2005). Forming composites of cognitive ability and alter-
native measures to predict job performance and reduce adverse impact: Corrected
estimates and realistic expectations. International Journal of Selection and Assess-
ment, 13, 304–315.
Rhea MR, Alvar BA, Gray R. (2004). Physical fitness and job performance of firefighters.
Journal of Strength and Conditioning Research, 18, 348–352.
Rosenfeld M, Thornton RF. (1976). The development and validation of a firefighter phys-
ical selection test for the City of Philadelphia. Princeton, NJ: Educational Testing
Service, Center for Occupational and Professional Assessment.
1038 PERSONNEL PSYCHOLOGY
Rosse JG, Stecher MD, Miller JL, Levin RA. (1998). The impact of response distortion
on preemployment personality testing and hiring decisions. Journal of Applied
Psychology, 83, 634–644.
Ryan AM, Greguras GJ, Ployhart RE. (1996). Perceived job relatedness of physical ability
testing for firefighters: Exploring variations in reactions. Human Performance, 9,
219–240.
Ryan AM, Ployhart RE, Friedel LA. (1998). Using personality testing to reduce adverse
impact: A cautionary note. Journal of Applied Psychology, 83, 298–307.
Sackett PR, Lievens F, Berry CM, Landers RN. (2007). A cautionary note on the effects of
range restriction on predictor intercorrelations. Journal of Applied Psychology, 92,
538–544.
Sackett PR, Yang H. (2000). Correction for range restriction: An expanded typology.
Journal of Applied Psychology, 85, 112–118.
Sarno MR. (2003). Issues in the third circuit: Employers who implement pre-employment
tests to screen their applicants, beware (or not?): An analysis of Lanning v. South-
eastern Pennsylvania Transportation Authority and the business necessity defense
as applied in Third Circuit employment discrimination cases. Villanova Law Review,
48, 1403–1428.
Schmidt FL, Hunter J. (2004). General mental ability in the world of work: Occupational
attainment and job performance. Journal of Personality and Social Psychology, 86,
162–173.
Schmidt FL, Hunter J, Outerbridge AN, Goff S. (1988). Joint relation of experience and
ability with job performance: Test of three hypotheses. Journal of Applied Psychol-
ogy, 73, 46–57.
Schmidt FL, Shaffer JA, Oh IS. (2008). Increased accuracy for range restriction corrections:
Implications for the role of personality and general mental ability in job and training
performance. P ERSONNEL P SYCHOLOGY, 61, 827–868.
Schmitt N, Rogers W, Chan D, Sheppard L, Jennings D. (1997). Adverse impact and pre-
dictive efficiency of various predictor combinations. Journal of Applied Psychology,
82, 719–730.
Sharkey BJ, Davis PO. (2008). Hard work: Defining physical work performance require-
ments. Champaign, IL: Human Kinetics.
Shaver LG. (1971). Maximum dynamic strength, relative dynamic endurance, and their
relationships. Research Quarterly, 42, 460–465.
Sothmann MS, Gebhardt DL, Baker TA, Kastello GM, Sheppard VA. (2004). Performance
requirements of physically strenuous occupations: Validating minimum standards
for muscular strength and endurance. Ergonomics, 47, 864–875.
Stauffer JM, Mendoza JL. (2001). The proper sequence for correcting correlation coeffi-
cients for range restriction and unreliability. Psychometrika, 66, 63–68.
Steiger JH, Lind JM. (1980). Statistically based tests for the number of common factors.
Paper presented at the annual meeting of the Psychometric Society, Iowa City, Iowa.
Stevenson JM, Weber CL, Smith JT, Dumas GA, Albert WJ. (2001). A longitudinal study
of the development of low back pain in an industrial population. Spine, 26, 1370–
1377.
Terman LM. (1917). A trial of mental and pedagogical tests in a civil service examination
for policemen and fireman. Journal of Applied Psychology, 1, 17–29.
United States v. City of New York, 637 F. Supp. 2d 77 (E.D.N.Y. 2009).
Wilcox RR. (2001). Fundamentals of modern statistical methods: Substantially improving
power and accuracy. New York: Springer-Verlag.
Winer BJ, Brown DR, Michels KM. (1991). Statistical principles in experimental design.
New York: McGraw-Hill.
NORMAN D. HENDERSON 1039