Sei sulla pagina 1di 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/266380813

Wechsler Adult Intelligence Scale-IV Dyads for Estimating Global Intelligence

Article  in  Assessment · September 2014


DOI: 10.1177/1073191114551551 · Source: PubMed

CITATIONS READS

17 1,257

4 authors, including:

Todd A Girard Bradley Axelrod


Ryerson University U.S. Department of Veterans Affairs
61 PUBLICATIONS   1,191 CITATIONS    246 PUBLICATIONS   3,942 CITATIONS   

SEE PROFILE SEE PROFILE

Ronak Patel
Ryerson University
5 PUBLICATIONS   337 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Visuospatial Strategies and Socioemotional Processing View project

All content following this page was uploaded by Bradley Axelrod on 10 February 2016.

The user has requested enhancement of the downloaded file.


551551
research-article2014
ASMXXX10.1177/1073191114551551AssessmentGirard et al.

Article
Assessment

Wechsler Adult Intelligence Scale–IV


2015, Vol. 22(4) 441­–448
© The Author(s) 2014
Reprints and permissions:
Dyads for Estimating Global Intelligence sagepub.com/journalsPermissions.nav
DOI: 10.1177/1073191114551551
asm.sagepub.com

Todd A. Girard1, Bradley N. Axelrod2, Ronak Patel1, and John R. Crawford3

Abstract
All possible two-subtest combinations of the core Wechsler Adult Intelligence Scale–IV (WAIS-IV) subtests were evaluated
as possible viable short forms for estimating full-scale IQ (FSIQ). Validity of the dyads was evaluated relative to FSIQ in a
large clinical sample (N = 482) referred for neuropsychological assessment. Sample validity measures included correlations,
mean discrepancies, and levels of agreement between dyad estimates and FSIQ scores. In addition, reliability and
validity coefficients were derived from WAIS-IV standardization data. The Coding + Information dyad had the strongest
combination of reliability and validity data. However, several other dyads yielded comparable psychometric performance,
albeit with some variability in their particular strengths. We also observed heterogeneity between validity coefficients
from the clinical and standardization-based estimates for several dyads. Thus, readers are encouraged to also consider
the individual psychometric attributes, their clinical or research goals, and client or sample characteristics when selecting
among the dyadic short forms.

Keywords
WAIS-IV, two-subtest short forms, intelligence assessment

Wechsler Intelligence Scales (WIS) have long been a core use of short-form and brief intelligence tests in an interna-
component of cognitive assessment across various settings tional survey by Thompson et al. (2004). They also reported
and populations (Kaufman, 1990; Piotrowski, 1999; L. A. that the most commonly used short forms were those derived
Rabin, Barr, & Burton, 2005; Thompson, LoBello, Atkinson, from selected subtests of WIS.
Chisholm, & Ryan, 2004; Watkins, Campbell, Nieberding, In addressing the need for shortened tests, the Wechsler
& Hallmark, 1995; Wolber, Reynolds, Ehrmantraut, & Abbreviated Scale of Intelligence (WASI; Wechsler, 1999)
Nelson, 1997). Short forms and brief cognitive tools date was designed to provide brief (two or four subtests) alterna-
back about a century, but their continued use and develop- tives to the lengthy WAIS-III (Wechsler Adult Intelligence
ment appears particularly relevant in the recent climate of Scale–III; Wechsler, 1997) for assessing global intelligence,
mental health care delivery that demands efficiency in psy- as measured by the full-scale intelligence quotient (FSIQ).
chological assessment (Eisman et al., 2000; Piotrowski, While the WASI is an independent brief instrument, several
1999). Moreover, long administration times for comprehen- short forms are readily derived by using selected subtests
sive assessment can place strains on some individuals’ levels directly from the WAIS. Past research indicated mixed sup-
of tolerance and motivation, and unduly challenge those port regarding WASI estimates of WAIS-III scores, and
with certain psychological, sensory, motor, and/or other between WASI and WAIS-III SF scores. For example,
physical constraints, which may be unnecessary in situations although comparable, Axelrod (2002) reported a higher
requiring only a rough gauge or screening of global intelli- validity coefficient and prediction accuracy for the WAIS-
gence (Crawford, Allum, & Kinion, 2008). For example, a III dyad of Vocabulary + Matrix Reasoning than the
quick gauge of global intellectual functioning may be sought
under conditions of time constraints, accommodating idio- 1
Ryerson University, Toronto, Ontario, Canada
syncratic testing factors (e.g., poor stamina), or for reevalu- 2
John D. Dingell Department of Veterans Affairs Medical Center,
ations/monitoring of clients at follow-ups to initial more Detroit, MI, USA
comprehensive testing (Christensen, Girard, & Bagby, 2007; 3
University of Aberdeen, Aberdeen, UK
Sattler & Ryan, 2009). Similarly, short forms are used to
Corresponding Author:
quickly characterize global intelligence of research samples Todd A. Girard, Department of Psychology, Ryerson University, 350
(e.g., Girard, Christensen, & Rizvi, 2010). Both time-savings Victoria Street, Toronto, Ontario, M5B 2K3, Canada.
and such client characteristics were key reasons reported for Email: tgirard@psych.ryerson.ca

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


442 Assessment 22(4)

two-subtest WASI composed of Vocabulary + Matrix WAIS-III and WAIS-IV. Merchan-Naranjo et al. (2012)
Reasoning. However, the WAIS-III dyad tended to overesti- recently reported that out of five dyads assessed, only
mate FSIQ to a greater degree than the WASI dyad. Since Information + Block Design performed adequately for use
the study by Axelrod (2002), Pearson Assessments devel- with Asperger syndrome. This finding underscores the need to
oped newer editions of both tests: the WAIS-IV (Wechsler, also assess the utility of short forms in clinical populations.
2008) and WASI-II (Wechsler, 2011). However, compara- Sattler and Ryan (2009) report on the validity and reli-
bility of WASI-II and the WAIS-IV has not yet been demon- ability of a set of top 10 dyads plus some deemed particu-
strated independently from the publisher and assessment of larly suitable for specific clinical issues (e.g., hearing
WAIS-IV short-forms is nascent. Use of WAIS-IV short impairment). The coefficients were derived from the
forms offers a practical advantage in that one only requires WAIS-IV standardization data and dyadic short-form esti-
the WAIS-IV whether interested in a full assessment or a mates of FSIQ have not yet been empirically evaluated for
quicker estimate of IQ (depending on client or research use with clinical samples. Of note, some dyads yielded
sample), without having to also have the WASI-II pur- comparable validity and/or reliability measures as longer
chased. If further testing is subsequently desired, remaining short forms (triads-pentads). The dyads with the highest
subtests can be administered. Although the WASI-II can validity coefficients included Vocabulary paired with any of
similarly be substituted for the corresponding subtests in the Block Design, Visual Puzzles, or Figure Weights sub-
such a follow-up full assessment (i.e., administering the tests. All three dyads yielded correlations with FSIQ of r′ =
remaining subtests), WAIS-IV short forms provide greater .87 to .88 (Sattler & Ryan, 2009); r′ reflects r values after
flexibility to select different subtest combinations depend- correcting for redundant error variance (Girard &
ing on client or research specific goals. Therefore, the cur- Christensen, 2008). Umfleet, Ryan, Gontkovsky, and
rent report focuses on evaluation of two-subtest short forms Morris (2012) subsequently evaluated short forms for the
of WAIS-IV FSIQ. Verbal Comprehension and Perceptual Reasoning indices.
The WAIS-IV is the most recent full-scale WIS Despite focus in the short-form literature on the correla-
(Wechsler, 2008). Updates to the WAIS-IV include changes tion coefficient as one useful index of validity, it is impor-
at the item, subtest, scoring, and conceptual levels. For tant to note that a high correlation does not mean high
instance, scales were modified to increase the developmen- agreement and the strength of correlation depends on the
tal appropriateness, enhance psychometric attributes, and range of data (Bland & Altman, 1986; Spinks et al., 2009).
reduce biased measures (e.g., increased item range, For example, a short form with a perfect correlation may
decreased emphasis on demands for motor dexterity and consistently overestimate the full-scale score. Moreover,
speed on visual–spatial tasks, decreased emphasis on audi- short-form validity should be assessed as a multifaceted
tory processing for working memory, new items, modified construct (Boone, 1990; Silverstein, 1990; Spinks et al.,
instructions, discontinue rules, and scoring rules). At the 2009; Thompson, Howard, & Anderson, 1986).
conceptual level, the WAIS-IV further integrates the The purpose of the present study was to evaluate all pos-
Cattell–Horn–Carroll theoretical framework, with increased sible dyad combinations using the 10 core subtests from the
measurement of fluid intelligence. At the subtest level, WAIS-IV. Thus, we provide an assessment of the reliability
these changes are instantiated with the removal of two sub- and several measures of validity of WAIS-IV dyads based
tests (Object Assembly and Picture Arrangement) and addi- on data from the standardization sample and a large clinical
tion of three new tests (Visual Puzzles, Figure Weights, and sample. While we encourage readers to weigh each of these
Cancellation). Of the latter, Visual Puzzles is now one of 10 validity measures to their own accord, we also derive an
core subtests contributing added measurement of fluid rea- unbiased composite measure of psychometric performance
soning in the calculation of FSIQ. to aid evaluation of the dyads. That is, as Cyr and Brooker
The WAIS-IV expands on its predecessors in several (1984) note, it may be difficult for many users to mentally
important ways, also claiming to reduce administration derive a composite summary interpretation of short-form
time by 15%. Nonetheless, the WAIS-IV is still estimated to performance across measures, even when just comparing
take 60 to 90 minutes to administer the 10 core subtests across two indices. They ranked short forms using a metric
required for FSIQ for normative samples, and often longer of “psychometric effectiveness” calculated as the average
for some clinical samples. These modifications warrant of the reliability and validity coefficients. Here we extend
psychometric evaluation of short-form estimates of intelli- this approach in providing Rc as a composite measure of
gence based on the new WAIS-IV. psychometric performance incorporating the multiple indi-
Prior studies have examined WIS dyads using different ces assessed. Silverstein (1990) not only acknowledged the
versions of the WIS. Silverstein’s (1982) dyad of Vocabulary advantage of this integrative approach but also noted that
+ Block Design is an example of a popular short form origi- separate reliability and validity indices may be of interest in
nally derived from the WAIS-R. This dyad was also recom- their own right under different situations. Thus, we provide
mended by Sattler and Ryan (1998, 2009) for both the an overall summary score (Rc) as well as the data for each

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


Girard et al. 443

psychometric index separately to maximize information for In addition, the magnitude of differences between each of
and comprehension by readers. the dyad DQs and FSIQ were assessed using paired t tests.
Taking into account both the consistency of scores and sys-
tematic differences between dyad and full-scale scores, we
Method
also used the intraclass correlation coefficient (ICC) using a
Clinical Sample two-way model assessing absolute agreement (model A.1 in
McGraw & Wong, 1996).
Our sample was composed of 482 persons administered at As an additional measure of agreement, we calculated
least the 10 core subtests from the WAIS-IV through a med- the proportion of DQs for each dyad falling within ± 10 IQ
ical neuropsychology consultation clinic within an urban points of participants’ respective FSIQ scores. This level of
veteran’s medical center with referrals from neurology, psy- agreement was selected as a meaningful range correspond-
chiatry, and primary care. The sample was mostly male ing to that of the qualitative categories used in WIS test
(93.6%), and primarily of Caucasian (63.5%) and African interpretation (e.g., 80-90 = below average, 110-120 =
American (32.8%) self-identified ethnicity. The participants above average). Nonetheless, it is possible that different
averaged 51.1 years of age (SD = 17.9), had 12.8 (SD = 2.0) interval widths may affect measurement sensitivity and/or
years of education, and generated a mean FSIQ in the high conclusions. To address this concern, we ran parallel analy-
end of the low average range (M = 88.9; SD = 14.2). The ses using intervals of ±2 and 5 IQ points, roughly corre-
FSIQ data were normally distributed (Mdn = 89.0; skew = sponding to ±1 and 2 SEM, respectively (SEM is 2.16 for
0.14, SE = 0.11; kurtosis = 0.02, SE = 0.22) with scores WAIS-IV FSIQ; Wechsler, 2008).
ranging from 51 to 144. Last, to facilitate interpretation across the multiple psy-
chometric indices, a composite measure of the above psy-
chometric measures (Rc) was computed adapting the
Procedures approach of Cyr and Brooker (1984). More specifically,
WAIS-IV subtests had been administered and scored whereas Cyr and Brooker averaged the reliability and valid-
according to standardized procedures (Wechsler, 2008) as ity (Pearson’s r) coefficients to provide an index of “psy-
part of clinical evaluations. The archival data were de- chometric effectiveness,” we average across all the indices
identified and approved for research use in accordance with assessed here. In keeping with the original metric, the
policy at the host medical facility. unweighted approach to the composite was selected because
there are no sufficiently strong reasons to differentially
weight the indices and the parsimony of a simple average
Analyses facilitates interpretation. The reliability and corrected valid-
In total, we calculated deviation quotients (DQs) for all 45 ity coefficients derived from standardization data, the cor-
dyad combinations of the 10 core WAIS-IV subtests. DQs rected validity coefficient based on our sample, the ICC,
are derived via linear scaling of composite subtest scaled and the proportion of DQs within ± 10 IQ points all fall on
scores to obtain scores sharing the WIS full-scale mean of scales from 0 to 1 in magnitude. To integrate the difference
100 and SD of 15 (FSIQ itself being a DQ; Tellegen & scores with these measures, we first converted the corre-
Briggs, 1967). Composite reliability (rxx) and correlation sponding mean differences to positive r effect-sizes (incor-
coefficients (rstd) between the short-form dyad DQs and porating the normative SD of 15 for WAIS-IV IQ scores) to
FSIQ were calculated using the WAIS-IV standardization represent the absolute degrees of discrepancy on a 0 to 1
data as per methods outlined by Crawford and colleagues scale. Because high values on the former measures, but low
(Crawford, Anderson, Rankin, & MacDonald, 2010; discrepancy scores represent good dyad performance, we
Crawford et al., 2008), which apply long-standing methods subsequently subtracted the latter r values from unity to
in the short-form literature (Levy, 1967; Moiser, 1943; yield a comparable metric of agreement (1 − r). Then, we
Tellegen & Briggs, 1967). Consistent with prior reports calculated the mean of these six values as an overall psy-
(e.g., Girard, Axelrod, & Wilkins, 2010; Sattler & Ryan, chometric composite score Rc for each dyad. Thus, the Rc
2009), internal consistency reliabilities from the standard- index is a multifaceted composite measure of psychometric
ization sample were used for this purpose, with the excep- performance that can be interpreted on a scale from 0
tion of Symbol Search and Coding for which only test–retest (absent) to 1 (perfect) correspondence between the short
reliabilities are available (Wechsler, 2008). Correlations (r) form and full scale.
were also computed for each dyad based on the neuropsy-
chological sample data. All correlations were corrected
Results
(r′std; r′) for redundant error variance using Levy’s (1967)
formula because of dyad subtests being embedded within Reliability and validity data for the 45 dyadic short forms are
the full-scale administration (Girard & Christensen, 2008). provided in Table 1. The dyads are rank ordered with respect

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


444 Assessment 22(4)

Table 1.  Reliability and Validity of Dyadic WAIS-IV Short Forms for FSIQ.

Dyad rxx r′std*** r′*** p(±10) Mdiff ICC Rc


Coding + Information .922 .850 .860 .832 –0.494 .871 .887
Coding + Matrix Reasoning .917 .815 .856 .842 –0.320 .866 .881
Block Design + Digit Span .931 .833 .856 .834 1.288*** .859 .878
Symbol Search + Vocabulary .907 .845 .848 .830 –0.830* .867 .878
Coding + Visual Puzzles .909 .810 .857 .832 –0.366 .872 .878
Arithmetic + Similarities .919 .857 .859 .838 2.128*** .859 .877
Matrix Reasoning + Vocabulary .947 .864 .867 .830 3.527*** .851 .874
Coding + Vocabulary .929 .851 .851 .807 –1.666*** .860 .874
Coding + Similarities .904 .839 .853 .836 –1.679*** .863 .873
Digit Span + Visual Puzzles .936 .830 .858 .811 1.890*** .856 .871
Similarities + Symbol Search .881 .830 .840 .828 –0.798* .862 .869
Arithmetic + Coding .909 .833 .857 .813 –1.915*** .865 .869
Arithmetic + Symbol Search .887 .823 .850 .805 –1.043** .867 .866
Block Design + Vocabulary .934 .873 .846 .790 2.884*** .842 .865
Matrix Reasoning + Symbol Search .896 .805 .846 .801 0.581 .855 .864
Digit Span + Vocabulary .957 .852 .815 .768 0.783* .816 .864
Arithmetic + Vocabulary .943 .860 .834 .780 2.227*** .835 .863
Arithmetic + Block Design .917 .846 .849 .797 2.693*** .851 .862
Visual Puzzles + Vocabulary .940 .865 .852 .784 3.567*** .838 .860
Digit Span + Similarities .932 .846 .816 .772 0.701 .818 .860
Information + Symbol Search .903 .822 .844 .784 1.559*** .859 .860
Block Design + Coding .904 .819 .823 .797 –1.018** .839 .858
Digit Span + Information .951 .851 .842 .757 2.845*** .832 .856
Symbol Search + Visual Puzzles .891 .780 .846 .780 0.823* .863 .855
Arithmetic + Visual Puzzles .922 .835 .860 .774 3.390*** .849 .854
Digit Span + Matrix Reasoning .942 .836 .826 .763 1.839*** .820 .854
Block Design + Symbol Search .887 .789 .821 .778 0.162 .843 .852
Similarities + Visual Puzzles .917 .848 .849 .782 3.607*** .835 .852
Arithmetic + Matrix Reasoning .928 .850 .844 .757 3.224*** .833 .851
Matrix Reasoning + Similarities .924 .854 .841 .768 3.515*** .827 .850
Block Design + Similarities .913 .850 .829 .776 2.983*** .825 .849
Arithmetic + Information .939 .839 .846 .770 4.434*** .821 .845
Digit Span + Symbol Search .907 .786 .824 .761 –2.196*** .825 .838
Coding + Digit Span .928 .798 .829 .749 –3.085*** .817 .836
Arithmetic + Digit Span .887 .821 .788 .739 0.790 .805 .836
Block Design + Information .931 .854 .841 .734 5.160*** .807 .833
Information + Matrix Reasoning .943 .849 .853 .691 5.741*** .803 .825
Block Design + Matrix Reasoning .925 .823 .819 .712 4.262*** .797 .823
Similarities + Vocabulary .945 .817 .777 .730 2.966*** .764 .822
Matrix Reasoning + Visual Puzzles .931 .809 .843 .703 4.981*** .806 .821
Information + Visual Puzzles .937 .839 .834 .691 5.906*** .785 .816
Information + Similarities .939 .822 .799 .720 4.881*** .765 .814
Block Design + Visual Puzzles .927 .773 .817 .697 4.732*** .789 .808
Information + Vocabulary .962 .810 .763 .658 5.056*** .730 .793
Coding + Symbol Search .900 .698 .774 .703 –3.112*** .775 .791

Note. FSIQ = full-scale intelligence quotient; rxx = composite reliability coefficients of dyadic short forms (Moiser, 1943; based on WAIS-IV
standardization data); r′std = Pearson product–moment correlations between dyad deviation quotients (DQs) and FSIQs corrected for redundant error
variance (Girard & Christensen, 2008); r′ = corrected validity coefficients based on neuropsychological sample data; p(±10) = proportion of sample
for whom dyad DQs fall within ±10 points of their respective FSIQs; Mdiff = mean difference scores between the dyad DQs and FSIQ scores; ICC =
two-way intraclass correlation coefficient modeling absolute agreement; Rc = composite measure of psychometric performance averaged across the six
preceding measures (see text for details). Although consideration of individual indices is recommended, dyads are displayed in descending order by Rc
with subtests in each pair labeled alphabetically. Boldfaced values highlight the top five dyads according to each psychometric measure.
*p < .05. **p < .01. ***p < .001.

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


Girard et al. 445

to composite Rc values. The metrics for each reliability and terms of its mean discrepancy score, underestimating FSIQ
validity measure are also provided for each dyad. In the by less than one third of an IQ point. It ranked 5th for ICC
interest of parsimony, we will highlight the summary statis- (.87) and 9th in terms of its sample validity coefficient (r′ =
tics for the top few dyads according to the Rc score followed .86), but substantially lower in terms of the coefficients
by those for reliability and each validity measure. based on standardization data (35th, r′std = .82; 28th, rxx =
Psychometric composite values ranged from Rc = .79 for .92). As expected, rates of inclusion were lower for the nar-
the combination of Coding + Symbol Search to Rc = .89 for rower bandwidths of ±2 (.15-.24) and ±5 (.36-.53). However,
Coding + Information (M = 0.85). Consistent with the for- the correlation between Rc scores reported in Table 1 based
mer Processing Speed dyad, all eight dyads composed of on all six psychometric indices with the five after omitting
single-index domains fell in the bottom quarter overall. The the proportion of scores falling within the ±10 point interval
Working Memory dyad of Arithmetic + Digit Span ranked was near perfect, r = .989. Moreover, inspection of changes
highest among this subset, Rc = .84. Although the Coding + in rank order of dyads by Rc scores revealed minimal differ-
Information dyad failed to top the list for any individual ences; the mean shift was −0.02 places.
measure, it ranked highest overall on the Rc score and in the In terms of magnitude, one quarter of the dyads yielded
top five for the sample-derived validity coefficient (r′, sec- discrepancies of less than one IQ point, but a handful devi-
ond), proportion of scores falling with ±10 points of FSIQ ated by 5 or more IQ points. Notably, all the top five dyads
(fifth), mean discrepancy from FSIQ (fourth; being one of on this discrepancy measure included one of the Processing
only seven dyads with a nonsignificant discrepancy), and Speed tasks (Coding or Symbol Search). On average, the
ICC (second). It fared slightly less well in terms of its valid- Block Design + Symbol Search dyad only slightly overesti-
ity coefficient (r′std, 11th) and particularly its reliability mated FSIQ by less than one sixth of an IQ point, ranking it
(25th) derived from the standardization data; nonetheless, it best in terms of mean discrepancy. However, it ranked only
exceeded criteria used previously for acceptable reliability 27th overall (Rc = .85), 23rd in terms of the proportion of
(>.90) and validity (>.82; e.g., Christensen et al., 2007; scores within 10 points of FSIQ, and among the bottom 10
Girard, Axelrod, et al., 2010). among the validity and reliability coefficients. In general,
With respect to the coefficients derived from the standard- dyads tended to overestimate FSIQ (mean of mean differ-
ization data, three dyads exceeded reliability of rxx = .95: ences = +1.70) and ranged from −3.11 (underestimating) to
Information + Vocabulary and each of these two paired with +5.91 (overestimating) IQ points. Even with the large sam-
Digit Span. Only six dyads fell below rxx = .90. As noted ple size, seven dyads, including the Block Design + Symbol
above, although the top Rc dyad Coding + Information fared Search, Coding + Information, and Coding + Matrix
lower in terms of reliability than on other measures, it still Reasoning dyads discussed above, revealed nonsignificant
met the latter acceptable level (rxx = .92). Because of its lower discrepancies at alpha level of .05, four more at .01, and two
subtest reliability, all nine dyads including Symbol Search more at .001. The rest were highly significant at p < .001
fell among the 12 least reliable dyads, albeit the minimum (ranging down to p = 5 × 10−49).
rxx = .88. Although Information + Vocabulary was the most Coding + Visual Puzzles yielded the highest ICC (.87).
reliable combination (rxx = .96), this single-domain Verbal- This dyad also ranked in the top five overall (Rc = .88), third
Comprehension dyad ranked in the bottom 10 across the five in terms of mean discrepancy, and sixth with respect to both
validity measures and next to last overall, Rc = .79. As in the sample validity coefficient and proportion of scores within
Sattler and Ryan (2009), the dyads of Vocabulary with Block 10 points of FSIQ. However, it ranked in the bottom third for
Design and with Visual Puzzles yielded the highest r′std (both validity and reliability based on standardization data.
.87). However, these dyads ranked 14th and 19th on our com-
posite Rc measure, respectively. They scored lowest in terms
Discussion
of their mean discrepancy scores, tending to overestimate
FSIQ on average by 2.88 and 3.57 IQ points, respectively. We report the validity and reliability coefficients of 45
The highest validity coefficient based on the sample data dyadic short forms of the WAIS-IV based on standardiza-
was for Matrix Reasoning + Vocabulary (r′ = .87). This tion data as well as their multifaceted validity in a large
dyad also ranked 7th overall (Rc = .87), as reflected in terms clinical sample. More specifically, we assessed correlations
of its high reliability (4th; r′xx = .95) and validity based on and magnitudes of discrepancies between dyads and FSIQ,
the standardization data (3rd; r′std = .86), the proportion of as well as the proportion of dyad DQs falling within ±10
scores within 10 points of FSIQ (8th), and ICC (17th). points of participants’ respective FSIQ scores. Although
However, it fell 12th from last in terms of its discrepancy assessment of individual measures is encouraged in accor-
score, overestimating FSIQ by 3.53 points on average. dance with one’s goals, we further present a composite
Coding + Matrix Reasoning was the dyad with the high- measure of psychometric performance (Rc) based on the
est proportion of scores falling within 10 points of FSIQ approach of Cyr and Brooker (1984) to aid interpretation
(.84). This dyad ranked second overall (Rc = .88) and in across the multiple indices.

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


446 Assessment 22(4)

Readers are encouraged to consider the individual mea- sample by more than 3.5 points, on average. Very similar
sures relative to their goals for short-form use when evalu- results were obtained by Axelrod (2002) with this combina-
ating or selecting a dyad. In this vein, we note the tion for estimating FSIQ on the WAIS-III. Use of these
heterogeneity in psychometric performance across mea- tasks from the two-subtest WASI yielded not only less dis-
sures. There is no dyad that consistently ranked among the crepancy but also a poorer validity coefficient (Axelrod,
five highest values for each measure (see boldfaced values 2002). Similar empirical comparisons between WAIS-IV
in Table 1). One source of variance is the discrepancy short forms and the WASI-II will be important. This dyad is
between validity coefficients in our sample and those also integrated in the two-subtest Oklahoma Premorbid
derived from the standardization data, highlighting the Intelligence Estimate–3 (OPIE-3) equation along with
importance of evaluating short-form validity across rele- demographic variables (Schoenberg, Duff, Scott, & Adams,
vant samples. In addition, the variability in performance 2002), which was also found to overestimate FSIQ on aver-
across sample validity measures highlights the value of age (Spinks et al., 2009). Notably, we found that five dyads
considering multiple forms of validity when assessing including Processing Speed measures yielded the smallest
short forms. Future research may yield further insight mean discrepancies in our clinical sample. These findings
regarding the reasons for these discrepancies. Despite vari- should also be taken into consideration in the development
ability across measures, the results do also yield relative of future versions of the WASI and OPIE tests.
consistencies in performance supporting superior perfor- In addition to psychometric attributes, it is important to
mance of some dyads. consider clinical factors and suitable applications. Dyad short
Overall, the dyad comprising Coding + Information forms are most appropriate when the research or referral
ranked highest on the composite Rc measure and was among question is aimed at estimating global intelligence (FSIQ).
the top five dyads across all four sample validity measures. That is, these short-form estimates are insufficient grounds
Consistent with Sattler and Ryan (2009), this dyad failed to for key verdicts regarding individual clients, such as place-
rank in the top 10 according to the scores derived from the ment decisions (Sattler & Ryan, 2009; Silverstein, 1990;
standardization data. Nonetheless, the latter authors did rec- Thompson et al., 1986). For instance, the top dyads yielded
ommend it as a good dyad for rapid screening. Moreover, inclusion rates of 84% of the short-form estimates falling
the standardization-based reliability and validity coeffi- with ±10 points of FSIQ scores in our clinical sample.
cients exceed recommended thresholds for acceptable reli- Slightly better, Sattler and Ryan (2009) reported 95% inclu-
ability (rxx > .90) and validity (r′std > .82) in the WIS sion rates with bandwidths of 7 to 12 IQ points, based on
literature (e.g., Christensen et al., 2007; Girard, Axelrod, et WAIS-IV standardization data, for their recommended dyads.
al., 2010). Thus, the current results further support this dyad Further assessment of bandwidths of ±2 and 5 reflected max-
for obtaining a quick gauge of intellectual functioning. ima of only a quarter and a half of cases falling within roughly
The dyads of Block Design + Vocabulary and Visual 1 and 2 SEM, respectively. Overall, these rates and ranges
Puzzles + Vocabulary have the highest r′std = .87, consistent warrant against making important individual-level interpreta-
with their place atop Sattler and Ryan’s (2009) ranking by tions based on dyad FSIQ Equivalents. On the other hand,
validity. However, these dyads ranked lower on our addi- most dyads present adequate reliability, validity coefficients,
tional and sample-based measures, particularly in their and mean discrepancies to support their use for research or
mean overestimation of FSIQ. Likewise, although for clinical screening or monitoring purposes (Sattler &
Information + Vocabulary is the most reliable combination Ryan, 2009). Moreover, the drop in validity for these pur-
based on standardization data, it ranked in the bottom 10 in poses is weighed against the substantial time-savings of
terms of validity and next to last overall (Rc = .79). administering only 2 of the 10 core WAIS-IV subtests.
Coding + Matrix Reasoning ranked second overall with Nonetheless, it will be useful to directly assess administration
strong sample validity, including the highest rate of inclu- times for specific short forms in both clinical and nonclinical
sion for DQs within 10 points of FSIQ, and acceptable lev- samples. Although such goals may be achieved with the use
els for the standardization data. Of note, this dyad may be of independent brief tests of intelligence, the list of dyads
preferred for use with hearing impaired samples (Sattler & assessed here offers users more flexibility in choice and the
Ryan, 2009). Other dyads recommended by Sattler and advantage of being able to simply follow up a quick screen
Ryan (2009) for this purpose ranked far lower overall with full administration of the remaining WAIS-IV if more
(Arithmetic + Matrix Reasoning, 29th; Block Design + comprehensive intelligence assessment is needed.
Matrix Reasoning, 38th). Despite the variation across validity measures, most
The Matrix Reasoning + Vocabulary dyad proved strong dyads rank within a reasonably stable range of performance.
with respect to its reliability and validity coefficients across For instance, the top 10 dyads on the composite measure
samples and ranked seventh on our list overall. However, range from Rc = .87 to .89. Thus, users might take into
further highlighting the importance of assessing multiple account clinical considerations, such as the referral question
indices, it tended to overestimate FSIQ in the clinical or client/sample characteristics when selecting a short form.

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


Girard et al. 447

For example, as noted above, Coding + Matrix Reasoning is clinical samples. In contrast, some dyads were closely
recommended for hearing-impaired samples. Use of Matrix matched; for example, Arithmetic + Similarities and Matrix
Reasoning versus Information along with Coding (i.e., top Reasoning + Vocabulary ranked among the highest five
two dyads in Table 1) may also depend on whether fluid or validity coefficients on both data sets and differed only at the
crystallized intelligence is of more interest, respectively. third decimal place. Nonetheless, cross-validation studies on
Likewise, whereas Information and Vocabulary assess more other representative samples will be useful in future.
concrete knowledge, Similarities is desired for assessment Moreover, it will be important to investigate short-form per-
of abstract verbal reasoning and may prove more sensitive formance at different ability levels, particularly the tails of
to nuances of psychosis (Christensen et al., 2007; Crawford the distribution (Spinks et al., 2009).
et al., 2008; Donders, 1997; Girard, Axelrod, et al., ), Future work should also reassess the dyads’ psychomet-
although this requires empirical assessment. The dyads ric properties from their administration independently of
involving Similarities with Arithmetic and with Coding the full-scale WAIS-IV. The statistical correction for redun-
ranked in the top 10 overall (sixth and ninth; Table 1), but dant error variance (r′) estimates the validity coefficient “as
over- and underestimated FSIQ by about two points in our if” obtained from separate administrations (Girard &
sample, respectively. Nonetheless, although statistically Christensen, 2008). However, systematic sources of vari-
significant deviations, the magnitude of these mean dis- ance (e.g., influence of other subtests) and clinical consider-
crepancies fall within the normative standard error of mea- ations (e.g., motivation/attention) may differentially affect
surement for FSIQ. full-score and short-form performance. For instance, short
It is also important to consider the sample’s characteristics forms may yield higher subtest scores and more reliable
with respect to the generalization of the current results. We performance (Levy, 1968; Wymer, Rayls, & Wagner, 2003),
report here on a large neuropsychological sample for whom but with reduced validity coefficients with FSIQ when iso-
the FSIQ scores were normally distributed with a full range lated than when embedded in the full battery, particularly
of scores. The sample comprised predominately males of for dyads (Thompson et al., 1986). However, Axelrod
Caucasian and African American descent with a mean low (2002) failed to observe any significant order effects regard-
average FSIQ of 89. Some apparent discrepancies across ing the administration of the two- (or four-) subtest WASI
indices may reflect differential sensitivity of some dyads to and the full WAIS-III, suggesting that at least some mea-
the samples or measures used, an observation that should be sures are robust to potential influences of test administra-
taken into account regarding generalizability and the need for tion time. These issues deserve further empirical attention
future research in this regard. For instance, discrepancies with the WAIS-IV.
between r′std and sample r′ values likely relate at least in part
to sample differences between the standardization data set Declaration of Conflicting Interests
and the clinical sample. It is notable in this regard that all The authors declared no potential conflicts of interest with respect
dyads including at least one Processing Speed task (Coding to the research, authorship, and/or publication of this article.
or Symbol Search; see Table 1) yielded numerically higher
validity coefficients for the clinical sample than for the stan- Funding
dardization data (except for Coding + Vocabulary, for which The authors received no financial support for the research, author-
they were equal); performance across sample statistics was ship, and/or publication of this article.
more balanced across other domains. This observation may
relate to the particular sensitivity of Processing Speed mea- References
sures to clinical conditions (e.g., Christensen et al., 2007; Axelrod, B. N. (2002). Validity of the Wechsler Abbreviate Scale
Gorlyn et al., 2006; Hawkins, 1998; Taylor & Heaton, 2001). of Intelligence and other very short forms of estimating intel-
It is notable that the lower reliabilities of dyads including lectual functioning. Assessment, 9, 17-23.
Coding and/or Symbol Search are likely partly because of Bland, J. M., & Altman, D. G. (1986). Statistical methods for
reliance on test–retest correlations for these subtests assessing agreement between two methods of clinical mea-
(Wechsler, 2008). As reviewed by Sattler and Ryan (2009), surement. Lancet, 1, 307-310.
these assessments were based on a smaller subsample and Boone, D. E. (1990). Short forms of the WAIS-R with psychiatric
among other subtests test–retest values are consistently lower inpatients: A comparison of techniques. Journal of Clinical
Psychology, 46, 197-200.
than their internal consistency counterparts. Nonetheless, the
Christensen, B. K., Girard, T. A., & Bagby, R. M. (2007). WAIS-
enhanced sensitivity noted above cannot be explained by this III short form for Index and IQ scores in a psychiatric popula-
method difference, as less reliable measures typically yield tion. Psychological Assessment, 19, 236-240.
less sensitive tests and because the same reliability values Crawford, J. R., Allum, S., & Kinion, J. E. (2008). An index-based
were used for validity estimates in both samples. In future, it short form of the WAIS-III with accompanying analysis of
will be valuable to have larger assessments of reliability on a reliability and abnormality of differences. British Journal of
common metric, as well as to directly estimate reliabilities in Clinical Psychology, 47, 215-237.

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


448 Assessment 22(4)

Crawford, J. R., Anderson, V., Rankin, P., & MacDonald, J. (2010). Sattler, J. M., & Ryan, J. J. (1998). Assessment of children:
An index-based short-form of the WISC-IV with accompany- Revised and updated third edition. WAIS-III supplement. San
ing analysis of the reliability and abnormality of differences. Diego, CA: Sattler.
British Journal of Clinical Psychology, 49, 235-258. Sattler, J. M., & Ryan, J. J. (2009). Assessment with the WAIS-IV.
Cyr, J. J., & Brooker, B. H. (1984). Use of appropriate formulas La Mesa, CA: Sattler.
for selecting WAIS-R short forms. Journal of Consulting and Schoenberg, M. R., Duff, K., Scott, J. G., & Adams, R. L. (2002).
Clinical Psychology, 52, 903-905. An evaluation of the clinical utility of the OPIE-3 as an estimate
Donders, J. (1997). A short form of the WISC-III for clinical use. of premorbid WAIS-III FSIQ. Clinical Neuropsychologist, 17,
Psychological Assessment, 9, 15-20. 308-321.
Eisman, E. J., Dies, R. R., Finn, S. E., Eyde, L. D., Kay, G. G., Silverstein, A. B. (1982). Two- and four-subtest short forms of
Kubiszyn, T., . . .Moreland, K. L. (2000). Problems and limi- the Wechsler Adult Intelligence Scale–Revised. Journal of
tations in the use of psychological assessment in the contem- Consulting and Clinical Psychology, 50, 415-418.
porary health care delivery system. Professional Psychology: Silverstein, A. B. (1990). Short forms of individual intelligence
Research and Practice, 31, 131-140. tests. Psychological Assessment, 2, 3-11.
Girard, T. A., Axelrod, B. N., & Wilkins, L. (2010). Comparison Spinks, R., McKirgan, L. W., Arndt, S., Caspers, K., Yucuis,
of WAIS-III short-forms for measuring index and full-scale R., & Pfalzgraf, C. J. (2009). IQ estimate smackdown:
scores. Assessment, 17, 400-405. Comparing IQ proxy measures to the WAIS-III. Journal of
Girard, T. A., & Christensen, B. K. (2008). Clarifying problems the International Neuropsychological Society, 15, 590-596.
and offering solutions for correlated error when assessing Taylor, M. J., & Heaton, R. K. (2001). Sensitivity and specific-
the validity of selected-subtest short forms. Psychological ity of WAIS-III/WMS-III demographically corrected fac-
Assessment, 20, 76-80. tor scores in neuropsychological assessment. Journal of the
Girard, T. A., Christensen, B. K., & Rizvi, S. (2010). Visual- International Neuropsychological Society, 7, 867-874.
spatial episodic memory in schizophrenia: A multiple systems Tellegen, A., & Briggs, P. F. (1967). Old wine in new skins:
framework. Neuropsychology, 24, 368-378. Grouping Wechsler subtests into new scales. Journal of
Gorlyn, M., Keilp, J. G., Oquendo, M. A., Burke, A. K., Sackeim, Consulting Psychology, 31, 499-506.
H. A., & Mann, J. J. (2006). The WAIS-III and Major Thompson, A. P., Howard, D., & Anderson, J. (1986). Two- and
Depression: Absence of VIQ/PIQ Differences. Journal of four-subtest short forms of the WAIS-R: Validity in a psy-
Clinical and Experimental Neuropsychology, 28, 1145-1157. chiatric sample. Canadian Journal of Behavioral Science, 18,
Hawkins, K. A. (1998). Indicators of brain dysfunction derived 287-293.
from graphic representations of the WAIS-III/WMS-III tech- Thompson, A. P., LoBello, S. G., Atkinson, L., Chisholm, V.,
nical manual clinical samples data: A preliminary approach & Ryan, J. J. (2004). Brief intelligence testing in Australia,
to clinical utility. The Clinical Neuropsychologist, 12, Canada, the United Kingdom, and the United States.
535-551. Professional Psychology: Research and Practice, 35, 286-290.
Kaufman, A. S. (1990). Assessing adolescent and adult intelli- Umfleet, L. G., Ryan, J. J., Gontkovsky, S. T., & Morris, J. (2012).
gence. Needham Heights, MA: Allyn & Bacon. Estimating WAIS-IV Indexes: Proration versus linear scaling in
Levy, P. (1967). The correction for spurious correlation in the a clinical sample. Journal of Clinical Psychology, 68, 390-396.
evaluation of short-form tests. Journal of Clinical Psychology, Watkins, C. E., Jr., Campbell, V. L., Nieberding, R., & Hallmark,
23, 84-86. R. (1995). Contemporary practice of psychological assess-
Levy, P. (1968). Short-form tests: A methodological review. ment by clinical psychologists. Professional Psychology:
Psychological Bulletin, 69, 410-416. Research & Practice, 26, 54-60.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences Wechsler, D. (1997). WAIS-III: Wechsler Adult Intelligence
about some intraclass correlation coefficients. Psychological Scale–third edition administration and scoring manual. San
Methods, 1, 30-46. Antonio, TX: Psychological Corporation.
Merchan-Naranjo, J., Mayoral, M., Rapado-Castro, M., Llorente, Wechsler, D. (1999). Wechsler Abbreviated Scale of Intelligence
C., Boada, L., Arango, C., & Parellada, M. (2012). Estimation (WASI). San Antonio, TX: Psychological Corporation.
of the intelligence quotient using Wechsler Intelligence Scales Wechsler, D. (2008). Wechsler Adult Intelligence Scale–fourth edi-
in children and adolescents with Asperger Syndrome. Journal tion (WAIS-IV). San Antonio, TX: Psychological Corpora-tion.
of Autism and Developmental Disorders, 42, 116-122. Wechsler, D. (2011). Wechsler Abbreviated Scale of Intelligence–
Moiser, C. I. (1943). On the reliability of a weighted composite. second edition (WASI-II). San Antonio, TX: Psychological
Psychometrika, 8, 161-168. Corporation.
Piotrowski, C. (1999). Assessment practices in the era of managed Wolber, G. J., Reynolds, B., Ehrmantraut, J. E., & Nelson, A. J.
care: Current status and future directions. Journal of Clinical (1997). In search of a measure of intellectual functioning for
Psychology, 55, 787-796. an inpatient psychiatric population with low cognitive ability.
Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment Psychiatric Rehabilitation Journal, 21, 59-63.
practices of clinical neuropsychologists in the United States Wymer, J. H., Rayls, K., & Wagner, M. T. (2003). Utility of a
and Canada: A survey of INS, NAN, and APA Division 40 clinically derived abbreviated form of the WAIS-III. Archives
members. Archives of Clinical Neuropsychology, 20, 33-65. of Clinical Neuropsychology, 18, 917-927.

Downloaded from asm.sagepub.com at US DEPT OF VETERAN AFFAIRS on July 20, 2015


View publication stats

Potrebbero piacerti anche