Cross Battery Assessment

Table
of Contents

Essentials of Psychological Assessment Series
Title Page
Copyright Page
Dedication
SERIES PREFACE
One - OVERVIEW

DEFINITION
RATIONALE FOR THE XBA APPROACH
THE THREE PILLARS OF THE XBA APPROACH
APPLICATION OF THE XBA APPROACH
IMPLEMENTATION OF THE XBA APPROACH STEP-BY-STEP
USE OF THE XBA APPROACH WITH CULTURALLY AND LINGUISTICALLY DIVERSE
POPULATIONS
CONCLUSIONS
REFERENCES

Two - HOW TO ORGANIZE A CROSS-BATTERY ASSESSMENT

OVERVIEW
UTILIZATION OF SPECIFIC REFERRAL INFORMATION
INTEGRATING GUIDING PRINCIPLES WITH DECISION MAKING
THE CROSS-BATTERY ASSESSMENT DATA MANAGEMENT AND INTERPRETIVE
ASSISTANT (XBA DMIA)
IMPLEMENTING THE XBA APPROACH STEP BY STEP
SUMMARY
REFERENCES

Three - HOW TO INTERPRET TEST DATA

HYPOTHESIS-DRIVEN ASSESSMENT AND INTERPRETATION
INTEGRATING HYPOTHESIS TESTING AND INTERPRETATION
GUIDELINES FOR TEST INTERPRETATION
SUMMARY
REFERENCES

Four - USE OF THE CROSS-BATTERY APPROACH IN SPECIFIC LEARNING DISABILITY
EVALUATION

INTRODUCTION
THE SEVEN DEADLY SINS IN SLD EVALUATION
A MODERN OPERATIONAL DEFINITION OF LEARNING DISABILITIES
SUMMARY
REFERENCES

Five - USE OF THE CROSS-BATTERY APPROACH IN THE ASSESSMENT OF DIVERSE
INDIVIDUALS

CULTURE, LANGUAGE, AND TESTS OF COGNITIVE ABILITY
THE CULTURE-LANGUAGE TEST CLASSIFICATIONS AND CULTURE-LANGUAGE
INTERPRETIVE MATRIX
SUMMARY
REFERENCES

Six - STRENGTHS AND WEAKNESSES OF THE CROSS-BATTERY APPROACH

Strengths
Weaknesses
MYTHS AND MISCONCEPTIONS ABOUT XBA
REFERENCES

Seven - CROSS-BATTERY ASSESSMENT CASE REPORTS

REASON FOR REFERRAL
BACKGROUND INFORMATION
DEVELOPMENTAL/HEALTH HISTORY
ASSESSMENT/EVALUATION PROCEDURES
BEHAVIORAL OBSERVATIONS
ASSESSMENT FINDINGS
SUMMARY AND DATA INTEGRATION
RECOMMENDATIONS
PSYCHOLOGICAL ASSESSMENT-FOR-INTERVENTION REPORT
REASON AND PURPOSE OF ASSESSMENT
DESCRIPTION OF PROCEDURES
STATEMENT OF VALIDITY OF ASSESSMENT RESULTS
EVALUATION OF INFLUENCES ON LEARNING
EVALUATION OF HEALTH AND DEVELOPMENTAL FACTORS
OBSERVATION OF CURRENT BEHAVIOR AND PERFORMANCE
EVALUATION OF ACADEMIC ACHIEVEMENT
EVALUATION OF COGNITIVE PROCESSES AND INTELLECTUAL FUNCTIONING
EVALUATION OF BEHAVIOR, SOCIAL, AND EMOTIONAL FUNCTIONING
OPINIONS AND IMPRESSIONS
RECOMMENDATIONS FOR INTERVENTION AND REMEDIATION
APPENDIX A - The Cattell-Horn-Carroll (CHC) Theory of Cognitive Abilities
Appendix B - CHC Broad and Narrow Ability Classification Tables For Tests ...
APPENDIX C - Descriptions of Cognitive Ability/Processing and Academic ...
Appendix D - Test-Specific Culture-Language Matrices
Appendix E
Appendix F - Critical Values Required for Statistical Significance among ...
Index
Acknowledgements
About the Authors
About the CD-ROM

Essentials of Psychological Assessment Series

Series Editors, Alan S. Kaufman and Nadeen L. Kaufman

Essentials of WAIS-III Assessment
by Alan S. Kaufman and Elizabeth O. Lichtenberger
Essentials of CAS Assessment
by Jack A. Naglieri
Essentials of Forensic Psychological Assessment
by Marc J. Ackerman
Essentials of Bayley Scales of Infant Development-II
Assessment
by Maureen M. Black and Kathleen Matula
Essentials of Myers-Briggs Type Indicator Assessment
by Naomi Quenk
Essentials of WISC-III and WPPSI-R Assessment
by Alan S. Kaufman and Elizabeth O. Lichtenberger
Essentials of Rorschach Assessment
by Tara Rose, Nancy Kaser-Boyd, and Michael P.
Maloney
Essentials of Career Interest Assessment
by Jeffrey P. Prince and Lisa J. Heiser
Essentials of Cross-Battery Assessment
by Dawn P. Flanagan and Samuel O. Ortiz
Essentials of Cognitive Assessment with KAIT and Other
Kaufman Measures
by Elizabeth O. Lichtenberger, Debra Broadbooks,
and Alan S. Kaufman
Essentials of Nonverbal Assessment
by Steve McCallum, Bruce Bracken, and John
Wasserman
Essentials of MMPI-2Assessment
by David S. Nichols
Essentials of NEPSY Assessment
by Sally L. Kemp, Ursula Kirk, and Marit Korkman
Essentials of Individual Achievement Assessment
by Douglas K. Smith
Essentials of TAT and Other Storytelling Techniques
Assessment
by Hedwig Teglasi
Essentials of WJ III Tests of Achievement Assessment
by Nancy Mather, Barbara J. Wendling, and Richard W.
Woodcock
Essentials of WJ III Cognitive Abilities Assessment
by Fredrick A. Schrank, Dawn P. Flanagan, Richard W.
Woodcock, and Jennifer T. Mascolo
Essentials of WMS-III Assessment
by Elizabeth O. Lichtenberger, Alan S. Kaufman, and
Zona C. Lai
Essentials of MMPI-A Assessment
by Robert P. Archer and Radhika Krishnamurthy
Essentials of Neuropsychological Assessment
by Nancy Hebben and William Milberg
Essentials of Behavioral Assessment
by Michael C. Ramsay, Cecil R. Reynolds,
and R. W. Kamphaus
Essentials of Millon Inventories Assessment, Second Edition
by Stephen N. Strack
Essentials of PAI Assessment
by Leslie C. Morey
Essentials of 16 PF Assessment
by Heather E.-P. Cattell and James M. Schuerger
Essentials of WPPSI-III Assessment
by Elizabeth O. Lichtenberger and Alan S. Kaufman
Essentials of Assessment Report Writing
by Elizabeth O. Lichtenberger, Nancy Mather,
Nadeen L. Kaufman, and Alan S. Kaufman
Essentials of Stanford-Binet Intelligence Scales (SB5)
Assessment
by Gale H. Roid and R. Andrew Barram
Essentials of WISC-IV Assessment
by Dawn P. Flanagan and Alan S. Kaufman
Essentials of KABC-II Assessment
by Alan S. Kaufman, Elizabeth O. Lichtenberger,
Elaine Fletcher-Janzen, and Nadeen L. Kaufman
Essentials of Processing Assessment
by Milton J. Dehn
Essentials of WIAT-II and KTEA-II Assessment
by Elizabeth O. Lichtenberger and Donna R. Smith
Essentials of Assessment with Brief Intelligence Tests
by Susan R. Homack and Cecil R. Reynolds
Essentials of School Neuropsychological Assessment
by Daniel C. Miller

Copyright 2007 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.

Wiley Bicentennial Logo: Richard J. Pacifico

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States
Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy
fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on
the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley
& Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008 or online at
http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no
representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should
consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or other damages.

This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the
understanding that the publisher is not engaged in rendering professional services. If legal, accounting, medical, psychological or any
other expert assistance is required, the services of a competent professional person should be sought.

Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons,
Inc. is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate
companies for more complete information regarding trademarks and registration.

For general information on our other products and services please contact our Customer Care Department within the United States at
(800) 762-2974, outside the United States at (317 ) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic
books. For more information about Wiley products, visit our website at www.wiley.com.

Library of Cong ress Catalog ing -in-Publication Data:
Flanagan, Dawn P.
Essentials of cross-battery assessment with CD / Dawn P. Flanagan, Samuel O. Ortiz, Vincent C. Alfonso.2nd ed. p. cm.(Essentials
of psychological assessment series)
ISBN-13: 978-0-471-75771-9 (paper/CD-ROM ) ISBN-10: 0-471-75771-3 (paper/CD-ROM)
1. Intelligence tests. 2. Intellect. I. Ortiz, Samuel O., 1958- II. Alfonso, Vincent C. III. Title. BF431.F437 2007 153.93dc22
2006036652
Printed in the United States of America.

10 9 8 7 6 5 4 3 2 1
Dedication

Today we break with the tradition of dedicating our books to our children and families. Our children
represent the future and we, of course, hope that our work and efforts somehow will result in better
lives for them. This time, however, we turn our thoughts to the past where, instead of our children or
even our parents, we seek to honor those who have had a profound influence on what we do and the
paths we have chosen. In this case, we are referring to the three gentlemen and scientists whose very
names are carrying intelligence theory and cognitive science into the next millennium and who put
the CHC in CHC theory. We mean, of course, Dr. Raymond Cattell (1905- 1998), Dr. John Horn
(1928-2006), and Dr. John Carroll (1916-2003).
Dr. Cattell passed away many years ago, but at least one of us (DPF) had the good fortune of
meeting him on at least one occasion. As the true grand-father of CHC theory, he holds a special
place of honor in the pantheon of scientists who have been so influential in shaping modern thinking
on intelligence and human cognitive abilities. And so to him we dedicate this book.
The other C in CHC belongs to Dr. Carroll, who passed away not too long ago but whose work in
the field of intelligence theory and cognitive abilities led to his landmark book in 1993 that has set the
stage for much of what we have done in terms of promoting the application of theory and research in
cognitive ability assessment. We all have had the good fortune of meeting Dr. Carroll and we are
grateful and fortunate indeed to have been able to become acquainted with him personally. We also
dedicate this book to him in honor of his invaluable work and contributions.
It is with immense pride tinged with sadness, however, that we also dedicate this book to Dr. Horn,
or John as he had everyone call him. As a student of Cattells, John took what was then a fledgling
two-factor theory (Gf-Gc) and transformed it into the most comprehensive and empirically supported
theory of cognitive abilities in existence today. We all knew John personally and one of us (SOO)
even had the incredible good fortune of having him mentor his statistical analyses for his dissertation.
Last year, John and his wife were invited as honored guests to dine at a special gathering of CHC
zealots during a professional conference in Los Angeles. With all of us and many of our colleagues
in attendance, John regaled the party with incredible recollections of both his childhood and his work
with Dr. Cattell in what turned out to be an amazing evening. Little did we know he would pass away
unexpectedly shortly thereafter, resulting in a loss that we continue to feel to this very day. And so we
honor Johns memory as well as his unparalleled contributions to the literature on the structure of
cognitive abilities by dedicating this book to him.
There is no question that this dedication is likely to go unnoticed by the many generations of
scientists soon to follow us and the work we are doing. Nevertheless, we hope that this gesture, small
though it may be, serves in some way to remind others from where and from whom we have come.
Isaac Newton once said, If I have seen further than others, it is by standing upon the shoulders of
giants. We are all now standing on the shoulders of three great giants, the very men who have lent
their names to what we all know as CHC theory. And so none of us should ever forget, especially not
in those instances in which the acronym CHC rolls off our tongues like any other over-learned
word in our vocabulary, that there is much more to it than verbal expediencythere is, in fact, three
awesome men, no longer with us in the flesh, but present in the spirit of all that we do. It is not just
CHC. It is Raymond Cattell. It is John Horn. And it is John Carroll. We miss you all.
With respect and admiration,
Dawn P. Flanagan
Samuel O. Ortiz
Vincent C. Alfonso

SERIES PREFACE

In the Essentials of Psychological Assessment series, we have attempted to provide the reader with
books that will deliver key practical information in the most efficient and accessible style. The series
features instruments in a variety of domains, such as cognition, personality, education, and
neuropsychology. For the experienced clinician, books in the series will offer a concise yet thorough
way to master utilization of the continuously evolving supply of new and revised instruments, as well
as a convenient method for keeping up to date on the tried-and-true measures. The novice will find
here a prioritized assembly of all the information and techniques that must be at ones fingertips to
begin the complicated process of individual psychological diagnosis.
Wherever feasible, visual shortcuts to highlight key points are utilized alongside systematic, step-
by-step guidelines. Chapters are focused and succinct. Topics are targeted for an easy understanding
of the essentials of administration, scoring, interpretation, and clinical application. Theory and
research are continually woven into the fabric of each book, but always to enhance clinical inference,
never to sidetrack or overwhelm. We have long been advocates of what has been called intelligent
testingthe notion that a profile of test scores is meaningless unless it is brought to life by the
clinical observations and astute detective work of knowledgeable examiners. Test profiles must be
used to make a difference in the childs or adults life, or why bother to test? We want this series to
help our readers become the best intelligent testers they can be.
An exciting new feature of the second edition of Essentials of Cross-Battery Assessment and the
first in the Essentials series to utilize it, is the addition of a CD-ROM to accompany the text. The CD-
ROM contains three programs collectively known as Essential Tools for Cross-Battery Assessxi
ment (XBA). The programs offer the advantages of automation in many aspects of conducting XBA.
The main program, Cross-Battery Assessment (XBA) Data Manager and Interpretive Assistant (XBA
DMIA), replaces the previous manual entry worksheets found in Appendix A in the first edition of the
book. Practitioners now have at their disposal a powerful program that contains all of the information
necessary to organize, manage, and interpret data from XBAs. The XBA DMIA does all the necessary
calculations, analyzes scores, and graphs results automatically. The XBA DMIA should prove to be an
extremely useful tool that not only facilitates XBA but also assists practitioners in understanding the
principles and procedures at the heart of the approach.
Another program that practitioners may find helpful is called Specific Learning Disability (SLD)
Assistant. This program is designed expressly for the purpose of informing decisions that need to be
made within the context of the operational definition of SLD presented in Chapter 4 of this book.
Rather than relying on a traditional discrepancy analysis, the program assists in evaluating whether an
individuals intact abilities/processes comprise a general pattern of otherwise normal ability, despite
identified areas of related cognitive and academic deficits. The program is easy to use and will prove
to be a valuable resource for practitioners.
The third program contained on the CD-ROM is the Culture-Language Interpretive Matrix (C-
LIM). The purpose of this program is to evaluate data from standardized norm-referenced tests to
determine the relative influence of limited English proficiency and level of acculturation on test
performance. This is an extremely important consideration in nondiscriminatory assessment and must
be accomplished prior to any attempts at interpretation because the validity of the obtained data rests
on the degree to which these factors may have affected performance adversely. The program
provides a systematic method that facilitates evaluation of cultural and linguistic factors that may be
present in the evaluation of individuals from diverse backgrounds. The program also provides a
graphic depiction of the effects of culture and language on test performance, which can also assist
practitioners in their evaluations.
The CD-ROM feature of Essentials of Cross-Battery Assessment (2nd edition) raises the bar in the
Essentials series to a new dimension of user friendliness and enhances the wonderful contributions to
test interpretation that were made by the first edition of the book. In addition, the second edition
covers thoroughly the major high-quality psychometric measures of cognitive abilities that were
published since the first edition, and includes key new content areas, making it a state-of-the-art
interpretive approach for examiners from a diversity of clinical backgrounds.

Alan S. Kaufman, PhD, and Nadeen L . Kaufman, EdD, Series Editors
Yale University School of Medicine
One

OVERVIEW

The Cross-Battery Assessment (XBA) approach (hereafter referred to as the XBA approach) was
introduced by Flanagan and her colleagues in the late 1990s (Flanagan & McGrew, 1997; Flanagan,
McGrew, & Ortiz, 2000; McGrew & Flanagan, 1998). The XBA approach provides practitioners with
the means to make systematic, valid, and up-to-date interpretations of intelligence batteries and to
augment them with other tests (e.g., academic ability tests) in a way that is consistent with the
empirically supported Cattell-Horn-Carroll (CHC) theory of cognitive abilities. Moving beyond the
boundaries of a single intelligence test kit by adopting the psychometrically and theoretically
defensible XBA principles and procedures represents a significantly improved method of measuring
cognitive abilities (Carroll, 1998; Kaufman, 2000).
According to Carroll (1997), the CHC taxonomy of human cognitive abilities appears to prescribe
that individuals should be assessed with respect to the total range of abilities the theory specifies (p.
129, emphasis added). However, because Carroll recognized that any such prescription would of
course create enormous problems, he indicated that [r]esearch is needed to spell out how the
assessor can select what abilities need to be tested in particular cases (p. 129). Flanagan and
colleagues XBA approach was developed specifically to spell out how practitioners can conduct
assessments that approximate the total range of broad and narrow cognitive abilities more adequately
than what is possible with a single intelligence battery. In a review of the XBA approach, Carroll
(1998) stated that it can be used to develop the most appropriate information about an individual in a
given testing situation (p. xi). In Kaufmans (2000) review of the XBA approach, he stated that the
approach is based on sound assessment principles, adds theory to psychometrics, and improves the
quality of the assessment and interpretation of cognitive abilities and processes.
Noteworthy is the fact that the crossing of batteries is not a new method of intellectual
assessment. Neuropsychological assessment has long adopted the practice of crossing various
standardized tests in an attempt to measure a broader range of brain functions than that offered by any
single instrument (Lezak, 1976, 1995). Nevertheless, several problems with crossing batteries have
plagued assessment-related fields for years. Many of these problems have been circumvented by
Flanagan and colleagues XBA approach (see Rapid Reference 1.1 for examples).
Unlike the XBA model, the various so-called cross-battery techniques applied within the field of
neuropsychological assessment, for example, are not grounded in a systematic approach that is both
psychometrically and theoretically defensible. Thus, as Wilson (1992) cogently pointed out, the field
of neuropsychological assessment is in need of an approach that would guide practitioners through
the selection of measures that would result in more specific and delineated patterns of function and
dysfunctionan approach that provides more clinically useful information than one that is wedded
to the utilization of subscale scores and IQs (p. 382). Indeed, all fields involved in the assessment of
cognitive functioning have some need for an approach that would aid practitioners in their attempt to
touch all of the major cognitive areas, with emphasis on those most suspect on the basis of history,
observation, and on-going test findings ( Wilson, 1992, p. 382). The XBA approach represents a
quantum leap in this direction. Recently, other researchers appear to be offering similar
recommendations as those inherent in the XBA approach (e.g., Dehn, 2006; Fiorello & Hale, 2006).
The definition of XBA as well as the rationale and foundations for and applications of this approach
are depicted in Figure 1.1 and are described briefly in the following sections.
DEFINITION

The XBA approach is a time-efficient method of cognitive assessment that is grounded in CHC theory
and research. It allows practitioners to reliably measure a wider range (or a more in-depth but
selective range) of cognitive abilities/processes than that represented by a single intelligence battery.
The XBA approach is based on three foundational sources of information or three pillars. Together,
these pillars (described later in this chapter) provide the knowledge base necessary to organize
theory-driven, comprehensive, reliable, and valid assessment of cognitive abilities/processes.
Rapid Reference 1.1

Parallel Needs in Cognitive Assessment-Related Fields Addressed by the XBA Approach

Figure 1.1 Overview of the XBA Approach

DONT FORGET

The XBA approach allows practitioners to reliably measure a wider range (or a more in-depth
but selective range) of cognitive abilities/processes than that represented by a single intelligence
battery.

RATIONALE FOR THE XBA APPROACH

The XBA approach has significant implications for practice, research, and test development. A brief
discussion of these implications follows.
Practice
The XBA approach provides a much needed and updated bridge between current intellectual theory
and research and practice (Flanagan & McGrew, 1997, p. 322). The results of several joint factor
analyses conducted over the past 10+ years demonstrated that none of our intelligence batteries
contained measures that sufficiently approximated the full range of broad abilities/processes that
define the structure of intelligence specified in contemporary psychometric theory (e.g., Carroll,
1993; Flanagan & McGrew, 1998; Horn, 1991; Keith, Kranzler, & Flanagan, 2001; McGrew, 1997;
Phelps, McGrew, Knopik, & Ford, 2005; Woodcock, 1990). Indeed, the joint factor analyses
conducted by Woodcock suggested that it may be necessary to cross batteries to measure a broader
range of cognitive abilities than that provided by a single intelligence battery.
A summary of the findings of the joint factor analytic studies of intelligence batteries that were
published before 2000 are presented in Rapid Reference 1.2. As may be seen in this table, most
batteries fell far short of measuring all seven of the broad cognitive abilities/processes listed. Of the
major intelligence batteries in use prior to 2000, most failed to measure three or more broad CHC
abilities (viz., Ga, Glr, Gf, and Gs) that were (and are) considered important in understanding and
predicting school achievement. In fact, Gf, often considered to be the essence of intelligence, was
either not measured or not measured adequately by most of the intelligence batteries included in
Rapid Reference 1.2 (i.e., WISC-III, WAIS-R, WPPSI-R, K-ABC, and CAS; Alfonso, Flanagan, &
Radwan, 2005).
Rapid Reference 1.2

Representation of Broad CHC Abilities/Processes on Nine Intelligence Batteries Published
Prior to 2000

The finding that the abilities not measured by the intelligence batteries listed in Rapid Reference 1.2
are important in understanding childrens learning difficulties provided the impetus for developing
the XBA approach (Flanagan & McGrew 1997). In effect, the XBA approach was developed to
systematically replace the dashes in Rapid Reference 1.2 with tests from another battery. As such, this
approach guides practitioners in the selection of tests, both core and supplemental, that together
provide measurement of abilities/processes that is considered sufficient in both breadth and depth for
the purpose of addressing referral concerns.
DONT FORGET

The XBA approach guides practitioners in the selection of tests, both core and supplemental, that
together provide measurement of abilities/processes that is considered sufficient in both breadth
and depth for the purpose of addressing referral concerns.

Another benefit of the XBA approach is that it facilitates communication among professionals.
Most scientific disciplines have a standard nomenclature (i.e., a common set of terms and definitions)
that facilitates communication and guards against misinterpretation. For example, the standard
nomenclature in chemistry is reflected in the Periodic Table; in biology, it is reflected in the
classification of animals according to phyla; in psychology and psychiatry, it is reflected in the
Diagnostic and Statistical Manual of Mental Disorders; and in medicine, it is reflected in the
International Classification of Diseases. Underlying the XBA approach is a standard nomenclature,
or Table of Human Cognitive Abilities, that includes classifications of over 500 tests according to the
broad and narrow CHC abilities/processes they measure (see also Alfonso et al., 2005; Flanagan &
Ortiz, 2001; Flanagan, McGrew, & Ortiz, 2000 ; Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006).
The XBA classification system has had a positive impact on communication among practitioners, has
improved research on the relations between cognitive and academic constructs, and has resulted in
substantial improvements in the measurement of cognitive constructs, as may be seen in the design
and structure of current intelligence batteries (e.g., WJ III, KABC-II, DAS-II, SB5).
Finally, the XBA approach offers practitioners a psychometrically defensible means to identifying
population-relative (or normative) strengths and weaknesses in cognitive abilities/processes.
According to Brackett and McPherson (1996), the limited capacity of standardized instruments to
assess isolated cognitive processes creates a major weakness in intracognitive discrepancy models.
Although analysis of [Wechsler] subtests typically report measures of distinct cognitive abilities, such
abilities may not emerge by individual subtests but rather in combination with other subtests (p. 79).
The XBA approach addresses this limitation. By focusing interpretations on cognitive ability clusters
(i.e., via combinations of construct-relevant subtests) that contain qualitatively different indicators of
each broad CHC cognitive ability/process, the identification of normative processing strengths and
weaknesses via XBA procedures is both psychometrically defensible and theoretically sound. In sum,
the XBA approach addresses the longstanding need within the entire field of assessment, from
learning disabilities to neuropsychological assessment, for methods that provide a greater range of
information about the ways individuals learnthe ways individuals receive, store, integrate, and
express information (Brackett & McPherson, p. 80). Because current intelligence tests provide a
broader range of information than their predecessors, it is not surprising that results of recent studies
demonstrated that specific cognitive abilities/processes explain significant variance in academic
outcomes (e.g., reading achievement) above and beyond the variance accounted for by g (e.g., Floyd,
Keith, Taub, & McGrew, 2006; Vanderwood, McGrew, Flanagan, & Keith, 2002).
DONT FORGET

The XBA approach offers practitioners a psychometrically defensible means to identifying
population-relative (or normative) strengths and weaknesses in cognitive abilities /processes.

Research
The XBA approach was also developed to promote a greater understanding of the relationship
between cognitive abilities and important outcome criteria. Because XBAs are based on the
empirically supported CHC theory and constructed in a psychometrically defensible manner, they
represent a valid means of measuring cognitive constructs (Flanagan, 2000; Phelps et al., 2005). It is
noteworthy that when second-order constructs are composed of (moderately) correlated but
qualitatively distinct measures, they will tend to have higher correlations with complex criteria (e.g.,
academic achievement), as compared to lower-order constructs, because they are broader in what
they measure (Comrey, 1988). Predictive statements about different achievements (i.e., criterion-
related inferences) that are made from XBA clusters are based on a more solid foundation than
individual subtests (and perhaps some global scores from single intelligence batteries) because the
predictor constructs are represented by relatively pure and qualitatively distinct measures of broad
CHC abilities/processes. Thus, improving the validity of CHC ability measures (i.e., intelligence
batteries) has further elucidated the relations between CHC cognitive abilities/processes and different
achievement and vocational/occupational outcomes (e.g., Flanagan, 2000; Floyd, Bergeron, &
Alfonso, 2006; Floyd, Keith, Taub, & McGrew, in press; McGrew, 1997; Vanderwood, McGrew,
Flanagan, & Keith, 2002).
Test Development
Although there was substantial evidence of at least eight or nine broad cognitive CHC
abilities/processes by the late 1980s, the tests of the time did not reflect this diversity in measurement.
For example, Rapid Reference 1.2 shows that the WISC-III, WPPSI-R, K-ABC, KAIT, WAIS-R, and
CAS batteries only measured two or three broad CHC abilities/processes adequately. The Wechslers
primarily measured Gv and Gc. The K-ABC primarily measured Gv and Gsm, and to a much lesser
extent Gf, while the KAIT primarily measured Gc, Gf, and Glr, and to a much lesser extent Gv. The
CAS measured Gs, Gsm, and Gv. Finally, while the DAS and SB:FE did not provide sufficient
coverage of abilities to narrow the gap between contemporary theory and practice, their
comprehensive measurement of approximately four CHC abilities was nonetheless an improvement
over the previously mentioned batteries. Rapid Reference 1.2 shows that only the WJ-R included
measures of all broad cognitive abilities as compared to the other batteries available at that time.
Nevertheless, most of the broad abilities were not measured adequately by the WJ-R (Alfonso et al.,
2005; McGrew & Flanagan, 1998).
In general, Rapid Reference 1.2 shows that Gf, Gsm, Glr, Ga, and Gs were not measured well by the
majority of intelligence batteries published prior to 2000. Therefore, it is clear that most test authors
did not use contemporary psychometric theories of the structure of cognitive abilities to guide the
development of their intelligence batteries. As such, a substantial theory-practice gap existedthat is,
theories of the structure of cognitive abilities were far in advance of commonly used intelligence
batteries. In fact, prior to the mid-1980s, theory seldom played a role in intelligence test development.
The numerous dashes in Rapid Reference 1.2 exemplify the theory-practice gap that existed in the
field of intellectual assessment at that time (Alfonso et al., 2005).
In the past decade, Gf-Gc theory, and more recently CHC theory, has had a significant impact on the
revision of old and the development of new intelligence batteries. For example, a wider range of
broad and narrow abilities /processes is represented on current intelligence batteries than that which
was represented on previous editions of these tests. Rapid Reference 1.3 provides several salient
examples of the impact that CHC theory and XBA CHC test classifications have had on intelligence
test development in recent years. This rapid reference lists the major intelligence tests in the order in
which they were revised, beginning with those tests with the greatest number of years between
revisions (i.e., K-ABC) and ending with newly developed tests (i.e., RIAS and WRIT) and tests that
have yet to be revised (e.g., CAS). As is obvious from a review of Rapid Reference 1.3, CHC theory
and XBA CHC test classifications have had a significant impact on recent test development (Alfonso
et al., 2005).
Of the seven intelligence batteries (including both comprehensive and brief measures) that were
published since 2000, the test authors of four clearly used CHC theory and XBA CHC test
classifications as a blueprint for test development (i.e., WJ III, SB5, KABC-II, and DAS-II), and the test
authors of two were obviously influenced by CHC theory (i.e., RIAS and WRIT). Only the authors of
the Wechsler Scales (i.e., WPPSI-III, WISC-IV, WAIS-III) and CAS did not state explicitly that CHC
theory was used as a guide for revision. 1 Nevertheless, the authors of the Wechsler Scales
acknowledged the research of Cattell, Horn, and Carroll in their most recent manuals (Wechsler,
2002, 2003). Presently, as Rapid Reference 1.3 shows, nearly all comprehensive, individually
administered intelligence batteries that are used with some regularity subscribe either explicitly or
implicitly to CHC theory (Alfonso et al., 2005; Flanagan et al., 2006).
Rapid Reference 1.3

Impact of CHC Theory and XBA CHC Test Classification on Intelligence Test Development

Convergence toward the incorporation of CHC theory is also seen clearly in Rapid Reference 1.4.
This table is identical to Rapid Reference 1.2 except it includes all intelligence batteries that were
published after 2000, including recent revisions of many of the tests from Rapid Reference 1.2. A
comparison of Rapid Reference 1.2 and Rapid Reference 1.4 shows that many of the gaps in
measurement of broad cognitive abilities have been filled. Specifically, the majority of tests published
after 2000 now measure four or five broad cognitive abilities adequately (see Rapid Reference 1.4) as
compared to two or three (see Rapid Reference 1.2). For example, Rapid Reference 1.4 shows that the
WISC-IV, WAIS-III, WPPSI-III, KABC-II, SB5, and DAS-II measure four or five CHC broad abilities.
The WISC-IV measures Gf, Gc, Gv, Gsm, and Gs while the KABC-II measures Gf, Gc, Gv, and Glr
adequately, and to a lesser extent Gsm. The WAIS-III measures Gc, Gv, Gsm, and Gs adequately, and to
a lesser extent Gf, while the WPPSI-III measures Gf, Gc, Gv, and Gs adequately. Finally, the SB5
measures four CHC broad abilities adequately (i.e., Gf, Gc, Gv, Gsm; Alfonso et al., 2005) and the
DAS-II measures five CHC broad abilities adequately (i.e., Gf, Gc, Gv, Gsm, and Glr) and to a lesser
extent, Ga and Gs.
Rapid Reference 1.4 shows that the WJ III continues to include measures of all the major broad
cognitive abilities/processes and now measures them well, particularly when it is used in conjunction
with the Diagnostic Supplement (DS; Woodcock, McGrew, Mather, & Schrank, 2003). Third, a
comparison of Rapid References 1.2 and 1.4 indicates that two broad abilities/processes not measured
by many intelligence batteries prior to 2000 are now measured by the majority of intelligence
batteries available today; that is, Gf and Gsm. These broad abilities/processes may be better
represented on revised and new intelligence batteries because of the accumulating research evidence
regarding their importance in overall academic success (see Chapter 2). Finally, Rapid Reference 1.4
reveals that intelligence batteries continue to fall short in their measurement of three CHC broad
abilities/processes; specifically, Glr, Ga, and Gs. In addition, current intelligence batteries do not
provide adequate measurement of most specific or narrow CHC abilities/processes, many of which
are important in predicting academic achievement. Thus, although there is greater coverage of CHC
broad abilities/processes now than there was just a few years ago, the need for the XBA approach to
assessment remains (Alfonso et al., 2005).
Rapid Reference 1.4

Representation of Broad CHC Abilities/Processes on Nine Intelligence Batteries Published
After 2000
DONT FORGET

Nearly all comprehensive, individually administered intelligence batteries that are used with
some regularity subscribe either explicitly or implicitly to CHC theory.

Rapid Reference 1.5

Three Pillars of the XBA Approach

The first pillar of the approach is a relatively complete taxonomic framework for
describing the structure and nature of cognitive abilities. This taxonomy is the Cattell-
Horn-Carroll theory of cognitive abilities (CHC theory).
The second pillar of the approach is the CHC broad (stratum II) classifications of
cognitive and achievement tests.
The third pillar of the approach is the CHC narrow (stratum I) classifications of cognitive
and achievement tests.

THE THREE PILLARS OF THE XBA APPROACH

The three pillars of the XBA approach include contemporary CHC theory and the broad and narrow
CHC ability classifications of all subtests that comprise current cognitive and achievement batteries as
well as numerous special purpose tests. Each pillar is defined briefly in the following sections and in
Rapid Reference 1.5.
The First Pillar of the XBA Approach: CHC Theory
The CHC theory was selected to guide assessment and interpretation because it is based on a more
thorough network of validity evidence than any other contemporary multidimensional model of
intelligence within the psychometric tradition (see McGrew, 2005; Messick, 1992; Sternberg &
Kaufman, 1998). According to Daniel (1997), the strength of the multiple (CHC) cognitive abilities
model is that it was arrived at by synthesizing hundreds of factor analyses conducted over decades
by independent researchers using many different collections of tests. Never before has a
psychometric ability model been so firmly grounded in data (pp. 1042-1043). Because nearly all
current intelligence batteries are based on CHC theory, it will not be described in detail in this chapter.
For a detailed presentation of CHC theory and comprehensive definitions of all broad and narrow
CHC abilities/processes, see Appendix A of this book.
The Second Pillar of the XBA Approach: CHC Broad (Stratum II) Classifications of Cognitive
and Achievement Tests

Based on the results of a series of cross-battery confirmatory factor analysis studies of the major
intelligence batteries and the task analyses of many intelligence test experts, Flanagan and colleagues
classified all the subtests of the major intelligence and achievement batteries according to the
particular CHC broad abilities/processes they measured (e.g., Flanagan et al., 2006). To date, well
over 500 CHC broad ability classifications have been made based on the results of these studies.
These classifications of cognitive and achievement tests assist practitioners in identifying measures
that assess the various broad and narrow abilities/processes represented in CHC theory. Classification
of tests at the broad ability/processing level is necessary to improve upon the validity of cognitive
assessment and interpretation. Specifically, broad ability classifications ensure that the CHC
constructs that underlie assessments are minimally affected by construct-irrelevant variance
(Messick, 1989, 1995). In other words, knowing what tests measure what abilities/processes enables
clinicians to organize tests into construct-relevant clustersclusters that contain only measures that
are relevant to the construct of interest.
To clarify, construct-irrelevant variance is present when an assessment is too broad, containing
excess reliable variance associated with other distinct constructs . . . that affects responses in a manner
irrelevant to the interpreted constructs (Messick, 1995, p. 742). For example, the WAIS-III Verbal IQ
(VIQ) has construct-irrelevant variance because, in addition to its four indicators of Gc (i.e.,
Information, Similarities, Vocabulary, Comprehension), it has one indicator of Gq (i.e., Arithmetic)
and one indicator of Gsm (i.e., Digit Span). Therefore, the VIQ is a mixed measure of three distinct,
broad CHC abilities/processes (Gc, Gq, and Gsm); it contains reliable variance (associated with Gq
and Gsm) that is irrelevant to the construct intended to be interpreted (i.e., Gc; McGrew & Flanagan,
1998). The Wechsler VIQ represents a grouping together of subtests on the basis of face validity (e.g.,
grouping tests together that appear to measure the same common concept), an inappropriate
aggregation of subtests that can actually decrease reliability and validity (Epstein, 1983). The purest
Gc composite on the WAIS-III is the Verbal Comprehension Index, because it contains only construct-
relevant variance.
DONT FORGET

Invalidity in Assessment

Construct-irrelevant varianceexcess reliable variance associated with other distinct constructs
that affects responses in a manner irrelevant to the interpreted construct. The XBA approach
guards against this major source of invalidity in assessment by ensuring that only validated
measures of a cognitive construct are included in an XBA designed to measure that construct.
The XBA DMIA organizes the subtests of the major intelligence batteries according to the broad
abilities/processes they measure to assist practitioners in designing assessments that measure
constructs validly.

CAUTION

Clusters that contain construct-irrelevant variance are psychologically ambiguous and difficult
to interpret. For example, the traditional Wechsler VIQ contained variance (Gq) that was
irrelevant to the construct intended to be interpreted (Gc). The Verbal Comprehension Index
eliminated the irrelevant Gq variance and, therefore, represented a purer measure of Gc as
compared to the VIQ.

Construct-irrelevant variance can also operate at the subtest (as opposed to composite) level. For
example, a Verbal Analogies test (e.g., Sun is to day as moon is to ) measures both Gc and Gf. That is,
in theory-driven factor-analytic studies, Verbal Analogies tests have significant loadings on both the
Gc and Gf factors (e.g., Woodcock, 1990). Therefore, this test is considered factorially complexa
condition that complicates interpretation (e.g., Is poor performance due to low vocabulary knowledge
[Gc] or poor reasoning ability [Gf ], or both?).
In short, [A]ny test that measures more than one common factor to a substantial degree yields
scores that are psychologically ambiguous and very difficult to interpret (Guilford, 1954, p. 356;
cited in Briggs & Cheek, 1986). Interpretation is far less complicated when composites are derived
from relatively pure measures of the underlying construct. Therefore, XBAs are typically designed
using only empirically strong or moderate (but not factorially complex or mixed) measures of CHC
abilities/processes, following the information presented in Appendix B.2
The Third Pillar of the XBA Approach: CHC Narrow (Stratum I) Classifications of Cognitive
and Achievement Tests

Narrow ability/processing classifications were originally reported in McGrew (1997), then later
reported in McGrew and Flanagan (1998) and Flanagan et al. (2000) following minor modifications.
Flanagan and her colleagues continued to gather content validity data on cognitive tests and expanded
their analyses recently to include tests of academic achievement (Flanagan et al., 2002, 2006).
Classifications of cognitive tests according to content, format, and task demand at the narrow (stratum
I) ability/processing level were necessary to improve further upon the validity of intellectual
assessment and interpretation (see Messick, 1989). Specifically, these narrow ability classifications
were necessary to ensure that the CHC constructs that underlie assessments are well represented.
According to Messick (1995), construct underrepresentation is present when an assessment is too
narrow and fails to include important dimensions or facets of the construct (p. 742).
Interpreting the WJ III Concept Formation (CF) subtest as a measure of Fluid Intelligence (i.e., the
broad Gf ability/process) is an example of construct underrepresentation. This is because CF
measures one narrow aspect of Gf (viz., Inductive Reasoning). At least one other Gf measure (i.e.,
subtest) that is qualitatively different from Inductive Reasoning is necessary to include in an
assessment to ensure adequate representation of the Gf construct (e.g., a measure of General
Sequential [or Deductive] Reasoning). Two or more qualitatively different indicators (i.e., measures
of two or more narrow abilities/processes subsumed by the broad ability/process) are needed for
appropriate construct representation (see Comrey, 1988; Messick, 1989, 1995). The aggregate of CF
(a measure of Inductive Reasoning at the narrow ability level) and the WJ III Analysis-Synthesis test
(a measure of General Sequential Reasoning at the narrow ability level), for example, would provide
an adequate estimate of the broad Gf construct because these tests are strong measures of Gf and
represent qualitatively different aspects of Gf (see Appendix B).
DONT FORGET

Invalidity in Assessment

Construct underrepresentationpresent when an assessment is too narrow and fails to include
important dimensions or facets of a construct.
The XBA approach guards against this major source of invalidity in assessment by ensuring that
at least two different components of a cognitive construct are included in an XBA cluster
designed to measure that construct.
The XBA DMIA organizes the subtests of the major intelligence batteries according to the
narrow abilities/processes they measure to assist practitioners in designing assessments that
measure construct validly.

The Verbal Comprehension Index (VCI) of the WAIS-III is an example of good construct
representation. This is because the VCI includes Vocabulary (VL), Similarities (LD/VL),
Comprehension (LD), and Information (K0), all of which represent qualitatively different aspects of
Gc. Most intelligence batteries yield construct-relevant composites, although some of these
composites underrepresent the broad ability intended to be measured. This is because construct
underrepresentation can also occur when the composite consists of two or more measures of the same
narrow (stratum I) ability/process. For example, the Number Recall and Word Order subtests of the
KABC-II were intended to be interpreted as a representation of the broad Gsm ability/process
(Kaufman & Kaufman, 2004). However, these subtests primarily measure Memory Span, a narrow
ability/process subsumed by Gsm. Thus, the Gsm cluster of the KABC-II is most appropriately
interpreted as Memory Span (a narrow ability /process) rather than an estimate of the broad Gsm
ability/process.
A scale [or broad CHC ability cluster] will yield far more informationand, hence, be a more
valid measure of a constructif it contains more differentiated items [or tests] (Clarke & Watson,
1995, p. 316). Cross-battery assessments circumvent the misinterpretations that can result from
underrepresented constructs by specifying the use of two or more qualitatively different indicators to
represent each broad CHC ability/process. In order to ensure that qualitatively different aspects of
broad abilities/processes are represented in assessment, classification of cognitive and achievement
tests at the narrow (stratum I) ability/processing level was necessary. The subtests of current
intelligence batteries, special purpose tests, and comprehensive achievement batteries are classified at
both the broad and narrow ability/processing levels throughout this book (see Appendix B for a
summary).
In sum, the latter two XBA pillars guard against two ubiquitous sources of invalidity in assessment:
construct-irrelevant variance and construct underrepresentation. Taken together, the three pillars
underlying the XBA approach provide the necessary foundation from which to organize assessments
of cognitive and achievement constructs that are more theoretically driven, comprehensive, and valid.
Prior to discussing the applications of the XBA approach, it is necessary to highlight the various
ways in which the approach has evolved since the first edition of this book. As noted earlier, nearly
all frequently used intelligence batteries have been revised in recent years. Additionally, these
revisions are among the most substantial of the major intelligence batteries in the history of
intellectual assessment. As a result, nearly all intelligence batteries include measurement of a broader
range of cognitive constructs and, indeed, constructs from a single psychometric theoryCHC
theory. Because intelligence batteries are substantially better than their predecessors from both a
psychometric and theoretical standpoint, the application of XBA methods is less involved.
Specifically, the mechanics of the approach are more simplistic and may be carried out effortlessly
using the automated program included on the CD-ROM accompanying this book (this program
hereafter referred to as XBA DMIA, which stands for Cross-Battery Assessment Data Management
and Interpretive Assistant). Additionally, the interpretation of test performance is enhanced by the
XBA DMIA as well as by the interpretive statements that correspond to this programs output. That is,
we provide interpretive statements (that may be used verbatim in a psychoeducational report) for
every possible outcome of a broad or narrow XBA cluster calculated by the XBA DMIA (see Chapter
3). Rapid Reference 1.6 lists the major changes that have taken place in the XBA approach since the
publication of the first edition of this book.
Rapid Reference 1.6

New Features of the XBA Approach
1. More easily incorporates and integrates all current intelligence batteries (i.e., WISC-IV,
WAIS-III, WPPSI-III, KABC-II, WJ III, SB5, and DAS-II), numerous special purpose tests,
and tests of academic achievement.
2. Uses core tests (and supplemental tests as may be necessary) from a single battery, rather
than selected components of a battery, as part of the assessment because (a) current
intelligence tests have better representation of the broad CHC abilities/processes and use
only two or three subtests to represent them; and (b) the broad abilities/processes
measured by current intelligence batteries are typically represented by qualitatively
different indicators that are relevant only to the broad ability/processes intended to be
measured.
3. Uses actual norms provided by the tests publisher for CHC broad ability clusters when
available.
4. Places greater emphasis on narrow CHC abilities/processes as supported by research
linking them to acquisition and development of specific academic skills.
5. Includes an automated program called Cross-Battery Assessment Data Management and
Interpretive Assistant (XBA DMIA) (on the CD-ROM that accompanies this book), that
incorporates and integrates all features of the XBA approach. For example, the XBA
DMIA
Incorporates and integrates components of prevailing interpretive systems of the

major intelligence batteries, including optional clinical clusters unique to WISC-
IV, WAIS-III, and SB5.
Calculates CHC broad and narrow ability/processing clusters that are generated
from either two or three individual subtests.
Graphs data to provide a pictorial representation of all interpretable broad and
narrow ability/processing clusters and the subtests that comprise them.

6. Includes interpretive statements for all possible outcomes regarding data from two or
three subtest combinations for broad and narrow ability/processing areas.
7. Expands coverage of CHC theory to include abilities typically measured on achievement
tests (e.g., Broad Reading and Writing [Grw] , Quantitative Knowledge [Gq], and extended
components of Auditory Processing [Ga]), providing additional information useful in the
identification of specific learning disability (SLD).
8. Incorporates the identification of disorders in basic psychological processes in the
interpretive system in a manner consistent with the definition of SLD in IDEA 2004 and
includes an automated program called SLD Assistant.
9. Includes advancements to the interpretive system of the Culture-Language Interpretive
Matrix used with culturally and linguistically diverse individuals.
10. Includes an automated program called Culture-Language Interpretive Matrix (C-LIM),
which calculates and graphs results to facilitate decision making as it pertains to
differentiating difference from disability with individuals from culturally and
linguistically diverse backgrounds.

APPLICATION OF THE XBA APPROACH

Guiding Principles
In order to ensure that XBA procedures are psychometrically and theoretically sound, it is
recommended that practitioners adhere to several guiding principles. These principles were listed
previously in Figure 1.1 and are defined briefly in the following section.
First, select an intelligence battery that best addresses referral concerns. It is expected that the
battery of choice will be one that is deemed most responsive to referral concerns. These batteries may
include, but are certainly not limited to, the Wechsler Scales, WJ III, SB5, KABC-II, and DAS-II. It is
important to note that the use of conormed tests, such as the WJ III tests of cognitive ability and tests
of achievement and the KABC-II and KTEA-II, may allow for the widest coverage of broad and
narrow CHC abilities/processes.
Second, use subtests and clusters/composites from a single battery whenever possible to represent
broad CHC abilities/processes. In other words, best practices involve using actual norms whenever
they are available in lieu of arithmetic averages of scaled scores from different batteries. In the past, it
was necessary to convert subtest-scaled scores from different batteries to a common metric (using the
table in Appendix E, for example) and then average them (after determining that there was a
nonsignificant difference between the scores) in order to build construct relevant broad CHC ability
/processing clusters. Because the development of current intelligence batteries benefited greatly from
current CHC theory and research, this practice is seldom necessary at the broad ability/processing
level. It continues to be necessary when testing hypotheses about aberrant performance within broad
ability/processing domains and when measurement of narrow abilities/processes is deemed necessary
(see Chapters 2 and 3).
Third, when constructing CHC broad and narrow ability clusters, select tests that have been
classified through an acceptable method, such as through CHC theory-driven factor analyses or
expert consensus content-validity studies. All test classifications included in the works of Flanagan
and colleagues have been classified through these acceptable methods (Flanagan & Ortiz, 2001;
Flanagan et al., 2006). For example, when constructing broad (stratum II) ability/processing clusters,
relatively pure CHC indicators should be included (i.e., tests that had either strong or moderate [but
not mixed] loadings on their respective factors in theory driven factor analyses). Furthermore, to
ensure appropriate construct representation when constructing broad (stratum II) ability/processing
clusters, two or more qualitatively different narrow (stratum I) ability/processing indicators should be
included to represent each domain. Without empirical classifications of tests, constructs may not be
adequately represented and, therefore, inferences about an individuals broad (stratum II)
ability/process cannot be made. Of course, the more broadly a construct is represented (i.e., through
the derivation of a cluster based on multiple qualitatively different narrow ability/processing
indicators), the more confidence one has in drawing inferences about the ability/process presumed to
underlie it. A minimum of two qualitatively different indicators per CHC cluster is recommended in
the XBA approach for practical reasons (viz., time efficient assessment).
Fourth, when at least two qualitatively different indicators of a broad ability /process of interest is
not available on the core battery, then supplement the core battery with at least two qualitatively
different indicators of that broad ability from another battery. In other words, if an evaluator is
interested in measuring Auditory Processing (Ga), and the core battery includes either one or no Ga
subtests, then select a Ga cluster from another battery to supplement the core battery.
Fifth, when crossing batteries (e.g., augmenting a core battery with relevant CHC clusters from
another battery) or when constructing CHC broad or narrow ability/processing clusters using tests
from different batteries (e.g., averaging scores when the construct of interest is not available on a
single battery), select tests that were developed and normed within a few years of one another to
minimize the effect of spurious differences between test scores that may be attributable to the Flynn
effect (Flynn, 1984). The subtests listed in the XBA DMIA are from batteries and tests that were
normed within 10 years of one another.
Sixth, select tests from the smallest number of batteries to minimize the effect of spurious
differences between test scores that may be attributable to differences in the characteristics of
independent norm samples (McGrew, 1994). In most cases, using select tests from a single battery to
augment the constructs measured by any other major intelligence battery is sufficient to represent the
breadth of broad cognitive abilities/processes adequately as well as to allow for at least three
qualitatively different narrow ability/processing indicators of most CHC cognitive constructs.
Rapid Reference 1.7

XBA Step-by-Step

1. Select primary intelligence battery for assessment.

2. Identify adequately represented CHC abilities/processes.
3. Select tests to measure CHC abilities/processes not measured by primary battery.
4. Administer primary battery and any supplemental tests as necessary.
5. Enter data into the XBA DMIA.
6. Follow XBA guidelines presented in Chapter 3 to interpret XBA DMIA output.

Noteworthy is the fact that when the XBA guiding principles are implemented systematically and
the recommendations for development, use, and interpretation of clusters are adhered to, the potential
error introduced through the crossing of norm groups is negligible (Flanagan & Ortiz, 2001;
McGrew & Flanagan, 1998). Furthermore, although there are other limitations to crossing batteries,
this systematic approach to the assessment and interpretation of cognitive abilities/processes has far
fewer implications with regard to the potential for error than those associated with the improper use
and interpretation of cognitive performance inherent in many currently used assessment approaches
(e.g., subtest analysis, discrepancy analysis, atheoretical approaches to assessment and interpretation,
and so forth).
IMPLEMENTATION OF THE XBA APPROACH STEP-BY-STEP

The XBA approach may be carried out following a straightforward set of steps. These steps are
outlined in Rapid Reference 1.7 and described in further detail in Chapter 2.
USE OF THE XBA APPROACH WITH CULTURALLY AND LINGUISTICALLY
DIVERSE POPULATIONS

Application of the XBA approach with diverse individuals rests on the premise that an empirically
based selection of tests, known to represent particular constructs, coupled with a consideration of the
relevant cultural and linguistic dimensions of such tests, can provide more reliable, valid, and
interpretable data than that ordinarily obtained using traditional methods. Careful and deliberate
selection of tests, based on factors relevant to the background of the individual being assessed, creates
a unique battery of tests that is responsive to the particular referral questions. Using the XBA
approach, practitioners can develop custom batteries for individuals of culturally and linguistically
diverse backgrounds that differ as a function of both the specific language competencies and the
cultural experiences of the individual, as well as the specific nature of the referral concerns. With
respect to issues of bias related to test selection, the basic goal in constructing XBAs for use with
diverse individuals is to ensure a balance between empirical issues and considerations related to
cultural and linguistic factors. The construction of an appropriate XBA for use with diverse
individuals is presented in Chapter 5 along with a detailed explanation of how to interpret their test
performances.
CONCLUSIONS

Recent refinements to the XBA approach, including automating the process, have made this method of
assessment both practical and easy to implement. Its continued popularity revolves around its use in
the identification of students with specific learning disability (Chapter 4) and in assisting in the
process of determining difference from disability in students from culturally and linguistically
diverse backgrounds (Chapter 5). This is because the XBA approach (a) allows for flexibility in
designing assessment batteries to meet the unique needs of the individual; (b) provides a defensible
interpretive method for identifying cognitive ability/processing strengths and weaknesses (important
in the evaluation of learning disabilities); and (c) is systematic, specifying steps for evaluating the
cognitive capabilities of individuals with learning needs, including those from diverse cultural and
linguistic backgrounds.
REFERENCES

Alfonso, V. C., Flanagan, D. P., & Radwan, S. (2005). The impact of the Cattell-Horn-Carroll Theory
on test development and interpretation of cognitive and academic abilities. In D. P. Flanagan & P. L.
Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (2nd ed., pp. 185-
202). New York: Guilford.
Brackett, J., & McPherson, A. (1996). Learning disabilities diagnosis in postsecondary students: A
comparison of discrepancy-based diagnostic models. In N. Gregg, C. Hoy, & A. F. Gay (Eds.), Adults
with learning disabilities: Theoretical and practical perspectives (pp. 68-84). New York: Guilford.
Briggs, S. R., & Cheek, J. M. (1986). The role of factor analysis in the development and evaluation of
personality scales [Special Issue: Methodological developments in personality research]. Journal of
Personality, 54 (1), 106-148.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, UK:
Cambridge University Press.
Carroll, J. B. (1997). The three-stratum theory of cognitive abilities. In D. P. Flanagan, J. L. Genshaft,
& P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 122-
Carroll, J. B. (1998). Foreword. In K. S. McGrew & D. P. Flanagan, The intelligence test desk
reference (ITDR): Gf-Gc cross-battery assessment (pp. xi-xii). Boston: Allyn & Bacon.
Clarke, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development.
Psychological Assessment, 7, 309-319.
Comrey, A. L. (1988). Factor-analytic methods of scale development in personality and clinical
psychology. Journal of Consulting and Clinical Psychology, 56(5), 754-761.
Daniel, M. H. (1997). Intelligence testing: Status and trends. American Psychologist, 52, 1038-1045.
Das, J. P., & Naglieri, J. A. (1997). Das-Naglieri Cognitive Assessment System. Itasca, IL: Riverside
Publishing.
Dehn, M. J. (2006). Essentials of processing assessment. New York: Wiley.
Elliot, C. (1990). Differential Ability Scales (DAS). San Antonio, TX: The Psychological Corporation.
Elliot, C. (2007). Differential Ability Scales-Second Edition (DAS-II). San Antonio, TX: PsychCorp.
Epstein, S. (1983). Aggression and beyond: Some basic issues on the prediction of behavior. Journal
of Personality, 51, 360-392.
Fiorello, C. A., & Hale, J. B. (2006). Cognitive hypothesis testing and response to interventions for
children with reading problems. Psychology in the Schools, 43, 835-853.
Flanagan, D. P. (2000). Wechsler-based CHC cross-battery assessment and reading achievement:
Strengthening the validity of interpretations drawn from Wechsler test scores. School Psycholog y
Quarterly, 15(3), 295-329.
Flanagan, D. P., & McGrew, K. S. (1997). A cross-battery approach to assessing and interpreting
cognitive abilities: Narrowing the gap between practice and cognitive science. In D. P. Flanagan, J. L.
Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues
(pp. 314-325). New York: Guilford.
Flanagan, D. P., & McGrew, K. S. (1998). Interpreting intelligence tests from contemporary Gf-Gc
theory: Joint confirmatory factor analyses of the WJ-R and KAIT in a non-white sample. Journal of
School Psychology, 36, 151-182.
Flanagan, D. P., McGrew, K. S., & Ortiz, S. O. (2000). The Wechsler intelligence scales and CHC
theory: A contemporary approach to interpretation. Boston: Allyn & Bacon.
Flanagan, D. P., & Ortiz, S. O. (2001). Essentials of cross-battery assessment. New York: Wiley.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo, J. T. (2002). Achievement test desk reference :
Comprehensive assessment of learning disabilities. Boston: Allyn & Bacon.
A guide to learning disability identification (2nd ed.). New York: Wiley.
Floyd, R. G., Bergeron, R., & Alfonso, V. C. (2006). Cattell-Horn-Carroll cognitive ability profiles of
poor comprehenders. Reading and Writing, 19(5), 427-456.
Floyd, R. G., Keith, T. Z., Taub, G. E., & McGrew, K. S. (in press). Cattell-Horn-Carroll cognitive
abilities and their effects on reading decoding skills: g has indirect effects, more specific abilities
have direct effects. School Psychology Quarterly.
Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin,
95, 29-51.
Glutting, J. J., Adams, W., & Sheslow, D. (2002). Wide Range Intelligence Test. Wilmington, DE :
Wide Range, Inc.
Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York: McGraw-Hill.
Horn, J. L. (1991). Measurement of intellectual capabilities: A review of theory. In K. S. McGrew, J. K.
Werder, & R. W. Woodcock (Eds.), Woodcock-Johnson technical manual (pp. 197-232). Chicago:
Riverside Publishing.
Kaufman, A. S. (2000). Foreword. In D. P. Flanagan, K. S. McGrew, & S. O. Ortiz (Eds.), The Wechsler
intelligence scales and Gf-Gc theory: A contemporary approach to interpretation (pp. xiii-xv). Boston:
Allyn & Bacon.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines,
MN: American Guidance Service.
Kaufman, A. S., & Kaufman, N. L. (1993). Kaufman Adolescent and Adult Intelligence Test. Circle
Pines, MN: American Guidance Service.
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children- Second Edition.
Circle Pines, MN: AGS Publishing.
Keith, T., Fine, J., Taub, G., Reynolds, M., and Kranzler, J. (2006). Higher order, multiple-sample,
confirmatory factor analysis of the Wechsler Intelligence Scale for Children-Fourth Edition: What
does it measure. School Psychology Quarterly, 35, 108-127.
Keith, T. Z., Kranzler, J., & Flanagan, D. P. (2001). What does the cognitive assessment system (CAS)
measure? Conjoint confirmatory factor analysis of the cognitive assessment system (CAS) and the
Woodcock-Johnson tests (3rd ed.). School Psychology Review, 30 (1), 89-119.
Lezak, M. D. (1976). Neuropsychological assessment. New York: Oxford University Press.
Lezak, M. D. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press.
McGrew, K. S. (1994). Clinical interpretation of the Woodcock-Johnson Tests of Cognitive Ability-
Revised. Boston: Allyn & Bacon.
McGrew, K. S. (1997). Analysis of the major intelligence batteries according to a proposed
comprehensive Gf-Gc framework. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.),
Contemporary intellectual assessment: Theories, tests, and issues (pp. 151-180). New York: Guilford.
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of cognitive abilities: Past, present, and
future. In D. P. Flanagan & P. L . Harrison (Eds.), Contemporary intellectual assessment: Theories,
tests, and issues (2nd ed., pp. 136-182). New York: Guilford.
McGrew, K. S., & Flanagan, D. P. (1998). The intelligence test desk reference (ITDR): Gf-Gc cross-
battery assessment. Boston: Allyn & Bacon.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational Measurement (3rd ed., pp. 131-104).
Washington, DC: American Council on Education.
Messick, S. (1992). Multiple intelligences or multilevel intelligence? Selective emphasis on distinctive
properties of hierarchy: On Gardner s Frames of Mind and Sternbergs Beyond IQ in the context of
theory and research on the structure of human abilities. Psychological Inquiry, 3(4), 365-384.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons
responses and performances as scientific inquiry into score meaning. American Psychologist, 50,
741-749.
Ortiz, S. O., & Flanagan, D. P. (2002). Best practices in working with culturally and linguistically
diverse children and families. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology
IV (pp. 1351-1372). Washington, DC: National Association of School Psychologists.
Phelps, L., McGrew, K. S., Knopik, S. N., & Ford, L. (2005). The general (g), broad, and narrow CHC
stratum characteristics of the WJ III and WISC-III tests: A confirmatory cross-battery investigation.
School Psychology Quarterly, 20(1), 51-65.
Reynolds, C. R., & Kamphaus, R. W. (2003). Reynolds Intellectual Assessment Scales. Lutz, FL:
Psychological Assessment Resources.
Roid, G. H. (2003). Stanford-Binet Intelligence Scales-Fifth Edition. Itasca, IL: Riverside Publishing.
Sternberg, R. J., & Kaufman, J. C. (1998). Human abilities. Annual Review of Psychology, 49, 479-502.
Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). Stanford-Binet Intelligence Scale- Fourth
Edition. Chicago: Riverside Publishing.
Vanderwood, M. L., McGrew, K. S., Flanagan, D. P., & Keith, T. Z. (2002). The contribution of general
and specific cognitive abilities to reading achievement. Learning and Individual Differences, 13, 159-
188.
Wechsler, D. (1981). Wechsler Adult Intelligence Scale-Revised. San Antonio, TX: The Psychological
Corporation.
Wechsler, D. (1989). Wechsler Preschool and Primary Scale of Intelligence-Revised. San Antonio, TX:
The Psychological Corporation.
Wechsler, D. (1991). Wechsler Intelligence Scale for Children-Revised. San Antonio, TX: The
Psychological Corporation.
Wechsler, D. (1997). Wechsler Adult Intelligence Scale-Third Edition. San Antonio, TX: The
Wechsler, D. (2002). Wechsler Preschool and Primary Scale of Intelligence-Third Edition. San
Antonio, TX: The Psychological Association.
Wechsler, D. (2003). Wechsler Intelligence Scale for Children-Fourth Edition. San Antonio, TX: The
Psychological Association.
Wilson, B. C. (1992). The neuropsychological assessment of the preschool child: A branching model.
In I. Rapin & S. I. Segalowitz (Eds.), Handbook of neuropsychology: Child neuropsychology (Vol. 6,
pp. 377-394). Amsterdam: Elsevier.
Woodcock, R. W. (1990). Theoretical foundations of the WJ-R measures of cognitive ability. Journal
of Psychoeducational Assessment, 8, 231-258.
Woodcock, R. W., & Johnson, M. B. (1989). Woodcock-Johnson Psycho-Educational Battery-Revised.
Chicago: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Tests of Cognitive
Abilities. Itasca, IL: Riverside Publishing.
Woodcock, R. W., McGrew, K. S., Mather, N., & Schrank, F. A. (2003). Diagnostic supplement to the
Woodcock-Johnson III Test of Cognitive Abilities. Itasca, IL: Riverside Publishing.
TEST YOURSELF

1. The XBA classification system has had a positive impact on communication among
practitioners, has improved research on the relations between cognitive and
academic abilities, and has resulted in substantial improvements in the measurement
of cognitive constructs, as seen in the design and structure of current intelligence
batteries. True or False?
2. Fluid Intelligence (Gf ), Crystallized Intelligence (Gc), and Visual Processing (Gv) are
examples of
(a) general (stratum III) ability.

(b) broad (stratum II) abilities.
(c) narrow (stratum I) abilities.
(d) none of the above.

3. Two broad abilities not measured by many intelligence batteries published prior to
2000 that are now measured by the majority of intelligence batteries available today
are
(a) Gc and Gv.

(b) Gf and Ga.
(c) Gf and Gsm.
(d) Gsm and Gt.

4. The three pillars of the XBA approach are CHC theory, CHC broad (stratum II)
classifications of cognitive and achievement tests, and
(a) CHC narrow (stratum I) classifications of cognitive and achievement tests.

(b) CHC general (stratum III) classifications of cognitive and achievement tests.
(c) a and b.
(d) neither a nor b.

5. The second guiding principle of the XBA approach is to
(a) use as many intelligence batteries as necessary to answer the referral concerns.
(b) use subtests and clusters from a single battery whenever possible to represent
broad CHC abilities/processes.
(c) select tests that have been classified through an acceptable method, such as
through CHC theory-driven factor analyses or expert consensus content-validity
studies.
(d) create broad CHC clusters instead of narrow CHC clusters when possible.

6. An example of a cluster that contains construct-irrelevant variance is the
(a) WISC-IV VCI.

(b) WJ III Comprehension-Knowledge Factor.
(c) WAIS-III VIQ.
(d) KABC-II Simultaneous/Gv Scale.

7. Most clusters that are found in todays comprehensive intelligence batteries are both
relatively pure (i.e., containing only construct-relevant tests) and well represented
(i.e., containing qualitatively different measures of the broad ability/process
represented by the cluster). True or False?
8. Which of the following is not a good descriptor of the XBA approach?
(a) Time-efficient
(b) Theory-focused
(c) Test kit-focused
(d) Empirically supported

9. All of the following narrow abilities/processes fall under Gc except
(a) Listening Ability (LS).

(b) Language Development (LD).
(c) Lexical Knowledge (VL).
(d) English Usage Knowledge (EU).

10. When conducting XBA, it is important to select tests from a limited number of
batteries. True or False?

Answers: 1. True; 2. b; 3. c; 4. a ; 5. b; 6. c; 7. True; 8. c; 9. d ; 10. True

Two

HOW TO ORGANIZE A CROSS-BATTERY ASSESSMENT

OVERVIEW

This chapter describes the fundamental principles for organizing Cross-Battery Assessments (XBAs)
. Clear, step-by-step instructions of the approach are presented that allow practitioners to organize
subtests and batteries that are more appropriate to particular referral concerns and purposes of
assessment. To assist practitioners in conducting XBAs, the Cross-Battery Assessment Data
Management and Interpretive Assistant (XBA DMIA) is introduced. The XBA DMIA is included on
the CD-ROM that accompanies this book.
Chapter 1 described how contemporary CHC theory and existing CHC XBA test classifications
have influenced the development of all current intelligence batteries. Although none of these batteries
measures the full range of broad and narrow abilities/processes specified by the theory, all provide
comprehensive measurement of many CHC abilities/processes and most represent a significant
improvement over their predecessors. Some intelligence batteries are more comprehensive than
others in terms of CHC abilities /processes measured (e.g., the WJ III is the most comprehensive).
Other intelligence batteries, though less comprehensive, offer unique features that are important for
evaluating certain children (e.g., the DAS-II is particularly effective for evaluating preschoolers and
the K ABC-II is particularly effective for evaluating children who are from culturally and
linguistically diverse backgrounds). The purposes of this chapter are to: (a) demonstrate the utility of
each battery in the measurement of broad and narrow CHC abilities/processes; (b) provide steps for
augmenting any given intelligence battery so that the abilities/processes not measured by the battery
are included in the assessment; and (c) encourage practitioners to select an intelligence battery on a
case-by-case basis, because no single battery is sufficient to address all referral needs and concerns.
UTILIZATION OF SPECIFIC REFERRAL INFORMATION

Referral information should inform decisions about test selection and organization. There are three
basic scenarios that best highlight how such information affects the decision-making process
regarding test selection and organization within the XBA framework.
The first scenario relates to the need to evaluate the relationship between an individuals manifest
performance (e.g., academic skills) and cognitive abilities/processes. This is often the situation in
evaluations conducted in accordance with the Individuals with Disabilities Education Improvement
Act (IDEA 2004) that seek to determine the presence of a disability that may be used to establish
eligibility for special education programs and services. For example, if there are concerns with
reading skills, practitioners should review current research that provides evidence linking particular
cognitive abilities/processes with reading achievement. The practitioner should then ensure that
measures of these specific cognitive abilities/processes (both broad and narrow) are included in the
initial assessment. Flanagan, Ortiz, Alfonso, and Mascolo (2006) reviewed more than 2 decades of
research on the relations between cognitive abilities/processes and reading, math, and written
language achievement. A summary of their findings is presented in Table 2.1.
Table 2.1 shows the CHC broad ability/processing domains, and the narrow abilities that comprise
them, that are important in understanding reading achievement. For example, a referral related to
suspected problems in reading should seek to examine those broad and narrow abilities/processes that
are strongly related to the acquisition and development of reading skills. Table 2.1 shows that Ga
(Phonetic Coding), Gc (primarily Language Development [LD], Lexical Knowledge [VL], and
Listening Ability [LS]), Glr (Naming Facility [NA]), and Gs (Perceptual Speed [P]), are all important
predictors of reading achievement, particularly during the elementary school years. Other
abilities/processes also show a relationship to reading, albeit not as strong as those just mentioned.
For example, Gf (Induction [I] and General Sequential Reasoning [RG]) demonstrate a moderate
relationship to reading and mostly to reading comprehension. Likewise, in the visual processing (Gv)
domain, some narrow abilities/processes, such as Visual Memory (MV) and perhaps visual
discrimination or form constancy, demonstrate a moderate relationship to reading but primarily with
respect to orthographic processing. Given that these empirically established relations between certain
CHC abilities/processes and reading achievement are well documented, it seems that evaluations
designed to address referral concerns that are related to reading should be responsive to those areas
that may be the potential cause of the observed learning difficulties. Evaluations that assess
abilities/processes related to reading with a broad stroke may well fail to uncover weaknesses in the
more narrowly defined abilities/processes that may explain the reading deficit.

Table 2.1 Summary of Findings on Relations Between CHC Abilities/Processes and Academic
Achievement

Table 2.1 also provides practitioners with information that assists in focusing evaluations on
abilities/processes related to mathematics and writing achievement. For example, practitioners who
evaluate individuals referred on the basis of difficulty with mathematics probably have little need to
assess Auditory Processing (Ga). Although Ga may well have some effect in mathematics
performance, its influence is much more evident in the area of reading. In general, Ga does not
contribute significantly to the explanation of mathematics achievement. Conversely, Visual
Processing (Gv) appears to be important in mathematics achievement, particularly for higher level
skills (e.g., geometry). In sum, the information in Table 2.1 helps practitioners determine where
efforts and resources can be placed best for maximum effectiveness in any given evaluation.
DONT FORGET

Review research on relations between cognitive abilities/processes and achievement to ensure a
comprehensive evaluation tailored to referral concerns.

The second scenario that illustrates the effect of referral concerns on decision making occurs when
attention must be paid to practical or legal considerations. With respect to practical considerations, it
is unreasonable to expect that every practitioner possess every published test or that every practitioner
have expertise in administering all tests. Therefore, decisions regarding test selection and
organization will be directly influenced by this reality. For example, of the seven major cognitive
batteries the KABC-II may be considered the best one for testing a young child who has been exited
from an English as a Second Language (ESL) program in 5th grade, but who nevertheless is rapidly
falling behind his/her classmates in most academic areas. However, if the KABC-II is not available to
the practitioner then a battery such as the SB5, because of its verbal-nonverbal structure, may be a
viable alternative for this student.
With respect to legal considerations, there are times when federal or local regulations mandate that
certain types of data should be collected (e.g., IQ or global ability scores from intelligence batteries).
This most often occurs in assessments that are conducted for the purpose of gathering data to inform
decisions regarding special education eligibility. In this circumstance the practitioner may find it
necessary to obtain the required score even though it may not be directly relevant to XBA purposes
(see Flanagan et al., 2006, and Flanagan & Ortiz, 2001, for a comprehensive discussion). However,
because local education agencies can no longer require an ability (IQ) -achievement discrepancy for
SLD determination (34 CFR 300.307 [a]), the practice of giving certain tests for the sole purpose of
generating an IQ should soon cease.
The third scenario in which decisions regarding test selection and organization are affected by
specific referral concerns involves testing individuals who possess characteristics that set them apart
from the mainstream. For example, practitioners are often called upon to assess the abilities of
individuals who have sensory or perceptual impairments (e.g., deafness, blindness), who have fine-
motor impairments (e.g., individuals with cerebral palsy, tremors, seizure activity), or who come
from culturally and linguistically diverse backgrounds. Obviously, if an individual is unable to
manipulate objects because he or she cannot see them or cannot hold them, then test selection and
organization will be affected significantly. Such decisions are not, of course, specific to conducting
XBAs. An individuals unique characteristics must be considered appropriately before selecting tests
for any evaluation. In the case of individuals who are culturally and linguistically diverse, the
Culture-Language Test Classifications of the XBA approach can be utilized in order to make
decisions that respond directly to issues of limited English proficiency as well as acculturation (see
Chapter 5). These particular procedures allow practitioners the opportunity to construct XBAs that are
empirically based and tailored to specific referral concerns related to individual culture and language
variables (also see Chapter 7 for a case study illustration).
INTEGRATING GUIDING PRINCIPLES WITH DECISION MAKING

Organization and selection of tests is a process that is integrated within the context of the XBA
guiding principles. Practitioners must review information on several aspects of available tests in
order to make appropriate decisions regarding final organization and selection. When a decision is
made to gather data from more than one battery, practitioners should review the XBA guiding
principles presented in Chapter 1.
THE CROSS-BATTERY ASSESSMENT DATA MANAGEMENT AND
INTERPRETIVE ASSISTANT (XBA DMIA)

To facilitate the scoring and interpretation of data gathered from XBAs, an automated program has
been created for practitioner use. This program, XBA DMIA, is found on the CD-ROM that
accompanies this book. The program will open to an Introduction ( Intro ) screen (see Figure 2.1).
The Intro tab includes general information about how to use the program and provides a place for
entering the examinees name, date of birth, and date of evaluation. The program automatically
calculates the examinees chronological age. The next tab, labeled Notes, provides information
about various aspects of the program. The information contained on this tab is meant to clarify any
issues and answer any questions that may arise when navigating through the program. Following the
Notes tab are tabs for each of the seven major intelligence batteries, including the WISC-IV, WAIS-
III, WPPSI-III, KABC-II, WJ III, SB5, and DAS-II. Each intelligence battery tab provides a template for
entering data from an examiner s scored test record. Figure 2.2 shows the WPPSI-III template with
data entered for a 6-year-old. A detailed explanation of program functions for each intelligence
battery tab is presented in Chapter 3.
In addition to intelligence battery tabs is a CHC tab that includes 10 broad CHC
abilities/processes, namely Gc, Gf, Gsm, Gs, Gq, Glr, Gv, Ga, Grw-W, and Grw-R. This tab includes the
subtests of the seven major intelligence batteries as well as numerous other cognitive and academic
tests. All the tests included on this CHC tab are also listed in Appendix B at the end of this book. Each
CHC ability/process included on this tab allows for the selection of up to three subtests for the
calculation of a broad or narrow ability/process. For example, Figure 2.3 shows three CHC broad
domains (Gsm, Glr, and Gq) and their corresponding subtest scores that have been entered on the tab.
This tab is typically utilized when one of the intelligence batteries selected for assessment needs to be
supplemented. In addition, this tab is used when follow-up assessment is warranted. The manner in
which this tab is used in the data organization and interpretation process is described in Chapter 3.
Also, the criteria used by the program to calculate broad and narrow ability/processing clusters is
explained in Chapter 3.

Figure 2.1 Introduction Page in the XBA DMIA on the CD-ROM

Figure 2.2 WPPSI-III Tab in the XBA DMIA on the CD-ROM

Figure 2.3 CHC Tab in the XBA DMIA on the CD-ROM

Additional tabs correspond to individual graphs for each intelligence battery as well as one for data
that are entered into the CHC tab. Figure 2.4 shows the graph that depicts the scores entered on the
WPPSI-III tab as illustrated previously in Figure 2.2. Note that although entry of the FSIQ is provided
on the WPPSI-III tab, this score is not included in the graph because it is not relevant to XBA purposes
and has extremely limited utility in the evaluation of SLD as well as individuals from diverse cultural
and linguistic backgrounds. However, the WPPSI-III tab provides six additional spaces at the bottom
where additional composites may be entered and which will subsequently appear on the WPPSI-III
graph. This option is intended primarily to allow for the graphing of composites derived through
XBA methods, such as when supplemental testing is necessary, but any scores of interest may be
entered there, including global ability composites (e.g., WPPSI-III FSIQ, WISC-IV GAI, DAS-II GCA,
WJ III GIA). In addition, Figure 2.5 provides another example of the resulting graph derived from
scores that were entered on the CHC tab as depicted previously in Figure 2.3.

Figure 2.4 WPPSI-III Graph in the XBA DMIA on the CD-ROM
Note: Bars represent the obtained standard score 1 SEM (5 points for indexes and clusters and 7
points for subtests), thus yielding an approximate 68 % confidence interval for the values.

IMPLEMENTING THE XBA APPROACH STEP BY STEP

Step 1: Selection of an Intelligence Battery
The first step of the XBA approach requires selecting an intelligence battery that is appropriate and
responsive to several factors, including age and developmental level of the examinee; English
language proficiency of the examinee; the specific referral concerns; and so forth. As such, although
a test like the WJ III may be appropriate for a relatively bright and articulate seventh-grader who is
experiencing difficulties in math and science, it may not be the best instrument of choice for a third-
grader, who is an English Language Learner and who is significantly behind her classmates in all
academic areas, despite the fact that the WJ III provides the most comprehensive coverage of CHC
abilities/processes. This is because many of the WJ III subtests have relatively high receptive
language demands (e.g., Analysis-Synthesis and Concept Formation). In the case of this third-grader
then, an intelligence battery such as the K ABC-II may be more appropriate because its language
demands and cultural loadings are generally lower than those associated with the WJ III.

Figure 2.5 CHC Broad Ability Graph in the XBA DMIA on the CD-ROM
Note: Bars represent the obtained standard score 1 SEM (5 points for indexes and clusters and 7
points for subtests), thus yielding an approximate 68% confidence interval for the values.

Step 2 : Identify the CHC Broad Abilities that Are Measured by the Selected Intelligence Battery
Rapid Reference 2.1 provides a summary of the CHC broad ability constructs that are measured by the
seven intelligence batteries referred to previously. The notation adequate means that the battery
contains at least two qualitatively different indicators of the broad ability/process. For example, Rapid
Reference 2.1 denotes adequate for Gc on the WISC-IV. This is because the WISC-IV Verbal
Comprehension Index, a Gc analogue, is comprised of three subtests (Vocabulary, Comprehension,
and Similarities), each of which measures different aspects of (or narrow abilities/processes that
define) Gc. The notation underrepresented means that only one narrow aspect of the broad
ability/process is measured by the intelligence battery. For example, Gsm is underrepresented on the
KABC-II because the subtests that comprise the Sequential/Gsm Scale measure mainly Memory Span
a narrow Gsm ability/process. Finally, the notation not measured means that the intelligence battery
does not contain subtests that measure any aspect of the broad ability /process. If the selected battery
does not allow for adequate measurement of the broad abilities/processes considered most germane
in light of the referral concerns, then it will need to be supplemented.
Rapid Reference 2.1

Representation of Broad CHC Abilities/Processes on Seven Intelligence Batteries

DONT FORGET

Adequate representation of a broad CHC ability/process means that it is measured by at least two
qualitatively different indicators. For example, Gf is adequately represented by a subtest
measuring inductive reasoning and a subtest measuring general sequential (or deductive)
reasoning.

Table 2.2 Examples of XBAs for Seven Intelligence Batteries

Table 2.2 provides examples of XBAs for seven intelligence batteries. As can be seen in this table,
only the WJ III provides adequate coverage of all seven CHC broad cognitive abilities/processes.
Because the WJ III includes qualitatively different measures of all broad ability/processing domains,
it was used to supplement most other intelligence batteries in Table 2.2. For example, when the WISC-
IV is the battery selected for assessment, it can be supplemented with the WJ III in the areas of Glr and
Ga. Likewise, when the DAS-II is the battery selected for assessment it can be supplemented with the
WJ III in the areas of Ga and Gs. At times however, the WJ III may not be considered the best
supplemental battery. For example, Table 2.2 shows that the WPPSI-III was supplemented in the areas
of Gsm and Glr with subtests from the DAS-II, instead of the WJ III, because the DAS-II is a better
instrument overall for very young children (e.g., 3-6 years old). An alternative to the use of a
comprehensive battery to supplement the WPPSI-III in the area of memory is a special purpose test.
For example, the CMS would also be suitable for supplementing the WPPSI-III in the areas for Glr
and Gsm. In the area of Ga, the WJ III was used to supplement the WPPSI-III simply because the DAS-
II does not include adequate representation of this broad domain.3
In general, when examiners are interested in a comprehensive evaluation that samples functioning
in all CHC broad ability/processing domains, the XBA examples provided in Table 2.2 are considered
sufficient for an initial evaluation. Further assessment may or may not be necessary following the
administration, scoring, and interpretation of the initial cross-battery (e.g., WISC-IV battery and
supplemental tests in the areas of Ga and Glr). Noteworthy is the fact that even though two
qualitatively different indicators are included for each broad cognitive construct in an initial
assessment, it may be necessary to follow up on significant differences between subtest scores within
a broad ability/processing domain. The XBA interpretive guidelines, which spell out when further
assessment may be necessary, are discussed in Chapter 3.
Step 3: Identify the CHC Narrow Abilities that Are Measured by the Selected Intelligence
Battery

This step is necessary when the nature of the referral concerns warrants or necessitates measurement
of specific or narrow cognitive abilities/processes. For example, Table 2.1 showed that several
narrow abilities/processes are important in understanding reading, math, and writing achievement.
Therefore, when referrals are specific to difficulties in one or more of these academic areas,
assessments should include measurement of the narrow cognitive abilities/processes listed in Table
2.1. Rapid References 2.2, 2.3, and 2.4 may be used to determine which subtests from the seven major
intelligence batteries measure narrow abilities/processes that are germane to reading, math, and
writing referrals, respectively.
Rapid Reference 2.2

Sample of Subtests that Measure CHC Narrow Abilities/Processes that are Related
Significantly to Reading Achievement

Rapid Reference 2.3

Sample of Subtests that Mesure CHC Narrow Abilities/Processes that are Related
Significantly to Math Achievement
Rapid Reference 2.4

Sample of Subtests that Measure CHC Narrow Abilities that are Significantly Related to
Writing Achievement

Rapid Reference 2.2 shows that the WJ III (cognitive battery) has at least one subtest that measures
each of the narrow abilities/processes considered important for reading achievement, except
Listening Ability (LS). However, this narrow ability/process is assessed by the WJ III achievement
battery. Therefore, in an initial reading referral, administration of tests from the WJ III cognitive and
achievement batteries is sufficient. That is, there is no need to supplement the WJ III. However, if one
or more WJ III subtest standard scores representing a single narrow ability that is not part of an
interpretable broad ability cluster falls in the below-average range of functioning, then there will be a
need to supplement this battery to determine the individuals true ability in the narrow
ability/processing domain(s). This is because the WJ III (and all other intelligence batteries) does not
include more than one measure of most narrow abilities/processes. Chapter 3 provides an in-depth
discussion of interpretation of narrow ability/processing functioning and illustrates the instances in
which additional measures of specific narrow abilities /processes need to be administered following
an initial evaluation.
As another example, if the WISC-IV was chosen as the core battery in assessment, then five narrow
abilities/processes (i.e., LS, Phonetic Coding: Analysis [PC:A], Phonetic Coding: Synthesis [PC:S],
Naming Facility [NA], Associative Memory [MA]) would need to be assessed by a supplemental
battery to ensure adequate coverage of all narrow abilities/processes important in understanding
reading achievement. Noteworthy is the fact that the WIAT-II, which is linked to the WISC-IV, has
subtests that measure the narrow abilities/processes of LS and PC:A. Thus, when the WISC-IV and
WIAT-II are used in an evaluation of suspected reading disability, evaluators will only need to gather
data in the areas of PC:S, NA, and MA from another battery.
As mentioned earlier, many variables must be considered prior to selecting an intelligence battery.
While the WJ III may be the battery of choice for one 8-year-old with reading difficulties, it would
not be the battery of choice for all 8-year-olds with reading difficulties. This is because intelligence
tests differ with regard to the extent to which they are engaging to young children, the amount of
receptive language requirements needed to comprehend subtest directions, the level of expressive
language necessary on the part of the examinee to demonstrate success, the extent to which exposure
to mainstream U.S. culture is necessary for success, and so forth. Therefore, when selecting an
intelligence battery, evaluators should consider a number of factors above and beyond the broad and
narrow CHC abilities/processes measured by the instrument. It is also important to recognize that
Table 2.2 provides examples only. There are other combinations of tests that may be more
appropriate for a given child than the ones listed in Table 2.2. Therefore, practitioners should become
familiar with a variety of tests that may be used to supplement their intelligence battery of choice.
Figures 2.6 to 2.12 include illustrations of the broad abilities/processes that are measured by the
WISC-IV, WAIS-III, WPPSI-III, KABC-II, WJ III, SB5, and DAS-II, respectively. At the center of each
figure is the total test score yielded by the instrument (e.g., Figure 2.9 shows that the K ABC-II yields
a Fluid-Crystallized Index, or FCI). The subtests listed in the white rectangles in each figure are those
that comprise the total test score of the battery. For example, the ten subtests reported in the white
rectangles in Figure 2.6 comprise the WISC-IV FSIQ. Subtests listed in the striped rectangles in
Figures 2.6 to 2.12 are the supplemental or optional subtests included in the battery. For example,
Figure 2.7 shows that the WAIS-III has two supplemental subtests, namely Picture Arrangement and
Comprehension. Supplemental or optional subtests typically serve two purposes: (a) they may be used
under certain circumstances in place of a subtest that has been spoiled; and (b) they may be used to
gather additional information about an examinees functioning. In some instances, administration of
the supplemental or optional subtests on a battery may provide a clearer explanation of the
abilities/processes measured by the battery. For example, by administering the Picture Completion
subtest on the WISC-IV, practitioners are able to derive separate clusters for Gv (based on Picture
Completion and Block Design) and Gf (based on Picture Concepts and Matrix Reasoning; see
Flanagan & Kaufman, 2004). The calculation of these WISC-IV Gv and Gf clusters is done by the
XBA DMIA.
Figures 2.6 to 2.11 also include gray rectangles that include the phrase See XBA DMIA. This
designation means that the battery will need to be supplemented in the area listed in the corresponding
CHC oval if in fact measurement of those abilities/processes is deemed necessary vis--vis referral
information. For example, Figure 2.6 shows that both Ga and Glr are not measured by the WISC-IV.
Therefore, practitioners can go to the XBA DMIA on the CD-ROM, click on the CHC tab, and identify
numerous tests that measure these abilities/processes. Appendix B includes a list of the tests that
comprise the drop-down menus in the CHC tab in the XBA DMIA. The figure for the WJ III (Fig.
2.12) is included but not discussed above because the WJ III does not require supplementation in the
initial assessment of cognitive abilities.

Figure 2.6 Illustration of the CHC Broad Abilities/Processes Measured by the WISC-IV

Figure 2.7 Illustration of the CHC Broad Abilities/Processes Measured by the WAIS-III

Figure 2.8 Illustration of the CHC Broad Abilities/Processes Measured by the WPPSI-III

Figure 2.9 Illustration of the CHC Broad Abilities/Processes Measured by the KABC-II

Figure 2.10 Illustration of the CHC Broad Abilities/Processes Measured by the SBS

Figure 2.11 Illustration of the CHC Broad Abilities/Processes Measured by the DAS-II

Figure 2.12 Illustration of the CHC Broad Abilities/Processes Measured by the WJ III

DONT FORGET

Test Selection

When selecting an intelligence battery, evaluators should consider the following:
a. the extent to which the battery is engaging to young children;

b. the amount of receptive language requirements needed to comprehend subtest directions;
c. the level of expressive language necessary on the part of the examinee to demonstrate
success; and
d. the extent to which exposure to mainstream U.S. culture is necessary for success.

In sum, Table 2.2 provides basic examples of XBA using mainly the WJ III as a supplement.
Alternatively, Figures 2.6 to 2.12 highlight the CHC broad abilities/processes that are measured by
intelligence batteries and direct the practitioner to the XBA DMIA for a comprehensive list of
potential supplemental tests for any ability/process not measured by the battery. Because there are
numerous tests from which to choose when constructing an appropriate battery for any given
examinee, clinical ingenuity, judgment, and experience remain important and necessary components
of competent, defensible, and sound assessment and interpretation practices.
Step 4 : Administer and Score Selected Intelligence Battery and Supplemental Tests
There are no unique administration or scoring instructions associated with XBAs to be followed apart
from those already specified by the test publishers. Practitioners should incorporate both general
testing and scoring considerations applicable to the use of standardized tests as well as the specific
guidelines provided by test publishers in the manuals of any tests that are used. Note that issues
pertaining to administration order of subtests is addressed in Chapter 6.
Step 5: Enter Scores into the Cross-Battery Assessment Data Management and Interpretive
Assistant (XBA DMIA)

Open the XBA DMIA and select the tab that corresponds to the core intelligence battery administered.
For example, if the K ABC-II was administered, then the tab corresponding to this battery should be
selected. Enter the subtest and composite scores from the examinees test record in the appropriate
cells. Note that scores may only be entered into cells with red borders.
After entering scores for the primary intelligence battery, select the tab that corresponds to your
supplemental battery. For example, if you supplemented the K ABC-II with tests from the WJ III, then
select the WJ III tab. Enter all relevant WJ III subtest, cluster, and composite scores. Alternatively, if
one of the major intelligence tests was not used to supplement your primary battery, then select the
CHC tab and enter data into the appropriate CHC area. For example, if the CTOPP was used to
supplement the KABC-II in the area of Ga, then you would enter the appropriate CTOPP subtest
scores in the Ga spaces provided on the CHC tab. The manner in which the program determines
interpretable composites for the intelligence batteries and the criteria used to calculate CHC broad
and narrow ability/processing clusters on the CHC tab are described in detail in the next chapter.
SUMMARY

By carefully augmenting a primary intelligence battery, measurement of a wider range of broad and
narrow CHC abilities/processes can be accomplisheda result that cannot be achieved through the
administration of a single intelligence battery. The foundational sources of information upon which
the XBA approach was built (viz., CHC theory and classification of all tests according to the CHC
abilities/processes they measure) provide a means to systematically construct a theoretically driven,
comprehensive, and valid assessment. When the XBA approach is applied, it is possible to measure
important abilities/processes that might otherwise go unassessed or that may be poorly assessed (e.g.,
Ga, Glr, Naming Facility, Working Memory) abilities that are important in understanding many
educational, vocational, and occupational outcomes.
The XBA approach allows for effective measurement of the broad and narrow CHC
abilities/processes, with emphasis on those considered most critical on the basis of history,
observation, and available test data. The CHC classifications of a multitude of cognitive
ability/processing tests bring strong content and construct validity evidence to the evaluation and
interpretation process. With a strong research base, the XBA approach can aid practitioners not only
in the comprehensive measurement of cognitive abilities/processes, but in the selective measurement
of abilities/processes that are deemed to be most important with respect to the examinees presenting
problem(s). Adherence to the guiding principles and steps of the XBA approach as well as careful
attention to specific referral concerns results in the creation of highly individualized assessment
batteries that are ideally suited for the intended purpose of assessment.
REFERENCES

Elliot, C. (2006). Differential Ability Scales-Second Edition (DAS-II). San Antonio, TX: PsychCorp.
Flanagan, D. P., & Kaufman, A. S. (2004). Essentials of WISC-IV assessment. New York: Wiley.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo, J. T. (2006). Achievement test desk reference:
Individuals with Disabilities Education Improvement Act of 2004 (IDEA). (2004). Pub. L. No. 108-446,
118 Stat. 2647.
Kaufman, A. S., & Kaufman, N. L. (2004a). Kaufman Assessment Battery for Children- Second Edition.
Kaufman, A. S., & Kaufman, N. L. (2004b). Kaufman Test of Educational Achievement- Second
Edition. Circle Pines, MN: AGS Publishing.
U.S. Department of Education. (2005). 34 CFR Parts 300, 301 and 304 Federal Register, June 21, 2005.
Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (1999). Comprehensive Test of Phonological
Processing. Austin, TX: Pro-Ed.
Antonio, TX: The Psychological Corporation.
Abilities. Itasca, IL : Riverside Publishing.
TEST YOURSELF

1. The Cross-Battery Assessment Data Management and Interpretive Assistant (XBA

DMIA) is an interpretation and report writing software program. True or False?
2. The three basic scenarios that best highlight how referral information affects the
decision-making process regarding test selection and organization within the XBA
framework are all of the following except
(a) the relationship between an individuals manifest performance (e.g., academic

skills) and cognitive abilities/processes.
(b) availability of tests that allow for ability-achievement discrepancy analysis.
(c) testing individuals who possess characteristics that set them apart from the
mainstream.
(d) practical or legal considerations.

3. The first step of the XBA approach requires
(a) a working knowledge of the XBA DMIA.

(b) an understanding of the three pillars and guiding principles of XBA.
(c) selecting an intelligence battery.
(d) identifying the CHC broad abilities/processes that are measured by the selected
intelligence battery.

4. The narrow abilities/processes that are related to math achievement include all of the
following except
(a) inductive reasoning.

(b) perceptual speed.
(c) phonetic coding.
(d) listening ability.

5. Intelligence tests differ with regard to the extent to which they are engaging to
young children, the amount of receptive language requirements needed to
comprehend subtest directions, the level of expressive language necessary on the
part of the examinee to demonstrate success, and the extent to which exposure to
mainstream U.S. culture is necessary for success. True or False?
6. If a second-grade child has been referred for an evaluation due to difficulties in basic
reading skills, you should be certain to include measures of the following narrow
abilities except
(a) phonetic coding.

(b) deductive reasoning.
(c) memory span.
(d) lexical knowledge.

7. Current intelligence batteries generally measure a broader range of cognitive
abilities/processes as compared to their predecessors. True or False?
8. The best intelligence battery to use with a fourth-grade child who has been referred
for written language difficulties is
(a) the WJ III.

(b) the KABC-II.
(c) the WISC-IV.
(d) too difficult to determine without additional information.

9. The XBA DMIA
(a) allows practitioners to enter all scores for each of seven intelligence batteries.
(b) allows practitioners to enter scores for all broad CHC abilities except Gt.
(c) provides a graphic depiction of data entered.
(d) all of the above

10. The Culture-Language Test Classifications of the XBA approach can be utilized to
make decisions that respond directly to issues of limited English proficiency as well as
acculturation. True or False?

Answers: 1. False; 2. b; 3. c; 4. c; 5. True; 6. b; 7. True; 8. d; 9. d ; 10. True

Three

HOW TO INTERPRET TEST DATA

Test data are largely meaningless unless they can be interpreted in a manner that is both theoretically
and psychometrically defensible. The XBA approach includes a set of interpretive guidelines that
allows practitioners to interpret data from one or more batteries from CHC theory and research using
psychometrically defensible methods. Because the XBA approach represents an advancement over
traditional assessment practices in terms of both measurement and meaning it has informed the
interpretive approaches of widely used intelligence batteries (e.g., WISC-IV, KABC-II, SB5; see
Alfonso, Flanagan, & Radwan, 2005, Alfonso & Flanagan, 2006, and Flanagan & Kaufman, 2004, for
details).
In this chapter, we provide guidelines for practitioners to follow that allow test data obtained from
single battery and cross-battery assessments to be interpreted according to CHC theory. Because of
the close link between science and practice inherent in the XBA approach, practitioners can have
confidence that their collected data reliably and validly measure the abilities/processes of interest,
thereby making interpretation relatively clear and straightforward. In addition, interpretation adheres
strictly to sound psychometric and statistical precepts that establish the basis for comparative
evaluations of test performance, such as inter-individual (or population-relative) analysis of broad
and narrow cognitive and academic abilities/processes. Interpretation must not, however, be thought
of as a separate or distinct endeavor from measurement. Rather, measurement and interpretation are
related, and each influences the other in many ways.
In order to interpret data properly, the manner in which measurement and interpretive processes
are related must be specified. To this end, interpretation of test data is embedded in a broader
conceptual framework for assessment that relies on the generation and testing of functional
assumptions or hypotheses about expected performance. In general, both a priori and a posteriori
assumptions are incorporated into the interpretive approach to control for confirmatory bias
(explained in the following section). This chapter begins with a discussion of an hypothesis-driven
framework and its relationship to the iterative nature of measurement and interpretation. Next,
specific guidelines are described that allow practitioners to make defensible interpretations of any
and all data entered into the Cross-Battery Assessment Data Management and Interpretive Assistant
(XBA DMIA) introduced in Chapter 2.
HYPOTHESIS-DRIVEN ASSESSMENT AND INTERPRETATION

Inherent in the XBA approach is the value of conducting assessments from a broad, comprehensive
framework, and the recognition that measurement methods, however precise, might form only a part
of the entire scope of assessment-related activities. In general, XBA methods are used in cases for
which standardized testing has been deemed necessary. When standardized testing is to be carried out,
practitioners should adhere to guidelines based on a philosophy of hypothesis generation and
hypothesis testing. Although psychometric data may seem to be rather objective, interpretation of
such data is anything but an unambiguous exercise. Therefore, in order to reduce the chances of
drawing incorrect inferences from test data on the basis of preconceived ideas, hypothesis generation
and hypothesis testing are necessary and crucial components of the XBA approach.
Confirmatory bias occurs when an examiner begins with preconceived notions regarding expected
performance on a test. After the test is administered and the data are collected, the examiner reviews
the data, looking specifically for patterns and results that support the preconception. In other words,
the examiner becomes predisposed to seeing only those patterns in the data that support the prevailing
assumption, and tends to minimize or reject data that are counter to the assumption (Sandoval, Frisby,
Geisinger, Scheuneman, & Grenier, 1998). In order to reduce the tendency to see patterns of disability
and dysfunction in data where, in fact, none exist, diagnostic interpretation should not begin with the
presumption of preexisting deficits. Rather, interpretation of test data should be guided by the
assumption that the examinee is not impaired and that his or her performance on tests (e.g., subtests,
clusters, IQ) will be within the average range or within normal limits. The average range or normal
limits is defined as 1 SD from the normative mean (i.e., standard scores ranging from 85-115,
inclusive). The assumption of average range performance represents the null hypothesis, which is
evaluated to determine whether it should be retained or rejected in favor of an alternative hypothesis
(i.e., performance is not average or not within normal limits).
Adoption of the stance that performance will be within normal limits, until and unless convincingly
contraindicated by the data, reduces the chance that examiners will view standardized test data only in
a manner that corroborates the beliefs they had prior to testing. It is important to note that even when
external factors have been ruled out as the primary cause of observed difficulties (e.g., poor academic
performance), it cannot be concluded automatically that an internally based disability is present (e.g.,
Specific Learning Disability). In every case, the null hypothesis must be assumed. Notwithstanding,
practitioners can and will entertain thoughts of dysfunction. After all, if standardized testing is being
contemplated, then it is very likely that the examiner has already been prompted by the possibility that
a disability exists. However, a clear distinction must be drawn between the specific hypotheses that are
to be evaluated and the opinions, conjecture, or suppositions of the examiner. Only the hypotheses
specified a priori or a posteriori are actually tested and evaluated directly in light of the data; opinion,
conjecture, and suspicion are not. Consequently, unless and until the data strongly suggest otherwise,
the null hypothesis that performance is within normal limits must not be rejected, no matter how
strong the examiner s belief to the contrary.
When the null hypothesis is rejected in favor of the alternative hypothesis, an examiner can be
certain that (a) the data do not support the notion that performance is within normal limits and (b)
performance is in all likelihood outside of the range of normal limits. Accepting the alternative
hypothesis, however, does not provide de facto support for the presence of a disability; rather, it
means only that the individuals performance was not due to chance. The specific reasons for this
level of performance should be investigated further and corroborated by additional data sources (e.g.,
review of school records, work samples, observations, diagnostic interviews).
INTEGRATING HYPOTHESIS TESTING AND INTERPRETATION

The following discussion is meant to assist practitioners in understanding the various stages of the
XBA approach as they apply to interpretation. These stages are illustrated in Figure 3.1. It is assumed
that the assessment and interpretation process described in this figure begins only when a focused
evaluation of cognitive abilities/processes through standardized testing is deemed necessary. The
assessment and interpretive process requires careful evaluation of case history information (e.g.,
educational records, response to intervention, authentic measures of achievement, medical records);
the inclusion of data from relevant sources (e.g., parents, siblings, teachers, friends, employers); and
the framing of an individuals difficulties within the context of CHC theory and research. No matter
how compelling the results from the administration of a single cognitive or achievement battery or
combination of batteries, test data alone should not be used to make definitive diagnostic decisions.
The reader should refer to Figure 3.1 often, as it will serve as the framework from which the stages
of XBA and interpretation are discussed.

Figure 3.1 Framework for XBA and Interpretation

Stage A: CHC Theory and Research Knowledge Base
In order to organize XBAs and interpret test data, practitioners must possess (or rely on) a knowledge
base composed of CHC theory and its expansive research foundation. This is Stage A of the XBA and
interpretive framework, which is represented by the shaded area in Figure 3.1. Meaningful and
defensible interpretation of test data, whether from a single intelligence battery or from XBAs,
requires knowledge of contemporary theory and research (see Rapid Reference 3.1). Such
information is critical in the early stages of assessment because it provides the foundation from
which to specify the relations between manifest academic performance deficits and suspected
underlying cognitive ability/processing deficits. Logical deductions and presuppositions that emanate
from current theory and research allow for the formation and subsequent testing of a priori
hypotheses.
Rapid Reference 3.1

Meaningful and Defensible Interpretation of XBA Data Necessitates Knowledge of:
1. The principles and procedures that underlie the XBA approach (see Chapter 1).
2. The literature on the relations between cognitive abilities/processes specified by CHC
theory and specific academic outcomes (see Chapter 2).
3. The network of validity evidence that exists in support of the structure and nature of
abilities within CHC theory (see Appendix A).

Stage B: Specification of A Priori Hypotheses
The definition of a priori as it is used in the XBA approach is found in the American Heritage
Dictionary (1994), which defines the term as: From a known or assumed cause to a necessarily
related effect; deductive . . . based on theory rather than on experiment. Use of an a priori approach
forces consideration of research and theory because the clinician is operating on the basis of
research and theory when the hypothesis is drawn (Kamphaus, 1993, p. 167). By coupling case
history data and current information with knowledge of CHC theory and research (and perhaps with
information from other fields [e.g., literature on specific learning disabilities]) defensible
connections between academic achievement and cognitive abilities/processes can be made. For
example, when a student presents with reading difficulties, the CHC theory and research knowledge
base assists the practitioner in identifying the most salient broad and narrow abilities/processes that
are related to reading achievement (e.g., Ga-PC:A, Ga-PC:S, Gc-VL, Gc-K0, Gc-LD, Glr-NA; see
Chapter 2, Table 2.1). On the basis of this information, a practitioner can logically assume that if the
student indeed has cognitive ability /processing deficits that are related to (or that are the presumptive
cause of ) his or her reading difficulties, then such deficits are likely to be found through an
evaluation of the abilities /processes known to explain significant variance in reading achievement.
Note that although the practitioner has a suspicion that the individuals reading difficulties may be
related to deficits in certain cognitive abilities/processes (e.g., Ga, Gc, Glr ), the a priori hypothesis
remains null, specifying that expected performance on any ability /processing test is within normal
limits.
Stage C: Construction of Assessment Battery
After making a connection between an individuals presenting difficulties and related cognitive
abilities/processes, and after specifying a priori hypotheses, a practitioner may construct a battery of
tests in accordance with XBA principles and procedures as outlined in Chapters 1 and 2 of this book.
By ensuring that appropriate referral-relevant abilities/processes are measured and sufficient data are
gathered, interpretation is facilitated and meaningful and useful conclusions may be drawn from the
data.
DONT FORGET

Stages within the Framework for XBA and Interpretation

Stage A : CHC Theory and Research Knowledge Base

Stage B: Specification of A Priori Hypotheses
Stage C: Construction of Assessment Battery
Stage D: Administration and Scoring of Assessment Battery
Stage E : Interpretation and Evaluation of Hypotheses
Stage F: Specification of A Posteriori Hypotheses
Stage G : Psychological Report

Stage D: Administration and Scoring of Intelligence Battery and Supplemental Tests
Strictly speaking, this stage is not a component of the interpretive processes; however, it is a
necessary component of the overall process. Because XBA is an iterative process, inclusion of this
stage is required to delineate clearly that some assessment activities stem from the need to test initial
or a priori hypotheses and, as will be discussed shortly, others stem from evaluation of a posteriori
hypotheses. Thus, the process of administration and scoring is, more often than not, iterative in nature
and, depending on interpretation of the initial data collected, it may well be a process that is
accomplished more than once.
Stage E : Interpretation of Results and Evaluation of Hypotheses
This step comprises the heart of the interpretive process. It is at this point that the examiner is able to
accomplish several different levels of analysis. Such analysis includes evaluation of data yielded
from a single battery (e.g., WISC-IV or WAIS-III or K ABC-II or WJ III) and evaluation of data
yielded from more than one battery (i.e., XBA). Interpretations that are made within the context of the
XBA approach are based on inter-individual comparisons (i.e., population-relative comparison
against same age- or grade-level peers). Thus, an individuals performance on both broad and narrow
CHC abilities/processes is based on between individual (or normative) comparisons rather than
within individual (or person-relative) comparisons.
In general, the XBA approach is based on a hierarchical model of interpretation (presented in
Figure 3.2), which emphasizes interpretation of broad ability/processing constructs (e.g., Gf ) over
narrow ability/processing constructs (e.g., Induction, General Sequential Reasoning) because they are
typically more reliable and valid. That is, broad abilities/processes are represented by at least two
qualitatively different indicators (subtests) of the construct, whereas narrow abilities/processes are
typically represented by a single subtest.

Figure 3.2 Interpretive Levels in XBA

The broad ability/process of Gf is represented by three subtests in Figure 3.2, which measure
qualitatively different aspects of Gf (i.e., either Induction or General Sequential Reasoning).
Interpretation of Gf (referred to as Level I Interpretation in Figure 3.2) may be made when two
conditions are met: (a) two or more qualitatively different narrow ability/processing indicators
(subtests) of Gf are used to represent the broad ability/process; and (b) the broad ability/processing
cluster (Gf in this example) is considered unitary and, thus, interpretable. In general, a unitary
ability/process is represented by a cohesive set of scaled scores or standard scores, each reflecting
slightly different or unique aspects of the ability/process.
As may be seen in Figure 3.2, the WJ III contains two qualitatively different indicators of Gf (i.e.,
Analysis-Synthesis, which measures General Sequential Reasoning [RG] and Concept Formation,
which measures Induction [I]). When the difference between WJ III subtest standard scores is not
statistically significant, then the WJ III Gf cluster may be interpreted as a reliable and valid estimate of
this broad ability/process (see Level I interpretation in Figure 3.2). However, if the difference between
the Analysis-Synthesis and Concept Formation subtest standard scores is statistically significant, then
Gf cannot be considered to represent a unitary ability/process and, therefore, this cluster should not
be interpreted. The various interpretive options that may ensue at this point (including Level II
interpretation or interpretation at the narrow ability/processing level; see Figure 3.2) are discussed
later in this chapter.
It is important to note that the XBA DMIA provides the user with information regarding the
interpretability of the clusters, scales, indexes, and IQs that comprise the seven major intelligence
batteries included in the program (i.e., WISC-IV, WAIS-III, WPPSI-III, KABC-II, WJ III, SB5, DAS-II).
Specifically, for all intelligence batteries, the program denotes either YES or NO in response to the
question, Is Composite Interpretable? Figure 3.3 is an illustration of the WISC-IV tab from the XBA
DMIA, using hypothetical data. As may be seen in this figure, the Verbal Comprehension Index (Gc) is
not interpretable; however, the Perceptual Reasoning (Gf/Gv), Working Memory (Gsm), and
Processing Speed (Gs) Indexes are interpretable. Rapid References 3.2 and 3.3 provide examples of
how to describe unitary and nonunitary ability/processing clusters, respectively, in a psychological
report. In addition, Figure 3.3 shows that the WISC-IV FSIQ is not interpretable for this set of scores.
As such, the GAI was calculated and reported to be interpretable.
DONT FORGET

Definition of a
Nonunitary Ability

When the variability among scores within a cluster is statistically significant or unusually large,
the cluster does not provide a good estimate of the ability/process it is intended to measure and,
therefore, is not interpretable. In other words, when a substantial difference between the scores
comprising a cluster is found, the cluster cannot be interpreted as representing a unitary
construct.
The criterion used to determine whether or not a cluster is unitary and, thus, interpretable varies by
battery as a function of the unique statistical qualities of the battery. In all instances, however, the
criterion used ensures that the designation of nonunitary or noninterpretable is based on the
finding of a statistically significant difference between the highest and lowest scores comprising the
composite. Table 3.1 lists the criteria used to determine whether a cluster (scale, index, or IQ) is
unitary, and thus interpretable, for each of the seven major intelligence batteries included in the XBA
DMIA. Noteworthy is the fact that the criteria used to determine a unitary cluster in the XBA DMIA are
based on general rules of thumb. For the precise critical values for determining unitary clusters for
all intelligence batteries by age the user is referred to Appendix F.

Figure 3.3 WISC-IV Tab of the XBA DMIA Highlighting Interpretable and Noninterpretable
Clusters
Rapid Reference 3.2

Example of How to Interpret a Unitary Cluster in a Psychological Report
The Processing Speed Index (PSI or Gs) represents Marias ability to perform simple, clerical-
type tasks quickly. Marias Gs ability was assessed with two tasksone required her to quickly
copy symbols that were paired with numbers according to a key (Coding), and the other required
her to identify the presence or absence of a target symbol in a row of symbols (Symbol Search).
The difference between Marias performances on these two tasks (Coding scaled score of 5
minus Symbol Search scaled score of 4 equals 1) was not statistically significant (i.e., it was not
5 points), indicating that her PSI is a good estimate of Gs. Maria obtained a PSI of 70 (66-81),
which is ranked at the 2nd percentile and is classified as Below Average/Normative Weakness.
Note: Criteria used to determine a unitary cluster in the XBA DMIA are reported in Table 3.1.

Rapid Reference 3.3

Example of How to Describe a Nonunitary Cluster in a Psychological Report
The Verbal Comprehension Index (VCI), a measure of Crystallized Intelligence (Gc), represents
Marias ability to reason with previously learned information. Gc ability develops largely as a
function of both formal and informal educational opportunities and experiences and is highly
dependent on exposure to mainstream U.S. culture. Marias Gc was assessed by tasks that
required her to define words (Vocabulary, scaled score = 7), draw conceptual similarities
between words (Similarities, scaled score = 9), and answer questions involving knowledge of
general principles and social situations (Comprehension, scaled score = 13). The variability
among Marias performances on these tasks was statistically significant (i.e., the scaled score
range was 5 points), indicating that overall Gc ability cannot be summarized in a single score
(i.e., VCI).
Note: Criteria used to determine a unitary cluster in the XBA DMIA are reported in Table 3.1.
Guidance in how to proceed when specific clusters are found to be nonunitary is offered in this
chapter.

Table 3.1 Criteria Used to Determine a Nonunitary or Noninterpretable Cluster for Seven
Intelligence Batteries in the XBA DMIA

When all clusters that compose an intelligence battery are found to be unitary at Stage E1, then
practitioners can readily determine whether their findings support or refute the null hypothesis at
Stage E2. However, when one or more clusters are found to be nonunitary and thus noninterpretable,
then it is often (but not always) necessary to gather additional data to test the null hypothesis. For
example, using the subtests in Figure 3.2, suppose that an individual earned standard scores of 108
and 83 on the Analysis-Synthesis and Concept Formation subtests, respectively. Because the
difference between these scores is 15 points (see Table 3.1), it is statistically significant, which
renders the Gf cluster nonunitary and, therefore, noninterpretable. Note that the lower of the two
scores in this comparison falls outside and below normal limits (i.e., <85), indicating a normative
weakness in Induction (i.e., the narrow ability/process measured by the subtest). However, before any
definitive conclusion can be made about the individuals performance in the area of Induction, at least
one other reliable measure of this narrow ability/process should be administered. Figure 3.2 shows
that the WAIS-III Matrix Reasoning subtest also measures Induction and, therefore, can be
administered. The individuals level of performance on Matrix Reasoning would then need to be
evaluated to determine whether it is consistent with or discrepant from the Concept Formation
standard score. If performance on Matrix Reasoning is consistent with performance on Concept
Formation then Level II interpretation would be warranted (i.e., interpretation of the narrow
ability/process of Induction; see Figure 3.2). Note that it is not always necessary to engage in Level II
interpretation when a statistically significant difference is found between the scores that represent
qualitatively different aspects of the broad ability/process. This situation as well as others are
discussed later in this chapter.
When the scaled scores for two subtests within the same broad ability/processing domain differ
significantly, then the broad ability/processing standard score should not be interpreted. In this
scenario, interpretations about the underlying narrow abilities/processes (e.g., General Sequential
Reasoning and induction in the Gf domain) also should not be made because there is only one
indicator (rather than two) for each narrow ability/process (see Level 2 example in Figure 3.2). If the
broad ability/processing domain was not a construct central to the referral, and if the scores from
both subtests comprising the domain were within normal limits or higher, despite being significantly
different from one another, then practitioners can be reasonably confident that this broad
ability/process is intact. However, if the broad ability domain was central to the referral, or if
performance on one of the two narrow ability/processing indicators that comprised it was below
normal limits, then there would be a need to secure more definitive information about functioning in
the narrow Gf ability/processing area that was found to be deficient to create a more defensible basis
for interpretation.
DONT FORGET

If the construct of interest revolves around Gf, then it would be expected that two independent
measures of Gf (e.g., WJ III Analysis- Synthesis and WJ III Concept Formation) would produce
scores that are similar to each other, since each test measures qualitatively different aspects of
the same broad ability/processing construct. On occasion, and for a wide variety of reasons, the
scores within a broad ability/processing domain may actually deviate significantly from each
other, making interpretation of performance ambiguous. When the difference between the scores
within the broad Gf domain, for example, is statistically significant, Gf should not be interpreted
in most instances.

In Stage E1, practitioners must evaluate the viability of the a priori hypotheses that were generated.
Based on the evaluative judgments derived from normative comparisons of the data in Stage E1,
practitioners must decide whether the data suggest that the null hypothesis should be retained or
rejected in favor of an alternative hypothesis. This process is relatively straightforward in the sense
that when the evaluative judgments in Stage E1 indicate that functioning or performance is outside of
normal limits, then the null hypothesis is rejected in favor of the alternative. Table 3.2 provides
practitioners with a descriptive classification system that corresponds to the properties of the normal
probability curve. The classifications provided in this table are consistent with those used for the
WISC-IV (Flanagan & Kaufman, 2004), KABC-II (Kaufman & Kaufman, 2004), and SB5 (Alfonso &
Flanagan, 2006), as well as those generally used by neuropsychologists.

Table 3.2 Contemporary Descriptive Classification System

Stage F: Specification of A Posteriori Hypotheses
When XBA data are interpreted and evaluated according to the specified a priori hypotheses, there
may be instances in which all functioning is observed to fall within normal limits and thus all a priori
hypotheses are retained (Stage F2). At this point, if the XBA was constructed in accordance with the
principles and procedures set forth in Chapters 1 and 2 (e.g., the assessment provided adequate
representation of the constructs of interest), then practitioners can reasonably conclude that the
individual demonstrates no measured deficits in functioning. Determination of disability, however,
should always be based on multiple sources of information.
Because of the selective nature of referral and assessment (e.g., most individuals referred for an
evaluation are having some type of difficulty), in the majority of cases, measurement of an
individuals abilities/processes is likely to produce one or more instances in which performance will
fall outside the normal limits of functioning. Disability determinations are concerned primarily with
cases in which performance falls below the expected or average range of functioning, whereas
identification of gifted and talented individuals focuses on performance that falls significantly above
the average range. In those cases in which the data suggest that the null hypothesis should be rejected
in favor of an alternative (Stage F1), or when the data provide contradictory, ambiguous, or
insufficient evidence upon which to base such a decision, XBA becomes an iterative process.
When initial XBA data support the null hypothesis, further assessment via standardized testing is
likely unwarranted; practitioners should draw appropriate conclusions and present those findings in a
psychological report (Stage G). However, when one or more a priori hypotheses are not supported by
the data, or when the data conflict (i.e., significant differences within broad ability /processing
domains exist), additional assessment may be warranted. When practitioners deem it necessary to
investigate anomalous or ambiguous results, the process remains hypothesis-driven and is carried
forth on the basis of a posteriori hypotheses. According to the American Heritage Dictionary (1994),
a posteriori is defined as: Reasoning from particular facts to general principles; empirical. The use
of a posteriori hypotheses has a long history in clinical assessment and involves inferring causes
from effects (Kamphaus, 1993).
The most common situation in which the use of a posteriori hypotheses and additional assessment
will be pursued occurs when there is a significant difference between two measures of a particular
cognitive ability/process that fail to converge as expected. This situation was described in the
previous section (i.e., the finding of a nonunitary cluster).
DONT FORGET

Additional Assessment
within a Broad
Ability/Processing
Domain is Generally
Warranted When
1. a statistically significant difference is found between narrow ability/processing scores in
a broad domain; and
2. the lower of the two scores is suggestive of a normative weakness (i.e., is more than 1 SD
below the normative mean).

In cases in which supplemental testing is necessary, so too is the specification of a posteriori
hypotheses. Such hypotheses are essentially identical to the a priori hypotheses described previously,
in that they also specify that performance on any additional tests that may be given will be within the
normal limits of functioning. These hypotheses differ only with respect to the point in the assessment
process at which they are generated: A priori hypotheses are generated prior to the administration of
any tests and prior to interpretation of any collected data; a posteriori hypotheses are generated
following interpretation of initial data and prior to administration of additional testing. As can be
seen in Figure 3.1, following specification of a posteriori hypotheses, practitioners return to Stage C.
Once again, knowledge of CHC theory and research is used to guide the selection of cognitive ability
measures that will be used to gather additional information, as well as to evaluate a posteriori
hypotheses. Returning to Stage C, following Stage F1, represents the iterative nature of assessment
and interpretations, which is necessary to corroborate ambiguous, anomalous, or contradictory
findings. Such iterations assist in narrow[ing] down the possibilities or reasons for the existence of
a particular initial finding (Kamphaus, 1993, p. 166), and can be continued until all hypotheses are
properly evaluated, allowing practitioners to draw valid conclusions from the data.
Stage G: Incorporate XBA Results in a Psychological Report
Two examples of the way in which XBA data may be incorporated into a psychological report are
presented in Chapter 7 of this book. In general, practitioners should take care to provide a clear
explanation of the basis for assessment; to explain the reasons for evaluating specific cognitive and
academic constructs; and to make connections between any identified deficits in cognitive
ability/processing and presenting problem(s) (e.g., academic skill deficits). Practitioners should also
remember that the XBA approach, although systematic, defensible, and theory driven, represents only
one component of the broad framework of evaluation. Therefore, any report that is built around XBA
data should not be considered a complete representation of psychological functioning. It is best
practice to demonstrate that the evidence from multiple data sources converge to form the basis for
defensible conclusions about individual ability or functioning. The remainder of this chapter
describes how results generated from the data entered into the XBA DMIA should be interpreted.
GUIDELINES FOR TEST INTERPRETATION

When only a single intelligence battery is used in assessment, the data derived from the scoring
process tend to remain on the same scale, having the same mean and standard deviation. This, of
course, provides for straightforward and direct comparison of scores and facilitates the interpretive
process. In the case of XBA, scores come from at least two different sources or tests. Consequently, it
is possible that each test is based on a different metric (e.g., Wechsler subtests have a mean of 10 and
SD of 3, whereas WJ III subtests have a mean of 100 and SD of 15), making direct comparisons
inappropriate. To overcome this obstacle, all test scores obtained within the context of the XBA
approach are converted to a common metric having a mean of 100 and a standard deviation of 15 and
are compared to an appropriate normative standard. The XBA DMIA automatically converts all data
that are entered into the CHC tab except for T scores from the DAS-II. When scores from the DAS-II
are utilized in XBA there are two options practitioners may use to convert the scores to the deviation
IQ metric. The first option is to enter the DAS-II scores into the appropriate places on the DAS-II test-
specific tab and allow the program to calculate the converted standard score automatically. The
converted scores may then be noted and entered by hand into the CHC tab. The second option is to use
the Percentile Rank and Standard Score Conversion Table found in Appendix E where converted
scores may be looked up and then entered by hand into the CHC tab. By converting all scores to a
common metric, concerns regarding the feasibility of drawing useful and valid conclusions from
tests with different means and standard deviations are addressed directly. Use of the normal
probability curve provides the means for achieving normative-based comparisons of XBA data (see
Table 3.2).
In the XBA approach, the criterion for rejecting the null hypothesis is set at the level of >1 SD
from the mean. Note that with the adoption of such a range, performance can be considered
exceptional only when it falls either significantly above (>115) or below (<85) the mean, indicating
both normative strengths and normative weaknesses in functioning, respectively. Note that cut-offs
are based on observed scores. Use of confidence bands is recommended to assist in making
interpretations of performance that falls at or close to a cut-off. For example, an observed score of 87
4 (83 to 91) may be indicative of a deficit for one individual but not another. This example
highlights the need to evaluate all observed scores (and their corresponding confidence bands) within
the context of the entire case. As discussed previously, however, a great deal of assessment is
conducted with respect to the investigation of potential or suspected deficits and, therefore, most
attention will likely be paid to performance that is at or near the lower cut off or that is significantly
below the mean.
In general, interpretation of XBA data begins by examining results from at least two subtests that
are expected to converge to form broad or narrow ability/processing clusters. Because clusters are
typically more reliable than scores from single subtests, interpretation of performance on the basis of
single subtests is not typically done under the XBA approach. Likewise, the initial focus is on the
interpretation of broad abilities/processes, as they tend to be more reliable and better representations
of overall functioning in a broad domain than narrow ability/processing clusters, which are also
derived from two subtests but that measure a single, specific, or narrow ability /process. As noted in
Figure 3.2, broad ability/processing clusters comprise Level I interpretation and narrow
ability/processing clusters comprise Level II interpretation. In some cases, there may well be more
than two subtests that comprise a particular broad or narrow ability/process. The various
combinations of scores that may result from having more than two subtests that measure the same
broad or narrow ability/process quickly complicates interpretation. The following section provides
guidelines for interpretation that may be used to derive valid and defensible conclusions when either
two or three subtests are used to represent an ability/process.
Interpretation of Two Scores Representing Either a Broad or Narrow Ability/Processing
Domain

The CHC tab of the XBA DMIA allows for up to three scores to be entered for the broad
ability/process. When two scores are entered, the program will either report a cluster, when it is
unitary and thus interpretable, or not report a cluster when it is nonunitary and thus not interpretable.
The program first converts all scores to a scale having a mean of 100 and SD of 15. Next, the
program determines whether calculation of an arithmetic average of the scores is appropriate. When
two subtest scores are entered into a broad ability/processing domain, the program will report one of
three possible outcomes:
1. It will calculate an average of two standard scores when the difference between them is less
than 15 points. These clusters are considered unitary and, therefore, interpretable.
2. It will calculate an average of two standard scores when the difference between them is 15
points and both scores are within the same normative range (i.e., both scores are either < 85 or
85 and 115 or >115). Although these clusters are nonunitary from a statistical significance
standpoint, they are nonetheless interpretable from a clinical standpoint.
3. It will not calculate an average of two standard scores when the difference between them is 15
and the scores are within different normative ranges (e.g., one score is <85 and one is 85).
These clusters are both nonunitary and noninterpretable.

For clusters entered into the individual intelligence battery tabs (e.g., WISC-IV VCI; WJ III Visual-
Spatial Thinking Factor; KABC-II Learning /Glr Scale), the program will determine whether these
clusters are interpretable based on the criteria reported in Table 3.1. Whether using data from
individual intelligence batteries or XBAs, Rapid Reference 3.4 provides examples of how to interpret
clusters based on two subtest combinations that are considered interpretable.
Rapid Reference 3.4

Examples of How to Describe the Finding of a Two-Subtest Unitary (Interpretable) Cluster
Based on Data from either a Single Battery or XBAs in a Psychological Report

Interpretation of Three Scores Representing a Broad Ability/Processing Domain
An average is calculated for three scores when: (a) all scores fall within the same normative range
(i.e., all three scores are either <85 or 85 and 115 or >115); or (b) the magnitude of the difference
between any score with any other score is <15. Otherwise, when three subtests represent the same
ability/process, the XBA DMIA (for CHC tab only) will calculate a cluster, based on the average of
two scores, and report the third score as an outlier.1 Regardless of what combination of scores is
averaged when three subtests are used to represent a cluster, Rapid Reference 3.5 directs the user of
the XBA DMIA to an appropriate interpretive statement that may be used to facilitate report writing.
The accompanying figures provide a set of decision points that correspond to each of the nine
possible interpretive statements associated with three-subtest combinations.
Rapid Reference 3.5

A Guide to Interpreting Three Scores within an Ability/

1 The XBA DMIA (CHC tab only) will calculate a two-subtest cluster and report the third subtest as
an outlier when: (a) the difference between standard scores for two tests is <15 and the difference
between the third score and both of these scores is 15; or (b) two scores fall within the same
normative range (< 85 or 85 and 115 or >115) and the third score differs from both of those scores
by 15 points; or (c) the difference between Standard Score A (SSA) and Standard Score B (SSB) is
<15 and the difference between Standard Score C (SSC) and SSB is <15 and the difference between
SSA and SSC is 15, then SSB is averaged with either SSA or SSC, depending on the normative range
in which the scores fall. For example, if SSA and SSB were within normal limits but SSC was Above
Average, then SSA and SSB would be averaged and SSC would be reported as an outlier. Regardless
of what combination of scores is averaged, Rapid Reference 3.5 directs the user of the XBA DMIA to
an appropriate interpretive statement that may be used to facilitate report writing.
Interpretive Statement 1

On the three tasks that comprise the WISC-IV Verbal Comprehension Index (VCI), Jims
performance was consistently Below Average and in the Normative Weakness range. For
example, when required to give definitions of words presented orally his performance was
slightly below average (Vocabulary = 6 [SS = 80]; 9th percentile). When asked to give oral
responses to hypothetical questions that assess everyday problems or understanding of social
rules and concepts his performance was lower (WISC-IV Comprehension = 3 [SS = 65]; 1st
percentile). And when required to respond orally and explain the similarity between the concepts
represented by two different words his performance was the lowest (WISC-IV Similarities = 1
[SS = 55], <1st percentile). The difference between his highest and lowest performances on these
tests is statistically significant, rendering the VCI nonunitary and noninterpretable. To better
understand Jims functioning in this domain, his scores were examined using XBA interpretive
guidelines. Analysis of his scores within this framework indicated that although the VCI is
nonunitary, a valid Crystallized Intelligence (Gc) cluster can be formed based on these three
subtest performances because they all fall in the same normative range. Jims Gc cluster of 67 is
ranked at the 1st percentile and is a Normative Weakness. Overall, this suggests that Jims
functioning in the broad Gc ability/processing domain is deficient as compared to same-age
peers from the general population. Therefore, Jim has a disorder in the basic psychological
process of Gca finding that should play a significant role in educational intervention planning.


On tasks that measured Carols Fluid Intelligence (Gf ) her performance was Below Average and
in the Normative Weakness range when required to determine the logic behind why some objects
are grouped together and others are not (WJ III Concept Formation = 80; 9th percentile) and in
the Average Range when asked to solve logic puzzles that are based on combinations of different
colored squares (WJ III Analysis-Synthesis = 110; 75th percentile). The difference between her
performances on these tests is statistically significant, rendering her overall Gf cluster
nonunitary and noninterpretable. To better assess and understand Carols functioning in this
domain, a second measure of her weaker Gf narrow ability/process (i.e., Induction) was
administered. On an Induction task that required Carol to determine the correct components that
complete a matrix reasoning or logic puzzle, her performance this time was in the Average
Range (WISC-IV Matrix Reasoning = 11 [SS = 105] ; 65th percentile). The lack of convergence
here between two measures of Induction (Concept Formation and Matrix Reasoning) indicates
that Carols performance on the WJ III Concept Formation subtest is likely to be an anomalous
result and not an accurate indication of her ability to reason inductively.4 Therefore, Carols
broad Gf ability was based on the aggregate of her performances on WJ III Analysis-Synthesis
and WISC-IV Matrix Reasoning. She earned a Gf cluster of 108, which is ranked at the 70th
percentile and is classified as Average Range/Within Normal Limits, indicating that her
functioning in this broad ability/process is intact.


On tasks that measured Alans Long-Term Retrieval (Glr), his performance was Below Average
and in the Normative Weakness range when required to learn and recall words associated with
picture symbols that form sentences (WJ III Visual-Auditory Learning = 80; 9th percentile) and
Above Average and in the Normative Strength range when asked to quickly recall words that
belong to a particular category (WJ III Retrieval Fluency = 118; 88th percentile).The difference
between his performances on these tests is statistically significant, rendering his overall Glr
cluster nonunitary and noninterpretable.To better assess and understand Alans functioning in
this domain, a second measure of his weaker ability (i.e., Associative Memory) was
administered. On a task that required Alan to again remember and recall words that are
associated with graphical symbols, his performance this time was Above Average and in the
Normative Strength range (KABC-II Rebus-14 [SS = 120]; 91st percentile).The lack of
convergence here between two measures of Associative Memory (WJ III Visual-Auditory
Learning and KABC-II Rebus) appears to indicate that Alans performance on the WJ III Visual
Auditory Learning subtest is likely to be an anomalous result and not an accurate indication of
his Associative Memory ability.5Therefore, Alans broad Glr ability/process was based on the
aggregate of his performances on WJ III Retrieval Fluency and KABC-II Rebus. He earned a Glr
cluster of 119, which is ranked at the 90th percentile and is classified as Above
Average/Normative Strength, indicating that his functioning in this area is intact and above what
is typically expected of individuals his age.


On tasks that measured Sarahs Crystallized Intelligence (Gc), her performance was Below
Average and in the Normative Weakness range when required to state a word that is either
similar or opposite in meaning to a presented word or name familiar and unfamiliar pictured
objects (WJ III Verbal Comprehension = 80; 9th percentile) and in the Average Range when
asked to answer orally presented questions regarding the common or typical characteristics of
objects (WJ III General Information = 100 ; 50th percentile). The difference between her
performances on these tests is statistically significant, rendering her overall Gc cluster
nonunitary and noninterpretable. To better assess and understand Sarahs functioning in this
domain, a second measure of her weaker narrow ability (Lexical Knowledge) was administered.
On a task that required Sarah to give definitions of words presented orally, her performance
again fell Below Average and in the Normative Weakness range (WISC-IV Vocabulary = 5 [SS =
75]; 5th percentile). Thus, a narrow ability/processing Lexical Knowledge cluster was formed
based on the aggregate of WJ III Verbal Comprehension and WISC-IV Vocabulary. Sarah earned
a Lexical Knowledge cluster of 78, which is ranked at the 7th percentile and is classified as
Below Average/Normative Weakness. Overall, it appears that although one aspect of Sarahs Gc
is average (General Information), another aspect (Lexical Knowledge) is deficient.


On tasks that measured Trents Crystallized Intelligence (Gc), his performance was in the upper
end of the Average Range when required to point to one of six pictures that show the meaning of
the word or the answer to a question posed by the examiner (KABC-II Verbal Knowledge = 13
[SS = 115]; 84th percentile) and in the lower end of the Average Range when asked to solve
orally presented riddles by pointing to a picture or using words (KABC-II Riddles = 7 [SS = 85];
16th percentile). Although both scores are within the Average Range, the difference between his
performances on these tests is statistically significant, rendering his overall Gc cluster
nonunitary and noninterpretable. To better assess and understand Trents functioning in this
domain, a second measure of his relatively weaker narrow ability (Lexical
Knowledge/Reasoning) was administered. On a task that required Trent to answer questions
using reasoning, his performance again fell within the Average Range (WISC-IV Word
Reasoning = 9 [SS = 95]; 37th percentile). This result indicates that Trents Lexical
Knowledge/Reasoning ability is, in fact, intact. When evaluated from a CHC theoretical
perspective, the aggregate of his three scores yielded a valid Gc cluster that was within the
Average Range (Gc = 98, 45th percentile). Thus, Trents ability to work and reason with
primarily learned information is within normal limits.


On the three tasks that comprise the WISC-IV Verbal Comprehension Index (VCI), Anitas
performance was consistently in the Average to Above Average range. For example, when
required to give definitions of words presented orally her performance was Average
(Vocabulary = 11 [SS = 105]; 65th percentile). However, when asked to give oral responses to
hypothetical questions that assess everyday problems or understanding of social rules and
concepts her performance was Above Average and considered a Normative Strength (WISC-IV
Comprehension = 16 [SS = 130]; 98th percentile). And when required to respond orally and
explain the similarity between the concepts represented by two different words her performance
was again Above Average (WISC-IV Similarities = 14 [SS = 120] ; 91st percentile). The
difference between her highest and lowest performances on these tests is statistically significant,
rendering the VCI nonunitary and noninterpretable. To better understand Anitas functioning in
this domain, her scores were interpreted following XBA guidelines. Analysis of her scores
within this framework indicated that a valid Crystallized Intelligence (Gc) cluster can be formed
based on the aggregate of the Comprehension and Similarities subtest scores. Anita earned a Gc
cluster of 125, which is ranked at the 95th percentile and is classified as Above
Average/Normative Strength. Anitas performance on the Vocabulary subtest, although Average,
was lower than her broad Gc cluster, and therefore is a relative weakness for her. Overall, when
asked to reason with words and general information, Anitas performance is significantly Above
Average as compared to same-age peers from the general population.


On tasks that measured Ricks Long-Term Retrieval (Glr), his performance was Above Average
and in the Normative Strength range when required to learn and recall words associated with
picture symbols that form sentences (WJ III Visual-Auditory Learning = 125; 95th percentile)
and Below Average and in the Normative Weakness range when asked to quickly recall as many
words that belong to a particular category (WJ III Retrieval Fluency = 80; 9th percentile). The
difference between his performances on these tests is statistically significant, rendering his
overall Glr cluster nonunitary and noninterpretable. To better assess and understand Ricks
functioning in this domain, a second measure of his weaker narrow ability (Ideational
Fluency/Naming Facility) was administered. On a task that required Rick to quickly identify and
name pictures of common objects, his performance was again Below Average and in the
Normative Weakness range (WJ III Rapid Picture Naming = 82; 12th percentile). Thus, a valid
narrow ability/processing cluster (Ideational Fluency/Naming Facility) was formed based on the
aggregate of WJ III Retrieval Fluency and Rapid Picture Naming. Rick earned an Ideational
Fluency/Naming Facility cluster of 81 (76-86), which is ranked at the 10th percentile and is
classified as Below Average/Normative Weakness. Overall, it appears that although one aspect of
Ricks Glr is Above Average (Associative Memory) another aspect is deficient (Ideational
Fluency/Naming Facility).


On the three subtests that comprise the WISC-IV Perceptual Reasoning Index (PRI), Jings
performance was consistently Average to Above Average. For example, when required to
reproduce a series of pictorial designs using blocks, her performance was Above Average and
considered a Normative Strength (Block Design = 16 [SS = 130]; 98th percentile). When
required to choose one picture from each row to form a group with a common characteristic,
her performance was within the Average Range (WISC-IV Picture Concepts = 11 [SS = 105];
65th percentile). And when required to select the option that completes a matrix, Jings
performance was again within the Average Range (WISC-IV Matrix Reasoning = 13 [SS = 115] ,
84th percentile). The difference between her highest and lowest performances on these PRI
subtests is statistically significant, rendering the PRI nonunitary and noninterpretable. To better
understand Jings functioning in this domain, her scores were interpreted following XBA
guidelines. Analysis of her scores within this framework indicated that a valid Fluid Intelligence
(Gf ) cluster can be formed based on the Picture Concepts (Induction) and Matrix Reasoning
(Induction/General Sequential Reasoning) subtests. Jing earned a Gf cluster of 110, which is
ranked at the 75th percentile and is classified as Average. Although Jings performance on Block
Design was lower than this cluster, it was nonetheless Above Average, indicating that her Visual
Processing (Gv) ability, particularly Spatial Relations, is intact. Overall, these results indicate that
Jing reasons well with visual information.


On the three tasks that comprise the WISC-IV Perceptual Reasoning Index (PRI), Guillermos
performance was consistently in the Above Average/Normative Strength range. For example,
when required to reproduce a series of pictorial designs using blocks, his performance was
Above Average (Block Design = 19 [SS = 145] ; 99th percentile). When required to choose one
picture from each row to form a group with a common characteristic, his performance was also
Above Average (WISC-IV Picture Concepts = 14 [SS = 120]; 91st percentile). And when required
to select the option that completes a matrix, Guillermos performance was again Above Average
(WISC-IV Matrix Reasoning = 15 [SS = 125]; 95th percentile). Nevertheless, the difference
between his highest and lowest performances on the PRI subtests is statistically significant,
rendering the PRI nonunitary and noninterpretable. To better understand Guillermos functioning
in this domain, his scores were interpreted following XBA guidelines. Analysis of his scores
within this framework indicated that although his PRI is nonunitary, a valid cluster can be formed
based on all three subtests. Guillermo earned a Fluid Reasoning/Visual Processing (Gf/Gv)
cluster of 130, which is ranked at the 98th percentile and is considered a Normative Strength.
Overall these results indicate that Guillermos ability to reason with visual information is well
Above Average as compared to same-age peers from the general population.

SUMMARY

This chapter provided specific guidelines to assist practitioners in interpreting XBA data. The process
of test interpretation within the context of CHC theory and research presented herein was described as
both systematic and defensible. Specifically, the XBA approach was described as an hypothesis-driven
method of assessment and interpretation that serves to reduce the possibility of confirmatory bias.
Practitioners were advised to follow guidelines for specifying and testing both a priori and a
posteriori hypotheses. In addition, practitioners were instructed on how to draw meaningful
conclusions from data entered in the XBA DMIA included on the CD-ROM accompanying this book.
REFERENCES

Alfonso, V. C., & Flanagan, D. P. (2006). Best practices in the use of the Stanford Binet Intelligence
Scales-Fifth Edition with preschoolers. In B. A. Bracken & R. Nagle (Eds.), Psychoeducational
assessment of preschoolers (4th ed., pp. 267-295). Mahwah, NJ: Erlbaum.
Alfonso, V. C., Flanagan, D. P., & Radwan, W. (2005). The impact of Cattell-Horn-Carroll theory on
test development and interpretation of cognitive and academic abilities. In D. P. Flanagan & P. L.
Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 185-202). New
York: Guilford.
American Heritage Dictionary. (1994). The American Heritage Dictionary-Third Edition. New York:
Dell Publishing.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge,
England: Cambridge University Press.
Flanagan, D. P., & McGrew, K. S. (1998). Interpreting intelligence tests from contemporary Gf-Gc
theory: Joint confirmatory factor analyses of the WJ-R and KAIT in a non-White sample. Journal of
School Psychology, 36, 151-182.
Kaufman, A. S., & Flanagan, D. P. (2004). Essentials of WISC-IV assessment. New York: Wiley.
Kaufman, A. S., & Flanagan, N. L. (2004). Kaufman Assessment Battery for Children-Second Edition.
Sandoval, J., Frisby, C. L., Geisinger, K. F., Scheuneman, J. D., & Grenier, J. R. (Eds.). (1998). Test
interpretation and diversity: Achieving equality in assessment. Washington, DC: American
TEST YOURSELF

1. Adopting the stance that performance of a child will be within normal limits, or
assuming the null hypothesis, at the beginning of assessment
a. suggests that the examiner has preconceived notions that a disability exists.
b. reduces the chance that the examiner will view standardized test data only in a
manner that corroborates the preconceived notion of disability.
c. provides de facto support for the presence of a disability.
d. indicates that external factors have already been ruled out as possible primary
causes for the observed difficulties.

2. In XBA assessment, a priori hypotheses
a. are hypotheses an examiner forms based on case history data.

b. are hypotheses an examiner forms based on specific knowledge of CHC theory
and research.
c. are hypotheses an examiner forms based on information from case history data
and knowledge of CHC theory and research.
d. are hypotheses based on an evaluation of initial assessment data.

3. A unitary broad ability/processing cluster is comprised of scores that do not differ
significantly from one another and that represent qualitatively different aspects of
the broad ability/process. True or False?
4. Interpretations that are made within the context of the XBA approach are based on
a. intra-individual comparisons.
b. extra-individual comparisons.
c. inter-individual comparisons.
d. supra-individual comparisons.
e. all of the above.

5. To adequately represent a broad ability/process in assessment, the practitioner needs
at least
a. one subtest that measures the broad ability/process.

b. two subtests that measure qualitatively similar aspects of the broad ability/process.
c. two subtests that measure qualitatively different aspects of the broad
ability/process.
d. three or more subtests that measure different narrow CHC abilities /processes.

6. When three subtests that measure the same broad ability/process are entered into
the CHC tab of the XBA DMIA, a cluster average will be calculated when
a. all scores fall within the same normative range.

b. the magnitude of the difference between any score with any other score is <15.
c. a and b.
d. none of the above.

7. In the XBA approach, the criterion for rejecting the null hypothesis is
a. >1.5 SDs from the mean.

b. >1 SD from the mean.
c. <1.5 SDs from the mean.
d. <1 SD from the mean.

8. A practitioner may choose to conduct additional assessment within a broad
ability/processing domain when a statistically significant difference is found between
the scores that comprise the domain and the lower of the two scores is a normative
strength. True or False?
9. XBA interpretation is based on
a. Contemporary CHC theory.

b. Carrolls Three-Stratum theory.
c. Gf-Gc theory.
d. PASS theory.

10. The XBA DMIA will calculate a CHC cluster (on the CHC tab) when the difference
between two scores entered for a particular CHC domain is <15 and both scores are
within different normative ranges. True or False?

Answers: 1. b; 2. c; 3. True; 4. c; 5. c; 6. c; 7. b; 8. False; 9. a; 10. True

Four

USE OF THE CROSS-BATTERY APPROACH IN SPECIFIC LEARNING

DISABILITY EVALUATION

INTRODUCTION

The main principles that characterize the Cross-Battery Assessment (XBA) approach include attention
to both measurement and interpretation. As noted in Chapter 1, the clear focus on better measurement
(e.g., adequate construct representation) in the recent wave of intelligence test revisions highlights the
importance of measurement in the assessment process and the need to apply psychometric standards
rigorously. But precision and confidence in measurement are only half the battle because
measurements are of little value unless some meaning can be attached to them. Interpretation is the
process of ascribing meaning to a particular set of measurements or data set. Although the purpose
and intent of interpretation is often quite clear, the determination of what meaning to assign to data is
a process fraught with ambiguity, uncertainty, and misconceptions. Nowhere perhaps is this problem
better exemplified than in the assessment of suspected learning disability.
The purpose of this chapter is to illustrate how XBA methods and procedures can be readily
applied in the evaluation of specific learning disability (SLDs). Although the XBA approach was not
designed expressly for the purpose of SLD evaluation, it is well-suited for this purpose. That is, the
kind of measurement and interpretation principles that underlie the XBA approach strengthens its
application in virtually all forms and purposes of assessment. Accordingly, the value of XBA is
particularly evident in the evaluation of suspected learning disability, especially when used in
conjunction with a modern operational definition of SLD.
This chapter provides a discussion of the seven deadly sins in SLD evaluation, which provides a
context for understanding the current difficulties and confusion surrounding the use of cognitive
batteries for this purpose. We also offer a brief description of our operational definition of SLD
(Flanagan, Ortiz, Alfonso, & Dynda, 2006; Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006).
Finally, we discuss the use of a special program, the SLD Assistant, that is designed to facilitate
decision-making within the context of the operational definition.
DONT FORGET

XBA Guides Measurement
and Interpretation

The main principles that characterize the XBA approach include attention to both measurement
and interpretation. Precision and confidence in measurement are only half the battle because
measurements are of little value unless some meaning can be attached to them.
THE SEVEN DEADLY SINS IN SLD EVALUATION

Whereas we refer to them as the seven deadly sins in SLD evaluation, we wish to emphasize that
this in no way minimizes the seriousness of such assessment. Indeed, SLD comprises the single
largest disability identified among the nations school-aged children, accounting for about one-half
of all children with disabilities currently receiving special education services (NCES, 2006). When
coupled with the increasing number of adults in college, graduate or professional schools, and other
educational and occupational settings who are being diagnosed SLD, the number of people being
evaluated for this disorder is truly staggering. Therefore, the consequences to the public as a result of
poorly constructed and misguided assessments are anything but negligible. With this in mind, we
offer the seven deadly sins as a basis for understanding the myriad misconceptions surrounding SLD
evaluation that continue to undermine its reliability and validity.
The purpose of discussing these sins (see Table 4.1) is to promote practices that are scientifically
supported, evidence based, and guided by clear theoretical specifications. Only in this manner will
practitioners, researchers, and trainers across a wide range of disciplines be able to converse with
one another regarding the nature of SLD and do so on common footing, using a common
nomenclature, and with a clear idea as to what SLD is and how it is to be diagnosed. Past and present
methods for identifying SLD have done little to improve the reliability and validity of the diagnosis.
We hope to change this record and begin by examining the reasons why traditional models of SLD
evaluation have failed.

Table 4.1 The Seven Deadly Sins in SLD Evaluation

Sin #1: Relentless Searching for Ipsative or Intra-Individual (Person-Relative) Discrepancies
Perhaps the most common practice in SLD evaluations is the procedure by which scores are ipsatized
that is, the scores are averaged and then subtracted from this average in order to determine the
degree of deviation from the average. The presumption has been that scores that deviate significantly
from the average of the individuals abilities when taken together are clinically important indicators
of either relative weaknesses (when the deviation is lower than the average) or relative strengths
(when the deviation is higher than the average). Relative weaknesses are then touted as evidence of
SLD (see Kaufman, 1979, 1994, for discussions of the approach). The focus of such analysis is the
identification of discrepancies that exist within the individual only.
DONT FORGET

XBA Benefits SLD
Evaluation through Use of
Common Terminology

Past and some present methods for identifying SLD have done little to improve the reliability
and validity of the diagnosis. XBA introduces modern intelligence theory and psychometric
rigor that allow practitioners, researchers, and trainers across a wide range of disciplines to be
able to converse with one another regarding the nature of SLD and do so on common footing,
using common nomenclature, and with a clear idea as to what SLD is and how it is to be
diagnosed.
Such ipsative, person-relative discrepancy analysis is fraught with both psychometric problems and
errors in logic. Consider first that by averaging scores and using the average as the referent for
deviation, there is a presumption that normal development is evidenced by a close correspondence
between every one of an individuals ability scores. That is, if an individuals cognitive functioning
was normal, no deviations from the average would be evident, resulting in a flat profile. This
assumption is erroneous. Most individuals have significant variability in their profile of cognitive
ability/processing scores (McGrew & Knopik, 1996). Simply because an individual is good or bad at
one task or skill does not mean that he or she should be either good or bad at all other skills.
According to various norm samples, for example the Wechsler Intelligence Scale for Children-
Fourth Edition (WISC-IV; Wechsler, 2003) and the Wechsler Adult Intelligence Scale-Third Edition
(WAIS-III; Wechsler, 1997), only about 3 to 4 percent of the entire population shows either no
variability in their scaled scores or variation of a single scaled score point. To use the standards set
by a tiny fraction of the population as the referent for evaluating the pattern of development in the
abilities of the rest of the population is unfounded.
CAUTION

Significant Variation
in Abilities/Processes
Is Normal

There is a tendency to believe, especially when relative weaknesses are identified via ipsative
analysis, that significant variation in an individuals abilities/processes is unusual or abnormal.
This is not true. Human beings vary considerably across their abilities/processes and only about
3 to 4% of the population shows little or no variability in performance. Thus, to use standards
based on little to no variation as the expectation for an individual is likely to lead to significant
errors in interpretation and conclusions that are not defensible.
Practitioners have also routinely been taught to search for discrepancies wherever they may exist,
including among and between individual subtest scores, composite scores, and global ability scores,
whether on the same cognitive or achievement battery or across different cognitive and achievement
batteries. Furthermore, there is no standard or guide regarding what types of scores should be
compared. As a result, practitioners often compare every combination of scores obtained in an
evaluation. Given that the most current Wechsler Scale (i.e., the WISC-IV) yields 1 IQ, 4 Indexes, and
15 individual subtest scaled scores, the sheer number of possible comparisons between scores from
within this battery alone is staggering. When the scores from an achievement battery are included in
this type of analysis the possibilities for discrepancy comparisons increases exponentially. Even if
ipsative analysis were defensible in and of itself, the fact that practitioners are able to conduct so
many comparisons greatly increases the probability that at least one will be found by chance alone.
DONT FORGET

An Individuals
Abilities/Processes
Are Rarely Equally
Well Developed

Many individuals actually have significant variability in their profile of cognitive
abilities/processes (McGrew & Knopik, 1996). Simply because an individual is good or bad at
one task or skill does not mean he or she should be either good or bad at all other skills.
The faulty logic in ipsative analysis may be seen in the following Michael Jordan analogy.
Whereas it is clear that Michael Jordan has superior basketball ability, it would be a fallacy to
presume that all his athletic abilities are equally well developed. Indeed, Michael Jordans abilities to
play baseball and golf fall far short of his superior ability to play basketball, although he remains
well above average in both. To say then that he is athletically disabled because he only plays baseball
and golf reasonably well is simply ludicrous. Ultimately, it should be recognized that significant test
variation in performance is normal. The expectation of a flat profile is unwarranted. Simply put, the
results of ipsative analysis are largely meaningless unless they are integrated with the results of
normative analyses within the context of a systematic theory-driven approach (see Flanagan &
Kaufman, 2004).
We are not alone in our conclusion that absolute person-relative analysis is an indefensible
practice. Ipsative analysis has long been roundly criticized in the literature (see McDermott, Fantuzzo,
& Glutting, 1990; McDermott, Fantuzzo, Glutting, Watkins, & Baggaley, 1992; McDermott &
Glutting, 1997). Yet the practice persists, in particular between the still readily available Wechsler
Verbal IQ and Performance IQ. Perhaps the fact that these constructs have been summarily dropped
from the WISC-IV will mean that some common relative comparisons will thankfully cease. Clinical
tradition dies hard, however, as may be seen in the comments Siegel (1999) provided regarding the
evaluations submitted on behalf of the plaintiffs in Guckenberger v. Boston University, in which she
noted that inferences about the learning disability were made on the basis of a discrepancy between
the Verbal and Performance Scales of an IQ test (the WAIS-R). Many individuals have a learning
disability, but no significant discrepancy between their Verbal and Performance IQ scores;
conversely, many individuals with no evidence of a learning disability show a significant discrepancy
between their scores on Verbal and Performance Scales (Maller & McDermott, 1997). This verbal-
performance discrepancy has been discredited (p. 314). Whether this approach is useful or relevant
to any given assessment is no longer debatable, and what is certain is that a discrepancy between two
scores of any kind, from within a single battery or across any two batteries, whether intelligence or
achievement, is neither necessary nor sufficient to establish the presence of an SLD. The reader is
referred to Flanagan and Kaufman (2004) for a comprehensive approach to interpreting WISC-IV
performance that integrates data from both person-relative and population-relative analysesan
approach that is psychometrically, theoretically, and clinically defensible.
CAUTION

Ipsative or Person-Relative
Analysis Is
Inherently Unreliable

Although it is one of the most popular approaches to data analysis used in the evaluation of SLD,
practitioners are strongly cautioned to recognize the limitations and problems inherent in the use
of ipsative or person-relative analysis. This type of analysis has been discredited in the literature.
Only when data from person-relative analysis is integrated with data from population-relative
analysis can practitioners draw meaningful conclusions (see Flanagan & Kaufman, 2004).
Sin #2: Failure to Distinguish between a Relative Weakness and a Normative Weakness
Another common mistake in SLD evaluation is the failure to understand that the lower score in a
statistically significant person-relative comparison does not automatically gain clinical significance
simply because the discrepancy (or difference between the scores) has been determined to be real.
Statistical significance means only that the difference between the two scores in the comparison is not
due to chance: that is, they are indeed different from each other. For example, a statistically significant
discrepancy between 115 and 90 means that the scores are not the samethe difference between them
is thus likely to be a real difference, rather than one due to chance or error. But statistical significance
is not de facto evidence that the difference between the two scores in the comparison is clinically
meaningful or indicative of impairment.
DONT FORGET

Statistical Significance
Indicates a Real Difference
Not Necessarily a Clinically
Meaningful One

When a statistically significant difference is found between two scores, practitioners can
conclude only that the difference was not due to chance. Statistically significant differences are
not always rare, uncommon, or even meaningful.
Many practitioners recognize that clinical meaningfulness is achieved primarily through an
examination of the frequency with which the magnitude of the difference occurs in the general
population. Differences that are infrequent (i.e., occurring in less than 10% of the general population)
are often ascribed tremendous significance in evaluations of suspected SLD. Such analysis often fails,
however, to evaluate the position or classification of the score relative to the entire population; that is,
its normative classification or meaning. As noted previously, individuals can vary considerably in
their abilities. However, if the lower score in such a person-relative comparison is in the average
range or higher, then it cannot rightly be considered a deficit, disorder, or evidence of dysfunction
regardless of how many standard score points separate it from any other score in the comparison.
This is because average (or higher) ability, by definition, is not a disability. It is difficult to argue that
a standard score of 100, for example, is a deficit, let alone an indicator of SLD, simply because all of
the individuals other scores are say 125 or higher. The belief that average ability in some areas
coupled with superior abilities in other areas is indicative of SLD is simply untenable.
Some have argued that SLD may well exist in individuals with average or better abilities (e.g., Gregg
& Mather, 2001). The argument is based primarily on the manner, condition, or duration of the
impairment relative to others. That is, if an individual must labor intensively in order to achieve
only at the average level, then a learning disability may well be present. Unfortunately, it is extremely
difficult to distinguish what might constitute effort that is labored and effort that is not. Students with
poor study skills and habits may well find it difficult to succeed academically at an average level, but
would hardly be considered disabled on that basis alone. Students who lack proficiency in English
will need to apply themselves much more intently in school than native English speakers, but they are
also not disabled solely on the basis of lack of proficiency in English. Moreover, someone in every
class in every school has to be last to finish a test. Someone has to graduate at the bottom of the class.
Someone has to read more slowly than everyone else. Are all such students learning disabled? We are
reminded of the insight provided by the quip that states that the person who graduates last in their
class in medical school is called a doctor. Use of a normative standard, that is, comparisons to the
average person of the same age in the general population, provides the most reasonable standard for
evaluation of performance and its meaning with respect to learning and other types of disabilities
(Flanagan, Keiser, Bernier, & Ortiz, 2003; Gordon, Lewandowski, & Keiser, 1999).
Failure to evaluate performance from a normative, not merely relative, perspective frequently
results in misdiagnosis even when a normative deficit is present. Consider the following scenario.
Two students were referred for evaluation by their classroom teachers because they were achieving at
a lower level than their classmates in reading. According to their teachers, these students appeared not
to be performing up to their intellectual capabilities, as demonstrated through a variety of academic
indicators. Standardized test results showed that Student A demonstrated a statistically significant
discrepancy between her overall ability (Wechsler FSIQ of 105) and her measured reading
achievement (standard score of 80). Student B obtained a Wechsler FSIQ of 90 and a reading
achievement standard score of 80. The difference between Student As ability and achievement scores
was statistically significant, but it was not for Student B. As a result of these evaluations, Student A
was diagnosed with SLD and an educational intervention was implemented in an attempt to improve
her reading skills. Student B was not diagnosed with SLD and was described as reading at a level that
was commensurate with his estimated intellectual potential. Yet, when compared to a representative
sample of individuals of the same chronological age from the general population, Student As and
Student Bs actual levels of performance in reading are equivalent and neither student could really be
considered an average or proficient reader relative to same-age peers. In the absence of other
convincing data, it would be rather presumptuous to claim that Student A has a learning disability
whereas Student B does not simply because he did not have a statistically significant discrepancy
between his FSIQ and reading score. In both cases, performance in reading is only as good as or
better than about 9% of same-age or grade peers from the general population (SS = 80; 9th
percentile). Irrespective of the classification scheme used, both students clearly demonstrated that
their reading abilities, as measured by standardized tests, are below average and outside the range of
normal limits. To suggest that Student A has SLD whereas Student B does not, simply because of the
presence or absence of a meaningless discrepancy, is poor practice.
Sin #3: Obsession with the Severe Discrepancy Calculation
The allusion to Response to Intervention (RTI) within the wording of IDEA 2004 may eventually
reduce or even eliminate the propensity to engage in the practice of identifying a severe discrepancy
between global ability and achievement as the sine qua non criterion for SLD determination. The
global ability-achievement discrepancy has been regarded as so important and central to definitions
of and the diagnostic criteria for SLD that practitioners often resort to calculating discrepancies
between virtually every cognitive cluster or subtest score and every academic cluster or subtest score
obtained in an evaluation. As noted previously, given the number of composites and subtest scores
that are obtained in a typical evaluation (upward of 40 or more would not be uncommon), it would be
surprising if at least one significant discrepancy was not found!
By itself, however, a significant discrepancy between global ability and achievement is neither
synonymous with nor a necessary condition for SLD identification. Contrary to popular belief in a
wide variety of professions, discovery of a significant discrepancy between global ability and
achievement does not carry the automatic diagnostic implication of SLD (Vellutino, Scanlon, & Lyon,
2000). According to Siegel (1999), Such a discrepancy is not a necessary part of the definition of a
learning disability, and there may well be cases where learning disabilities are validly indicated in the
absence of any such discrepancy (p. 311). Yet virtually every definition of learning disability, except
for the recent version in IDEA 2004, continues to include the concept of discrepancy (Kavale &
Forness, 2000). Although the concept of discrepancy was and remains incorporated into some
legislative codes, mostly state regulations, it should not be construed as necessary to the process.
IDEA 2004 specifically prohibits states from requiring that a severe discrepancy be used in
identifying SLD, albeit it still tacitly allows its use. Moreover, IDEA 2004 continues to specify that no
single score or procedure may be used as the sole criterion for determining any type of disability.
The modification in the wording contained in IDEA 2004 and the specific prohibition against
requiring its use makes it clear that global ability-achievement discrepancy is no longer desirable or
necessary for the diagnosis of SLD in the evaluation of school-aged children.
Such changes have not yet found their way into other definitions and criteria for identifying SLD.
For example, the DSM-IV continues to use the concept of global ability-achievement discrepancy in
its wording relative to learning disorders where reference is made to underachievement.
Nevertheless, given the profound influence that science has already brought to bear in fostering the
recent changes in some legal definitions and methods for identifying SLD, it is safe to assume that
reliable and valid evaluation of SLD no longer requires or needs to include global ability-
achievement discrepancy analysis.
CAUTION

A Significant Discrepancy Is Not Synonymous
with the Presence of SLD

Historically, the most ubiquitous aspect of SLD evaluation has been the concept of a significant
discrepancy between global ability and achievement. In many cases, practitioners have resorted
to using the finding of a significant discrepancy between any two scores as an indication of SLD.
However, discrepancies occur for many reasons, including natural variation in an individuals
abilities/processes. The concept of discrepancy, as it has been generally operationalized, has
proven to be of little value in SLD evaluation. Therefore, the principle of underachievement
must be addressed in a more logcal and empirically supportable manner.
Sin #4: Belief That IQ Is a Near-Perfect Predictor of Any Area of Achievement and Synonymous
with Potential

A mistaken belief that has perhaps fostered and perpetuated attempts to uncover severe discrepancies
more than anything else is the notion that IQ and other global ability composites are near-perfect
predictors of an individuals academic achievement. For too long this misconception has permeated
psychological practice, to the point that practitioners have felt comfortable in using any composite
from an intelligence battery as the bar or standard against which to compare scores on any academic
test.
Because some current and most traditional operational definitions of SLD have incorporated a
global ability score (usually derived from an intelligence battery) as a predictor of academic
achievement and, more pointedly, as an estimate of an individuals intellectual potential, there is a
tendency to view the relation between global intellectual ability and achievement as near perfect. In
addition, practitioners have relied on other composites (e.g., Wechsler VCI) as the global ability score
to be used in ability-achievement discrepancy formulae. Even the venerable VIQ and PIQ (which no
longer exist on the WISC-IV) have been (and in some cases continue to be) used in discrepancy
formulae.
Although global ability scores in general, such as the FSIQ, are often cited as the best predictors of
general achievement, it is important to consider that they account for only about 35 to 50% of total
achievement variance (Glutting, Youngstrom, Ward, Ward, & Hale, 1997; Neisser et al., 1996). This
means that global ability measures do not explain about 50 to 65% of the variance in total
achievement. When global achievement scores are used to predict specific (rather than general)
achievement (e.g., reading decoding), the amount of variance accounted for is substantially reduced.
According to Vellutino et al. (2000) IQ scores accounted for only 10 percent to 20 percent, at best, of
the variance on the Woodcock Reading Mastery Test-Revised (WRMT-R; Woodcock, 1987) Word
Identification and Word Attack subtests, which is hardly a basis for using IQ to predict reading, to
define reading disability, or to make determinations regarding access to instructional resources (p.
233). Although IQ may represent the best predictor of achievement in relation to other indices
(Neisser et al., 1996), it simply leaves too much variance left unexplained relative to an individuals
pattern of specific academic achievement to conclude that it is in any way a highly accurate predictor
of academic functioning in a specific area (Vellutino et al., 2000). Practitioners must recognize that
there are other important factors that explain significant variance in achievement and that global
ability measures, such as the ubiquitous IQ, although possessing significant predictive power for
general purposes, fall far short of explaining the lions share of variance in achievement, particularly
specific areas of achievement that are invariably the focus of SLD evaluations.
DONT FORGET

Global IQ Predicts General Academic Achievement
Better Than Specific Achievement

Although global ability scores in general, such as an FSIQ, are often cited as the best predictors
of general achievement, it is important to consider that they account for only about 35 to 50% of
total achievement variance (Glutting, Youngstrom, Ward, Ward, & Hale, 1997; Neisser et al.,
1996). This means that global ability measures do not explain about 50 to 65% of the variance in
total achievement. When used to predict specific academic skills, global IQ accounts for only 10
to 20% of the variance, at best.
It is not only erroneous to assume that IQ predicts achievement nearly perfectly, it is also incorrect
to assume that IQ even represents an individuals capacity or potential for academic success. In other
words, IQ is simply not the marker or standard for what can be expected academically from an
individual. As Stanovich (1999) pointed out, Psychometricians, developmental psychologists, and
educational psychologists long ago gave up the belief that IQ test scores measured potential to any
valid sense (p. 354). He added, At best, IQ test scores are gross measures of current cognitive
functioning. In short, we have been basing systems of educational classification in the area of reading
disabilities on special claims of unique potential that are neither conceptually nor psychometrically
justifiable (p. 354). IQ simply cannot stand as a marker of potential for any given individual, let
alone those suspected of having SLD.
Sin #5: Failure to Apply Current Theory and Research
Most of the activities that practitioners engage in during the course of evaluating SLD can be
construed as acts of commission, notably the application of invalid or indefensible methods and
procedures. The present sin is more an act of omission in that contemporary psychometric theory and
current research on SLD are not often brought to bear in making determinations regarding
identification and diagnosis of SLD. Part of the problem here rests with the fact that contemporary
cognitive theory and the literature on SLD are not topics that are brought together commonly in the
usual graduate school curriculum.
DONT FORGET

Consistency vs. Discrepancy
in SLD Evaluation

As global ability-achievement discrepancy continues to be discredited as an approach in SLD
evaluation, the concept of consistency is likely to gain wider acceptance. Consistency, as defined
in the operational definition presented in this chapter, refers to the relationship between a
manifest academic skill deficit and a disorder in a basic psychological process or ability that is
presumed to cause the academic deficit. Thus, the scores for each tend to be consistentthey are
both either low (a marker for SLD), or they are both average or better (not suggestive of SLD).
Nevertheless, there is no question that the absence of modern theory and its attendant research base
has significantly and adversely affected all aspects of the SLD determination process, from initial
conceptualization to final recommendations. This sin was discussed at length in Chapter 2 and the
reader is referred there for a more detailed analysis of this issue.
Sin #6: Over-Reliance on Findings from Single Subtests and Screening Instruments
This sin appears to be related mainly to an inadequate understanding of psychometric scaling and
measurement. As a result, diagnostic decisions are often predicated on results from either a single
subtest or scores obtained from instruments that are ostensibly screeners and not suitable for the
purpose of diagnosis or high-stakes decision making.
The measurement of cognitive abilities/processes and academic achievement represents an activity
that is bound by the principles of psychometrics. These principles must be followed in order to
establish the reliability and validity of results obtained from any evaluation in which standardized
tests are used. One of the fundamental principles of psychometrics is that a single subtest cannot be
considered a reliable indicator by itself of the construct it is intended to measure. When a construct is
operationalized, at least two qualitatively different measures should be used to represent that
construct. In some cases, three measures are considered best for this purpose, but in practical
application, such as testing, having three or more separate measures for each construct of interest
quickly becomes unwieldy and inefficient. Nevertheless, the use of just two (qualitatively different)
subtests to measure any given construct, although practical, may not be sufficient, particularly when
there exists a statistically significant or unusual difference between the two subtest scores (which are
expected to converge) or when a more in-depth assessment of the construct is warranted.
When practitioners use a single subtest as an indicator of deficient functioning, they are in fact
making an erroneous and indefensible assumptionthat one subtest is sufficient to indicate
dysfunction. The issue is analogous to the use of instruments that display inadequate reliability for an
intended purpose. For example, tests with reliability coefficients below .90 should not be used for
diagnostic purposes or high-stakes decision making (DeVellis, 2003; Nunnally, 1978). These
guidelines are often completely ignored in the SLD evaluation process. For example, the reading rate
score from the Nelson Denny Reading Test (NDRT; Brown, Fishco, & Hanna, 1993) is routinely used
in SLD evaluations and poor scores on this one component of the test (i.e., reading rate) are often
held up as evidence of dysfunctional reading ability in SLD evaluations. Yet, the NDRT reading rate
subtest has a reported reliability of .68, which is only marginally adequate for a research instrument,
let alone for the purpose of identifying SLD. Likewise, when an individual scores low on a single test
of processing speed, rapid automatic naming, reading fluency, and the like, no definitive statement
can be made regarding dysfunction. Such findings must be corroborated by other evidence, such as
information from the individuals response to intervention, observations of performance in the
classroom, difficulties in areas related to the suspected deficit, and so forth.
CAUTION

Poor Reliability Limits Interpretability

Understanding reliability is an important component of assessment and practitioners should
recognize when there is insufficient reliability to allow for valid inferences and conclusions to
be drawn from the data. For example, a single subtest is not sufficiently reliable to support
diagnostic interpretations. A composite, made up of two or more tests, is necessary to support
interpretation. Furthermore, the composite must be comprised of two or more qualitatively
different narrow ability/processing indicators of the construct intended to be measured by the
composite. In general, for diagnosis and high-stakes decisions, interpretation should be based on
composites that represent unitary abilities/processes and that have reliability coefficients of .90
or higher.
Sin #7: Belief That Aptitude and Ability Are One and the Same
Aptitude and ability are not synonymous. According to Snow (1994), The concept of aptitude
includes any enduring personal characteristics that are propaedeutic to successful performance in
some particular situation. This definition includes affective, cognitive, and personality characteristics
as well as cognitive and psychomotor abilities (Snow, 1992). However, cognitive abilities are a
particularly important source of aptitude for learning and performance in many school and work
situations (p. 5). Thus, aptitude measures are validated for a particular purpose by demonstrating that
they predict important criteria (e.g., specific academic skills).
One of the best examples of the distinction between global ability and aptitude, as defined by Snow
(1994), may be seen through an examination of the clusters yielded by the Woodcock-Johnson
Psychoeducational BatteryRevised (WJ-R; Woodcock & Johnson, 1989). The WJ-R Scholastic
Aptitude Clusters were based on an equally weighted combination of four separate tests drawn from
the whole of the cognitive battery. The aptitude clusters were developed through a series of stepwise
multiple regression analyses conducted over the entire age range of the WJ-R (McGrew, Werder, &
Woodcock, 1991). Consistent with the definition of aptitude, this statistical procedure identified the
optimal linear combination of variables that best predict[ed] a selected criterion variable (p. 194). In
the case of the WJ-R, these regression procedures identified the specific combinations of the four
WJ-R cognitive tests that best predicted performance on the WJ-R Reading, Mathematics, Written
Language, Oral Language, and Knowledge Achievement Clusters. Research on the differential
prediction of the WJ-R Scholastic Aptitude Clusters demonstrated that these clusters consistently
predicted their respective outcome criterion better than both the 7-test Broad Cognitive Ability (BCA
Standard) and 14-test Broad Cognitive Ability (BCA Extended) scores of the WJ-R, explaining up to
50 to 70% of the variance in the outcome criterion. Clearly, the differential predictive validity
evidence, when combined with the superior prediction of achievement when compared to other
intelligence batteries (i.e., Wechsler, K-ABC, SB:IV) (McGrew, 1994, p. 211), highlights the basic but
significant difference between aptitude and ability. Aptitude scores, unlike global ability scores,
comprise specific measures of ability that are closely associated with their respective criterion
measures. For example, the WJ-R Reading Aptitude Cluster was comprised of a test of Lexical
Knowledge, a test of Memory Span, a test of Phonological Processing, and a test of Processing
Speed. Research has demonstrated consistently that each of these abilities/processes predicts reading
achievement and that one or more of these abilities/processes is low in individuals with reading
disabilities (see Chapter 2 for a review).
The cognitive shift that is necessary in understanding SLD is based on the difference between
ability and aptitude. Because aptitudes, such as the WJ-R Scholastic Aptitude Clusters, were generally
much better predictors of achievement that most global ability scores, practitioners often found a
consistency between a WJ-R Aptitude Cluster (i.e., Reading Aptitude) and the respective area of
achievement (i.e., reading) for individuals referred for learning problems in that achievement area
(i.e., reading difficulties). If a global ability (FSIQ) score for that same individual was entered into an
ability-achievement discrepancy formula (in lieu of the aptitude score), then the likelihood of a
discrepancy would be much greater, simply because the global ability score is not as good of a
predictor when compared to the aptitude score. In other words, the better the prediction (of some
academic criterion), the less likely you are to find a discrepancy between the predictor and the
outcome measure.
In SLD evaluation, aptitudes are important because they constitute the very abilities/processes that
are most closely associated with different academic outcomes (see Table 2.1 in Chapter 2). Thus, the
finding of a consistency between an individuals reading aptitude and reading achievement would be a
marker for SLD if both reading aptitude and reading achievement were below average. If reading
aptitude was average and reading achievement was significantly below average, however, then the
possibility remains that factors other than a disorder in one or more basic psychological processes
constitute the underlying cause of the academic skill(s) deficiency. This notion of aptitude-
achievement consistency is discussed later in this chapter within the context of our operational
definition.
In sum, the psychological practice of SLD identification has relied historically on methods and
procedures that have virtually no inherent reliability, much less validity. Given the vagaries of current
specifications and the lack of specific operational definitions, it should not be surprising that SLD has
been so misunderstood that methods purported to diagnose this condition have been distilled down to
rote, simplistic clinical exercises that are neither psychometrically nor theoretically justifiable. Many
of the methods described above have roots in clinical teaching and practice that were not borne from
empirical research, but rather emanated from unfounded assumptions and faulty intuitive logic. Such
exercises are so entrenched in clinical practice today that it is extremely difficult to get practitioners
to either revisit critically their SLD diagnostic methods or acknowledge that invalid methods are just
that: invalid, regardless of how long they have been used.
We suspect that some practitioners may feel we are attacking the foundations of their training and
clinical experience because we do not look favorably upon some of the more popular and widespread
procedures still being taught in school, clinical, neuropsychological, and other professional
programs that train practitioners in the process of SLD determination. It is our hope, however, that
practitioners will remain open minded to the limitations that may have accompanied their training and
clinical experiences and use the knowledge presented here as an impetus to improve future practice as
it pertains to evaluation and diagnosis of SLD.
A MODERN OPERATIONAL DEFINITION OF LEARNING DISABILITIES

One of the first general operational definitions of SLD was published by Kavale and Forness (2000).
Their model included several levels, each of which was a necessary but not sufficient condition for
SLD. When all conditions were met, however, sufficient data presumably existed to make the SLD
diagnosis. This model was an important development because it provided the specificity necessary to
allow SLD to be operationalized more reliably. A modified version of this definition was presented
by the authors of this volume (Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006; Flanagan, Ortiz,
Alfonso, & Dynda, 2006). The major development in our definitions was the incorporation of CHC
theory into the definition, thereby allowing both modern cognitive theory and research to guide the
SLD identification process. An additional change included a restructuring of the component levels of
Kavale and Forness operational definition to provide a better correspondence with the assessment
and evaluation process (Flanagan, Ortiz, Alfonso, & Mascolo, 2006). Our definition introduced the
concept of consistency between cognitive and academic deficits. Similar to the Kavale and Forness
definition, our definition consists of various levels corresponding to key components of the process
(see Rapid Reference 4.1). As will become evident, it is only when the specified operational criteria at
each of the four levels are met that SLD can be diagnosed.
Rapid Reference 4.1

Basic Criteria for the Various Levels of the Operational Definition of SLD
Meeting the criteria necessary at each of the levels within the operational definition of SLD
presented in this chapter is part of the process for establishing a defensible diagnosis. Following
are the criteria for each level that must be met.
a. At Level I-A, a normative deficit in academic functioning is required.

b. At Level I-B, exclusionary factors are considered and determined not to be the primary
cause of the academic deficit(s).
c. At Level II-A, a normative deficit in a cognitive ability/process is required.
d. At Level II-B, exclusionary factors are again considered and determined not to be the
primary cause of either academic or cognitive deficits.
e. At Level III, underachievement is established by an empirical or logical relationship
between the cognitive and academic deficits identified in Levels I-A and II-A, and by
evidence of otherwise normal functioning in those abilities/processes not strongly related
to the academic deficit(s).
f. At Level IV, there must be evidence of functional limitations in activities of daily life that
require the academic skill identified as deficient.

Prereferral Issues
The description of the operational definition that follows is based primarily on data collected from
the use of standardized, norm-referenced ability tests. Consistent with IDEA 2004 and its attendant
regulations (34 CFR Parts 300, 301, and 304) we see the use of norm-referenced ability testing as
only one method among many that may be used in the evaluation of SLD. We wish to emphasize that,
prior to engaging in the use of norm-referenced ability tests, other important and significant data
sources should have already been collected, including data from RTI and other prereferral activities
(e.g., informal testing, direct observation of behaviors, work samples, interviews with teachers and
parents). The operational definition outlined here is intended to be used when either RTI or other
prereferral intervention methods meet with little or no success.
Level I-A: Measurement of Specific Academic Skills and Acquired Knowledge
Level I-A represents perhaps the most basic concept involved in SLDthat academic learning is
somehow disrupted from its normal course on the basis of some type of internal dysfunction.
Although the specific mechanism that inhibits learning is not directly observable, we can proceed on
the assumption that it does manifest itself in observable phenomena, particularly in areas of academic
achievement. Thus, the most logical and initial component of an operational definition of SLD should
be establishing the fact that some type of learning dysfunction exists apart from reported low
achievement (e.g., teacher reports). If no academic deficit or documented failure to respond to
appropriate instruction can be found, whether through the use of standardized tests, RTI, or any other
viable method, then the issue of SLD becomes moot because such dysfunction is a necessary
component of the definition.
Assessment activities at Level I-A usually involve comprehensive assessment of the major areas of
academic achievement (e.g., reading, writing, math). For convenience as well as practical reasons, the
academic abilities depicted in Figure 4.1 at this level in the hierarchy are organized according to the
eight areas of achievement specified in the federal regulations attached to IDEA 2004 (i.e., 34 CFR
300.309): math calculation, math problem solving, basic reading, reading comprehension, reading
fluency, written expression, oral expression, and listening comprehension. The definitions of these
academic domains are neither provided in IDEA 2004 or its attendant regulations, nor are they based
on any particular theoretical formulation. As such, they remain vague and nonspecific. It is for this
very reason that the XBA approach is particularly well suited for evaluation of SLD. The rigor
introduced by the application of both modern cognitive theory and empirical research that underlie
XBA provides both a structural framework for assessment and a common nomenclature that
significantly reduces the ambiguity that pervades current SLD definitions and methods for
identification. Accordingly, for theoretical and psychometric reasons, the academic abilities depicted
at this level have been organized according to the broad CHC abilities/processes that encompass these
achievement domains (i.e., Gq, Grw, Gc). Generally speaking, Level I-A represents an individuals
stores of acquired knowledge. These specific knowledge bases (i.e., Gq, Grw, Gc) develop almost
exclusively as a function of formal instruction, schooling, and educationally related experiences.
At Level I-A, the performance of the student is compared to a tests norm sample. The evaluator
must answer the following question: Is performance relative to individuals of the same age in the
general population within normal limits or higher? If yes, SLD is ruled out; if no, further assessment
is needed to determine SLD. Note that the comparison is not based on performance within the
individual, but rather performance of the individual contrasted with other individuals. Thus, person-
relative discrepancies, no matter how large, are generally not useful as indicators of deficiency
unless one of the students scores falls below the Average Range7 (i.e., standard scores of less than
85). Unless test data indicate a normative deficit in one or more areas of academic functioning,
advancement to Level I-B analysis is unwarranted. If the criterion of a normative deficit in academic
achievement is not met, then the evaluator should either reassess the sufficiency of the academic
evaluation or reexamine the referral questions and concerns. For example, it is entirely possible that
the test selected for initial evaluation simply failed to adequately assess the specific area of presumed
deficiency.

Figure 4.1 A Modern Operational Definition of SLD
Source: From Flanagan, Ortiz, Alfonso, and Mascolo (2002, 2006).

Level I-B: Evaluation of Exclusionary Factors
Level I-B involves evaluating whether the documented academic skill or knowledge deficit(s) found
through Level I-A analysis is primarily the result of factors other than intrinsic cognitive dysfunction.
Because the potential reasons for low academic performance are many and do not always reflect an
actual manifestation of SLD, clinicians must be careful not to ascribe causal links to SLD prematurely
and should develop reasonable hypotheses related to potential causes other than cognitive
dysfunction. For example, cultural or language differences are factors that can adversely affect test
performance and result in data that appear to suggest SLD. In addition, factors such as insufficient
instruction, lack of motivation, emotional disturbance, performance anxiety, psychiatric disorders,
sensory impairments, and medical conditions (e.g., hearing or vision problems) need to be ruled out
as potential explanatory correlates to any academic deficiencies identified at Level I-A.
DONT FORGET

Exclusionary Factors
Are Considered After
All Phases of Testing

Test performance can be adversely influenced by a wide range of variables. These variables may
be present at some points in time and absent at others or they may be present consistently.
Whenever testing is conducted, practitioners should review the results carefully and determine if
there were any exclusionary factors present that might have affected the results negatively. This
would include such factors as fatigue, lack of motivation, emotional disturbances, cultural and
linguistic factors, incorrect scoring or administration, and so forth.
Noteworthy is the fact that RTI methods can be used to assist in evaluating the data collected up to
this point. If RTI methods were employed prior to referral for a comprehensive evaluation, it is
possible that many of the plausible external reasons for the academic deficiency have already been
ruled out (e.g., lack of effective instruction, lack of motivation, cultural and linguistic differences).
Alternatively, some relevant and important exclusionary factors may not be uncovered until much
later in the assessment process. This is because it may not be possible to rule out certain conditions at
this level, such as Mental Retardation, which may necessitate Level II-A assessment (i.e., assessment of
cognitive abilities/processes). When the conditions listed at Level I-B have been reliably evaluated
and determined not to be the primary reason for the observed academic deficits, assessment may
advance to Level II-A.
Level II-A: Measurement of Abilities/Processes and Aptitudes for Learning
Level II-A evaluation is similar to Level I-A evaluation except that it focuses on cognitive
abilities/processes rather than academic skills. In general, the process of assessment at Level II-A
proceeds with the expectation that an individual will perform within normal limits (i.e., standard
scores of 85 to 115, inclusive) in all areas listed at this level in Figure 4.1. The questions that must be
answered at this level are as follows: (1) Is performance on tests of cognitive ability/processing
within normal limits relative to people of the same age in the general population; and (2) If a deficit
in cognitive ability/processing is found, is it empirically or logically related to the academic skill
deficit? Of the more salient aspects involved in creating an operational definition of SLD, none is
more central than the need to establish the potential presence of a normative deficit in a particular
cognitive ability/process that is related to and is the presumptive cause of the observed academic
deficit(s). This is because SLD is defined, according to IDEA 2004, as a disorder in one or more basic
psychological process (34 CFR 300.8[c][10]). Although the term disorder may be defined in
numerous ways, it is clear that this term is not synonymous with average ability. A disorder implies
dysfunction, deficit, or disability; that is, a condition that significantly limits the individual relative to
most people. Therefore, documenting a disorder should be based on population-relative
comparisons.
The cognitive abilities/processes depicted at this level in the evaluation hierarchy in Figure 4.1 are
organized according to the broad abilities/processes specified in CHC theory (i.e., Gs, Gsm, Glr, Ga,
Gv, Gf, and Gc). These CHC abilities are organized further according to the processes they represent
when embedded within an information processing perspective, including attention and cognitive
efficiency, memory, thinking abilities, and language abilities (e.g., Dean & Woodcock, 1999;
Woodcock, 1993). The latter category represents the collection of Gc narrow abilities that more
accurately reflect processing skills as opposed to the Gc abilities that represent the stores of acquired
knowledge that were included at Level I-A. Generally speaking, the abilities/processes depicted at
Level II-A provide valuable information about an individuals learning efficiency. Development of
most of the cognitive abilities/processes represented at this level tend to be less dependent on formal
classroom instruction and schooling as compared to the abilities presented at Level I-A (Carroll,
1993, 1997). Furthermore, specific or narrow abilities/processes within many of the CHC areas
included at Level II-A may be combined in different ways to yield specific aptitudes for learning in
different skill areas (e.g., reading, math, writing). Aptitude performance, therefore, is expected to be
consistent with its corresponding academic skill area. For example, a reading aptitude may be
composed of Phonetic Coding (a narrow Ga ability/process), Naming Facility (a narrow Glr
ability/process), and Working Memory (a narrow Gsm ability/process) (or any combination of
abilities/processes listed in the first column of Table 2.1 in Chapter 2) because the research shows that
each of these abilities/processes has a significant relationship with reading achievement. Thus, if a
childs reading skill deficit is the result of a disorder in one or more basic psychological processes,
then his or her reading aptitude performance would be consistent with (not discrepant from) his or
her actual reading performance.
Data generated at Level II-A, like the data generated at Level I-A, provide input for Level III
analyses, should the process advance to the third level. The evaluator may progress to Level III when
the following two criteria are met: (1) identification of a normative deficit in at least one area of
cognitive ability/processing; and (2) identification of an empirical or logical link between any
identified area of cognitive ability/processing deficiency and the area(s) of academic skill deficiency
(as identified in Level I-A analysis).
Level II-B: Reevaluation of Exclusionary Factors
Although the presence of a cognitive ability/processing deficit that is related to the academic deficit is
fundamental to the operational definition of SLD described herein, these deficits must not be
primarily the result of exclusionary factors. Hypotheses regarding reasonable explanations
(particularly situation-specific factors such as motivation, fatigue) for the observed cognitive
deficit(s) must be rejected in order to conclude that the data represent an accurate and valid reflection
of true ability. When all appropriate exclusionary factors have been evaluated and excluded as the
primary reason for the observed cognitive deficits, the process may advance to Level III.
Level III: Evaluation of Underachievement
Advancement to Level III implies that the three necessary conditions for determination of SLD
specified previously have been met. To review, these criteria included: (1) documentation of one or
more academic skill deficits; (2) documentation of one or more cognitive ability/processing deficits;
and (3) determination that the identified academic and cognitive deficits are related and are not the
primary result of exclusionary factors. What has not yet been determined, however, is whether the
pattern of results supports the notion of underachievement in the manner that might be expected in
cases of suspected SLD, or whether the pattern of results may be better explained via alternative
causes, such as Mild Mental Retardation or other factors known to have an adverse effect on both
academic and cognitive performance (e.g., sensory-motor handicaps, lack of English language
proficiency, emotional disturbances). Thus, Level III analysis is designed to determine whether the
identified academic and cognitive deficits exist within an otherwise normal ability/processing profile.
Given the historical predominance of the discrepancy model, evaluation of consistency between
academic and cognitive deficits may appear unusual at first. As mentioned earlier, an aptitude is
comprised of tests that measure abilities/processes that are most directly relevant to the development
and acquisition of specific academic skills and thus is the best predictor of those skills. As such, an
aptitude-achievement consistency is an important marker for SLD.
Because consistency in scores that are within normal limits or even above it would have already
failed to demonstrate normative-based deficits, SLD determination at this level (Level III) is
concerned primarily with scores that fall below the Average Range. However, below average aptitude
coupled with a below average academic achievement is insufficient to meet the criterion at this level
unless the below average aptitude-achievement consistency occurs within the context of an otherwise
normal ability profile. Thus, analysis of data at this level also involves examining scores that fall
within or above the Average Range or normal limits of functioning. Low aptitude scores across the
board (i.e., all or nearly all cognitive abilities/processes in the deficient range) may be more
suggestive of Mild Mental Retardationa condition that would preclude determination of SLD under
this definition (and most others). In the case of an individual with reading difficulties, it would be
necessary to determine the level of performance or functioning in all cognitive areas, including those
that are largely unrelated to reading. If the majority of these abilities/processes are within normal
limits relative to same-aged peers from the general population, then the practitioner can be
reasonably confident that the consistency between reading aptitude deficits (e.g., below average
performance on cognitive abilities/processes related to reading, such as phonological processing and
working memory) and academic deficits in reading represents underachievement.
DONT FORGET

Evaluation of
Underachievement

In traditional approaches to evaluation of SLD, the concept of underachievement has been
operationalized as a global ability-achievement discrepancy. This method has been shown to be
inadequate and technically invalid for making such determinations. XBA and the operational
definition described in this chapter provide an alternative operationalization that is based on a
pattern of circumscribed and empirically related cognitive and academic deficits that exist within
an otherwise normal ability profile. The SLD Assistant program located on the accompanying
CD-ROM assists in making decisions regarding underachievement.
To assist practitioners in determining whether a below average aptitude-achievement consistency
occurs within an otherwise normal ability profile, we have included on the CD-ROM that
accompanies this book a program called the SLD Assistant. Instructions for using the program and
details regarding its development and the principles upon which it is based are contained in the
program and will not be repeated here. In general, however, the program utilizes developmentally
based g-loadings across the lifespan for seven broad CHC abilities/processes and combines them with
a weighting system that recognizes the changing nature of formal instruction with respect to
abilities/processes that are utilized, taught, or developed across the general education curriculum. As
illustrated in Figure 4.2, the program computes a g-value that should be used in conjunction with
other information (e.g., the specific normative classifications of intact abilities /processes) to
generate an informed, well-reasoned, defensible opinion regarding the presence or absence of an
otherwise normal ability profile.
Figure 4.2 illustrates a sample case A involving a child who is 7 years old and in second grade
a period in which instruction of basic skills takes place and at which time teaching occurs primarily
through the use of visual stimuli and reliance on phonological processing skills for reading
acquisition. In this case, the child was identified as having deficits in Ga, Gs, and Gsm. Deficits in
these areas are related empirically to reading difficulties (e.g., problems with decoding,
comprehension, and fluency). The child in this case, therefore, exhibited a below average aptitude
(Ga, Gs, Gsm)-achievement (reading decoding, reading fluency) consistency. The SLD Assistant was
used to determine whether the childs aptitude-achievement consistency occurs within an otherwise
normal ability profile. As may be seen in Figure 4.2, the program reported a g-value of 1.04, which
exceeds the cutoff of 1.0. This value was used to support the conclusion that the childs
underachievement (aptitude-achievement consistency) is consistent with an SLD, precisely because it
did in fact occur within an otherwise normal ability profile (see SLD Assistant for details).

Figure 4.2 Case Illustration A Using the SLD Assistant

A g-value 1.0, coupled with other data supporting otherwise normal ability/processing, supports
the presence of SLD because the intact abilities /processes suggest the likelihood of average or better
academic performance if the domain-specific deficiencies (in this case, abilities/processes associated
with reading) are either accommodated or remediated. If, on the other hand, the childs g-value was
<1.0, it is possible that SLD is not present. A child who does not function within normal limits after
the effects of cognitive deficits (particularly those most strongly related to the academic skill deficits)
are accounted for may not be able to function in the average range academically (e.g., at grade level),
despite attempts to remediate or accommodate those deficits. As such, a child with this pattern of
performance (i.e., most abilities/processes in the low average or below average range) likely has a
more pervasive (rather than domain-specific) pattern of cognitive deficienciesa pattern that is not
consistent with the SLD construct.
Case scenario B is presented in Figure 4.3. In this example another 7-year-old, second-grade
child displays a very different pattern than that presented in case scenario A. Whereas the g-value of
the child in case A was greater than 1.0, the g-value reported here is less than 1.0 (i.e., .6), or well
below the cutoff. This child has the same deficits in Ga, Gs, and Gsm as in the previous case; however,
there is an additional deficit in the area of Gc. Because Gc, Ga, and Gsm are three of the four most
important abilities at this point in an individuals education, and because this child is also deficient in
Gs, there is little support for the idea that this child displays an otherwise normal ability/processing
profile. In this case, a diagnosis of SLD would not likely be appropriate and there may be better
explanations for this particular constellation of cognitive and academic deficits, such as severe
language problems or developmental difficulties.

Figure 4.3 Case Illustration B Using the SLD Assistant

The latter example highlights an important caution in the process of identifying SLD via use of the
operational definition. If an individual does not display an otherwise normal ability profile, this
finding alone does not indicate that no disability is present. Rather, it indicates that an SLD is not
likely present. When many areas of ability/processing are found to be below normal limits,
practitioners should review the sufficiency of the evaluation and consider explanations for the results
other than SLD.
Level IV: Evaluation of Interference with Functioning
When the SLD determination process reaches this point, criteria at the previous three levels have been
met, thus supporting the presence of SLD. Further evaluation may seem unnecessary, but an
operational definition of SLD based only on the previous criteria would still be incomplete. One of
the basic eligibility requirements contained in both the legal and clinical definitions of SLD refers to
whether the suspected learning disorder actually results in significant or substantial academic failure
or other restrictions/limitations in daily life functioning. This final criterion reflects the need to take a
broad survey of all collected data as well as the real-world manifestations of any presumed disability.
In general, if the criteria specified in Levels I through III have been met, it is very likely that Level IV
analysis serves only to support conclusions that have already been drawn up to this point. However, in
cases in which data may be equivocal, Level IV analysis becomes an important safety valve, ensuring
that any representations of SLD suggested by the data indeed manifest in observable impairments in
one or more areas of functioning in real-life settings.
DONT FORGET

Evaluation of Interference
with Functioning

All too often, the evaluation of SLD ends when some type of cognitive or academic deficit is
identified. However, it is important for practitioners to remember that the legal definitions of
disability include provisions that state there must be some type of functional impairment or
limitation in activities of daily living that require the skill that is presumed to be deficient. Thus,
test results alone are insufficient to diagnose or establish SLD and must be bolstered by evidence
of interference with functioning that is documented and occurring in the present.
The advantage of the operational definition we have described (Flanagan, Ortiz, Alfonso, &
Mascolo, 2006) lies in its integration of established notions regarding the nature of SLD with theories
about the structure of cognitive abilities into an inherently practical method for SLD assessment that
clearly specifies relationships between and among both cognitive and academic abilities, definitions
of aptitude and global ability scores, and a recursive process that accommodates essential elements
necessary for high-quality evaluation of learning difficulties (p. 360).
SUMMARY

Applied psychologists in all assessment-related fields no doubt encounter many evaluations in which
the focus is on identifying SLD. As noted in the beginning of this chapter, there are more
misconceptions regarding the nature and process of SLD diagnosis than there are established
principles or guidelines for its evaluation. The discussion regarding the seven deadly sins at the outset
was intended to be a call to action for practitioners involved in SLD evaluations to move beyond
practices and procedures that are based more on clinical lore than science and embrace the rapidly
developing empirical literature on modern intelligence theory and the relations between cognitive
abilities/processes and academic outcomes.
It seems clear that efforts toward increasing the reliability and validity of SLD identification
require, at the very least, application of theory and research. The operational definition described in
this chapter represents just such an attempt and, when combined with appropriate XBA methods and
procedures, results in a process that is theoretically driven, empirically grounded, and based on the
current and empirically supported structure of cognitive abilities /processes. We firmly believe that
the marriage of the XBA approach with the operational definition of SLD creates a solid
methodological paradigm for the systematic evaluation of students suspected of having SLD.
REFERENCES

Brown, J. I., Fishco, V. V., & Hanna, G. S. (1993). Nelson-Denny Reading Test. Itasca, IL: Riverside
Publishing.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York:
130). New York: Guilford Press.
Dean, R., & Woodcock, R. (1999). The WJ-R and Bateria-R in neuropsychological assessment:
Woodcock psychological and educational assessment research report no. 3. Itasca, IL: Riverside
Publishing.
DeVellis, R. R. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA:
Sage Publications.
Flanagan, D. P., Keiser, S., Bernier, J. E., & Ortiz, S. O. (2003). Diagnosing learning disability in
adulthood. Boston: Allyn & Bacon.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Dynda, A. M. (2006). Integration of response of
intervention and norm-references tests in learning disability identification: Learning from the Tower
of Babel. Psychology in the Schools, 43, 807-825.
Comprehensive assessment and learning disabilities. Boston: Allyn & Bacon.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo, J. T. (2006). Achievement test desk reference:
A guide to learning disability assessment-Second edition. New York: Wiley.
Fletcher, J. M., Lyon, G. R., Barnes, M., Stuebing, K. K., Francis, D. J., Olson, R. K., Shaywitz, S. E., &
Shaywitz, B. A. (2002). Classification of learning disabilities: An evidenced-based evaluation. In R.
Bradley, L. Danielson, & D. P. Hallahan (Eds.), Identification of learning disabilities: Research to
practice (pp. 185-250). Mahwah, NJ: Erlbaum.
Glutting, J. J., Youngstrom, E. A., Ward, T., Ward, S., & Hale, R. L. (1997). Incremental efficacy of
WISC-III factor scores in predicting achievement: What do they tell us? Psychological Assessment, 9,
295-301.
Gordon, M., Lewandowski, L., & Keiser, S. (1999). The LD label for relatively well-functioning
students: A critical analysis. Journal of Learning Disabilities, 32, 485-490.
Gregg, N., & Mather, N. (2001). Discrimination against high achieving adults with learning
disabilities: A tragic consequence of Public Law interpretation. LDA Newsbriefs, 36(5), 11-14.
Kaufman, A. S. (1979). Intelligent testing with the WISC-R. New York: Wiley.
Kaufman, A. S. (1994). Intelligent testing with the WISC-III. New York: Wiley.
Kavale, K. A., & Forness, S. R. (2000). What definitions of learning disability say and dont say: A
critical analysis. Journal of Learning Disability, 33(3), 239-256.
Maller, S. J., & McDermott, P. A. (1997). WAIS-R profile analysis for college students with learning
disabilities. School Psychology Review, 26, 575-585.
McDermott, P. A., Fantuzzo, J. W., & Glutting, J. J. (1990). Just say no to subtest analysis: A critique of
Wechsler theory and practice. Journal of Psychoeducational Assessment, 8, 290-302.
McDermott, P., Fantuzzo, J., Glutting, J., Watkins, M., & Baggaley, A. (1992). Illusions of meaning in
the ipsative assessment of childrens ability. Journal of Special Education, 25, 504-526.
McDermott, P. A., & Glutting, J. J. (1997). Informing stylistic learning behavior, disposition, and
achievement through ability testsor more illusions of meaning? School Psychology Review, 26,
163-175.
McGrew, K. S. (1994). Clinical interpretation of the Woodcock-Johnson Tests of Cognitive Ability-
Revised. Boston: Allyn & Bacon.
McGrew, K. S., & Knopik, S. N. (1996). The relationship between intra-cognitive scatter on the
Woodcock-Johnson Psycho-Educational Battery-Revised and school achievement. Journal of School
Psychology, 34, 351-364.
McGrew, K. S., Werder, J., & Woodcock, R. (1991). WJ-R technical manual. Chicago: Riverside
Publishing.
National Center for Education Statistics. (2006). Participation in education: Elementary/ Secondary
Education-Indicator 8. Retrieved from http://nces.ed.gov/programs/coe/2006/section1/indicator08.asp
Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., Ceci, S. J., Halpern, D. F., Loehlin,
J. C., Perloff, R., Sternberg, R. J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American
Psychologist, 51, 77-101.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Siegel, L. S. (1999). Issues in the definition and diagnosis of learning disabilities: A perspective on
Guckenberger v. Boston University. Journal of Learning Disabilities, 32(4), 304-319.
Snow, R. E. (1992). Aptitude theory: Yesterday, today, and tomorrow. Educational Psychologist, 27, 5-
32.
Snow, R. E. (1994). Abilities in academic tasks. In R. J. Sternberg & R. K. Wagner (Eds.), Mind in
context: Interactionist perspectives on human intelligence (p. 337). Cambridge, England: Cambridge
University Press.
Stanovich, K. E. (1999). Who is rational? Studies of individual differences in reasoning. Mahwah, NJ:
Erlbaum.
Vellutino, F. R., Scanlon, D. M., & Lyon, R. G. (2000). Differentiating between difficult-to-remediate
and readily remediated poor readers: More evidence against the IQ-achievement discrepancy
definition of reading disability. Journal of Learning Disabilities, 33, 223-238.
Woodcock, R. W. (1987). Woodcock Reading Mastery Test-Revised. Circle Pines, MN: American
Guidance Service.
Woodcock, R. W. (1993). An information processing view of Gf-Gc theory. Journal of
Psychoeducational Assessment Monograph Series: WJ-R Monograph, 11, 80-102.
Woodcock, R. W., & Johnson, M. B. (1989). The Woodcock-Johnson Psycho-Educational Battery-
Revised. Allen, TX: DLM Teaching Resources.
TEST YOURSELF

1. The most logical and empirically supported way to evaluate SLD is to use an ability-
achievement discrepancy calculated between the absolute value of two obtained
standard scores. True or False?
2. The use of XBA improves the process of SLD evaluation because
(a) theory and research are used to operationalize SLD.

(b) the terminology of CHC theory and XBA is consistent and serves to facilitate
communication across disciplines.
(c) research is used to understand the relations between cognitive and academic
abilities/processes.
(d) all of the above.

3. Because ipsative (person-relative) analysis has a long history of clinical application, it
(a) has gained considerable research evidence to support its use.

(b) has become so entrenched in practice that its significant problems are often
ignored.
(c) is the best way to conduct SLD evaluations.
(d) should remain the primary method by which SLD is diagnosed.

4. A relative weakness is not likely to be a true weakness unless it is also
(a) a normative weakness.

(b) more than one standard deviation different from the individuals overall average.
(c) a common occurrence in the general population.
(d) between the 16th and 49th percentile ranks.

5. An FSIQ is a near-perfect predictor of any and all achievement domains and,
therefore, sets the bar for expected academic outcome. True or False?
6. One of the most significant problems that has hampered evaluation of and research
on SLD has been
(a) a continuing lack of tools with which to measure specific abilities /processes.
(b) the reluctance of parents to provide information about early learning
experiences.
(c) the absence of a modern operational definition of SLD.
(d) lack of recognition of SLD as a problem in education.

7. The operational definition presented in this chapter begins with identification of a
normative deficit in at least one area of academic skill because
(a) assessment should always start with academic testing.

(b) achievement testing gets an individual warmed up for the cognitive testing.
(c) frequently there is no indication of what kind of learning problems an individual
has.
(d) if there is no actual deficit in learning, there is no SLD.

8. Evaluation of exclusionary factors following any kind of testing may reveal other
possible reasons for poor performance, including
(a) fatigue or lack of motivation.

(b) incorrect scoring.
(c) cultural or linguistic differences.

9. Hypothesis-driven assessment is crucial to the evaluation process in order to prevent
(a) the examinee from becoming too nervous.

(b) confirmatory bias.
(c) psychometric bias.
(d) deficit performance from occurring.

10. When all criteria at all levels of the operational definition have been met, it can
safely be assumed that SLD is present even in the absence of any actual functional
limitations in activities that require the deficit skill (e.g., reading). True or False?

Answers: 1. False; 2. d; 3. b; 4. a; 5. False; 6. c; 7. d; 8. d; 9. b; 10. False

Five

USE OF THE CROSS-BATTERY APPROACH IN THE ASSESSMENT OF

DIVERSE INDIVIDUALS

In light of the increasing diversity in the U.S. population, working with culturally and linguistically
diverse individuals continues to present unique challenges to practitioners, especially in the use of
standardized tests of intelligence or cognitive ability. The issue is no longer merely a practical
concern, but a professional and ethical responsibility, as delineated by the various publications that
now govern assessment of diverse individuals, including the Guidelines for Providers of
Psychological Services to Ethnic, Linguistic, and Culturally Diverse Populations (APA, 1990) and the
Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). Legislative
mandates such as the Individuals with Disabilities Education Improvement Act (IDEIA, 2004) and its
attendant regulations also continue to specify requirements for ensuring fairness and equity in
evaluations of individuals from diverse backgrounds. Clearly, psychologists can no longer view the
influence of language and culture on behavior as a secondary issue in assessment.
Despite nearly a century of study, the manner in which language and culture affect test
performance, let alone interpretation of test results, often remains at best a sidebar in the education
psychologists receive in assessment courses and practica. Training programs do not routinely
provide content or instructional methods in this area, albeit practical frameworks for conducting
nondiscriminatory assessments have only begun to emerge recently (Ortiz, 2001; Ortiz & Ochoa,
2005a, 2005b; Rhodes, Ochoa, & Ortiz, 2005). The problem has been compounded by the unfortunate
tendency to view nondiscriminatory assessment as primarily an issue of communicating in the
examinees native language, which in turn leads to neglect of the defining cultural factors that are
extremely important in understanding the nature of test results. Practitioners should remember mere
possession of the capacity to communicate in an individuals native language does not ensure
appropriate, nondiscriminatory assessment of that individual. Traditional assessment practices and
their inherent biases can be easily replicated in any number of languages (Flanagan, McGrew, &
Ortiz, 2000, p. 291). Misconceptions and inadequate training aside, assessment of diverse individuals
in the present day continues to be conducted by psychologists and other professionals whose levels of
competency in this regard are far from optimal. As such, issues in instrument selection, instrument
administration, and interpretation of results (not to mention referral and the entire decision-making
process as well) remain problematic at best for the vast majority of practitioners using standardized
tests (Ortiz & Dynda, 2005; Ortiz & Ochoa, 2005c).
DONT FORGET

Assessment Bias and
Cultural Diversity

[The] mere possession of the capacity to communicate in an individuals native language does
not ensure appropriate, nondiscriminatory assessment of that individual. Traditional assessment
practices and their inherent biases can be easily replicated in any number of languages.
(Flanagan, McGrew, & Ortiz, 2000, p. 291)
The entire breadth and depth of the issues involved in bilingual, cross-cultural, nondiscriminatory
assessment are far beyond the scope of this chapter and book. There are no simple answers,
prescriptions, or shortcuts for this rather complex topic. Practitioners are often concerned primarily
with trying to determine which standardized tools are considered best and which are not, but this
represents only a very small part of what is a much broader process. In fact, there are many occasions
in which standardized tests may not be used at all for the purposes of assessment. The procedures to
be discussed in this chapter must not be viewed as the definitive or only answer to the broad range of
difficulties encountered in the comprehensive assessment of diverse individuals. Rather, the approach
to assessment we outline should, as with the use of standardized tests in any comprehensive
evaluation, be seen as only one component that is integrated with multiple sources of data into a
larger framework that uses the individuals cultural and linguistic history as the appropriate context
from which to draw meaning and conclusions from the data. Figure 5.1 provides an example of how
the assessment methods to be described should be viewed as representing only one element within the
structure of a broad approach to nondiscriminatory assessment. The steps outlined in Figure 5.1 were
developed by Ortiz (2002) and represent the various stages of assessment that comprise a
comprehensive, systematic approach to reducing bias and discrimination in the assessment of diverse
individuals. Note that in the overall structure of this approach, we will be focusing primarily on Step
7, which attempts to reduce bias in traditional practices (i.e., testing). As such, these procedures
represent only a portion of the entire assessment process, emphasizing the fact that any single
procedure or method is insufficient to represent the totality of what constitutes current best practices
in the assessment of culturally and linguistically diverse children. To accomplish such an undertaking
properly would entail the application of a broad framework and explication of issues that are well
beyond the scope of this chapter. The reader is referred to more comprehensive sources for such
guidance (c.f., Rhodes et al., 2005; Vazquez-Nuttall, Li, Dynda, Ortiz, Armengol, Walton, & Phoenix,
in press). However, as a beginning point to understanding the issues involved in the use of
standardized tests with diverse individuals, the following discussion and approach is offered.

Figure 5.1 A Model for Comprehensive Nondiscriminatory Assessment

Because of its importance in terms of creating the background and context for fair and equitable
assessment, this chapter begins with a section that discusses the manner in which culture and language
typically affect the assessment process. This discussion highlights the need and rationale behind the
principles underlying the Culture-Language Test Classifications (C-LTC) and Culture-Language
Interpretive Matrix (C-LIM). The next section in this chapter describes the basis for cognitive ability
test classifications according to cultural and linguistic dimensions. The information contained in this
section serves as a new frame of reference from which tests can be both selected and interpreted
CAUTION

XBA As Part of a
Larger Framework

XBA procedures must not be viewed as the answer to the broad range of difficulties encountered
in the comprehensive assessment of diverse individuals. Rather, XBA is only one component that
is integrated together with multiple sources of data into a larger framework that uses the
individuals cultural and linguistic history as the appropriate context from which to draw
meaning and conclusions from the data.
in accordance with principles directly relevant in the assessment of diverse individuals. The final
section of this chapter covers specific methods and procedures involved in the application of the C-
LTC and C-LIM in the assessment of diverse individuals. Guidelines for test selection and
interpretation of data are also presented. As will become evident, the C-LTC and C-LIM can be used in
conjunction with the tenets of the XBA approach to provide a theoretically and empirically defensible
evaluation that is also fair.
CULTURE, LANGUAGE, AND TESTS OF COGNITIVE ABILITY

Because psychological training often fails to provide the essential knowledge and skills needed for
the assessment of culturally and linguistically diverse individuals, practitioners have often been
forced to utilize procedures and tests that may not be suitable or appropriate for measuring the
cognitive abilities or intellectual functioning of such individuals (Ortiz & Dynda, 2005; Ortiz &
Ochoa, 2005a). According to various studies (e.g., Ochoa, Powell, & Robles-Pina, 1996; Ochoa,
Riccio, Jimenez, Garcia de Alba, & Sines, 2004), the most commonly used instruments with culturally
and linguistically diverse students include a Wechsler Scale (generally administered completely in
English), the Bender Visual-Motor Gestalt test, the Draw-A-Person test, and a nonverbal instrument
(e.g., Universal Nonverbal Intelligence Test [UNIT] or Leiter). Generally speaking, given the
inadequate psychometric properties, inappropriate norms and comparison groups, unidimensional
assessments, linguistic and cultural confounds, and so forth, that characterize many of these tests,
such a combination or battery is problematic. Moreover, the problems associated with the use of such
tests on diverse populations is not entirely solved when native-language tests are used or when
interpreters are used for the process of administration (Lopez, 1997; McCallum & Bracken, 1997).
But perhaps the greatest problem associated with the use of any set of tests or any test battery lies in
the fact that tests are often selected, administered, and then interpreted in a manner that is not
systematic or guided by the research literature on how culture or language influence performance of
individuals from various cultures or with various linguistic backgrounds. Decisions and conclusions
will thus be haphazard and largely indefensible. To derive meaningful information from the use of
standardized tests with individuals who are culturally and linguistically diverse, the nature and
process of bias and discrimination need to be well understood.
DONT FORGET

The Nature and Process of
Bias and Discrimination

In order to derive meaningful information from the use of standardized tests with individuals
who are culturally and linguistically diverse, the nature and process of bias and discrimination
need to be well understood.
Cultural Bias versus Cultural Loading
An extensive review of the question and nature of bias in psychometric procedures is not practical,
given the practitioner-oriented focus of this book. The reader is referred to other sources for a more
detailed treatment of the subject (e.g., Sandoval, Frisby, Geisinger, Scheuneman, & Grenier, 1998).
Nevertheless, it is necessary to provide some explanation of how bias operates in the psychometric
approach that underlies testing.
In order to begin clarifying the nature of bias, it is necessary to recognize the extent to which
culture and cultural values have played a part in the construction and development of intelligence
batteries from their very origins. According to Kamphaus (1993), the traditions of Galton, Binet,
Wechsler, Cattell, and others underlie all modern tests of intelligence. These tests emanated from
French, British, German, North American, and other similarly European cultures (p. 441). Perhaps
the best illustration of this notion comes from Kaufman (1994), who provides a very poignant
recollection of his collaborative work with David Wechsler on decisions regarding item deletion
for the revision of the WISC. Kaufman wrote:
From that point on, I never held back anything. He would usually respond calmly but
occasionally Id strike a raw nerve, and his grand-fatherly smile would evaporate. His temples
would start to pulse, and his entire face and scalp would turn crimson. Id unconsciously move
my chair back in self-protection, the way I did when I tested hard-core prisoners on the old WAIS
and had to ask the question, Why should we keep away from bad company? I struck that
exposed nerve when
I urged him to eliminate the Comprehension item about walking away from a fight if someone
much smaller starts to fight with you. The argument that you cant walk away from any fight in a
black ghetto just added fuel to his rage. When I suggested, at a later meeting, that he just had to
get rid of the item, Why should women and children be saved first in a shipwreck? or incur the
wrath of the new wave of militant feminists, his response was instant. With red face and pulsing
head, he stood up, leaned on his desk with extended arms, and said as if he were firing a
semiautomatic, Chivalry may be dying. Chivalry may be dead. But it will not die on the WISC.
(p. x; emphasis in original)

It would seem that chivalry did die after all, because the shipwreck item was in fact dropped from
the Wechsler Intelligence Scale for Children-Revised (WISC-R). However, it is not a trivial fact that
the fight item was retained not only at that time, but also on the Wechsler Intelligence Scale for
Children-Third Edition (WISC-III) and carried over even in the most current revision, the Wechsler
Intelligence Scale for Children-Fourth Edition (WISC-IV ). Kaufmans battles with Wechsler illustrate
the degree to which test content, at the most fundamental level, is often a very real reflection of the
attitudes and beliefs of the individuals who create them. What is intelligent behavior, and by extension
what is a correct response to a question on an intelligence test, is by no means a completely objective
determination. It is therefore essential that practitioners understand that all tests of intelligence and
cognitive ability reflect the culture from which they emanated and are based on the culturally bound
values and beliefs of their authors. Neisser and colleagues (1996) stress that it is obvious that the
cultural environmenthow people live, what they value, what they dohas a significant effect on the
intellectual skills developed by individuals (p. 86). To assess individuals from diverse cultures in a
more equitable manner, practitioners will need to come to terms with the fact that intelligence cannot
be tested independently of the culture that gives rise to the test (Cole & Cole, 1993, p. 502) and that
intelligence tests are not tests of intelligence in some abstract, culture-free way. They are measures
of the ability to function intellectually by virtue of knowledge and skills in the culture of which they
sample (Scarr, 1978, p. 339).
Yet the vast majority of research into the nature of bias in intelligence tests has failed to find any
evidence of bias. Study after study has examined and reexamined test items (including content and
novelty), test structure (sequence, order, difficulty), test reliability (measurement error or accuracy),
factor structure (theoretical structure, cluster or composite scores), and prediction (academic success
or achievement), without any significant findings of bias (Sandoval et al., 1998; Valds & Figueroa,
1994). It would seem that if cultural background were a variable that differentially affected
performance it would be readily identified in such studies. The answer to this seeming dilemma lies
in the fact that bias is often too narrowly defined. Culture (and in effect, cultural bias) has historically
been viewed as a unitary, monolithic construct that is expected to interact with performance in some
way that might systematically differentiate one group of people from another (Figueroa, 1990a,b;
Valds & Figueroa, 1994). However, this view represents an inaccurate and unrealistic perspective of
the attenuating influence of cultural differences. This is because intelligence tests and tests of
cognitive ability measure quite well the degree to which anyone has acquired and can access the
culturally specific information reflected in and inherent in their structures. It is not culture per se that
acts as a biasing factor; rather, it is an individuals exposure to and familiarity with (or lack thereof)
the tests underlying culture that affects performance on such tests (Cummins, 1984; Figueroa, 1990b;
Matsumoto, 1994; Valds & Figueroa, 1994). Thus, the main problem with traditional definitions of
bias has been the failure to recognize that the process of language acquisition (first or second) and the
acquisition of cultural knowledge (i.e., acculturation) are developmental. Therefore, bias defined
relative to sequence of item difficulty, reliability, factor structure, or prediction simply cannot be
found because the measurement of developmental processes, as incorporated into the structure of
tests, does not vary as a function of being linguistically or culturally different.
The process of acquiring culture (i.e., acculturation) is invariant for everyone new to the culture.
The simpler, more common elements of the culture are learned first; the more complex elements of
the culture follow later in predictable and measurable ways. According to Salvia and Ysseldyke
(1991), the very process of acculturation represents a fundamental principle within test development
known as the assumption of comparability. They wrote,
When we test students using a standardized device and compare them to a set of norms to get an
index of their relative standing, we assume that the students we test are similar to those on whom
the test was standardized; that is, we assume their acculturation is comparable, but not necessarily
identical, to that of the students who made up the normative sample for the test. (p. 18)

Therefore, the structure and design of intelligence and cognitive ability tests are actually based on the
notion that there is an equivalent level of acculturation across the variables of age or grade for
individuals on whom the test was standardized and on whom the test will be used. This assumption can
be far from reality when such tests are used on individuals from other cultures. Salvia and Ysseldyke
make this point clear as well:
When a childs general background experiences differ from those of the children on whom a test
was standardized, then the use of the norms of that test as an index for evaluating that childs
current performance or for predicting future performances may be inappropriate. (p. 18)

The biasing effect from the use of psychometric instruments, therefore, operates whenever tests of
intelligence and cognitive ability (developed and normed in the United States) are given to individuals
whose cultural backgrounds, experiences, and exposures are not similar to or consistent with those of
the individuals comprising the norm group against whom performance will be compared. In these
cases, such tests will likely measure a lower range of ability in diverse individuals because the test
samples only the cultural content related to mainstream experience and not the full or entire range of
cultural content possessed by the individual (Valds & Figueroa, 1994).

Figure 5.2 Yerkes 1921 Data from the Binet Scales
*Represents the overall average mental age of all Army recruits, including both native English and
non-native English speakers.

Such bias was evident at the outset of the development of testing when, for example, data from
Yerkes (1921) administration of the Army Beta test and the Binet Scales to native and non-native
English speakers revealed a distinct pattern. This pattern is evident in Figure 5.2, which demonstrates
clearly the increase in mental age for immigrants on the Binet Scales as a function of length of
residence in the United States. Yerkes data also revealed that the average score on the Army Beta for
native English speakers was 101.6, whereas the average score for non-native English speakers was
only 77.8. What seems rather obvious an interpretation was refuted by Carl Brigham, a lieutenant of
Yerkes, who promoted the view in his book, A Study of American Intelligence (1923), that:
Instead of considering that our curve indicates a growth of intelligence with increasing length of
residence, we are forced to take the reverse of the picture and accept the hypothesis that the curve
indicates a gradual deterioration in the class of immigrants examined in the army, who came to
this country in each succeeding 5 year period since 1902 (pp. 110 -111). . . . The average
intelligence of succeeding waves of immigration has become progressively lower. (p. 155)

Even as early as the 1930s, Sanchez, an enlightened Mexican-American psychologist, recognized this
type of bias in testing. His comments, however, represented a mere drop against the tidal wave of
research that supported the view espoused by Brigham and others who believed that it was neither
acculturation nor bilingualism that caused non-native English-speaking immigrants to perform so
poorly on tests, but rather a natural genetic inferiority. Sanchez wrote eloquently on the subject but
alas his words appear to have fallen largely on deaf ears. He cautioned:
As long as tests do not at least sample in equal degree a state of saturation [i.e., assimilation of
fundamental experiences and activities] that is equal for the norm children and the particular
bilingual child it cannot be assumed that the test is a valid one for the child. (p. 771, Sanchez,
1934; words in brackets added for clarity)

The biasing influence described by Sanchez and evident in the earliest days of testing is best
construed as involving cultural loading and is distinctly different from definitions that are based on
culture, race, or ethnicity as unitary, monolithic interacting variables that should somehow disrupt
performance. Thus, although there is considerable research evidence suggesting that many
intelligence and cognitive ability tests are technically sound and appropriately normed, and are not
culturally biased, they are, nevertheless, culturally loaded (Ortiz & Dynda, 2005; Ortiz & Ochoa,
2005a; Sattler, 1992; Valds & Figueroa, 1994).
Given the preceding discussion, practitioners who seek to increase the validity of results obtained
in the assessment of individuals from diverse cultural backgrounds should attempt to acquire two
important and interrelated pieces of information: (a) the individuals level of acculturation, and (b)
the degree to which performance on any given test is contingent upon culture-specific knowledge
(this is highlighted in Rapid Reference 5.1). Mercer (1979), Valds and Figueroa (1994), and others
have addressed the few studies that attempted to measure the former issue. The C-LTC and C-LIM
presented in this chapter involve the latter.
Rapid Reference 5.1

Information Needed to Increase Validity in the Assessment of Culturally Diverse Individuals
Practitioners who seek to increase the validity of results obtained in the assessment of
individuals from diverse cultural backgrounds should attempt to acquire two important and
interrelated pieces of information:
a. the individuals level of acculturation, and

b. the degree to which performance on any given test is contingent upon culture-specific
knowledge.

Language Bias versus Linguistic Demands
Practitioners have paid much more attention than researchers to the issue of language differences in
testing. No doubt this is because the practical implications of working with individuals who may not
be fully proficient in English are matters of greater importance to the applied psychologist than the
theoretical one. There appears to be an intuitive understanding among practitioners that an examiner
who is a monolingual English speaker is going to have significant problems in conducting an
assessment on an individual who does not speak or comprehend English well or at all, particularly in
the use of standardized tests. Nevertheless, the specific manner in which such communicative
obstacles may affect test performance is not clearly understood, and it becomes extremely cloudy in
those situations in which the individual being assessed knows some English, but may not be fully
proficient as compared to other individuals of the same age or grade.
The effect that language difference has on test performance is quite similar to that just described
for acculturation. Valds and Figueroa (1994) noted empirically established difficulty levels in
psychometric tests are not altered by cultural differences. Neither are they because of proficiencies in
the societal language (p. 101). In other words, development of language (English or otherwise) is
just as experientially based and follows just as invariant a developmental course as acculturation.
Given the developmental structure and sequence of items on standardized tests, the attenuating effect
of language development or language proficiency is not manifest in comparisons of performance
within any single subtest. Rather, it is the lack of concurrence between constructs that are measured
through different channels (i.e., a set of verbal subtests versus a set of nonverbal subtests) that begins
to reveal the nature of language bias in tests (Cummins, 1984; Valds & Figueroa, 1994). There exist
only a few scientific studies that have examined linguistic bias in tests using this comparative manner.
However, they have been strikingly consistent in their findings that tasks that are primarily language
based do not measure incidental learning as well as tasks that are more visual or perceptual in nature
(Cummins, 1984; Jensen, 1974, 1976).
DONT FORGET

Presumed Language
Proficiency

Tests of intelligence or cognitive ability are constructed in ways that require or presume that a
level of language proficiency is present in the average individual that is sufficient to
comprehend instructions, formulate and verbalize responses, or otherwise use language ability
in completing the expected task.
Tests of intelligence or cognitive ability are constructed in ways that presume that a level of
language proficiency is present in the average individual that is sufficient to comprehend instructions,
formulate and verbalize responses, or otherwise use language ability in completing the expected task.
As in the case of acculturation, there may be times when an individuals language proficiency is not
developmentally commensurate with the language proficiency of the individuals comprising the
norm group against which performance will be compared (Cummins, 1984; Figueroa, Delgado, &
Ruiz, 1984). In cases in which the focus is on evaluating language-related disabilities, this is precisely
the point. However, when individuals who are not language disabled but are in fact limited in English
proficiency or, for whatever reasons, are not developmentally equivalent in language proficiency to
the norm group, the result will be bias. In similar fashion to the discussion on acculturation, tests may
be linguistically biased, not because of any inherent structural defect, but because of the expectations
and assumptions regarding the comparability of language proficiency. The assumption of
comparability regarding language development and proficiency for such individuals is very often
invalid. Figueroa (1990b) strongly cautions practitioners to remember language background, not just
language proficiency, must be taken into account in every facet of assessment such as test
development, selection, administration, and interpretation (p. 94).
DONT FORGET

English-Language
Proficiency and Bias

In cases in which individuals are limited in English proficiency or, for whatever reasons, are not
developmentally equivalent in language proficiency to the norm group, the result will be bias.
CAUTION

Elimination of
Spoken-Language
Requirements

Reducing the oral or spoken-language requirements in any given test does not completely
eliminate potential linguistic bias and does little, if anything, to reduce bias related to
acculturation.
With respect to language differences in assessment, the evidence seems abundantly clear that tests
that carry high linguistic demands (e.g., vocabulary tests) tend to degenerate in unknown degrees
into tests of English language proficiency whenever they are used with individuals who are
linguistically different (Figueroa, 1990a, p. 93). In order to improve upon current use of tests of
intelligence and cognitive ability, practitioners should continue to strive for the collection of at least
two key factors that will affect the path of assessment. These factors include the individuals level of
proficiency in English and any other language he or she has acquired or has been exposed to (no
matter how little the exposure may be), and the degree or level of language required by any test or
tests that will be used to evaluate the individuals functioning. The former is ordinarily accomplished
through the use of any one of the various English-language proficiency tests available on the market
today (e.g., Woodcock-Muoz Language Survey-Revised; Woodcock, Muoz-Sandoval, Ruef, &
Alvarado, 2005). These tests are sufficient to gauge the general degree to which an individual may
differ in proficiency from age-related peers. Collection of data related to the latter factor (linguistic
characteristics of tests) may be accomplished via the C-LTC and C-LIM, to be discussed in the
following sections. As will be explained, the combination of these sources of information provides
practitioners with a more systematic and defensible basis for evaluating performance of dual-
language learners and in a fairer and more equitable manner than is ordinarily achieved using
traditional methods.
A Comment on Assessment Approaches
At present, there appear to be two basic approaches to the assessment of diverse individuals. The first
relates to attempts to develop better teststhat is, ones that are reliable and valid for use with diverse
populations. The second approach resists the temptation to develop pluralistic norms per se, and
focuses more on utilizing research that reveals differences in performance between groups on
existing tests. In other words, it may be fruitful to simply acknowledge that test performance is
attenuated by cultural and linguistic differences and quantify that difference to determine their effects.
Perhaps the best example of the first tradition is the development of tests designed specifically for
diverse individuals and usually in their native language. Over the last half century, test developers
have seen both the merit in and the need for the development of norm groups that more accurately
reflect the composition of individuals living in the United States, so that valid comparisons of
performance can be made. The result has been that norm samples over the years have become
increasingly more representative on a variety of variables. Virtually all of the major intelligence or
cognitive ability batteries available today have more than adequate national norms and meet the
strictest criteria with respect to the technical aspects of norm sample development. The emphasis on
representing diversity, which has slowly crept into the psychometric arena, is clearly reflected in the
composition of these modern norm groups. To meet the stringent demands of modern-day
practitioners, test developers have made it rather common practice to stratify their standardization
samples along several important variables, including age, sex, race, ethnicity, education level, and
geographic location. One of the primary goals for the inclusion of such a broad range of variables is
the creation of a norm sample that minimizes and equalizes any systematic variation that might be due
to differences along any one of (or combination of) these dimensions. In essence, a standardization
sample that contains these variables allows for valid comparisons of performance that are no more or
less biased against any single individual for whom the sample is representative (Flanagan, Ortiz,
Alfonso, & Mascolo, 2002, 2006).
For test developers, the difficulty with creating norms that represent culturally, linguistically, or
other diverse individuals lies in some of the assumptions related to the stratification process. The
question becomes one concerning the notion of exactly what constitutes representative. It was
discussed previously that there exists an assumption of comparability that, when true, allows for valid
conclusions about relative performance to be drawn. Conversely, when the assumption is false,
conclusions and interpretation of performance are questionable at best. Salvia and Ysseldyke (1991)
noted that in any case in which testing is conducted on an individual whose general background
experiences are different from the experiences of the individuals on whom the test was standardized,
the use of those norms for performance comparisons or prediction may be inappropriate. They
further emphasized that:
[i]ncorrect educational decisions may well be made. It must be pointed out that acculturation is a
matter of experiential background rather than of gender, skin color, race, or ethnic background.
When we say that a childs acculturation differs from that of the group used as a norm, we are
saying that the experiential background differs, not simply that the child is of different ethnic
origin, for example, from the children on whom the test was standardized. (1991, p. 18, emphasis
in original)

In the case of culturally or linguistically diverse individuals, Salvia and Ysseldykes (1991)
comments make it clear that skin color, race, or ethnicity should not be equated with cultural
differences, and that it is the difference in experiential background (related to differences in cultural
background) from the mainstream that adversely affects test performance. These differences in
experience (which are based on cultural differences) represent a variable that is not stratified in any
norm sample available today. Therefore, no matter how much a test developer might want to
emphasize the fairness of a given test by illustrating the inclusion of racially or ethnically diverse
individuals (e.g., in accordance with their frequency in the general population, according to country
of origin), claims about equity are highly misleading and inaccurate (Valds & Figueroa, 1994).
Practitioners should thus be careful not to fall prey to the assumption that stratification in the norm
sample on the basis of race is equivalent to stratification on the basis of culture. Not only is this not
true, but it is not even culture or country of origin itself that is the crucial variable, but the level of
acculturation that should be controlled.
The implications of the lack of adequate representation due to varying levels of acculturation in
modern-day standardization samples creates an important implication for practitioners. Until and
unless publishers of intelligence and cognitive ability tests provide norm samples that adequately
stratify along the dimension of acculturation, performance of individuals reared either completely or
partly outside the U.S. mainstream culture cannot be compared validly against existing norm samples
(Cummins, 1984; Figueroa, 1990a; Samuda, Kong, Cummins, Pascual-Leone, & Lewis, 1991; Valds
& Figueroa, 1994). Accomplishing such a feat in the development of a norm sample may well prove
impossible from a practical point of view, especially considering the difficulties inherent in
stratifying a variable that is by nature continuous, and considering the broad range and number of
variables that might be needed even to measure acculturation accurately. Nevertheless, until the
culturally based experiences that are sampled by standardized tests (and established by the
performance of the norm group) are comparable to the cultural experiences of the individual being
tested, fair and equitable interpretation of performance will remain difficult and elusive.
Representation within existing standardization samples along the dimension of language
proficiency is an issue similar to that of acculturation. The fundamental goal of the U.S. educational
system is English literacy. Therefore, regardless of the manner or specific program used to achieve
this goal each pupil who enters the system as a non-English or limited-English speaker will, by
default, become a circumstantial bilingual speaker (i.e., by force of circumstance as opposed to
choice). As was discussed previously, language proficiency and history are variables that can greatly
attenuate results obtained from the use of standardized tests. This consequence of the educational
system alone would seem to make it prudent on the part of test developers to provide norms that
include individuals according to their respective levels of language proficiency in both languages,
not just one. Unfortunately, such dual-language learners or bilingual individuals have not been
systematically incorporated into the design and composition of any extant norm samples.
Too often, bilingual individuals are viewed as a homogenous group, whereas in reality this is far
from the truth. Bilingual individuals can range widely in terms of proficiency in either of the
languages they speak, creating an extremely diverse composition of individuals who are often
reduced to simple terms, such as English language learners (ELLs) or English as a Second
Language (ESL) individuals. Thus, even when a sample is established in a language other than
English, the issue of control through proper stratification of language proficiency remains a concern.
For example, the Bateria-III (Woodcock, Muoz-Sandoval, McGrew, & Mather, 2005) norm sample is
composed almost exclusively of monolingual Spanish speakers, just as the norm samples for its
English-language counterpart is composed of monolingual English speakers. But monolingual
Spanish speakers offer no more an appropriate comparison group for the many bilingual individuals
residing in the United States than does the monolingual English norm group. Both are
unrepresentative of individuals who are bilingual. There are some tests, such as the WISC-IV Spanish
(Wechsler, 2005) that do include many dual-language individuals in the norm sample. Unfortunately,
their inclusion was not for purposes of controlling differences among individuals on the basis of
dual-language proficiency. The WISC-IV Spanish norm sample relies instead on country of origin
and other factors that, albeit helpful, continue to neglect the fact that these individuals are not
comparable or similar simply because they are bilingual or come from the same country.
As with the acculturation variable, stratifying a sample on the basis of dual-language or bilingual
ability is an extremely daunting task. Creation of a truly representative sampling of bilingual
individuals faces many of the same difficulties encountered by publishers who seek to create special
norm groups (e.g., individuals who are deaf, who have learning disabilities, Attention Deficit
Disorder [ADD]). The issue would not be settled simply by evaluating individual language
proficiency alone, but would need to include consideration of variables involving length of time
learning each language, amount of formal instruction received in each language, proficiency in the
language or languages spoken by parents and siblings in the home, and so forth. Problems with norm
groups notwithstanding, it is important to recognize that tests such as the Bilingual Verbal Ability
Tests-Normative Update (BVAT; Muoz-Sandoval, Cummins, Alvarado, Ruef, & Schrank, 2005)
represent an emerging and distinct research and development tradition. From the inception of
standardized tests of intelligence, the vast majority of research and test development conducted to
evaluate the effect of bilingualism on performance has been accomplished with intelligence tests
given in English, not in the native language. The unwavering and consistent findings regarding the
biasing effect of testing bilinguals with monolingual tests (see Valds & Figueroa, 1994) is the
research tradition that forms the basis for the culture and linguistic extensions of the XBA approach,
which are presented in the following section. This is because the patterns of expected performance of
bilinguals on monolingual English tests are very well known. On the other hand, the BVAT is an
innovative test that sits at the forefront of a new research tradition, which seeks to test bilingual
individuals with bilingual tests in a manner much more consistent with theory concerning bilingual
development and second-language acquisition than anything ever accomplished with monolingual
tests. Although the BVAT represents a significant advancement in this practice and will no doubt help
to push related research further along, there is at present very little known about the performance of
bilingual individuals on bilingual or native language tests. Though different in focus, both
monolingual and bilingual approaches represent trends under the larger umbrella of test development
and both have important practical implications that highlight the fact that bilingual assessment is not
the same as assessment of bilingual individuals. Whereas the BVAT is a prime example of
developments in the former, the C-LTC and C-LIM described in the next section fall within the context
of the latter.
In the absence of any appropriate norm group, measuring the cognitive or intellectual performance
of linguistically diverse individuals with standardized tests, whether in English or the native language,
amounts to a measure of language proficiency more than to any reflection of actual cognitive ability
(Cummins, 1984; Valds & Figueroa, 1994). This conclusion is shared by a number of researchers
(e.g., Bialystok, 1991; Figueroa, 1990a; Samuda et al., 1991) who reinforce the notion that there are
no tests of intelligence or cognitive ability containing suitable norms for use with bilingual
individuals. The implications for practitioners are critical. Tests developed without accounting for
language differences are limited in their validity and on how they can be interpreted (Figueroa,
1990b, p. 94).
Another example of attempts to create more appropriate tests for diverse individuals can be seen in
the continued popularity and development of so-called nonverbal tests. The advertising literature
disseminated by many publishers of such tests often touts them as being both culture free and
language free because they contain a protocol for administration that is entirely nonverbal (e.g.,
UNIT, Leiter-R, Wechsler Nonverbal Scale of Ability [WNV; Wechsler & Naglieri, 2006]).
Unfortunately, the representation of any test as culture or language free is, at best, very misleading.
Reducing the oral or spoken-language requirements in any given test does not completely eliminate
potential linguistic bias and does little, if anything, to reduce bias related to acculturation. Certainly,
when tests are given that utilize little or no oral language demands, they can assist in generating
results that are less discriminatory for individuals from diverse backgrounds (Figueroa, 1990b;
McCallum & Bracken, 1997). In fact, the C-LTC and C-LIM to be presented in the next section are
based, in part, on this notion.
Practitioners are well advised, however, to recognize that although some commonly accepted
nonverbal tests (e.g., Wechsler Picture Arrangement) may not require any oral or expressive
language ability per se, they often do demand from the examinee a high level of nonverbal receptive
language skill in order to comprehend the examiner s instructions and expectations. Similarly, tests
that are often thought of as representing verbally reduced functioning (e.g., Wechsler performance
tests) may contain lengthy and possibly confusing verbal directions, which can affect an individuals
ability to comprehend what is expected or to provide an appropriate response (e.g., Block Design). In
such cases, whenever the individual being tested does not possess the minimum required or expected
level of receptive language (as may be the case for linguistically diverse individuals), performance
will be affected in an adverse manner. Moreover, even tests that effectively eliminate most, if not all,
oral (expressive or receptive) language demands are not free of the communicative requirement. Test
performance, even with tests administered entirely with gestures, pantomime, or pictorial instruction
cards (e.g., UNIT), continue to remain very dependent upon the level of nonverbal communication
between the examiner and examinee and their ability to interact effectively in a nonverbal manner.
Such tests continue to require that the examiner somehow clearly and correctly convey the nature of a
given task and its expected response to the examinee, and they continue to require that the examinee
comprehend that communication and accurately reconvey an acceptable response to the examiner. To
say that a test requires no language is misleading because, in fact, some type of communication is still
necessary in order to conduct the testing process. If such communication were unnecessary and the
test truly required no language then we might effectively generate valid IQs for a wide range of
inarticulate and even inanimate objects. It is, therefore, somewhat an exaggeration to portray
nonverbal tests as being completely language free and thus free of bias. The significant reduction in
communicative demand does reduce to some extent the possible effects of bias related to lack of
comprehension, but it hardly eliminates it entirely. In addition, the type of nonverbal communication
that may be required for administration of such tests often carries more culturally based implications
than does verbal communication (Ehrman, 1996). There is an emerging body of research that
suggests that nonverbal tasks may actually carry as much if not more cultural content than that found
in verbal tests (Greenfield, 1998). Moreover, because of the redundancy and relatively limited range
found in many nonverbal tests with respect to the measurement of the broad and narrow abilities
specified in CHC theory, interpretation can be quite confusing, is often much less defensible, and does
not necessarily provide more valid assessment data (McCallum & Bracken, 1997).
CAUTION

Cultural Bias in Nonverbal Communication

The type of nonverbal communication that may be required for administration of nonverbal tests
often carries more culturally based implications than does verbal communication (Ehrman,
1996).
DONT FORGET

Elimination of
Language from Tests

Test performance, even with tests administered entirely with gestures or in pantomime, continues
to remain very dependent upon the level of nonverbal communication between the examiner and
examinee and their ability to interact effectively in a nonverbal manner. Such tests continue to
require that the examiner clearly and correctly convey the nature of a given task and its expected
response to the examinee, and continue to require that the examinee comprehend that
communication and accurately reconvey an acceptable response to the examiner.
To summarize, practitioners must understand several important issues that remain problematic in
the use of tests with diverse individuals. First, a bilingual individual is not two monolingual
individuals in the same head (Bialystok, 1991) and therefore evaluation in one or the other language
is not a necessarily valid approach to measuring the abilities of bilinguals. Second, bilingual
individuals do not suddenly cease to be bilingual simply because they are or have become dominant
in one or the other language. Once a bilingual, always a bilingual. Third, culture is not the same as
race or ethnicity or country of origin. What makes an individual different is not a function of where
they come from or what color their skin may be, it is the degree to which they are or are not familiar
with the prevailing cultural norms reflected in the test being administered to them.
CAUTION

Bilingualism vs.
Monolingualism

Individuals who are bilingual are not simply two monolinguals in one head. Becoming and being
bilingual carries with it important experiences that are very different than those with
monolingual experience. In short, the two are not the same and cannot be treated as if they are.
Remember, once a bilingual, always a bilingual.
CAUTION

Bilingualism and Language Dominance

The language in which to conduct testing is often determined erroneously by the concept of
dominance. Language dominance only indicates which of a bilingual individuals two languages
are better developed. It does not, however, indicate that the person has age-appropriate
proficiency in the dominant language. An individual may be underdeveloped in both, even if he
or she is more proficient in one or the other. Thus, when a bilingual individual is dominant in
English, this does not indicate that testing may be conducted validly in English. In short, an
individual who is bilingual doesnt suddenly cease to be bilingual simply because he or she has
become English dominant.
The vast array of complex variables involved in the assessment of diverse individuals can give
even the most experienced practitioner considerable pause. Completely culture-free or truly equitable
evaluation seems a rather lofty and unattainable goal. Perhaps a more pragmatic approach to the
assessment of diverse individuals lies not in attempts to eliminate all bias or find unbiased tests
(which is unlikely and impractical), but rather in efforts to reduce bias in their use to the maximum
extent possible while maintaining as much accuracy as possible in construct measurement. The C-LTC
and C-LIM are designed with this philosophy in mind. In the final analysis, there is no such thing as a
completely nondiscriminatory or unbiased assessment. However, use of XBA along with the C-LTC
and C-LIM provides a systematic and defensible method for greatly reducing the discriminatory
aspects inherent in the use of cognitive ability tests with diverse individuals.
DONT FORGET

A Realistic Approach
to Assessing
Diverse Individuals

A more realistic approach to the assessment of diverse individuals lies not in attempts to
eliminate all bias (which is unlikely and impractical), but rather in efforts to reduce bias in test
use to the maximum extent possible while maintaining as much accuracy as possible in construct
measurement.
THE CULTURE-LANGUAGE TEST CLASSIFICATIONS AND CULTURE-
LANGUAGE INTERPRETIVE MATRIX

The preceding discussion emphasized the need for practitioners who seek to assess individuals who
are culturally or linguistically diverse using standardized, norm-referenced instruments to remain
well aware of four essential points. (These points are also enumerated in Rapid Reference 5.2.) In
general terms, these points include recognition that: (a) all tests are culturally loaded and reflect the
values, beliefs, and knowledge that are deemed important within the culture in which the tests were
developed (e.g., U.S. mainstream culture); (b) all tests require some form of language (or
communication) on the part of both examiner and examinee, and such factors can affect
administration, comprehension, and performance on virtually any test (including nonverbal ones,
albeit to a lesser extent than on verbal ones); (c) tests vary significantly on both dimensions (the
degree to which they are culturally loaded and require language); and (d) interpretation of results
from standardized tests using existing norm groups for performance comparisons may be invalid for
diverse individuals.
Rapid Reference 5.1

Summary of Points to Consider in Assessing Culturally and Linguistically Diverse
Individuals
Practitioners who need to assess individuals who are culturally or linguistically diverse using
standardized, norm-referenced instruments should remain well aware of four essential points:
1. All tests are culturally loaded and reflect the values, beliefs, and knowledge that are
deemed important within the culture in which the tests were developedfor example, U.S.
mainstream culture.
2. All tests require some form of language (or communication) on the part of both the
examiner and the examinee, and such factors can affect administration, comprehension,
and performance on virtually any test, including nonverbal ones.
3. Tests vary significantly on both dimensionsthe degree to which they are culturally
loaded and the degree to which they require language.
4. Interpretation of results from standardized tests using existing norm groups for
performance comparisons may be invalid for diverse individuals.

For practitioners engaged in applied work settings, the standardized, norm-referenced instrument
represents one of the most important and valuable tools in the assessment repertoire. When such
instruments are used with diverse individuals, practitioners need to be aware of how cultural and
linguistic factors may affect both the results and the subsequent interpretations they may make. The
manner in which cultural or linguistic bias may operate in these cases has been discussed briefly in
the previous sections. The implications of such bias have been formally operationalized in the
development of the C-LTC and C-LIM, providing a rather new frame of reference from which to
understand and interpret performance on tests of intelligence and cognitive ability that may be of
significant benefit and utility to practitioners.
In 1990, Figueroa recommended the application of defensible theoretical frameworks in the
assessment of culturally and linguistically diverse individuals and admonished practitioners to pay
particular attention to cultural and linguistic dimensions. The C-LTC and C-LIM represent an
approach wholly in line with these recommendations. The methods described herein are also
consistent with the propositions for testing bilinguals contained in Chapter 13 of the Standards for
Educational and Psychological Testing (AERA, APA, & ACM, 1999), especially the notions that
idiosyncratic variations in cultural and linguistic background can lower test performance. Similarly,
these methods, as well as XBA, remain in full accordance with APAs Guidelines for Providers of
Psychological Services to Ethnic, Linguistic, and Culturally Diverse Populations (1990).
The Culture-Language Test Classifications (C-LTC)
The test classifications based on CHC theory presented in Chapter 2 (and contained in Appendix B)
served as the foundation for extending XBA with diverse individuals. In addition to their theory-based
classifications, these tests can be classified according to inherent cultural and linguistic dimensions.
The method for classification was threefold. First, existing data were gathered and reviewed in order
to understand the nature and extent of the attenuating effect of cultural and linguistic differences.
Research and data ranging from the very roots of the psychometric approach to the present were
reviewed and produced a rather strong consensus that bilinguals tended to perform about one
standard deviation below the mean of monolinguals (Cummins, 1984; Goddard, 1917; Jensen, 1974,
1976; Mercer, 1979; Sanchez, 1934; Valds & Figueroa, 1994; Vukovich & Figueroa, 1982). Second,
data from many of these studies included mean scores for bilingual individuals on various tests, most
commonly the Wechsler batteries. By aligning the tests in terms of mean differences, as compared to
monolingual individuals, tests could be arranged in terms of the degree that performance was
attenuated by cultural and linguistic differences. And third, because of the lack of research on
bilinguals with many of the batteries currently in use, an expert consensus procedure was utilized to
provide a logical basis for classifications of tests for which no existing data were available. In
addition, current research efforts examining the cultural loading and linguistic demands of tests is
providing empirical support that suggests tests are appropriately classified (Nieves-Brull, Ortiz,
Flanagan, & Chaplin, 2006).
Ultimately, the structure of the classifications are based on two specific test characteristics: (a)
degree of cultural loading; and (b) degree of linguistic demand. Note that the classifications have
nothing to do with what the tests are designed to measure. That is, what construct they measure plays
no part in these classifications. Rather, it is only the degree of cultural loading and linguistic demand
that drive which tests are grouped together and which are not. The purpose of this initial classification
is to provide a different framework for evaluating test performance. Because test results cannot be
interpreted if they are not valid, it is first necessary to establish that validity. As noted previously, the
cultural loading and linguistic demands of tests tend to inhibit performance of diverse individuals. In
such cases, validity is compromised when the constructs that are actually measured are level of
acculturation and English-language proficiency rather than the actual abilities the test is designed to
measure. In other words, validity is compromised because some unintended constructs, not the
constructs of interest, have been measured. Once validity is established, tests may be presumed to be
measures of the intended constructs and, as such, the CHC broad and narrow ability classifications are
retained and appended at the end of the test name. In addition, the test names are also printed in the
manner that conveys the essential information necessary for appropriate test selection in accordance
with XBA guiding principles (i.e., broad ability measured, narrow ability measured, strong vs.
moderate vs. logical classification; see Chapter 1).
Cultural Loading Classification
The first principal dimension along which tests are organized relates to Degree of Cultural Loading,
which represents the degree to which a given test requires specific knowledge of or experience with
mainstream U.S. culture. In this regard tests were classified in terms of several characteristics,
including emphasis on process, content, and nature of response. Specifically, tests were categorized
along dimensions that related to process or product (process-dominant versus product-dominant) and
stimuli (use of abstract or novel stimuli versus use of culture-specific stimuli)although attention
was also given to aspects of the communicative relationship between examinee and examiner (i.e.,
culturally specific elements apart from actual oral language, such as affirmative head nods, pointing,
etc.; see McCallum & Bracken, 1997). These characteristics are in accordance with the findings of
various researchers (e.g., Jensen, 1974; Valds & Figueroa, 1994) who suggest that tests that are more
process oriented and that contain more novel, culture-reduced stimuli and communicative
requirements might yield scores that are fairer estimates of ability or skill, since they would be less
subject to attenuating influences from an individuals level of exposure to mainstream culture.
Classification of tests utilizes a simple, three-category system (high, moderate, and low), which
reflects the fact that the nature of these dimensions is better represented by a continuum than by a
dichotomy.
Linguistic Demand Classification
As discussed previously, test performance can also be adversely affected on the basis of an
individuals language proficiency. In short, those who are not fully English proficient (i.e., those
individuals whose language-development skills do not meet the age-appropriate expectations built
into tests) may score lower on a wide variety of tests, not because of lower ability but because of
linguistic barriers that impede comprehension and communication. Thus, there is a need for
practitioners to understand the inherent language demands placed on an individual as a function of
any given test selected for administration. Therefore, we sought to classify tests also on the basis of
Degree of Linguistic Demand.
Three main factors were considered in the classification of tests along this dimension, including
verbal versus nonverbal language requirements on the part of the examiner (in administration of the
test), receptive-language requirements on the part of the examinee, and expressive-language
requirements on the part of the examinee. These distinctions are important because all three relate to
issues of language proficiency on the part of the examinee and all three bear directly upon an
individuals performance on such tests. With respect to the language requirements on the part of the
examiner, it is important to note that some tests have lengthy, verbose instructions (e.g., WJ III
Analysis-Synthesis), including some that are commonly accepted to be relatively nonverbal (e.g.,
Wechsler Block Design). On the opposite end of the spectrum are tests that require virtually no
written or oral language on the part of the examiner (albeit effective communication is still required
as explained before) and that can be given using simple gestures or through pictorial instruction
cards (e.g., those on the UNIT or WNV).
There are other tests in which an individuals actual language proficiency becomes central to
performance, as with the Wechsler Vocabulary and Similarities tests. Such tests place significant
demands on language development and rely heavily on the assumption that an individuals experience
with and exposure to the language is comparable to that of age- or grade-related peers. The linguistic
demands operate in both the receptive and expressive realms for the examinee. For example, some
tests utilize linguistic conventions that are part of the necessary structure for an individuals response
(e.g., WJ III Concept Formation: round or yellow or red); some simply are lengthy, requiring
significant receptive-language ability in order to comprehend the administrator s instructions fully. In
addition, certain tests require the individual to rely directly on expressive-language skills in order to
provide an appropriate or correct response (e.g., Wechsler Vocabulary and Comprehension), whereas
some tests need no actual spoken response (e.g., KABC-II Triangles). Similar to the structure for
classifications based on degree of cultural loading, tests are organized according to a system that uses
high, moderate, and low categories, again emphasizing the continuous nature of these variables.
CHC Test-Specific Culture-Language Matrices
Originally, the Culture-Language Test Classifications were provided in a single table that contained
nine cells, three categories each for the two dimensions (Flanagan et al., 2000; Flanagan & Ortiz,
2001; Ortiz, 2001; Ortiz & Flanagan, 1998) . This was done partly to provide a quick collection of
tests that practitioners could select from when seeking to choose the most fair tests for a given
assessment. That is, by use of the table, practitioners could pick those tests that measured the construct
of interest (e.g., Gsm-MW) and that had the lowest cultural loading and linguistic demands.
Presumably, this would result in test scores that would be more fair than those obtained through use
of other tests of the same construct but that were more culturally loaded or linguistically demanding.
The table, however, proved to be rather unwieldy because of its numerous entries and substantial size
and we quickly realized that the tests classified as low on both dimensions included a relatively
narrow range of abilities, mostly Gv, Gf, and Gs. Thus, other abilities could not be measured through
tests with low cultural loadings or linguistic demandsuse of the table for this purpose was rather
limited. On the basis of this issue and because the table itself remained rather daunting, we have
instead opted to provide a more practical method for test selection as well as a better way of
understanding the classifications and subsequent interpretation.
Rather than providing a single table, we have developed instead test-specific culture-language
matrices, because practitioners may be interested in seeing what the relative classifications of all the
tests in a single test battery might be or how the collected tests from one battery compare to another
battery. This information is provided in Appendix D, which contains a figure for the culture-language
classifications for each of the major cognitive batteries and many special-purpose and speech-
language tests as well. An example of one of the matrices contained in Appendix D is illustrated in
Figure 5.3, which contains the classifications for the WISC-IV. It is important to note that although
tests that were classified either empirically or logically as mixed measures of abilities were included
in these figures, in accordance with XBA principles caution should be used when they are
administered because even if validity is established, such measures remain psychologically
ambiguous and difficult to interpret.

Figure 5.3 Test-Specific Culture-Language Matrix for the WISC-IV Subtests
* These tests demonstrate mixed loadings on the two separate factors indicated.

Although Figure 5.3 is specific to the tests from the WISC-IV, it nevertheless provides a good
example of the structure of the test-specific matrices in general. Each contains the respective tests
from a single battery arranged in a simple 3 3 matrix that can be viewed all at once and provides the
essential information regarding classification according to degree of linguistic demand, degree of
cultural loading, as well as the CHC broad and narrow ability classifications. These test-specific
matrices provide an easy-to-use, graphical representation of how the tests that comprise a particular
battery are arranged when classified according to these dimensions. The three categories (low,
moderate, and high) for degree of linguistic demand span across the matrix from left to right and the
similar categories for degree of cultural loading run down the matrix from top to bottom.
As is evident upon examination of Figure 5.3, these test-specific matrices offer practitioners an
efficient means for sorting through the test batteries with which they are familiar or use routinely. In
general, knowledge of the cultural loading and linguistic demands of tests and of their CHC broad
and narrow ability loadings allows practitioners to select tests that are most appropriate to the referral
questions while giving due consideration to the experiential factors unique to the individual. Thus,
practitioners reap the same benefit as before with respect to test selection without having to wade
through a large table. In addition, because the classifications provide relative positions of the tests
that is, distinctions between tests are made on the bases of cultural-loading and linguistic-demand
differencesresults from testing can be interpreted within the context of expected patterns for diverse
individuals (this concept will be discussed in more detail in the next section). This is not to say,
however, that the information contained in these matrices should be relied upon solely for decisions
related to test selection and interpretation. As noted previously, use of these matrices is only one part
of a larger, comprehensive process of nondiscriminatory assessment. The classifications contained in
the matrices do not establish (and are not intended to establish) a comprehensive basis for the
assessment of diverse individuals. The information provided in the matrices is meant primarily to
supplement the assessment process in both the diagnostic and interpretive arenas within the context of
a broader, defensible system of multilingual, nondiscriminatory, cross-cultural assessment (see
Figure 5.1). Their limitations notwithstanding, these classifications offer practitioners a viable,
systematic, and defensible method by which certain important decisions regarding culturally fair
assessment can be made. As will be discussed, when used in conjunction with other relevant
assessment information (e.g., referral issues, direct observations, review of records, interviews,
language-proficiency testing, socioeconomic status, developmental data, family history), these
classifications may well prove to be of significant practical value in decreasing bias related to the
selection and interpretation of tests.
The Culture-Language Interpretive Matrix (C-LIM)
The use of tests with diverse individuals gains validity only when their application rests on the proper
and systematic consideration of the relevant cultural and linguistic characteristics that influence
performance. This is more easily said than done and the complexity of the factors involved often
result in conclusions and opinions that are not based on rational or systematic patterns of results.
Although the proper interpretations may be evident from the collected data, unless a practitioner is
very well trained and experienced it is often quite difficult to tease apart the many interrelated
variables and come to conclusions that are indeed supported by the evidence. With respect to issues of
bias related to test interpretation, the basic goal is to reframe the manner in which data are typically
evaluated so that the potential attenuating effects of cultural and linguistic factors are much more
clear and evident. The basic question to be addressed in the evaluation of diverse individuals boils
down to whether the obtained results reflect cultural or linguistic differences or whether they indicate
the presence of some type of disability. This difference versus disorder question is the very reason for
the development of the C-LIM.
It has already been discussed that culture (or level of acculturation) and language (or language
proficiency) operate as attenuating variablesthat is, the greater the difference between an
individuals cultural or linguistic background and the cultural or linguistic background of the
individuals comprising the norm group, the more likely the test will measure lower performance as a
function of this experiential difference as opposed to being due to actual lower ability. Validity is
therefore compromised and the data cannot be interpreted. When an individuals background is
commensurate with the background of the individuals comprising the norm group, differences in
performance can be more reliably interpreted as the result of true differences in ability. Therefore,
we know that, in general, cultural and linguistic differences serve to artificially depress the scores of
diverse individuals. The more different the individual is, the greater the score is attenuated.
DONT FORGET

Performance Measurement
and Experiential Difference

Culture (or level of acculturation) and language (or language proficiency) operate as attenuating
variablesthat is, the greater the difference between an individuals cultural or linguistic
background and the cultural or linguistic background of the individuals comprising the norm
group, the more likely the test will measure lower performance as a function of this experiential
difference as opposed to being due to actual lower ability.
If we draw upon an understanding of this relationship and combine it with a blank matrix similar to
the ones used for the test-specific matrices, we can create a pattern within the matrix that describes the
logical and expected pattern of performance for diverse individuals. Figure 5.4 provides this
illustration. There are two small shaded arrows and one large shaded arrow depicted in the figure.
The smaller arrow at the top, pointing from left to right, represents the increasing effect that language
differences are likely to have on test performance as a function of the increasing linguistic demands
of the tests. When practitioners use tests that have relatively heavy language demands (i.e., classified
in the high linguistic demand cells), performance of diverse individuals is likely to be adversely
affected to a relatively large degree. When the tests are more language-reduced (i.e., classified in the
low linguistic demand cells), then performance is likely to be relatively less adversely affected.
The small arrow on the left side pointing from top to bottom represents the increasing effect that
cultural differences are likely to have on test performance as a function of the increasing cultural
loadings of the tests. When practitioners use tests that are classified as being culturally loaded to a
relatively high degree (i.e., classified in the high cultural loading cells), performance of diverse
individuals is also likely to be adversely affected to a relatively large degree. When the tests used are
more culturally reduced (i.e., classified in the low cultural loading cells), then performance is
likely to be relatively less adversely affected. The large arrow in Figure 5.4 shows the overall or
combined effect that cultural and language differences have on performance across tests categorized
along the dimensions of cultural loading and linguistic demand. Generally speaking, performance of
diverse individuals on standardized tests is least likely to be affected by tests that are classified more
to the left and top of the matrix and most likely to be affected by tests that are classified closer to the
right and bottom of the matrix. The large arrow pointing diagonally from the upper-left cell to the
lower-right cell represents the decline in scores expected as a function of degree of cultural loading
and linguistic demand.

Figure 5.4 Pattern of Expected Test Performance for Diverse Individuals

It should be noted that although the alignment of the two dimensions in the matrix (cultural loading
and linguistic demand) is orthogonal, this does not imply that they are uncorrelated. Indeed, it is
unlikely that one would find patterns of performance that are influenced by either one alone. The fact
is, these variables are so highly related that it is the combined effect of the two that creates the pattern
of decline. The large arrow is intended to represent this primary and overarching effect and forms the
primary basis for interpretation. The arrangement of cultural loading and linguistic demand in the
matrix simply provides a graphical representation from the perspective of the tests cultural and
linguistic characteristics rather than from the perspective of constructs measured.
DONT FORGET

Cell Averages from the C-LIM Do Not
Represent Ability Constructs

When calculating the Cell Averages using the C-LIM, practitioners must be careful not to ascribe
meaning to the scores in terms of representing a particular ability construct. The Cell Averages
merely represent aggregate performances on tests that share similar characteristics in terms of
cultural loading and linguistic demand. The aggregate is not derived on the basis of tests that
purportedly measure the same theoretical ability and thus the Cell Averages should not be
interpreted independently but only relative to each other in terms of the pattern formed in the
matrix.
In order to utilize this alternative frame of reference for test results, practitioners must enter results
from the tests they selected for use in assessment according to their respective cultural and linguistic
dimensions. Figure 5.5 provides a representation of a blank Culture-Language Interpretive Matrix (C-
LIM) that facilitates this process. The names of tests are entered into the appropriate cells that
correspond to the cells where the tests are listed as noted on the test-specific matrices. Obtained
scores are entered in the space provided next to where the test names are written. For the purposes of
interpretation, all scores must be in the same metric. Because of their wide acceptance and familiarity
to professionals, we recommend that all scores be converted to the deviation IQ metric that has a
mean of 100 and a standard deviation of 15. A standard score and percentile rank conversion table is
available in Appendix E for this purpose. Alternatively, these values can be taken directly from the
automated XBA DMIA. For each set of tests grouped together according to their degree of cultural
loading and degree of linguistic demand, an overall mean or average score can be calculated. This
score is called the Cell Average and is used directly in the interpretive process.
Note also that the Cell Average score itself has no meaning beyond an arith metical representation
of aggregate performance on a set of tests grouped together according to characteristics largely
unrelated to the intended construct of measurement. Beyond this, there is no inherent meaning or
implied construct for this score and it should not be interpreted as such. Rather, the Cell Averages are
meant to offer only an easy way for practitioners to evaluate the pattern of scores revealed by the data
when viewed from this perspective.

Figure 5.5 Culture-Language Interpretive Matrix

Once all of the information has been entered on this worksheet, practitioners can begin the process
of comparative evaluation and interpretation of performance. (This process is summarized in Rapid
Reference 5.3.) Practitioners should begin by examining the Cell Averages as recorded on the C-LIM.
In general, practitioners should first determine if the highest aggregate score (Cell Average) is in the
upper-left cell as predicted. Second, practitioners should determine if the lowest score is in the lower-
right cell as predicted. Third, practitioners should determine if the rest of the scores are interspersed
between these endpoints and if they also show a decline in performance from the upper left to the
lower right. What is important to note is not the normative values of the scores (i.e., average, low
average), but rather the relationships between the scores and the degree to which they form a pattern
that is either consistent or inconsistent with the pattern of performance predicted by the matrix (as
specified in Figure 5.4). In general, when the overall pattern of scores obtained for a given individual
approximates the general predicted pattern (declining in relative value from upper left to lower
right), practitioners should interpret such results as being a reflection of cultural or linguistic
differences rather than of true ability.
Rapid Reference 5.3

Summary of Steps to Use in Determining the Pattern of Scores Derived from the C-LIM
In general, practitioners should look for declining pattern of scores (Cell Averages) within the
context of the individuals unique cultural, linguistic, and educational history. The following
guiding questions may be used to assist in this determination:
1. Is the highest Cell Average in the uppermost left-hand corner (the Low/Low cell
classification)?
2. Is the lowest Cell Average in the lowermost right-hand corner (the High/High cell
classification)?
3. Do the remaining Cell Averages fall between the highest and lowest scores and follow a
relative decline in value from the upper-left cells to the lower-right cells?
4. If the answer to all questions is yes, then it is very likely that the test results are invalid
and reflect lack of acculturation and limited English proficiency more so than true ability.
If the answer to any question is no, then the data may be valid and uncompromised by
cultural or linguistic factors and can be used, in conjunction with other converging data,
to support hypotheses regarding the presence of a disability.

Although Figure 5.4 appears to suggest that there are three general patterns that may emerge, it has
been noted that it is only the overall effect of culture and language that is of central importance. When
it can be determined that scores in or near the upper-left corner of the matrix are higher than scores at
or near the bottom-right corner of the matrix, this pattern suggests that tests results were influenced
primarily by level of acculturation and limited English proficiency rather than by actual ability. In
such cases, the validity of the obtained data cannot be established and no further interpretation of the
data should be conducted. The question regarding difference versus disorder has been answered and
there is no need or reason to seek further answers from the data. However, when the pattern that
emerges from the data is not consistent with the expected general decline in scores for diverse
individuals, practitioners may assume that cultural and linguistic factors did not play a primary role
in affecting test results. That is, although level of acculturation and limited English proficiency may
be contributing to the pattern of scores, the lack of a clear declining pattern indicates that they are not
the major or primary influences. As such, validity of the scores has not been compromised and
practitioners may then return confidently to their original results and interpret the test scores in
accordance with XBA principles or as directed by publishers of the test.
This is not to say that results from this analysis are fully valid or that they automatically suggest a
cognitive deficit simply because no significant or preeminent effect of culture and language
difference was found. Although this is certainly one possibility, many other factors may also lead to
the lack of a clear declining pattern that have nothing to do with disability. For example, lack of
motivation, fatigue, incorrect scoring or administration, and emotional difficulties are but a few of
the potential factors that can lead to score patterns in the matrix that do not show a clear decline. Thus,
interpretations of dysfunction or disability should remain bolstered by a wide range and multiple
sources of evidence that converge and support any opinion or inference.
Although the C-LTC and C-LIM are not specifically designed to be diagnostic tools, the question
regarding difference versus disorder addressed by their use helps in diagnosis and decisions
regarding disabilities. By having a method to systematically determine the influence of acculturation
and language proficiency on test performance, practitioners can exclude these variables in
accordance with the mandates of existing ethical and legal prescriptions. In doing so, and when
supported by additional converging evidence, the lack of a declining pattern in the C-LIM allows
practitioners to entertain notions of learning disability. Although limited, emerging research suggests
that the C-LIM is able to distinguish between culturally and linguistically diverse individuals with and
without learning disabilities (Esparza Brown, 2005) and thus has some diagnostic utility for
practitioners.
DONT FORGET

Nonattenuated Patterns
Do Not Automatically
Imply Disability

When using the C-LIM and a declining pattern of scores is not found, this does not automatically
mean that a disability must be present. Other noncognitive factors may influence test results in
ways that disrupt the typical culture-language attenuation. Such factors can include but are not
limited to lack of motivation, fatigue, incorrect test administration, incorrect scoring, anxiety, or
emotional difficulties.
Examples of C-LIM Interpretation
As described in the preceding section, basic interpretation of the C-LIM consists of examining test
data in order to determine whether the expected pattern of declining scores exists in the results for a
given individual. In essence, the C-LIM relies on a null hypothesis that states that there should be no
systematic variation in scores as a function primarily of cultural loading and linguistic demand. That
is, if these variables are not present and did not act upon the test results in the attenuating manner
predicted for diverse individuals, then the resulting pattern would be random, predicated mostly on
the nature of the individuals idiosyncratic constellation of abilities. The alternative hypothesis states
that if the variables of acculturation and English-language proficiency are present, they will be
revealed in a systematic attenuation of the scores as a function of their operationalization in the
matrix as degree of cultural loading and degree of linguistic demand. The alternative hypothesis is
supported only when the pattern of scores obtained from testing with a diverse individual follows a
systematic decline diagonally across the matrix from the upper-left cell to the lower right. This
premise has been supported historically in the literature as noted and current research that specifically
tested various patterns of performance of both monolinguals and bilingual individuals provides
additional evidence that indicates the unique aspect of a systematic attenuation of scores only for
diverse individuals (Nieves-Brull et al., 2006).
Let us consider an example of the manner in which the C-LIM may be applied in practice. Table 5.1
contains sample data from the administration of the standard and some of the supplemental tests from
the WISC-IV for a hypothetical, culturally and linguistically diverse individual, named Yuquita.
Examining scores in the context of a tabular format clearly does not lend itself well to
nondiscriminatory interpretation. By converting the scaled scores from the subtests, with help from
the conversion table contained in Appendix E, the data can be entered into a blank C-LIM as illustrated
in Figure 5.6. The individual subtest scores are then averaged to produce a Cell Average, which can
be found in the gray-shaded box in the lower-right corner of each cell. In cases in which only one
subtest is listed, the score for that subtest serves as the Cell Average. Where no subtests are listed, no
scores are entered.

Table 5.1 WISC-IV Data for Yuquita

The interpretive process begins by determining the location of the highest Cell Average. It is
predicted that it should be found in the cell containing the tests with the lowest cultural loadings and
linguistic demandsthat is, the uppermost left-hand cell in the matrix. A review of the C-LIM
indicates that this is precisely where the highest Cell Average (98) is located. Next, in similar fashion,
determination of the location of the lowest Cell Average should be conducted. It is predicted that it
should be found in the cell containing the tests with the highest cultural loadings and linguistic
demandsthat is, in the lowermost right-hand cell in the matrix. Another review of the C-LIM for
Yuquita indicates that, once again, this is exactly where the lowest Cell Average (78) is found. The
final step in interpreting the data involves examination of the Cell Averages for the other cells in the
matrix that fall between these two endpoints. Because the decline should proceed diagonally, Cell
Averages that are closer to the upper-left cell should be higher than those closer to the lower-right
cell. In other words, there should be an observable decline in scores as one examines the cell in a
diagonal manner from left to right. In this example, Yuquitas intervening scores range from 94 to 90
to 88 and follow the expected pattern of decline outlined by the highest and lowest Cell Averages.
This pattern supports the alternative hypothesis that suggests the presence of acculturation and
English-language proficiency variables were systematic and dominant influences on the obtained test
scores. As such, the primary influence of cultural and linguistic issues cannot be excluded as
attenuating factors in testing and the results should be considered invalid and interpreted no further.
Let us turn to another example using the WISC-IV. This time, sample data are presented in Table 5.2
for another hypothetical diverse individual named Benayala. Once again, a review of Benayalas
scores in tabular format does little to assist in determining whether his scores were influenced more
by cultural and linguistic factors than actual ability. Conversion of his scores to the deviation IQ
metric and their placement within a blank C-LIM begins the process of evaluation for these results.
Figure 5.7 provides an illustration of the C-LIM for Benayalas scores.

Figure 5.6 Culture-Language Interpretive Matrix of WISC-IV Data for Yuquita (English)

Table 5.2 WISC-IV Data for Benayala

As described in the case of Yuquita, the initial step in evaluating the data is to look for the location
of the highest Cell Average. This score would typically be located in the uppermost right corner of
the matrix and, once again, this is precisely where it is locatedin the cell that contains the tests with
the lowest cultural loadings and linguistic demands. Next, the location of the lowest Cell Average is
examined in accordance with the prediction that it should be in the cell that contains tests with the
highest degree of cultural loadings and linguistic demands. This location is typically in the lowermost
right-hand cell, however, unlike the pattern for Yuquita, Benayalas lowest Cell Average is not found
there but is instead located in the uppermost right-hand cellthe cell that contains tests with a high
degree of linguistic demand but a low degree of cultural loading. This finding does not fall in line
with the predicted declining pattern that is characteristic of diverse individuals when cultural and
linguistic factors are operating in dominant fashion on test performance. Further examination of the
intervening Cell Averages indicates that instead of showing a gradual and systematic decline in value
(as was the case with Yuquita), no such pattern is revealed in Benayalas scores as they range from 86
to 90, then 80 to 70, and finally to 82. In this case there is no clear, systematic pattern of score
attenuation that can be determined. As such, this pattern aligns itself with the null hypothesis that
suggests the presence of acculturation and English-language proficiency variables were not
systematic or dominant influences on the obtained test scores. Thus, the influence of cultural and
linguistic issues, though still possibly present, can be excluded as the primary reasons for the pattern
of obtained test results. Having excluded cultural and linguistic issues as primary factors in testing
effectively renders the test results valid (assuming other noncognitive factors have also been ruled
oute.g., lack of motivation, fatigue, incorrect scoring). The test results can now be interpreted in
accordance with XBA principles or guidelines specified by the test publishers with confidence that
they reflect measures of actual ability more so than cultural or linguistic differences. And, if
supported by other converging data, the results may well provide additional support for the presence
of a disability.

Figure 5.7 Culture-Language Interpretive Matrix of WISC-IV Data for Benayala (English)

The Culture-Language Interpretive Matrix: (C-LIM)
To facilitate the process of evaluating the effect of acculturation and language proficiency on test
performance, an automated version of the C-LIM has been developed, which is included on the
companion CD-ROM available with this book. The file, C-LIM, is a spreadsheet with several tabs,
the last of which mirrors the hard copy of the General C-LIM depicted in Figure 5.5. The basic look
of the general tab of the C-LIM is illustrated in Figure 5.8. The additional tabs mirror some of the
major test-specific matrices contained in Appendix D (i.e., WISC-IV, WJ III, and KABC-II). Figure 5.9
illustrates the screen for the KABC-II that is displayed when the KABC-II tab at the bottom is selected
by the user. Although the matrices are not identical representations of the figures contained in
Appendix D, the basic structure of the matrices remains the same, with the degree of cultural loading
corresponding to the vertical (up-down) direction and the degree of linguistic demand corresponding
to the horizontal (left-right) direction. Rather than having to write in the tests used in the evaluation by
hand, the specific tests may be selected by using the cursor to trigger the drop-down menus in any of
the cells and highlight/select the tests of interest. For the test-specific matrices, the subtests are already
listed as a convenience and additional spaces are provided for the inclusion of any additional subtests
that may have been administered. Once the correct test is selected in a given cell, the score for that test
must be entered manually in the space immediately to the right of the space containing the test name.
Once entered, the score will be dimmed and a converted standard score calculated. For example, if the
score is based on the Scaled Score metric (range 1 to 20), it will be converted by the program into a
deviation IQ score, which will then appear with full intensity adjacent to and to the right of the
original score. Scores that are already based on a mean of 100 and standard deviation of 15 are not
converted, but the program will transfer the score into the correct adjacent space as well. Tests that
use the t-score metric (mean of 50, standard deviation of 10) will need to be converted manually to
the deviation IQ metric (see Appendix E) prior to entering as the program will not convert such
scores at this time. Finally, the program will automatically compute each Cell Average and place it in
the lower-right corner of each cell, which then serve as the indices for evaluating the pattern of
scores.

Figure 5.8 The General Tab (Blank) Interpretive Matrix from the C-LIM

Figure 5.9 KABC-II Test-Specific Culture-Language Interpretive Matrix form C-LIM

Examples of how the C-LIM looks with data entered can be seen in Figures 5.10 and 5.11, in which
the data from the previous examples are once again entered into the worksheet to derive the
corresponding Cell Averages. Figure 5.10 contains the Cell Averages for Yuquitas WISC-IV data and
Figure 5.11 contains the Cell Averages for Benayalas WISC-IV data. The major benefit here in using
the C-LIM is found primarily in the automatic calculation of the Cell Averages. The layout of the two
principal dimensions (cultural loading and linguistic demand), the classifications, and test names all
remain the same as in the C-LIM and the patterns can be examined much as before. However, the
automated C-LIM provides an additional graphical device to assist in the determination of the
influence of cultural loading and linguistic demand on test performance not available when using the
manual-entry C-LIM. In addition to automating the data entry and calculations, a graph is produced
that provides a pictorial representation of the data as they change in value from the uppermost left-
hand cell (low culture/low language) diagonally down and across to the lowermost right-hand cell
(high culture/high language). The use of a visual modality in this case may enhance the decision
process for practitioners in determining the nature and extent of the decline in scores and whether the
attenuating pattern typical of diverse individuals is present.

Figure 5.10 Cell Averages from Yuquitas WISC-IV Data as Derived from the C-LIM

Figure 5.11 Cell Averages from Benayalas WISC-IV Data as Derived from the C-LIM

The value of the graph in examining test data with respect to the effects of cultural and linguistic
differences can be seen in Figures 5.12 and 5.13. The first illustration, Figure 5.12, contains a chart of
the Cell Averages from the Yuquita WISC-IV data, initially presented in Figures 5.6 in the C-LIM and
duplicated by the C-LIM in Figure 5.10. Similarly, the Cell Averages derived from the Benayala
WISC-IV data, originally presented in Figures 5.7 (C-LIM) and 5.11 (C-LIM), are now illustrated in
Figure 5.13.
The graph in Figure 5.12 provides a dramatic visual representation of the manner in which
Yuquitas Cell Averages (which are based on the actual obtained scores) follow the pattern of results
that would be predicted for individuals who are culturally and linguistically diverse. The general
pattern is illustrated by the dashed trend line that runs across the graph and it can be easily
discerned that each of Yuquitas Cell Averages touches the line in some manner as it descends from
the upper left to the lower right. The gray shaded area around the trend line corresponds to the range
of scores that may be expected for a typical or moderately different English learner. (See Figure
5.14 later in this chapter.) In contrast, the graph for Benayalas Cell Averages depicted in Figure 5.13
does not follow the expected pattern and three of the averages deviate from the predicted trend line.
Because these Cell Averages for Benayala do not follow the descending pattern that is typical for
individuals who are culturally and linguistically diverse, there is reason to conclude that his actual
scores were not primarily influenced by cultural or linguistic factors. Thus, a disability may well be
present and supported by his data. On the other hand, Yuquitas Cell Averages are very much in line
with the type of attenuation expected as a function of cultural and linguistic differences, and in her
case the conclusion must be that these factors, not true ability, did in fact play the primary role in
determining the test results.
Caveats When Interpreting the C-LIM
When a declining pattern of performance is not found through use of the C-LIM, practitioners must
not assume that this automatically indicates the presence of a disability. Generally, when the pattern is
absent, the most likely reason will be due to learning disability because the results will vary more as a
function of which test scores happen to be low and what constructs those tests are designed to
measure. But score deficiencies that do not follow CHC theory and that do not have logical and
empirical links to manifest academic problems may simply be anomalous and not necessarily
indicative of actual cognitive dysfunction. This is why it is important that practitioners not lose sight
of the basic tenets and principles underlying XBA. If we return to the case of Benayala for a moment,
we can review his scores within the C-LIM (see Figure 5.7 or 5.11) and apply XBA methods to analyze
them. A close inspection of his scores indicates that his poorest performance was on the Digit Span
and Letter-Number Sequencing subtests, in which he obtained Scaled Scores of 4 and 4 respectively
(converted SS = 70 and 70). Examination of the broad and narrow abilities underlying these two tests
indicate that they are both measures of Gsm, specifically Working Memory (MW ) and Memory Span
(MS). Thus, these scores belong together theoretically and under XBA principles form a cohesive
broad cluster for this ability. This provides strong support to suggest that Benayala may well have a
deficit in his Short-Term Memory (Gsm) ability and that this could be the root cause of academic
problems. If, in fact, Benayala was referred for evaluation on the basis of teacher observations and
progress monitoring that reflected difficulties with short-term memory, then the test results would
provide strong evidence of a possible learning disability. On the other hand, were the two subtests
found to be deficient in this case not measures of the same broad ability, or were measures of the
same broad and narrow ability, additional testing as specified by XBA might be necessary in order to
provide the type of theoretically and empirically supported evidence needed to determine the presence
of a disability.

Figure 5.12 Graph of Cell Averages from Yuquitas

Figure 5.13 Graph of Cell Averages from Benayalas WISC-IV Data as Charted by the C-LIM

When a declining pattern of performance is found during application of the C-LIM, practitioners
must recognize that the invalidity of their results indicates that no interpretation can be made and no
direct inferences drawn regarding levels of actual or true ability. There is always a temptation to
ascribe meaning to scores when they are derived and no doubt the effort expended in gathering the
data prompts the tendency to make some use of them. But when level of acculturation and English-
language proficiency are manifest in the pattern of test scores, they cannot be ignored or excluded
and remain confounding influences and eliminate any semblance of validity. The individual must,
therefore, be presumed to be average or otherwise normal in his/her abilities unless other
incontrovertible data are found. With respect to test scores, however, they cannot be used to bolster
the presence of any disability if they follow the predicted declining pattern.
Finally, there are some potential patterns that may show a clear, declining pattern but that may also
indicate the presence of a disability. Consider for a moment the case of an English-language learner
with a true speech-language impairment. Because the C-LIM is set up to be sensitive to drops in
performance as a function of language (and acculturation), and because language acquisition and
acculturation are highly correlated, such an individual will likely generate a pattern that shows a
systematic decline in scores that argues against the presence of a disability. However, although the
scores would very likely decline, the decline would appear to accelerate as the linguistic demands of
the tests increase. Whereas there may be little or no noticeable change in expected scores in cells
containing tests with the lowest cultural loadings and linguistic demands, as the language demands
increase the attenuation of performance is likely to be much more than what would be expected of an
English learner without a speech-language problem. Tests with the highest level of cultural loadings
and linguistic demands would be even further attenuated by what is essentially a double whammy
effectthe combined influence of linguistic difference compounded by the presence of a speech-
language problem.
Another pattern that may show a similar decline in scores, but still reflect a possible disability,
would include cases in which a diverse individual has some type of Pervasive Developmental
Disorder or Mental Retardation. In these cases, the delays in development are across the board,
affecting each ability more or less equally. Unlike an individual with only a learning disability or
speech-language impairment, the effect of the disability would attenuate virtually all scores, not just
those related to a specific deficit or those related to language. Consequently, it would not be
unexpected to find score patterns that, although indicative of a systematic decline relative to cultural
and linguistic differences, nonetheless fall far below what could possibly be considered average
scores for diverse individuals.
It is for this reason that practitioners need to remember the importance of gathering data that help
to define the background and experiences of the individual being evaluated so as to create the
appropriate context for determining expected levels of performance. Individuals with diverse
backgrounds have often been collapsed in research so that there is a presumption of homogeneity
among English learners. But such a group is more a conglomeration of acculturative experiences and
dual-language proficiencies that can vary widely, resulting in very different levels of performance.
Brighams (1923) work discussed earlier in this chapter is a good example of this. Practitioners must
therefore know to what extent the individual they seek to evaluate is different than the mainstream
expectations set by individuals on whom the test was normed. Figure 5.14 provides some general
guidelines for expected patterns of test performance for diverse individuals. The guidelines are based
on the identification of the extent of difference from the mainstream. Individuals who are markedly
different from the mainstream tend to be first generation, who have resided in the United States for a
very short period of time, are very limited in English proficiency, have little or no education in their
native language and have parents who are also poorly educated, and have low SES. As a group, they
typically will score the lowest on standardized tests given the greater attenuation induced by the
severe limitations in acculturative knowledge and linguistic comprehension and development. On the
other hand, individuals who are only slightly different (third to fourth generation) have significantly
better English-language proficiency (albeit not equal to that of native speakers), have resided in the
United States for a long period of time, have received more formal education, have parents with more
education, and come from higher SES backgrounds. Such individuals tend to score closer to the mean
of monolingual, native English speakers, although their scores are still attenuated. Because research
has focused primarily on groups of diverse learners who have reasonably good English language
proficiency (enough to be tested in English), the performance of this group tends to represent the
average or composite attenuation described in the literatureabout one full standard deviation (15
points) from the norm on the tests with the highest cultural loadings and linguistic demands.
Individuals who are moderately different tend to be second generation and fall in between the other
two groups on the various dimensions described. Practitioners may use the information contained in
Figure 5.14 to assist in determining the nature and extent of score attenuation in results that are
evaluated through the C-LIM. Because of the developmental aspects of acculturation and language
acquisitin, these guidelines help to formulate an appropriate idea regarding expected levels of
performance, which can then be evaluated directly via the C-LIM. Knowing what to expect in the first
place is a crucial step toward recognizing and understanding the type of bias that is often present in
the testing of diverse individuals.

Figure 5.14 General Guidelines for Expected Patterns of Test Performance for Diverse
Individuals

SUMMARY

The cultural and linguistic extensions to XBA described in this chapter do not offer a complete
solution to all of the problems inherent in the process of fairly and equitably evaluating the cognitive
abilities or intelligence of individuals who are culturally or linguistically diverse. It was made clear
that this approach addresses only those issues involved in test selection and interpretation, and that
there are numerous other sources of potential bias that can affect any given individuals performance
on standardized tests. Nevertheless, with due consideration of these issues combined with well-
reasoned application of the C-LTC and C-LIM, practitioners should be able to select an appropriate
set of tests that, in addition to having a strong empirical base, can also reduce the potential
discriminatory aspects involved in their use with diverse populations, and that provide a defensible,
systematic framework for evaluating the relative influence of cultural and linguistic differences that
may affect interpretation. Although the other potential sources of bias found throughout the
assessment process (e.g., inappropriate cross-cultural transactions, failure to use culture as the context
for framing behavior; see Figure 5.1) are not specifically attended to by this approach, use of XBA
methods along with the C-LTC and C-LIM represents a significant advancement in the practice of
bilingual, cross-cultural, nondiscriminatory assessment that is well within the professional reach of
most practitioners.
The treatment in this chapter of the issues involved in the assessment of individuals from diverse
backgrounds is admittedly brief and lacks much detail. The primary focus has been on issues related
to test selection and interpretation, but there are numerous other substantive issues that simply fall
beyond the limits of this chapter, including delineation of a comprehensive framework for
nondiscriminatory assessment that addresses bias and discrimination on many levels. In short, fair
and equitable assessment of individuals who are culturally and linguistically diverse rests primarily
on a thorough knowledge and understanding of the manner in which tests of intelligence or cognitive
ability may be affected by variables involving level of acculturation and language proficiency. Tests
in the array of cognitive ability batteries and special purpose tests available to practitioners today, as
well as those now emerging, tend in general to be very sophisticated and well designed. Nevertheless,
their use with individuals from diverse backgrounds continues to be plagued by assumptions that tend
to have discriminatory effects, in particular with issues involving level of acculturation and language
proficiency. Practitioners are well advised to remain aware that the research that overwhelmingly
supports the notion that such tests are not biased is based on definitions of bias that ignore the
developmental aspects of the variables and fail to understand their relationship to issues of validity.
Bias is not a function simply of item content, factor structure, or racial differences. Bias is more a
function of differences in experience that are due to factors involving many variables, including
culture and language. Moreover, it was made clear in this chapter that tests possess cultural and
linguistic characteristics to varying degrees that differentially affect the performance of individuals
who are experientially different in ways that do not occur with individuals who are experientially
comparable. According to Sattler:
Probably no test can be created that will entirely eliminate the influence of learning and cultural
experiences. The test content and materials, the language in which the questions are phrased, the
test directions, the categories for classifying the responses, the scoring criteria, and the validity
criteria are all culture bound. (1992, p. 579)

Whenever standardized, norm-referenced tests are used with individuals from diverse
backgrounds, the possibility that what is actually being measured is acculturation or English-language
proficiency, rather than ability, always exists. The information presented in this chapter provides
assistance to practitioners by guiding decisions regarding test selection in the process of conducting
evaluations of diverse individuals through XBA. Because this approach to assessment is built upon an
empirical knowledge base of theory and research concerning the relationship between cognitive
abilities and academic achievement, it allows practitioners to construct individualized test batteries
that are more responsive to the unique demands of any given assessment. Application and use of the
C-LTC and C-LIM represents a viable method for advancing assessment of diverse individuals in
ways that seek to enhance the meaning of the collected data and that provide a mechanism for
systematic and defensible evaluation of the effects of cultural and linguistic differences on test
performance.
REFERENCES

American Educational Research Association, American Psychological Association, & American
Council on Measurement. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
American Psychological Association. (1990). Guidelines for providers of psychological services to
ethnic, linguistic, and culturally diverse populations. Washington, DC: Author.
Bialystok, E. (1991). Language processing in bilingual children. New York: Cambridge University
Press.
Brigham, C. C. (1923). A study of American intelligence. Princeton, NJ: Princeton University Press.
Cole, M., & Cole, S. R. (1993). The development of children. New York: Scientific American Books.
Cummins, J. C. (1984). Bilingual and special education: Issues in assessment and pedagogy. Austin,
TX: PRO-ED.
Ehrman, M. E. (1996). Understanding second language learning difficulties. Thousand Oaks, CA:
Sage.
Esparza Brown, J. (2006, January 19). Using the Culture-Language Interpretive framework for
distinguishing between disabled and different. Workshop presented at The American Association for
Bilingual Education, Phoenix, Az.
Figueroa, R. A. (1990a). Assessment of linguistic minority group children. In C. R. Reynolds & R. W.
Kamphaus (Eds.), Handbook of psychological and educational assessment of children: Vol. 1:
Intelligence and achievement (pp. 671- 696). New York: Guilford.
Figueroa, R. A. (1990b). Best practices in the assessment of bilingual children. In A. Thomas & J.
Grimes (Eds.), Best practices in school psychology ( Vol. 2, pp. 93-106). Washington, DC: National
Association of School Psychologists.
Figueroa, R. A., Delgado, G. L., & Ruiz, N. T. (1984). Assessment of Hispanic children: Implications
for Hispanic hearing-impaired children. In G. L. Delgado (Ed.), The Hispanic deaf: Issues and
challenges for bilingual special education (pp. 124-153). Washington, DC: Gallaudet College Press.
Flanagan, D. P., & Miranda, A. H. (1995). Working with culturally different families. In A. Thomas &
J. Grimes (Eds.), Best practices in school psychology (Vol. 3, pp. 1039- 1060). Washington, DC:
National Association of School Psychologists.
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo, J. T. (2006). The achievement test desk
reference-second edition. New York: Wiley.
Goddard, H. H. (1917). Mental tests and the immigrant. Journal of Delinquency, 2, 243- 277.
Greenfield, P. M. (1998). The cultural evolution of IQ. In U. Neisser (Ed.), The rising curve: Long-
term gains in IQ and related measures. Washington, DC: American Psychological Association.
Jensen, A. R. (1974). How biased are culture-loaded tests? Genetic Psychology Monographs, 90, 185-
244.
Jensen, A. R. (1976). Construct validity and test bias. Phi Delta Kappan, 58, 340-346.
Kamphaus, R. W. (1993). Clinical assessment of childrens intelligence. Boston: Allyn & Bacon.
Kaufman, A. S. (1994). Intelligent testing with the WISC-III. New York: Wiley.
Lopez, E. C. (1997). The cognitive assessment of limited English proficient and bilingual children. In
D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment:
Theories, tests, and issues (pp. 506-516). New York: Guilford.
Matsumoto, D. (1994). Cultural influences on research methods and statistics. Pacific Grove, CA:
Brooks/Cole.
McCallum, R. S., & Bracken, B. A. (1997). The Universal Nonverbal Intelligence Test. In D. P.
Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories,
tests, and issues (pp. 268-280). New York: Guilford.
Mercer, J. R. (1979). System of multicultural pluralistic assessment: Technical manual. New York:
Muoz-Sandoval, A. F., Cummins, J., Alvarado, C. G., Ruef, M., & Schrank, F. A. (2005). Bilingual
verbal ability testsNormative update. Itasca, IL : Riverside Publishing.
Nieves-Brull, A. I., Ortiz, S. O., Flanagan, D. P., & Chaplin, W. F. (2005). Evaluation of the Culture-
Language Matrix: A validation study of test performance in monolingual English speaking and
bilingual English/Spanish speaking populations. Unpublished Doctoral Dissertation, St. Johns
University, Jamaica, New York.
Ochoa, S. H., Powell, M. P., & Robles-Pina, R. (1996). School psychologists assessment practices
with bilingual and limited-English-proficient students. Journal of Psychoeducational Assessment, 14,
250-275.
Ochoa, S. H., Riccio, C. A., Jimenez, S., Garcia de Alba, R., & Sines, M. (2004). Psychological
assessment of limited English proficient and/or bilingual students: An investigation of school
psychologists current practices. Journal of Psychoeducational Assessment, 22, 93-105.
Ortiz, S. O. (2001). Assessment of cognitive abilities in Hispanic children. Seminars in Speech and
Language, 22, 17-37.
Ortiz, S. O. (2002). Best practices in nondiscriminatory assessment. In A. Thomas & J. Grimes (Eds.),
Best practices in school psychology IV (pp. 1321-1336). Washington, DC: National Association of
School Psychologists.
Ortiz, S. O., & Dynda, A. M. (2005). Use of intelligence test with culturally and linguistically diverse
populations. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment (pp. 545-
Ortiz, S. O., & Flanagan, D. P. (1998). Enhancing cognitive assessment of culturally and linguistically
diverse individuals: Selective Gf-Gc cross-battery assessment. The School Psychologist, 52 (1), 6-9.
Ortiz, S. O., & Ochoa, S. H. (2005a). Advances in cognitive assessment of culturally and linguistically
diverse individuals: A nondiscriminatory interpretive approach. In D. P. Flanagan & P. L. Harrison
(Eds.), Contemporary intellectual assessment (pp. 234-250). New York: Guilford.
Ortiz, S. O., & Ochoa, S. H. (2005b). Cognitive assessment of culturally and linguistically diverse
individuals: An integrated approach. In R. Rhodes, S. H. Ochoa, & S. O. Ortiz (Eds.), Assessing
culturally and linguistically diverse students: A practical guide (pp. 168-201). New York: Guilford.
Ortiz, S. O., & Ochoa, S. H. (2005c). Conceptual measurement and methodological issues in cognitive
assessment of culturally and linguistically diverse individuals. In R. Rhodes, S. H. Ochoa, & S. O.
Ortiz (Eds.), Assessing culturally and linguistically diverse students: A practical guide (pp. 153-167).
New York: Guilford.
Rhodes, R. L., Ochoa, S. H., & Ortiz, S. O. (2005). Assessing culturally and linguistically diverse
students: A practical guide. New York: Guilford.
Salvia, J., & Ysseldyke, J. (1991). Assessment in special and remedial education (5th ed.). Boston:
Houghton-Mifflin.
Samuda, R. J., Kong, S. L., Cummins, J., Pascual-Leone, J., & Lewis, J. (1991). Assessment and
placement of minority students. New York: C. J. Hogrefe/Intercultural Social Sciences.
Sanchez, G. (1934). Bilingualism and mental measures: A word of caution. Journal of Applied
Psychology, 18, 765-772.
Sandoval, J., Frisby, C. L., Geisinger, K. F., Scheuneman, J. D., & Grenier, J. R. (Eds.). (1998). Test
interpretation and diversity: Achieving equity in assessment. Washington, DC: American
Sattler, J. (1992). Assessment of children (Rev. 3rd ed.). San Diego: Author.
Scarr, S. (1978). From evolution to Larry P., or what shall we do about IQ tests? Intelligence, 2, 325-
342.
Valds, G., & Figueroa, R. A. (1994). Bilingualism and testing: A special case of bias. Nor-wood, NJ:
Ablex.
Vazquez-Nuttall, E. V., Li, C., Dynda, A. M., Ortiz, S. O., Armengol, C., Walton, J., & Phoenix, K. (in
press). Cognitive assessment of culturally and linguistically diverse students. In G. Esquivel, E. Lopez,
& S. Nahari (Eds.), Handbook of multicultural school psychology. New York: Erlbaum.
Vukovich, D., & Figueroa, R. A. (1982). The validation of the system of multicultural pluralistic
assessment: 1980-1982. Unpublished manuscript, University of California at Davis, Department of
Education.
Wechsler, D. (2005). Wechsler Intelligence Scale for Children-Fourth Edition, Spanish. San Antonio,
TX: Harcourt Assessment.
Wechsler, D., & Naglieri, J. (2006). Wechsler Nonverbal Scale of Ability. San Antonio, TX: Harcourt
Assessment.
Woodcock, R. W., Muoz-Sandoval, A. F., McGrew, K. S., & Mather, N. (2005). Bateria III Woodcock-
Muoz. Itasca, IL: Riverside Publishing.
Woodcock, R. W., Muoz-Sandoval, A. F., Ruef, M. L., & Alvarado, C. G. (2005). Woodcock Muoz
Language Survey-Revised, English. Itasca, IL: Riverside Publishing.
Yerkes, R. M. (Ed.). (1921). Psychological examining in the United States Army. Washington, DC:
National Academy of Sciences.
TEST YOURSELF

1. When an examiner speaks the same language and comes from the same cultural
background as the examinee, this is sufficient in order to ensure nondiscriminatory
assessment.True or False?
2. Use of the Culture-Language Test Classifications and Interpretive Matrix assists in
(a) reducing bias related to test selection.

(b) reducing bias related to test interpretation.
(c) determining the influence of cultural/linguistic factors on test performance.

3. Use of an interpreter during the administration of a test
(a) eliminates the problem with bias in language.

(b) remains problematic even when the examiner and examinee are well trained.
(c) represents best practices.
(d) minimally affects the psychometric properties of the test.
(e) helps maintain the standardization of the test.

4. Standardized, norm-referenced assessments often show no psychometric bias because
(a) they predict equally well for different ethnic groups.

(b) the factor structure does not change as a function of race or ethnicity.
(c) they rarely contain any items that differentially affect performance.

5. Cultural bias is the same as cultural loading, just as linguistic bias is the same as
language loading. True or False?
6. Nonverbal tests
(a) despite their claims are neither entirely culture free or language free.
(b) may be as culturally loaded or possibly more than verbal tests.
(c) do not circumvent issues related to communication between examiner and
examinee.
(d) have norm samples that remain problematic and are based on monolingual
speakers.
(e) all of the above.

7. Which of the following is not true in selecting tests for XBA with culturally and
linguistically diverse individuals?
(a) Tests with reduced language demands should be selected over equivalent tests
with higher language demands.
(b) Tests with reduced cultural loadings should be selected over equivalent tests with
higher cultural loadings.
(c) Tests should only be selected with respect to the broad ability they measure.
(d) Selection of tests represents striking a balance between XBA guiding principles,
specific referral concerns, and the unique background characteristics of the
examinee.

8. When using the Culture-Language Interpretive Matrix, the expected pattern of
performance for culturally and linguistically diverse individuals should, in general,
(a) decrease diagonally from the top left to the bottom right.
(b) increase diagonally from the top left to the bottom right.
(c) should remain close to the mean for all cells.
(d) should be about one standard deviation below the mean for all cells.

9. When using the Culture-Language Interpretive Matrix, which of the following
general interpretive statements is incorrect?
(a) When scores are consistent across all cells, this indicates a possible effect of both
culture and language differences.
(b) When scores decrease across cells from left to right, this indicates a possible
effect of language difference.
(c) When scores decrease across cells from top to bottom, this indicates a possible
effect of cultural difference.
(d) When scores decrease across cells from left to right and from top to bottom, this
indicates a possible effect of both culture and language differences.

10. The Culture-Language Test Classifications and Interpretive Matrix described in
this chapter offer a complete solution to all of the problems inherent in the process of
fairly and equitably evaluating the cognitive abilities or intelligence of individuals
who are culturally or linguistically diverse. True or False?

Answers: 1. False; 2. d; 3. b; 4. d; 5. False; 6. e; 7. c; 8. a; 9. a; 10. False

Six

STRENGTHS AND WEAKNESSES OF THE CROSS-BATTERY APPROACH

It is interesting that strengths and weaknesses of a particular assessment approach or instrument are
not always based entirely on their inherent properties. In some cases there may be significant external
influences that determine the nature of some of these strengths and weaknesses and ultimately how the
approach or instrument is viewed by practitioners. Approaches that are consistent with mainstream
ideas and practices are generally well accepted, despite the fact that they may have significant
weaknesses and limitations. The application of discrepancy models in evaluation of SLD is an
excellent example of this phenomenon. Approaches that run counter to prevailing thought and
practice are often not used widely, even if they are psychometrically and theoretically sound. This is
because the paradigmatic shift that is typically necessary to open practitioners up to unfamiliar ways
of thinking and doing occurs slowly, and sometimes not at all (Kuhn, 1996).
We believe the principles and procedures that characterize the XBA approach fall into this latter
category. This is not to imply, however, that all the potential weaknesses or limitations of the
approach are external to it. As will be described in this chapter, there are some issues particular to the
approach that certainly merit discussion. But there are other issues, often associated with the
approach, that are best viewed within the context of developments in the field of psychological
assessment.
Much of the impetus for the development of the XBA approach came from a recognition of the
relatively poor manner in which purported cognitive ability constructs were represented on many of
the major intelligence batteries (Flanagan & McGrew, 1997; Flanagan, McGrew, & Ortiz, 2000;
Flanagan & Ortiz, 2001; McGrew & Flanagan, 1998). In addition, most intelligence batteries did not
measure more than two to four broad cognitive abilities/processes. Finally, because many intelligence
batteries also lacked a clear, empirically supported theoretical foundation, test interpretation was not
always defensible (Carroll, 1998). Thus, XBA emerged as both a measurement and interpretation
approach that was intended to bring theoretical and empirical rigor to the practice of assessing and
interpreting psychological constructsactivities that had long been based on little more than clinical
lore and pseudoscientific mathematics (Kamphaus, Winsor, Rowe, & Kim, 2005).
Strong, and sometimes vehement, objections to some of the principles and practices that comprised
the XBA approach when it was first presented to the field (Flanagan & McGrew, 1997; Flanagan &
Ortiz, 2001; McGrew & Flanagan, 1998) were not only encountered, but were expected. A great deal
of the alarm has been effectively quelled, however, now that CHC theory and its ever-mounting
research base have made tremendous inroads into the consciousness of researchers, trainers, and
practitioners. That is, there are significantly fewer knee-jerk reactions to XBA, allowing us to focus
on the more substantive issues that warrant consideration. Despite initial criticisms, the XBA approach
has served as a likely catalyst for the kinds of dramatic changes that have occurred in test construction
over the past 5 or more years (Alfonso, Flanagan, & Radwan, 2005; McGrew, 2005).
Prior to the publication of the WJ III (Woodcock, McGrew, & Mather, 2001), tests that were based
on clear, modern, and empirically supported intelligence theory were virtually nonexistent. Test
construction in particular did not appear to pay enough attention to issues regarding construct over-
representation, construct underrepresentation, and construct-irrelevant variance and such issues were
rarely addressed in test manuals. History tells us that test development generally proceeds at a glacial
pace. For example, users of the original Wechsler-Bellevue (Wechsler, 1939) would not have found
the WAIS-III (Wechsler, 1997) to be at all unfamiliar despite the passing of nearly 6 decades, during
which time tremendous advancements were made in theory and research on cognitive development
and the structure of cognitive abilities. Nevertheless, with the advent of the new millennium, the field
has been introduced to a host of revisions to the vast majority of major intelligence batteries,
including the WJ III (Woodcock et al., 2001), SB5 (Roid, 2003), WPPSI-III (Wechsler, 2002), WISC-
IV (Wechsler, 2003), KABC-II (Kaufman & Kaufman, 2004), and DAS-II (Elliott, 2007). For an
industry that has been historically conservative about modifying its tests, for whatever reasons, this
sudden change in stance is a rather stunning developmentone that we believe is due in no small way
to the influence of the XBA approach (Alfonso et al., 2005; McGrew, 2005). In fact, the XBA approach
was the first operationalization of contemporary CHC theory (i.e., the integration of the Cattell-Horn
and Carroll frameworks; see Flanagan, McGrew, & Ortiz, 2000). Woodcock, McGrew, and Mather
then used the integrated theory proposed in Flanagan et al.s (2000) book as the foundation for the WJ
III, which was published in 2001 (McGrew, 2005).8 Shortly thereafter, most major intelligence
batteries followed suit.
The majority of batteries previously noted do not reflect the same kind of meager cosmetic
changes and normative updates characteristic of prior versions (e.g., WISC-III; see 1993 Journal of
Psychoeducational Assessment Monograph). Rather, some of the changes have been profound indeed.
For example, one of our colleagues (Kevin McGrew) was fond of noting that David Wechsler did not
bring the 12 Wechsler subtests down from the mountain (a la Moses). His comment was in
reference to the tendency to see the very same hallowed subtests in revision after revision. Yet the
current WISC-IV maintained only 7 of the original 12 subtests, of which 3 were relegated to
supplemental status. More importantly, 5 new tests were added to the WISC-IVmore than the total
number of new tests added to all earlier revisions combined. But perhaps the most dramatic change is
evident in the measurement/interpretation arena, where the ubiquitous VIQ and PIQ, long-standing
staples of the Wechsler IQ family, no longer exist. It frightens us to think of how many individuals
have been diagnosed as having a specific learning disability on the basis of a VIQ/PIQ discrepancy.
The degree of change evident in the current versions of the major intelligence tests is substantial
indeed and most likely not coincidental. That the KABC-II, SB5, and DAS-II were all constructed
based upon aspects of CHC theory and that the Wechslers, despite the use of some proprietary terms,
also follow the basic precepts of CHC theory is more than a coincidence. It is the direct result of
overwhelming evidence, indeed an entire network of validity evidence, in support of CHC theory
(e.g., Carroll, 1993). While there have been batteries based on earlier versions of Fluid-Crystallized
theory (e.g., SB:FE, KAIT), the WJ-R was the only battery that was consistent with the most current
evidence of the structure of cognitive abilities/processes. Richard Woodcock, author of the WJ-R,
consulted with both John Horn (the H in CHC theory) and John Carroll (the second C in CHC theory)
during the development of the WJ-R and WJ III. In addition, the authors of the WJ III (including Kevin
McGrew, who, along with Dawn Flanagan, developed the cross-battery approach) extended the
theoretical basis of the test, which was based in part on the principles underlying the XBA approach.
In short, with few exceptions, issues related to theory and construct representation at best played a
minor role in the development and revision of intelligence batteries. Presently, however, nearly all
test authors and publishers highlight these factors when touting the newest incarnations of their
intelligence batteries. There is little question that the XBA approach was one of the major
contributions to the literature that facilitated this monumental shift in test development and
construction (see McGrew, 2005, for a comprehensive historical account of such contributions).
We cite the connection between the XBA approach and the significant changes that have taken place
and that are continuing in the field of test development, not so much to highlight our own
achievements, but rather to underscore the equally dramatic changes in the strengths and weaknesses
of the approach itself. For example, a little over 5 years ago, practitioners were saddled with tests that
were largely inadequate in many ways and that, at that time, required frequent crossing of at least two
batteries, sometimes more, to achieve adequate representation of about seven broad cognitive
abilities/processes. The emergence of better batteries has significantly reduced this need, although it
has not eliminated it. As a consequence, we have made alterations to the XBA approach in response to
these types of extrinsic events. It is, therefore, important that practitioners understand the nature of the
strengths, weaknesses, and misconceptions highlighted in the following within the context of the
larger changes in the field of psychological testing. That is, the differences between what were
previous and what are current strengths and weaknesses of the approach are based more on the
broader impact that the approach has had on the field as it relates to test development and
interpretation, and not so much on significant flaws or problems inherent in the approach.
Strengths

Modern Theory
The XBA approach was designed to address problems in assessment-related fields as they relate to
both measurement and interpretation. The changes cited in current test development have reduced
many of the measurement problems, but not all of them. Specifying exactly how to address
difficulties that exist in traditional, and many current, approaches to test interpretation continues to be
one of the more compelling features of the XBA approach. In either case, XBA continues to be based
on the most empirically supported and well-validated theory of the structure of cognitive
abilities/processes, namely Cattell-Horn-Carroll (CHC) theory. Despite more than 7 decades of
systematic research, we are only just beginning to feel the impact of CHC theory, particularly on test
development. Its influence is both undeniable and overwhelming and is responsible in large part for
advancing the knowledge base of abilities/processes that practitioners routinely seek to measure and
understand. By utilizing this theoretical paradigm, the XBA approach has the advantage of being
current and in line with the best available scientific evidence on intelligence and cognitive
abilities/processes.

Communication
During the development of the XBA approach, McGrew (1997) and McGrew and Flanagan (1998)
compiled CHC classifications for the subtests comprising all intelligence batteries and numerous
special purpose tests of cognitive abilities/processes. This CHC (then Gf-Gc) classification system set
the stage for improving communication among professionals. Most scientific disciplines have a
standard nomenclature (i.e., a common set of terms and definitions) that facilitates communication
and guards against misinterpretation. For example, the standard nomenclature in chemistry is
reflected in the Periodic Table; in biology, it is reflected in the classification of animals according to
kingdom, phylum, class, and so on; in psychology and psychiatry, it is reflected in the Diagnostic and
Statistical Manual of Mental Disorders; and in medicine, it is reflected in the International
Classification of Diseases. Underlying the XBA approach is a standard nomenclature or Table of
Human Cognitive Abilities that currently includes classifications of over 500 cognitive and
achievement tests according to the broad and narrow CHC abilities/processes they measure. The XBA
classification system has had a positive impact on communication among practitioners, has improved
research on the relations between cognitive and academic abilities, and has resulted in substantial
improvements in the measurement of cognitive constructs, as is seen in the design and structure of
current intelligence batteries.

Evaluation of Specific Learning Disability (SLD) and Culture-Language Differences
It stands to reason that if abilities/processes are understood within an empirically supported
theoretical framework, and if there is less confusion and more precision about what an
ability/process is, then all aspects of assessment are likely to benefit. This is perhaps best exemplified
in the assessment of SLD and in cases in which the examinee is culturally and linguistically different
(CLD). The former issue was discussed at length in Chapter 4. The advantages evident in the
evaluation of suspected SLD using XBA methods include (a) clear specification of the relations
between abilities/processes and academic outcomes; (b) elimination of atheoretical notions regarding
the cognitive abilities/processes underlying the disorder; (c) application of empirical research in
understanding patterns of test performance; and (d) more rigorous measurement of constructs.
Likewise, application and use of the Culture-Language Interpretive Matrix (see Chapter 5), provides
practitioners with the means to evaluate the potential attenuating influence of cultural and linguistic
factors on test performance. This method focuses attention squarely on the issue of validity so that it
can be directly evaluated prior to any interpretation of test results.

Flexibility
A particular advantage of XBA rests with its flexibility in being able to respond to the particular
referral concerns and practitioner needs in assessment. Few of the questions that prompt
psychologists to conduct a given evaluation can be answered by the administration of a single battery.
Likewise, if the focus of an evaluation is comprehensive in nature, not every individual battery will be
able to provide adequate measurement of all the broad and narrow CHC abilities/processes
considered germane to any given referral. The principles and practices inherent in XBA allow
practitioners to obtain any type of data, including information about general ability, broad
abilities/processes, and specific narrow abilities/processes for whatever the purpose of assessment
(e.g., comprehensive, selective, diagnostic, screening).

Automation
Perhaps the most significant advantage and newest development of the XBA approach can be found on
the CD-ROM that accompanies this book. On this disc, practitioners will find three programs that
provide a degree of automation never before available for engaging in XBA. The main program,
Cross-Battery Assessment Data Management and Interpretive Assistant (XBA DMIA) replaces the
need for manual calculations that were required on the worksheets found in Appendix A in the first
edition of this book. In addition to conducting all the necessary calculations, the program analyzes
scores, reports valid clusters, and graphs results automatically, all of which greatly streamline the
process and increase the efficiency with which XBA is accomplished. Additional programs available
on the CD-ROM include the Specific Learning Disability Assistant (SLD Assistant) and the Culture-
Language Interpretive Matrix (C-LIM). The former program is designed to inform decisions
pertaining to the issue of establishing underachievement in SLD evaluations. That is, rather than
relying on a discrepancy analysis, the program assists in evaluating whether an individuals intact
abilities/processes comprise a general pattern of otherwise normal ability in areas not strongly
related to the identified academic deficits. Prior to this program, no such quantitative method existed.
The purpose of the latter program is to provide a graphical structure useful in evaluating data from
standardized tests to determine the relative influence of limited En-glish proficiency and level of
acculturation on test performance. The previous version of the Culture-Language Interpretive Matrix
required manual entry of test names and scores and had no provision for graphing the results. The C-
LIM provides a systematic and automated method for evaluation of cultural and linguistic factors that
may be present in the evaluation of individuals from diverse backgrounds.
Weaknesses

Norm Samples
In the original conceptualization of XBA presented in the first edition of this book, significant
criticism was raised regarding the fact that there was no internal norm group associated with XBAs
and thus the aggregation of data across batteries was invalid. Such criticism has proven to be rather
short-sighted and specious in that it runs counter to basic scientific principles and research
conventions. For example, if it were accepted that conclusions and inferences drawn from test data are
particular only to a single battery, then we could never generalize findings or correlate performance
on tests that ostensibly measure the same thing. Any test manual that states that results cannot be
compared to results from any other test due to differences in the norm groups is not a particularly
useful test. Many applied psychologists, neuropsychologists in particular, have built an entire career
on the practice of collating and interpreting data across a diverse set of instruments that were normed
at different times, on different samples, and so forth. Such is the customary practice of psychological
assessment and it is parallel to that which occurs in the research arena. That is, when one researcher
publishes findings from an experiment involving a particular group, the findings are generalizable to
the extent that the sample population approximates or represents well the true population. Thus, when
another researcher seeks to replicate the findings, the same effect should be observed in spite of the
fact that the second experiment is carried out with individuals who are different than those included in
the original experiment but who nonetheless are drawn from the same population. Were this not a
valid process, all experiments and replications would need to be carried out on the very same subjects
each and every time. Generalizability of results would extend only to those actually involved in the
experiment and no inferences could be made to others even if they presumably would be represented
by the sample population.
This is, of course, not the way science works. When a test publisher creates a norm sample, great
effort is made to comprise a group that very closely approximates several characteristics of the
general population on whom the test is designed to be used. If the test developer succeeds in creating
just such a representative sample, then the utility and validity of the test for use in the general
population is established. Likewise, the validity of inferences and interpretations that may be drawn
for a given individual in comparison to the performance of the norm sample is also established. To
say that there is something psychometrically wrong with making inferences from test scores that
come from different batteries but that measure similar constructs is akin to saying that one of those
norm samples is not actually representative of the intended target population. Major intelligence
batteries published in the United States are designed to be used with nearly any individual residing in
the United States. To maintain this utility means that each battery necessarily constructed a
representative sample of the U.S. population. What could be more equivalent? Thus, to the extent that
each battery indeed succeeded in creating an adequate norm sample, and we have not seen any
evidence to the contrary, we are confident in saying that XBA needs no internal norm group given the
stellar representation provided by the norm samples of each battery. As a means of controlling for
spurious differences that may be found as a direct result of differences in the characteristics of norm
samples, the XBA approach includes guidelines stating that, when crossing batteries, examiners
should use tests that were normed within a few years of one another. As such, all the tests included in
this book were normed within the span of 10 years. As a general rule of thumb, the closer the
publication year between batteries, the greater the likelihood that the norm samples are more similar
than they are different.

Complicated
One of the most frequent criticisms we have heard about the XBA approach is that it is more
complicated than traditional methods of assessment and interpretation. This perception may be due to
the fact that XBA employs a high degree of theoretical and psychometric rigor and, therefore, a clear
understanding of these issues is necessary on the part of the practitioner. It seems ironic then that
criticism should take the form of complexity when the issue rests more perhaps with insufficient or
inadequate training. The use of standardized, norm-referenced tools is one of the hallmark activities
of psychological practice. The implications and consequences of their use are significant and serious
in nature. It would seem that all psychologists would aspire to the highest levels of competency that
would ensure such practices are carried out with the type of precision and defensibility necessary to
justify the decisions that will ultimately emanate from their use. Thus, we find it hard to apologize for
expecting practitioners to be well versed in modern cognitive theory and applied psychometrics. In
reality, current tests do a reasonably good job of providing the necessary adequacy in terms of
construct measurement and scaling. What remains a complicated issue is the lack of interpretive
guidelines for test resultssomething that test developers have long avoided and left to individual
clinical judgment.
With respect to the process of interpretation, there are few other methods available that provide the
kind of guidance found in XBA for making sense of collected data and creating defensible positions
relative to an individuals functioning. Interpretation is never a simple issue and rightly remains a key
component of psychological practice that justifies extensive training and continued education. To say
that XBA requires practitioners to be knowledgeable and highly competent is true. But the same can
be said of the process of interpretation in general, irrespective of the data obtained or the approach
used in the assessment.

Time Consuming
There may have been some validity to this criticism in the past. However, as noted in the previous
section, the inclusion of new automated programs for the XBA approach has essentially rendered this
point moot. Practitioners now have the capability to engage in XBA procedures with no more
expenditure in time and effort than what is commonly used in current single-battery assessment
practices. In fact, the nature of the test battery revisions noted at the outset of this chapter has resulted
in a significant reduction in many of the XBA activities that were previously necessary. Less effort is
necessary to form adequate representations for a given broad ability/process and batteries tend to
measure more broad abilities/processes than before. On the whole, XBA is rapidly becoming as
efficient as any other approach a practitioner may choose to employ in assessment.

Subtest Order
One concern that has been raised repeatedly about XBA is the issue of giving tests outside of the
comprehensive framework of standardization. The concern is one of whether, in cases in which a test
battery was standardized using a particular order of subtest administration, using and administering
only selected tests necessarily violates standardization because the original order was not maintained.
In other words, in the absence of explicit instructions from the publisher that allow for the
administration of only specific subtests, are we legally or ethically required to administer the entire
test, even if we may be interested in only a particular set of tests or composite scores? The answer is
no, and it is our contention that, in a wide variety of cases, individuals have routinely been given only
a portion of certain batteries because other portions were deemed invalid or inappropriate for that
individual. For example, giving only the Wechsler Performance subtests to culturally or linguistically
diverse children has been a common practice for decades and remains so to the present day (albeit,
like discrepancy it is not the best approach and has limitations). Similar selective administrations are
often done with children who are blind, deaf, motor impaired, and so on, and selective test
administration has been the foundation of many past and current procedures used in the field of
neuropsychology. Therefore, unless a tests manual specifically states that the validity and reliability
of the test is maintained only when every single subtest is given, we see no reason why XBA methods
should violate standardization any more or less than do the other selective procedures just mentioned
that are in common practice today. In fact, in their respective standardizations, the WJ III (Woodcock
et al.), the SB-5 (Roid, 2003), and the DAS-II (Elliott, 2007) all utilized varied administration
sequences. Moreover, not only do test publishers routinely omit prescriptions against using selected
subtests or portions of the test, they also do not state that the use of the battery and interpretation of
resulting scores are valid only if all subtests are administered. Therefore, it would seem that any
alternative use of a test (whether with particular populations or through different theoretical
foundations) is left up to the professional judgment of the examiner and that the examiner, in such
cases, is wholly liable for providing a suitable (i.e., defensible) rationale for whatever decisions were
made and actions taken. Clearly, the emphasis on both systematic evaluation and empirically based
decision making inherent in XBA make it a method that is significantly more defensible than most
other accepted procedures.
MYTHS AND MISCONCEPTIONS ABOUT XBA

Although a discussion regarding strengths and weakness is always appropriate, we believe that
additional valuable information for practitioners can be generated via a discussion of the various
myths and misconceptions that have surrounded the XBA approach. To that end, the following section
offers a list of some of the more common misconceptions we often encounter along with a brief
response that we hope will serve to clarify the issue and provide practitioners with information that is
useful in evaluating the merits of the approach for themselves.

Misconception 1. In order to conduct XBAs, I will need access to all the major intelligence batteries or
at least most of them.
Response: Actually, at most, only two intelligence batteries are needed to conduct XBA, even when a
comprehensive assessment is desired. Access to additional batteries or special purpose tests may be
required if more in-depth assessment in a given domain, especially in narrow ability/processing
domains, is necessary or if the chosen core battery does not offer adequate construct representation.

Misconception 2. The CHC theory underlying the XBA approach is supported by only factor-analytic
evidence and, therefore, is limited.
Response: The reality is that CHC theory is supported by a network of validity evidence
(developmental, neurocognitive, etc.) stretching across 70 years and that exceeds that in support of
any other psychometric theory of multiple cognitive abilities. Therefore, within the psychometric
tradition, it is the principal theory around which cognitive functioning should be interpreted. See
McGrew (2005) and Horn and Blankson (2005) for summaries of validity evidence for the CHC
constructs.

Misconception 3. Some abilities do not correlate highly with g (or general intelligence), such as Gs,
and therefore are not important to measure.
Response: The importance of assessing certain CHC broad abilities /processes should be guided by
referral concerns as well as their established relations to outcome criteria (e.g., academic
achievement). It is incorrect, however, to assume that an ability is unimportant solely on the basis of
its relation to g (see Flanagan, Ortiz, Alfonso, & Mascolo, 2006). For example, processing speed (Gs)
has been found to be a significant predictor of reading fluency (see Chapter 2) and should not
therefore be summarily dismissed because of its historically low g-loading.

Misconception 4. Combining tests from different intelligence batteries is an inappropriate and invalid
use of intelligence tests.
Response: The cluster scores derived from XBA are more reliable and valid than scores from
individual tests (i.e., subtests) because the XBA composites are based on the aggregate of two
qualitatively different and empirically strong narrow ability/processing indicators of a particular
broad ability construct. Whether these composites are drawn from a single battery or from across
batteries does not diminish their inherently superior psychometric properties. The issue of norm-
sample differences was addressed in the previous section, where it was shown that it creates no
significant limitation or problem for XBA. Indeed, because of the greater precision in measurement
and the application of theory to guide interpretation in a systematic way, XBA data, in some cases,
will likely be more valid than inferences made based on the results from a single intelligence battery.

Misconception 5. Because most current intelligence tests are grounded in CHC theory, there is not as
great a need to combine tests from different batteries to form broad ability/processing clusters.
Response: Most intelligence batteries yield broad ability/processing clusters that are derived from
qualitatively different aspects of the constructs presumed to underlie them. Apart from the WJ III,
however, no single battery provides adequate representation of all the major CHC broad
abilities/processes; and none of them, including the WJ III, comes close to measuring all the narrow
abilities/processes adequately. This, of course, would be impractical anyway. But there may well be
times when a referral necessitates the measurement of specific narrow abilities/processes that are
germane to an understanding of the presumptive cause of academic skill deficits. In these instances it
is likely that such abilities/processes are either poorly measured or not measured at all by a particular
battery, necessitating the use of an alternative battery or test. Crossing batteries in general, therefore,
may simply involve administering a core battery and supplementing it with broad ability/processing
clusters from another batteryclusters that represent abilities not measured by the core battery. In
cases in which aberrant findings result, as well as in those instances in which measurement of specific
narrow abilities/processes is considered necessary, it is likely that there will be a need to use more
than what is offered by a single battery.

Misconception 6. Traditional assessment data are easily interpreted through the plotting of test scores
and well-established intra- and inter-individual discrepancy procedures. XBA data are not as easily
interpreted.
Response: XBA data may be interpreted in much the same way as traditional assessment data.
However, the XBA-integrated inter-individual analysis procedures are far in advance of those
associated with traditional batteries (e.g., ipsative) because they are psychometrically and theoretically
defensible and are conducted as part of a systematic approach to assessment. Also, XBA results can
now be plotted on a graph automatically via use of the XBA DMIA.

Misconception 7. The language of the XBA approach is not user friendly.
Response: The acceptance of CHC terminology into the common vernacular of psychological testing
means that the language used in XBA reports is no more confusing or difficult than that used in a
standard or traditional Wechsler report. Although the former uses CHC terminology (as opposed to
verbal/nonverbal or simultaneous/successive terminology), it is the ethical responsibility of the
practitioner to communicate the meaning of any psychological term, particularly to non-
psychologists. In point of fact, it is no more difficult to describe Visual Processing (a broad ability
within the CHC framework) than it is to describe Simultaneous Processing or Nonverbal ability.
Moreover, because nearly all current intelligence tests are based on CHC theory and detailed CHC
interpretive approaches have been developed for those that are not (e.g., WISC-IV; Flanagan &
Kaufman, 2004), all professionals engaged in the assessment of cognitive abilities and processes
should now be well-versed in CHC terminology.
REFERENCES

Alfonso, V. C., Flanagan, D. P., & Radwan, W. (2005). The impact of Cattell-Horn-Carroll theory on
test development and interpretation of cognitive and academic abilities. In D. P. Flanagan & P. L.
Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 185-202). New
York: Guilford.
Bracken, B. A., & McCallum, R. S. (Eds.). (1993). Wechsler Intelligence Scale for Children- Third
Edition: A monograph of the Journal of Psychoeducational Assessment. Knoxville, TN:
Psychoeducational Corporation.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge,
England: Cambridge University Press.
Elliott, C. (2007). Differential Ability Scales-Second Edition. San Antonio, TX: PsychCorp.
cognitive abilities: Narrowing the gap between practice and cognitive science. In D. P. Flanagan, J. L .
Genshaft, & P. L . Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues
Flanagan, D. P., McGrew, K. S., & Ortiz, S. O. (2000). The Wechsler intelligence scales and Gf-Gc
Theory: A contemporary approach to interpretation. Needham Heights, MA: Allyn & Bacon.
A guide to learning disability assessment (2nd ed.). New York: Wiley.
Horn, J. L., & Blankson, N. (2005). Foundation for better understanding of cognitive abilities. In D. P.
Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues
Kamphaus, R. W., Winsor, A. P., Rowe, E. W., & Kim, S. (2005). The history of test interpretation. In
D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and
issues (pp. 23-38). New York: Guilford.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines,
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children- Second Edition.
Kuhn, T. S. (1996). The structure of scientific revolution. Chicago: University of Chicago Press.
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of cognitive abilities: Past, present and future.
In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and
issues (pp. 136-182). New York: Guilford.
Wechsler, D. (1939). Wechsler-Bellevue Scale of Intelligence. New York: The Psychological
Corporation.
Wechsler, D. (2002). Wechsler Primary and Preschool Scale of Intelligence-Third Edition. San
Woodcock, R. W., & Johnson, M. B. (1989). The Woodcock-Johnson Psycho-educational Battery
Revised. Allen, TX: DLM Teaching Resources.
Woodcock, R. W., McGrew, K. W., & Mather, N. (2001). The Woodcock-Johnson III Tests of Cognitive
TEST YOURSELF

1. New developments in science are readily appreciated and quickly embraced by

members of the established professional community. True or False?
2. The XBA approach emerged initially in response to the
(a) lack of theoretical foundation characteristic of nearly all intelligence tests.

(b) lack of or poor representation of cognitive constructs in many intelligence tests.
(c) absence of any systematic guidelines for interpretation from intelligence test
publishers.

3. The most likely reason for the dramatic changes that have occurred in the
development and structure of intelligence tests in the past 5 years is due to
(a) a sudden collective psychometric epiphany by test publishers.

(b) psychometric and theoretical inadequacies having been exposed by XBA
methods.
(c) the Flynn Effect.
(d) unprecedented advances in measurement and intelligence theory.

4. Among the first intelligence tests to be based on a theoretical conceptualization of
human cognitive abilities were the
(a) WJ-R and K-ABC.

(b) WAIS and WISC.
(c) Stanford Binet: FE and Stanford Binet:5.
(d) Army Alpha and Beta.

5. The current guidelines for XBA require knowledge of at least one core and one
supplemental intelligence battery. True or False?
6. The introduction of a standard or common nomenclature for cognitive
abilities/processes promoted by XBA methods
(a) facilitates communication among professionals even across different disciplines.

(b) helps to ensure that operationalization of a given construct is more consistent.
(c) significantly improves the understanding and measurement of concepts such as
specific learning disability (SLD).

7. Perhaps the most recent significant development in XBA relates to
(a) refinement in construct representation.

(b) advances in psychometric theory.
(c) the creation of software programs that automate and facilitate the entire process.
(d) lack of convergence in modern intelligence theory.

8. The XBA approach has sometimes been criticized (often unfairly) for which of the
following reasons
(a) being too complicated.

(b) being too time consuming.
(c) not being based on an internal norm sample.

9. Although current intelligence batteries have improved significantly with respect to
coverage of CHC abilities, XBA remains an important option for practitioners
because
(a) it can assist in understanding and untangling aberrant test findings.

(b) no single battery provides options for measuring all of the abilities that may
become the focus of a given evaluation.
(c) guidelines for interpretation of cognitive abilities/processes from test publishers
are still lacking or atheoretical.

10. Although the XBA approach allows for measurement of a wide range of cognitive
abilities/processes, some subtests have low g loadings so that its not really important
that they be included in most assessments. True or False?

Answers: 1. False; 2. d ; 3. b; 4. a; 5. True; 6. d; 7. c; 8. d; 9. d; 10. False

Seven

CROSS-BATTERY ASSESSMENT CASE REPORTS

This chapter includes two psychoeducational evaluations that were carried out following the methods
described in this book. The first case involves a first-grade student, named Bobby Mandell, who has
deficits in speech as well as significant learning difficulties, despite having received remedial
intervention services. The evaluation was conducted to determine whether this student meets
eligibility requirements for special education services and to gain the information necessary to
develop an alternative remedial education plan. The second case involves a 14-year-old, bilingual
student, of Hispanic heritage, named Juan Pablo. He was referred for evaluation by his seventh-grade
teacher who reported concerns with behavior, but more so with academic skills, particularly in
language arts. His teacher indicated that he appears to be well below grade level in his ability to
absorb and apply written material and expresses himself poorly in writing. She is concerned about his
work particularly because she believes he speaks English very well, with no accent, and therefore,
his academic difficulties are not the result of having English as a second language. Despite
numerous intervention attempts to help Juan improve his schoolwork, including direct tutoring
before and after class, he has made very little progress. Members of the prereferral team
acknowledge that there are some family issues that may be affecting his classroom behavior and
academic performance; however, these factors are not seen as the primary reason for his poor
schoolwork. Consequently, the prereferral team has begun to suspect that a learning disability might
be present and therefore made a formal referral for special education evaluation. Juan was assessed
using the KABC-II as the primary instrument and the WJ III as the supplemental instrument to ensure a
comprehensive evaluation of CHC broad abilities. In addition, C-LIM was used to assess the degree to
which cultural and linguistic factors may have affected his test scores so that results could be
interpreted in the least discriminatory manner possible.

Case Study 1
CONFIDENTIAL PSYCHOLOGICAL REPORT

Name: Bobby Mandell
Evaluation Date(s): August 2006
School: Astor Elementary
Parents/Guardian: Joan and John Mandell
DOB: 04/19/2000
Address: 123 Smith Road
Jamesburg, NY 01111
Grade: 1 (2006-2007 school year)

REASON FOR REFERRAL

Bobby was referred by his parents for an independent evaluation of cognitive and academic
functioning. They reported that Bobby demonstrated significant academic difficulties throughout his
kindergarten year and, therefore, was identified as a candidate for retention. Bobby was also found
eligible for academic intervention services in the areas of reading and mathematics and received
these services in kindergarten.
Although school personnel suggested that Bobbys learning difficulties might be attributed
primarily to visual issues (i.e., inadequate eyeglass prescription strength), his parents believed that his
learning problems extended well beyond visual concerns. Nevertheless, Mr. and Mrs. Mandell had
Bobby evaluated by a vision specialist at St. Josephs hospital. Bobby was diagnosed with bilateral
amblyopia and, although his ophthalmologist suggested that his learning difficulties might be partly
related to this condition, he reported that it probably does not fully explain such difficulties.
Having ruled out visual issues as a primary factor of his learning difficulties, Bobbys parents
continued to seek evaluations in an effort to better understand his struggles. Specifically, Mr. and Mrs.
Mandell had Bobby evaluated by a speech-language pathologist and an audiologist at a university-
based clinic. The results of both evaluations suggested specific cognitive and language deficits and
the speech-language evaluation, in particular, resulted in a diagnosis of Speech Impaired.
The present evaluation was undertaken to extend previous findings regarding Bobbys cognitive
difficulties as well as to provide information regarding Bobbys current level of academic
functioning. Bobbys parents also expressed an interest in translating relevant findings from the
present evaluation into educational recommendations that might assist him in meeting the academic
demands of the first-grade curriculum within an appropriate educational program.
BACKGROUND INFORMATION

Although the present evaluation was conducted during the summer, following his kindergarten school
year, Bobby has recently entered the first grade at Astor Elementary School. Bobbys rate of progress
in kindergarten was considered inadequate, as he was identified as a candidate for retention. He
received pull-out reading support in kindergarten and, according to school records, was scheduled to
receive Math and Language Arts support during first grade. In addition to academic difficulties noted
by school personnel, Mr. and Mrs. Mandell reported that Bobby has difficulty with retention and
recall of information and exhibits significant articulation problems (e.g., some words are slurred with
a lisp; he has difficulty producing certain sounds; some words are unintelligible).
Despite the specific academic difficulties described by school personnel and his parents, Bobby
reportedly pays attention in class and puts forth good effort during learning activities. Mr. Mandell
indicated that Bobbys motivation to learn is seen at home. For example, Bobby often asks his father
to complete academically related workbook pages with him. Despite motivation for learning, Mr. and
Mrs. Mandell are concerned that Bobbys attitude toward school will be adversely affected if his
learning needs are not adequately addressed. Although they appreciate the pull-out support offered by
the school district, they do not believe that it is sufficient. For example, Bobby became frustrated
recently by the first-grade curriculum, as evidenced by his crying in class when he did not understand
the task at hand.
DEVELOPMENTAL/HEALTH HISTORY

Information regarding Bobbys developmental and health history was obtained via reviews of prior
evaluations and interviews with Bobbys parents. Bobbys parents reported that his physical
development is within normal limits for his age. With respect to Bobbys developmental history, his
mother indicated that pregnancy and birth were normal and uneventful. Although feeding difficulties
were noted in infancy, developmental milestones, with the exception of speech and language
development, were reportedly attained within normal limits. Information regarding Bobbys health
history revealed chronic middle ear infections, bilateral myringotomies at age 2, and a diagnosis of
asthma at age 3. Bobby began wearing glasses at age 4. Information obtained from recent evaluations
suggests that Bobbys present health status is good. Bobbys parents deny any familial history of
neurological, hearing, or psychological impairment. Notwithstanding, Bobbys two older brothers
(ages 12 and 15) have documented learning disabilities and are currently receiving special education
services.
ASSESSMENT/EVALUATION PROCEDURES

Kaufman Assessment Battery for Children-Second Edition (KABC-II) 8/29/2006

Kaufman Test of Educational Achievement-Second Edition (KTEA-II) selected subtests
8/31/2006
Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV ) selected subtests 8/31/2006
Review of Work Samples
Review of Select School Records
Review of Previous Testing

BEHAVIORAL OBSERVATIONS

Bobby was cooperative and appeared to put forth his best effort throughout the present testing.
Rapport was gradually established and steadily improved throughout testing. Bobby readily
responded to questions presented by the examiner and provided subjective evaluations of his school
experience and the present testing (e.g., interactions with teachers, difficult assignments, favorite
tasks). Bobby also initiated spontaneous conversation with this examiner and spoke about
nonacademic interests (e.g., his siblings, his rock collection). Bobby reported that he was not looking
forward to the upcoming school year and stated that he did not like school because the teachers did
not help him. When asked to expand upon this, Bobby reported that when he asked for help, he was
sometimes told that he needed to do his own work.
Throughout the evaluation, particularly during tasks that required a significant amount of
expressive language, Bobby exhibited noticeable articulation problems. For example, Bobby
pronounced the word guitar as sar and the word picture as pixer. Nevertheless, Bobby readily
attended to the tasks presented but often asked for repetition and clarification at certain points
throughout the evaluation. He especially liked a bag filled with tangible reinforcers that was used by
the examiner throughout the evaluation. He was able to choose an item from this bag following the
completion of a few subtests and immediately preceding a break. He responded well to rewards for
working hard.
In general, Bobby persisted on items that he found difficult, needing only minimal encouragement
to continue. The present results appear to represent a reliable and valid estimate of Bobbys current
level of cognitive and academic functioning in the areas assessed.
ASSESSMENT FINDINGS

Academic Functioning
Bobby was administered a select set of tests from the Kaufman Test of Educational Achievement-
Second Edition (KTEA-II). Specifically, Bobby was administered a reading decoding test, a
prewriting and basic writing skills test, and a measure assessing basic math skills (e.g., counting,
number matching). Scores on these three tests yielded a Comprehensive Achievement Composite
(CAC) of 74 (69-79), which is ranked at the 4th percentile and is classified as Below Average. A
discussion of Bobbys performance within each academic domain follows.
Reading
Bobbys performance in the area of basic reading skills (e.g., decoding) was estimated to be Below
Average and is considered a Normative Weakness (KTEA-II Letter & Word Recognition standard
score = 79; 75-83, 8th percentile). His performance on this task is generally consistent with school
data (i.e., language arts eligibility letter), suggesting that Bobby requires assistance in this domain. In
addition to school records, Bobby himself reported, in the context of the present evaluation, that he is
unable to read.
An analysis of Bobbys performance on the KTEA-II reading task revealed difficulties with letter
recognition, sound-symbol relationships, and production of letter sounds. First, Bobby incorrectly
identified an E as a G, a G as an N, and a lowercase v as an E. When asked to listen to letter sounds,
Bobby identified the f sound as corresponding to the lowercase letter g and identified the short a
sound as corresponding to the printed letter m. Finally, when asked to produce the sound that various
letters make, instead, Bobby merely named the letters that were printed on the page. Given Bobbys
numerous errors with the letter identification and sound matching/production items, he did not reach
the items that required him to identify words. In addition to Bobbys letter identification and phonemic
awareness difficulties, he was unable to provide an accurate oral reproduction of the alphabet. For
example, on two occasions, Bobby spontaneously recited the alphabet, each time omitting the letters G
and J. In addition, he stated, CUB for TUV.
Writing
Similar to his performance in the area of basic reading, Bobbys basic writing skills were estimated
to be Below Average and are also considered a Normative Weakness as compared to same-age peers
from the general population (KTEA-II Written Expression standard score = 73; 64-82, 4th percentile).
When asked to write his name, Bobby only provided his first name and did not correctly form the
letters y or b. He was able to trace the letter S and reproduce letters from near-point copy, namely, L,
S, T, and W. Although he could write the letters A and H without a visual model, he was unable to write
the letters F and N.
Beyond the inconsistency in his knowledge of letters, an analysis of Bobbys writing suggests
difficulty with the mechanical aspects of writing, namely, spacing and letter formation. Even when
provided with line guides to form upper- and lowercase letters, Bobbys letters often extended beyond
these lines and were formed, in one instance, backward and upside down.
Mathematics
As with reading and writing, Bobbys performance in math was Below Average and also considered a
Normative Weakness (KTEA-II Math Concepts and Applications standard score = 76; 68-84, 5th
percentile). Bobby was able to demonstrate knowledge of some basic number concepts, such as
elementary counting skills, number identification, and concepts related to size (smallest, tallest).
Notwithstanding, Bobby had difficulty with counting skills that were paired with a motor requirement
(e.g., touch each item as you are counting it). He also had difficulty with simple concepts of
addition and subtraction when represented by verbal concepts (I have one candy bar and you have
two, how many do we have altogether?; If 1 person leaves this line of 4 people, how many people
will be left in this line?)
Listening Comprehension
Bobbys performance in the area of listening comprehension is classified as Below

Average/Normative Weakness (KTEA-II Listening Comprehension standard score = 78; 68-88, 7th
percentile). The listening comprehension task required Bobby to listen to a series of passages
presented on a CD and respond to either literal or inferential questions that were presented orally
following each passage.
Although Bobby appeared to actively attend to the stories as they were being presented, he had
difficulty with both literal and inferential questions and, quite often, appeared to rely on his own
knowledge base (rather than story content) to answer questions. For instance, if presented with a story
about a little boy who did not like to eat vegetables, and then asked which vegetable the boy would not
eat, Bobby might say broccoli because that is his least favorite vegetable. Thus, although Bobby
appeared to process the questions asked, he often answered the questions without reference to the
story.
Cognitive Performance
In response to the present referral, Bobby was administered the Kaufman Assessment Battery for
Children-Second Edition (KABC-II) and select tests from the Wechsler Intelligence Scale for
Children-Fourth Edition (WISC-IV). Ten scores from the KABC-II combine to yield a measure of
overall cognitive functioning, called the Fluid Crystallized Index (FCI). Bobby was not able to
complete all 10 subtests and, therefore, the FCI could not be calculated. Therefore, his current level of
cognitive functioning is best understood by examining the separate Scales of the KABC-II, including
Knowledge/Gc, Learning/Glr, Simultaneous/Gv, and Short-Term Memory/Gsm. Follow-up testing
within the domains of Gc, Gv, and Gf was conducted with the WISC-IV and is discussed in the
pertinent sections that follow. The subtests comprising the WISC-IV Processing Speed (Gs) Index
were administered to Bobby to obtain information regarding the efficiency with which he processes
information. Auditory Processing (Ga) was not assessed in the present evaluation because several
auditory processing measures were administered as part of Bobbys speech/language and
audiological evaluations. A description of Bobbys cognitive performance follows.
Crystallized Intelligence (Gc)

Crystallized Intelligence (Gc) refers to an individuals breadth and depth of knowledge including
verbal communication, general information, and reasoning with previously learned procedures.
Bobbys ability within this domain was assessed through tasks that required him to provide the names
for a series of pictured objects (KABC-II Expressive Vocabulary SS = 8) and to point to a picture of
(or provide a verbal label for) a concrete or abstract verbal concept that matched characteristics stated
by the examiner (KABC-II Riddles SS = 7). These subtests provide a measure of Bobbys vocabulary
knowledge (Expressive Vocabulary) as well as his general development and understanding of words
and sentences (not requiring reading), in spoken language (Riddles). The difference in these scores
was not statistically significant, indicating that Bobby performed similarly on both subtests. Overall,
Bobby obtained a Crystallized Intelligence (Gc) composite score of 88 (79-99), which is ranked at the
21st percentile and is classified as Average Range/Within Normal Limits.
Although Bobbys performance on the KABC-II verbal subtests was estimated to be within normal
limits, he was only required to give one-word responses. To determine how Bobby performs in the
Gc domain when required to give more lengthy responses either by answering questions or defining
words, the WISC-IV subtests that comprise the Verbal Comprehension Index (VCI) were administered.
On these tasks, Bobby was required to provide names for pictured objects or to provide definitions
for words stated by the examiner (WISC-IV Vocabulary SS = 7), answer questions based on his
understanding of general principles and social situations (WISC-IV Comprehension SS = 3), and
draw conceptual similarities between words (WISC-IV Similarities SS = 2). The difference between
his highest and lowest performances on these tests is statistically significant, rendering the VCI
nonunitary and noninterpretable. To better understand Bobbys functioning on the VCI subtests, his
scores were examined using XBA interpretive guidelines. Analysis of his scores within this
framework indicated that a Crystallized Intelligence (Gc) cluster can be formed based on the
aggregate of the Comprehension and Similarities subtest scores. Bobby earned a Gc cluster of 60,
which is ranked at the 1st percentile and is classified as Lower Extreme/Normative Weakness.
Conversely, Bobbys performance on the Vocabulary subtest was higher than his Gc cluster and
consistent with his KABC-II Gc performance. When Bobby was required to respond to verbal tasks
with one word (e.g., naming common objects), his performance was Within Normal Limits. However,
when asked to reason with words and general information and provide more lengthy responses,
Bobbys performance was significantly lower. Overall, Bobbys ability to reason with verbal
information is deficient.
Short-Term Memory (Gsm)

Short-Term Memory (Gsm) is the ability to hold information in immediate awareness and then use it
within a few seconds. Bobbys Gsm ability was assessed through tasks that required him to repeat a
series of numbers in the same order presented by the examiner (KABC-II Number Recall = 5) and
touch a series of silhouetted objects (printed on a card) in the same order as the examiner named them
(KABC-II Word Order = 3). These tasks primarily measure auditory memory span. Bobby obtained a
Gsm cluster of 66 (59-77), which is ranked at the 1st percentile, and is classified as Lower
Extreme/Normative Weakness.
Although Bobby reported that he perceived the numbers task as good, he was only able to recall
two-digit number sequences. His ability to recall three-digit sequences was inconsistent and he was
unable to recall any of the four-digit sequences. Errors involved omissions only. Numbers were
always recalled in the correct sequence (e.g., if the number string was 5-2-1-4, Bobby might say 2-
4). On the word task, Bobby repeated the stimuli as they were being said to him, and, upon being
shown the symbol card, quickly put one or more fingers simultaneously on all the stimuli that he
believed were named. Bobby even commented that it was hard to touch them all when the symbols
were not close together. Although Bobby was reminded not to touch the symbols at the same time but,
rather, to touch them in the order that he heard them, he continued to have difficulty. It was not until
this examiner demonstrated several times what touching symbols in order meant that he began to
touch the target symbols separately. Bobbys extreme difficulty with sequencing as demonstrated on
these Gsm tasks is consistent with information provided in his speech-language evaluation. Overall,
Bobby has a disorder in the basic psychological process of short-term memory. This process is
strongly related to performance in reading, spelling, writing, and the retention of oral directions.
Long-Term Retrieval (Glr)

Long-Term Retrieval (Glr) is defined as an individuals ability to store information efficiently and
retrieve it later through association. On tasks that measured Bobbys Glr, his performance was in the
Lower Extreme/Normative Weakness range when required to learn the word or concept associated
with a particular rebus (or drawing) and then read phrases and sentences composed of these rebuses
(KABC-II Rebus = 3) and in the Average Range when required to learn several nonsense names of
various shells, plants, and fish, and correctly point to each of them when each is presented among an
array of previously learned stimuli as well as unknown stimuli (KABC-II Atlantis = 9). The difference
between his performances on these tests is statistically significant rendering his overall Glr cluster
nonunitary and noninterpretable. To better assess and understand Bobbys functioning in this domain,
a second test involving rebuses was administered (i.e., WJ III Visual-Auditory Learning). On this task
Bobbys performance again fell in the Normative Weakness range (standard score = 70). Thus, a
cluster was formed based on the aggregate of KABC-II Rebus and WJ III Visual Auditory Learning
(i.e., 68; Lower Extreme/Normative Weakness).
On the Learning/Glr subtests, successful performance is a function of the efficiency with which
information is transferred, stored, and recalled, all of which is dependent upon (a) contextual
demands; (b) sequential demands; and (c) the nature and frequency of feedback provided. Bobbys
performance on Glr tasks was within normal limits only when the information to be retrieved was
limited and decontextualized (e.g., single- versus multiword response), when there was no demand to
sequence responses, and when immediate, corrective feedback was provided throughout the task.
Visual Processing (Gv)

Visual Processing (Gv) includes spatial orientation, the ability to analyze and synthesize visual
stimuli, and the ability to hold and manipulate mental images. Bobbys Gv ability was assessed
through a task that required him to view a set of pictures and identify the picture that did not belong
with the others (KABC-II Conceptual Thinking, SS = 8). In addition, Bobby was required to move a
toy dog to a bone on a checkerboard-like grid using the path involving the smallest number of moves
(KABC-II Rover = 7). Bobby obtained a Gv cluster of 87 (82-92), which is ranked at the 19th
percentile and is classified as Average Range/Within Normal Limits.
Fluid Reasoning (Gf)

Fluid Reasoning (Gf ) involves the ability to reason and solve problems that often include unfamiliar
information or procedures. Gf is generally manifested in the reorganization, transformation, and
extrapolation of information. Bobbys Gf ability was assessed through tasks that required him to
analyze the parts of an incomplete logic puzzle and complete the pattern by selecting the correct
stimulus from an array of options at the bottom of the page (KABC-II Pattern Reasoning = 5) and
view two or three rows of pictures and select one picture from each row to form a group with a
common characteristic (WISC-IV Picture Concepts = 4). Specifically, these tasks assessed Bobbys
ability to reason both inductively and deductively. Bobby obtained a Gf cluster score of 73 (68-78),
which is ranked at the 3rd percentile and is classified as Below Average/Normative Weakness.
Bobbys ability in this domain is deficient and therefore can negatively impact his performance on
academic tasks that require him to make inferences and draw conclusions, especially when such
abstractions must be derived from visual information (e.g., making predictions using visual stimuli
such as graphs and charts).
Processing Speed (Gs)

Processing Speed (Gs) involves the speed and efficiency in performing automatic or very simple
cognitive tasks. Bobbys Gs ability was assessed through tasks that required him to copy symbols that
were paired with simple geometric shapes using a key (WISC-IV Coding A = 7) and to search a group
of symbols and indicate, by marking a yes or no box, whether a target symbol is in the group
(WISC-IV Symbol Search = 2). These tasks primarily measured the speed at which Bobby could make
visual discriminations among symbols and perform tasks that are relatively easy or that require very
simple decisions.
On Coding, Bobby referenced the key for each item copied. He did not appear to retain any of the
symbol-shape pairings until the very end of the task, when he was observed to independently form
one symbol without reference to the key. On Symbol Search, Bobby indicated quickly that he could
not read. He was assured that there was no reading required other than marking either yes or no
for each item. Additionally, the yes and no boxes were identified for Bobby within sample and
practice item sessions. Notwithstanding, of 25 items attempted, Bobby inaccurately identified 12 items
as either present or absent from the search group. Although these inaccuracies appeared to stem
partly from scanning difficulties, it is conceivable that Bobby may have also had difficulty
remembering the written labels (i.e., yes and no). As a result of the statistically significant
difference between Bobbys Coding and Symbol Search Performances, the WISC-IV PSI is
nonunitary and noninterpretable. A third measure of Gs was therefore administered to better
understand Bobbys functioning in this domain. On the WISC-I V Cancellation subtest, which did not
require word recognition, Bobbys performance was within the Average Range and consistent with
his performance on Coding. Therefore, Bobbys broad Gs ability was based on the aggregate of his
performances on Coding and Cancellation. He earned a Gs cluster of 88, which is ranked at the 21st
percentile and is classified as Average Range/Within Normal Limits.
Figures 7.1 through 7.3 show the three tabs of the XBA DMIA on which data were entered for
Bobby. Figures 7.4 and 7.5 show two additional tabs of the XBA DMIA that include graphs of Bobbys
performances.
SUMMARY AND DATA INTEGRATION

Bobby was referred for a psychoeducational evaluation at the request of his parents. The measures
administered to respond to this referral indicated that Bobbys functioning ranged from Lower
Extreme/Normative Weakness to Average Range/Within Normal Limits across the various academic
and cognitive domains that were evaluated. A review of Bobbys prior evaluations and school records
revealed specific difficulties with basic reading, writing, math, and listening comprehension skills.
Based on a review of cognitive findings, Bobbys academic difficulties appear to be explained in
part by specific cognitive weaknesses in the areas of Auditory Processing (Ga; based on previous
evaluations), Short-Term Memory (Gsm), and Fluid Reasoning (Gf ) with both verbal and visual
information. The totality of Bobbys cognitive weaknesses helps to explain his difficulties in specific
academic domains. For example, Bobbys weaknesses in the Crystallized Intelligence (Gc) domain
inhibit his reading acquisition and development of listening comprehension.
In addition, his Gf difficulties affect detrimentally his reading, writing, and math skills. That is,
Bobbys reasoning difficulties constrain the degree to which he can draw inferences, make
predictions, or abstract information. As such, his weakness in this area limits his ability to
comprehend text beyond the literal level. Furthermore, thinking tasks that involve abstractions or that
require making predictions (e.g., thinking of a title for a story, solving word problems) will also
present difficulty for Bobby.
Another area that likely inhibits Bobbys academic performance is Gs. For example, Bobby is often
slow to complete tasks and needs an extensive amount of time to process incoming information,
particularly because information needs to be repeated several times before it is used in the manner
intended.
While Gsm and Gf performances were consistent and represented Normative Weaknesses, Bobbys
performance on Glr, Gc, and Gs tasks was variable. First, Bobbys retrieval ability ranged from
Lower Extreme to Average Range and suggested that his ability to remember and retrieve
information is partly linked to the contextual and sequencing demands of a task and the provision of
immediate, corrective feedback. In the present evaluation, Bobby performed better when required to
retrieve limited verbal information (single names) versus connected stimuli (sentences), when there
were minimal to no constraints on the sequence of the stimuli retrieved, and when he received
corrective feedback consistently and immediately upon committing an error. While Bobby
demonstrated the ability to process and retrieve brief instructions, as instructions became more
lengthy or were presented only once, and via one modality (orally), his ability to retain information
deteriorated. Additionally, Bobbys generally slow processing speed constrains his ability to process
instructions initially and in their totality.
In regard to retrieval ability, it is important to realize that Bobbys ability to retrieve information is
directly related to his ability to initially encode information. As such, Bobby needs to be given
sufficient time to encode information and needs to be offered strategies to make the information to be
encoded salient (e.g., through repeated rehearsal of information, use of mnemonic devices, direct
teaching regarding strategies that make stimuli more meaningful, experiential learning).
Despite Bobbys limited or variable performances within specific cognitive domains, his visual
processing was within the Average Range. This finding suggests that the visual modality may be a
useful adjunct to orally presented material. Although there is a history of visual difficulties and
Bobbys eyeglass prescription has changed recently, it is this examiner s understanding that Bobby
should be able to deal with near-point visual information normally and requires only minimal
accommodation (e.g., preferential seating) to ensure that he can fully perceive distant visual stimuli
(e.g., writing on a chalkboard).
Diagnostic Impressions
Bobby meets criteria for Specific Learning Disability. He has a constellation of Below Average
academic aptitudes (i.e., memory, reasoning with verbal information, auditory processing) that are
consistent with his academic skill deficits. See Rapid Reference 7.1. Because Bobby also has a Speech
Impairment, he meets criteria for the educationally handicapping condition of Multiply Disabled in
New York State.
Based on the findings of this evaluation, it seems clear that the regular education interventions
implemented and proposed by Bobbys school thus far are useful, but are insufficient to adequately
address his underlying cognitive deficits. A review of these data coupled with data from previous
evaluations suggests the need for remediation, interventions, and accommodations that are outside the
regular education environment.
Rapid Reference 7.1

SLD Assistant

Although the SLD Assistant yielded a g-value of .83 (see Figure 7.6), indicating that his
underachievement does not in fact occur within an otherwise normal ability profile, his cognitive
performances were clearly attenuated by a comorbid condition, namely Speech Impairment. As
such, a diagnosis of Specific Learning Disability is warranted.

RECOMMENDATIONS

A primary goal of cognitive assessment is to generate information that may be utilized to develop
effective interventions. Irrespective of the types of interventions developed, routine monitoring of
their effectiveness must play an integral role in whatever recommendations are developed to address
the observed academic difficulties and concerns. The integration of data from standardized tests,
work samples, evaluation of prior assessment results, and parent and teacher reports provided the
basis for the following recommendations for Bobby.
Vocabulary Development
Extend Bobbys language in the context of routine conversations (e.g., if Bobby comments that
a question was really easy, one could say, yes, that one was really simple). Providing
Bobby with semantically similar words in a naturalistic manner will aid in expanding his
vocabulary.
Increase Bobbys exposure to more age- and grade-appropriate vocabulary. Although Bobbys
reading skills appear to still be at the level of letter identification, he can be exposed to grade-
appropriate books by utilizing a books on tape method. It is also important to allow Bobby
to choose high-interest books to increase his likelihood of attending for substantial lengths of
time. Listening to such books may serve to improve Bobbys receptive vocabulary while
controlling for any issues Bobby may have decoding text above his instructional reading
level. At the same time, exposure to words would be couched within a contextual format,
which may provide Bobby cues as to the meaning of newly encountered words.

General Language
Consider recommendations outlined in speech/language and audiological evaluations.

Teach and reinforce positional (e.g., first/last), directional (e.g., over/under), and temporal
(e.g., before/after) concepts.
Within the classroom, Bobbys teachers should be aware that his initial responses might not
always be reflective of his knowledge of a topic. He may need prompting to provide an
answer; he may need encouragement to expand upon an answer; or he may need additional
context to help him organize and retrieve his thoughts (e.g., if asking about how someone felt
in a story, rather than ask, how did Sally feel, provide additional context, after Sallys
friends left her in the park, how did she feel?).

Long-Term Retrieval
Pair new concepts or information to be learned with meaningful stimuli or overlearned

material. This meaningful pairing and linking of new knowledge to previously learned
material might facilitate the degree to which Bobby initially encodes, and subsequently
retrieves, information.
Provide Bobby with sufficient time to demonstrate his knowledge of learned information. This
can likely be accomplished with the provision of extended time on tests and completion of
classroom tasks.
Consider using mnemonic devices to assist Bobby in the encoding and retrieval of information
(e.g., using the acronym HOMES to facilitate Bobbys ability to remember and retrieve the
names of the Great Lakes). Encourage Bobby to participate in the creation of such mnemonic
devices.
Before introducing new stories or reading material, try to access prior knowledge of the topic
and include new vocabulary (e.g., Have you ever been to the zoo Bobby? Well there is an
animal that we are going to read about, called a lemur, that kind of acts like a monkey). Such
activities will help Bobby associate the information from his reading with prior experiences
and known information.

Memory
Teach Bobby a variety of strategies (e.g., chunking, use of mnemonic devices, visualization) to
increase the likelihood of him remembering specific information.
Consider extended time on tests and classroom tasks.
Attempt to break down information that Bobby is required to process into smaller pieces or
present the information following a sequential structure (e.g., first, next, finally), which would
allow him to more readily identify and focus on smaller units of information at a time.
Use game-like formats (e.g., classic memory games such as Concentration ) to work on
improving memory skills and to point out strategies for remembering information such as
specific visual features (e.g., color) or verbal mediation (e.g., this silly hat, goes with this
silly hat).
Allow Bobby wait time when requesting a response (e.g., ask a question and direct Bobby to
think about it before coming back to him for an answer). Alternatively, build in wait time by
recognizing when he is ready to respond before calling on him.

Writing
Allow Bobby to focus on one aspect of writing at a time (e.g., organizing thoughts, developing
content, focusing on basic mechanics).
Ensure that feedback provided to Bobby is direct and concrete.
Allow Bobby the opportunity to check his own work.
To work on fine motor skills, have Bobby engage in visual-motor (e.g., tracing) and motor
planning activities (e.g., mazes).
Have Bobby write words using different forms of media (e.g., finger paints, a tray of salt or
sand, clay) to add a tactile dimension to his visual and auditory perception of words and
letters.

Reading
To work on Bobbys letter identification and formation skills, consider using a text involving a
multisensory approach such as Alphabet Art: With A to Z Animal Art and Fingerplays, by Judy
Press, published by Vanwell Publishing Co., Ltd. This book, which is appropriate for children
ages 2 through 6, works on building alphabet knowledge through a hands-on approach.
Activities include ABC letter hunts, fingerplays, reading rebus picture sentences, and creating
make-and-touch letters (e.g., bumpy bs, fluffy fs, quilted qs, and zig-zag zs).
Consider utilizing books on tape for selective reading assignments (e.g., have a teacher or
parent tape record a class book for Bobby). This technology can provide Bobby with the
opportunity to process information in multiple modalities as well as provide him with a model
to follow when attempting to decode words and build a basic sight-word vocabulary.
Ensure that Bobby has sufficient time to review words contained within class writing/lecture
materials. Encourage him to review a glossary of terms with an adult prior to reading (or
listening to stories) to increase comprehension.
Read to Bobby at home. Encourage him to use story readers or books on tape in this
environment as well.
Read preprimer books to Bobby that use repetitive patterns and visual cues so that he can try to
memorize them and read along.
Consider using an oral cloze type procedure when reading to offer Bobby opportunities for
success (e.g., the cat in the __________; the itsy bitsy spider climbed up the spout).
When reading to Bobby, ask periodic questions throughout the story, especially those relating
to order (e.g., what came first, what happened before this, what happened at the end?)

Math
Encourage Bobby to utilize large graph paper for tasks and tests involving mathematical
computations.
Considering providing Bobby with a list of key math terms to assist him in the completion of
math word problems. This list should be constructed so as to contain operational symbols and
various words that denote the use of those symbols. Although Bobby may be unable to
independently read the words, he might be able to visually match them and then reference
which operation is to be used.
Highlight important instructions and/or operational symbols (e.g., simplify answers, circle
final response, show all work) on Bobbys math tests.
For newly introduced conceptual or computational tasks, assist Bobby with the first few items
on a task, when necessary, and provide him with a permanent model to follow.
Allow Bobby to use manipulatives for simple counting tasks and encourage a count and move
system (e.g., count a block, then move it to the side).
Use songs and hands-on activities to teach basic math concepts (e.g., days of week, names of
shapes).

Miscellaneous
In the context of the present evaluation, Bobby demonstrated the ability to expand upon his
initial responses with minimal encouragement. As such, when Bobby provides a partially
complete written or oral response during academic tasks, encourage him to expand upon his
initial response prior to providing corrected or teacher-directed responses.
Continue to perform checks for understanding as Bobby is completing academic tasks to ensure
that he has understood, and is continuing to follow, initial task instructions.
If television viewing is allowable in the home, try to incorporate shows that focus on early
literacy skills (e.g., Sesame Street, Read Between the Lions ).
Check for Bobbys understanding of task requirements by asking him to repeat or paraphrase
the instructions required to complete a task. Gradually reduce this repetition/paraphrase
requirement and simply ask Bobby if he understands the task following the presentation of
oral/written directions. An ultimate goal would be to encourage Bobby to independently, and
directly, request assistance with a task.

Jennifer T. Mascolo, PsyD

Licensed Psychologist

Figure 7.1 Data Entry on WISC-IV Tab of XBA DMIA

Figure 7.2 Data Entry on KABC-II Tab of XBA DMIA

Figure 7.3 Data Entry on CHC Tab of XBA DMIA

Figure 7.4 Graph of Bobbys KABC-II and KTEA-II Data from the XBA DMIA

Figure 7.5 Graph of Bobbys Cross-Battery Data and KTEA-II Data from the XBA DMIA

Figure 7.6 g-value for Bobby from the SLD Assistant

Case Study 2
Name: Juan Pablo

Birthdate: 12/01/1992
Grade: 7
Ethnicity: Hispanic/Ecuadorian
Evaluation Date(s): 01/07-02/07
Report Date: 02/01/2007
Chronological Age: 14-2
School: Greenpoint Elementary
Language of Instruction: English
Native Language: Spanish

PSYCHOLOGICAL ASSESSMENT-FOR-INTERVENTION REPORT

An effective psychological assessment report accomplishes four things: (a) it identifies and describes
the significant elements in the individuals environment and experience that may be related to the
reported difficulties; (b) it assesses and describes the present status of the individuals functioning in
those areas suspected to be problematic; (c) it offers an opinion regarding the possible reasons for
the individuals reported difficulties; and (d) it links assessment results with specific strategies for
intervention and remediation.
In order to accomplish these goals and provide information that is helpful in understanding the
nature and reasons of an individuals current functioning, this psychological assessment report uses a
narrative, plain English approach that minimizes irrelevant information and technical terms and
language. The plain language format of this report assists in providing information in such a way that
it may be useful to anyone who may be involved in making decisions regarding intervention by
avoiding needless and confusing psychological jargon. In addition, attached at the end of this report is
a graphical summary of test results that facilitates comprehension of their meaning or significance.
REASON AND PURPOSE OF ASSESSMENT

Juan was referred for an evaluation by his mother, Ms. Pablo, who was concerned about his poor
academic performance and emotional well-being. According to his mother, and despite repeated
attempts at intervention and remediation, Juan has a history of poor grades and has been required to
attend summer school on several occasions in order to be promoted from grade to grade. She also
states that Juan appears to have a difficult time studying and trouble following the class content and
subject matter. Juan has been given direct instruction in both small groups and individually, but has
not demonstrated significant improvement. In addition, Juans feelings and emotions at home worry
Ms. Pablo greatly. For example, she indicated that he prefers to be left alone, yet, he frequently
confronts other family members and tends to react in an angry manner that is inappropriate to the
situation.
Additional concerns were raised by Juans therapist, Ms. Agarosa, regarding his academic
performance. She states that Juan started receiving counseling about 3 months ago to address the
emotional issues described previously. She also reported that at times it is very difficult to understand
his speech. She feels that the way he speaks may be negatively affecting his social interactions. Ms.
Agarosa also indicated that Juans mother is concerned about the possibility that he might become a
high school drop-out if the situation continues without resolution.
Juans academic and emotional difficulties have made Juans mother and his therapist wonder if
perhaps the underlying cause of his problems might be due to some kind of learning disability.
Therefore, this assessment was conducted specifically to evaluate the nature of Juans low
achievement and emotional functioning and determine whether he might have a disorder in a
particular cognitive ability or process. Results from this assessment are to be used to guide the
decision-making process in developing recommendations and intervention strategies, as may be
necessary and appropriate in this case.
DESCRIPTION OF PROCEDURES

This assessment was conducted in a systematic manner by first collecting information from multiple
sources such as a review of records, interviews, actual work samples, general health screening
results, and informal testing. This information helped in finding out whether there are any
environmental, instructional, or experiential factors present that could be the cause of the reported
difficulties. If it can be reliably determined that there do not appear to be any such external reasons
primarily responsible for the reported difficulties, additional procedures are then employed to
evaluate the possibility that some type of internal dysfunction is present. These methods include
focused formal testing, such as scales, questionnaires, or standardized tests and batteries, and are
given to generate additional information with which to assess those abilities and processes that may
be the primary cause of the reported difficulties. Overall, this process helps generate specific and
relevant information while avoiding needless, invasive, and redundant testing.
STATEMENT OF VALIDITY OF ASSESSMENT RESULTS

The ecological methods and procedures used in the course of this assessment are specifically
intended to enhance the meaning of patterns seen in the data as well as reduce potential bias and
discrimination inherent in the interpretation of the meaning of any single test score or combination
of scores. In general, the following steps were used in order to increase the validity of the findings:
(a) testing was conducted in English with consideration regarding any exposure or experience with a
second language; (b) when using norm-referenced measures, those with the most appropriate norms
were selected; (c) tests that focus on assessing the specific constructsabilities and processesin
question were utilized rather than those that provide only broad general information; (d) less
culturally and linguistically biased assessment methods were used (e.g., evaluation of work samples);
(e) results were interpreted within the context of Juans unique cultural and linguistic background; and
(f ) conclusions were based on multiple sources of information and not any single score or
procedure.
In this case, the area of most concern with respect to bias involves the use of norm-referenced tests
because there is concern as to whether such tests or assessment tools are fair and equitable or have
norms that are adequately representative of Juans Spanish/English linguistic background and
Hispanic/Ecuadorian cultural heritage and experience. As such, the validity of strict interpretations
made on the basis of such test results cannot be considered to be conclusive or completely reliable
estimates of Juans actual or true functioning. Valid interpretations of results from norm-referenced
cognitive ability/processing tests used in this evaluation were achieved in three ways: (1) through the
collection and use of information about how other children like Juan typically perform on such tests;
(2) through use of the Culture-Language Interpretive Matrix, where appropriate; and (3) through use
of tests that display the smallest average differences for culturally and linguistically diverse
individuals, notably the KABC-II. In this manner, the conclusions and opinions offered regarding
Juans functioning and described in this report are believed to be as valid and as nondiscriminatory as
possible.
EVALUATION OF INFLUENCES ON LEARNING

Careful examination of such things as cultural/linguistic difference, environmental or economic
disadvantage, level of acculturation, effectiveness and appropriateness of instruction, and educational
experiences indicates that Juans reported difficulties can be at least partially attributed to one or more
of these factors, particularly the fact that Juan is not a native English speaker and that there are two
languages spoken in the household. In addition, there appears to be a significant family history of
conflict that also seems to be playing a role in Juans reported difficulties.
During an interview, Juans mother indicated that Juan, since his birth, learned Spanish at home as
his native language. Moreover, Ms. Pablo indicated that although Spanish is spoken most frequently in
the home, Juan refuses to speak in Spanish and instead insists on answering back in English only. It
seems reasonable that Juans current difficulties in school may be due partly to his dual-language
exposure. Exposure to two languages can change significantly the trajectory and pattern of an
individuals language development in each. This happens because of the splitting of time and
experience with both languages in the early years causes a very slight delay in the acquisition of both.
If, however, both languages continue to be developed at age-appropriate levels, for example through
formal instruction or sufficient language modeling by parents, academic development and
achievement has a good chance of becoming age and grade appropriate. If, however, development in
one or both languages is interrupted, for example he is instructed in English only and his parents are
unable to maintain academic development in Spanish at home, then a significant delay will develop
one that cannot be overcome through general school instruction. Unfortunately, it appears that Juans
school may not fully understand these principles of dual-language acquisition and have apparently
paid no attention to and provided no instruction in his native language. Instead, Juan is immersed in
English instruction that, although promotes easy acquisition of conversational skills in English,
creates a learning gap between him and his monolingual English-speaking peers. The gap is severe
and the ability to communicate in English that Juan will acquire is simply not enough to foster the
age- or grade-appropriate proficiency necessary to be competitive in school. Therefore, it seems
reasonable that Juans current difficulties in the school classroom may be, to some degree, due to this
linguistic and cultural difference. It is well known that the cognitive, linguistic, and academic
developmental patterns of English learners vary considerably from that of native English speakers
and lead to patterns of achievement that are also consistently lowernot because of lack of ability but
because of inappropriate instruction.
Other familial factors also appear to have a role in the difficulties reported by Juans mother and
therapist. Ms. Pablo reports that Juans father and she are divorced and that Mr. Pablo, Juans father, is
back in his country of origin, Ecuador. Ms. Pablo reported that Juans father was an alcoholic and was
both physically and emotionally abusive to her throughout the marriage. She reports that Juan was a
witness to the abuse on several occasions, including one in which the family had to leave the house
for their safety. Ms. Pablo divorced Mr. Pablo in 2000. Juan had inconsistent contact with his father
until 2005, when his father left the country to return to Ecuador. Ms. Pablo stated that she remarried
about 2 years ago. She indicated that Juan has a fine relationship with his stepfather, but they do not
interact very much. However, she reports that Juan has a poor relationship with his older sister s
boyfriend, who lives in the house with them. These family circumstances appear to be having a major
impact on Juans emotional well-being and behavior both within the school and at home. Stressful
events such as a history of abuse, divorce, and new family members may well be affecting Juans
feelings and motivation to perform well in school in an adverse manner.
EVALUATION OF HEALTH AND DEVELOPMENTAL FACTORS

According to the information provided by Juans mother, we know that Juan has glasses that are
necessary to correct his eyesight but that his vision and hearing are within the normal limits for
success in all daily life activities, including learning in school. There is no history of pregnancy
complications and labor and delivery were normal. Juan appears to have met his developmental
milestones (walking, talking, toilet training, etc.) normally and without any delays. Juans mother
indicated that Juan does have a heart murmur that is being monitored by their family physician.
However, this condition does not appear to adversely affect any aspect of his educational
performance. In general, there do not appear to be any health or developmental factors related to or
sufficient enough to account for the problems that have been reported in this case.
OBSERVATION OF CURRENT BEHAVIOR AND PERFORMANCE

Evaluation of Juans behavior and performance during formal and informal interactions over the
course of this assessment provided support for some of the difficulties reported by his mother and
therapist. It seems clear that Juans inability to do better in school is at least partially related to his
personal preferences and a lack of motivation. Juan was observed during several reading, writing,
and math tasks, on which he appeared to work diligently but at a somewhat slow pace. However, once
the task became more challenging, he quickly gave up and appeared disinterested. Initially, this pattern
of behavior was particularly apparent on math tasks, which appeared to be more difficult for him than
other subject areas. When asked about it, Juan reported that he just preferred to stop responding when
he no longer understood the questions well. When asked how he was able to jump from a math grade
of 65 on his third reporting period to a grade of 88 in the fourth reporting period, he indicated that he
really tried to improve his grade because he did not want to fail or have to attend summer school. He
reported that, basically, he completed all of his home- and class-work and studied more.
Consequently, it appears that Juan is able to succeed in school, even in areas that are difficult for him,
when he is sufficiently motivated. Despite his attempts to improve his grades in language arts, he
reports that he has not been as successful as he has been in math and finds reading and writing to be
more difficult subjects than math in general.
With respect to social interactions, Juan appeared to be rather timid and quiet. His speech was very
low in volume and at times a bit difficult to comprehend. He also demonstrated inconsistent eye
contact. On several occasions it was necessary to ask him to repeat himself in order to be clear about
what he was saying. This observation is consistent with what his therapist, Ms. Agarosa, reported in
that Juan appears to mumble at times making his words very difficult to understand. Juan was very
reluctant to share any personal information about himself. He tended to use short answers and to
volunteer little else beyond what was asked of him. He reported that he has no problems either at
school or at home, but was willing to find ways to improve his grades except that he indicated that he
does not want to stay after school for extra help. Not surprisingly, Juan asked to be spoken to in
English rather than in Spanish. However, he was clearly able to understand Spanish conversations that
he overheard during the assessment period, for example between this examiner and his mother, and
would interject appropriately at times.
In general, Juan put forth reasonable effort during testing and displayed motivation that was
comparable to other students at his grade level. It was clear, however, that he struggled more with
tasks that assessed reading and writing skills. He tended to work very deliberately and his slow-paced
manner appeared to be mostly a function of preference and lack of interest and motivation rather than
poor comprehension or a lack of understanding. Juans approach to and strategies used in completing
tasks were generally comparable to that of other students and his basic understanding of concepts that
were being measured was similar to that of other children in the same grade. When adequately
motivated and confident in his understanding of a given task, Juan was able to produce good results
from his efforts with no more prompting or querying than that required for any other examinee.
EVALUATION OF ACADEMIC ACHIEVEMENT

Because the KABC-II was selected as the primary cognitive instrument for this assessment (as a
function of Juans cultural and linguistic background), the KTEA-II was chosen as the most
appropriate test to evaluate Juans academic achievement, particularly because it is conormed with the
KABC-II. Examination of the results from a CHC Cross-Battery perspective revealed that valid
clusters were formed for all composites measuredthat is, none of the tests that comprise the clusters
were significantly discrepant from each other. The Comprehensive Achievement Composite, a
measure of Juans general academic functioning, fell in the lower part of the Average range (SS = 86;
17th percentile). Although his overall performance was within normal limits, certain academic skill
areas were below average and fell within the Normative Weakness range. For example, although
Juans Math Composite (SS = 96; 40th percentile) was in the Average range, his Reading Composite
(SS = 84; 14th percentile), Written Language Composite (SS = 80; 9th percentile), and Oral Language
Composite (SS = 84; 14th percentile) were all below average and in the Normative Weakness range.
Ordinarily this would suggest that Juan demonstrates intact abilities in math with comparatively
deficient abilities in language arts. However, as noted previously, Juan is an English learner and his
educational programming along with his cultural and linguistic background are likely to be the
reasons why his performance appears to be very low. Tests of reading, writing, and language are by
nature loaded with cultural content and demand language development that is age- and grade-
appropriate. For example, a review of the obtained scores indicates that his lowest performance
occurred on tests that measure spelling, oral expression, and phonological awareness. These skills
rely heavily on early home and school experiences with the language. Thus, these findings are not
believed to be an indication of a disability so much as they are a reflection of Juans linguistic and
cultural differences and, to some extent, his motivational state.
A graph that summarizes and compares these results against the performance of other children of
the same age or grade is attached at the end of this report (see Figure 7.7).
EVALUATION OF COGNITIVE PROCESSES AND INTELLECTUAL
FUNCTIONING

Because the KABC-II does not measure or provide adequate representation of all seven major
cognitive abilities/processes specified by CHC theory, it was supplemented by use of the WJ III using
the CHC Cross-Battery Assessment approach. Thus, apart from the subtests on the KABC-II that are
appropriate for Juans age, four additional tests from the WJ III were administered, two (Sound
Blending and Auditory Attention) were necessary to measure Juans Auditory Processing (Ga), and
two (Decision Speed and Visual Matching) were necessary to measure Juans Processing Speed (Gs).
Examination of the results from a CHC Cross-Battery perspective revealed that valid norm-based
global ability scores were formed. That is, none of the tests that comprise the General Ability clusters
were significantly different from each other. The Fluid Crystallized Index (FCI), the broadest measure
of Juans general intellectual functioning, fell just below average and in the Normative Weakness
range (SS = 84; 14th percentile). However, Juans Mental Processing Index (MPI), a broad measure of
intellectual functioning that is less culturally and linguistically loaded than the FCI, fell within the
Average range (SS = 91; 27th percentile). This suggests that Juans true general cognitive ability is
underestimated by the FCI due to cultural and linguistic factors.
Evaluation of Juans psychological abilities/processes again revealed unitary norm-based clusters
in all areas except one, which enables interpretation without the need for supplemental testing. Of the
abilities/processes measured, Short-Term Memory (SS = 87; 19th percentile), Fluid Reasoning (SS =
87; 19th percentile), Long-Term Retrieval (SS = 98; 45th percentile), Processing Speed (SS = 90 ;
25th percentile), and Visual Processing (SS = 90 ; 25th percentile) were all within the Average range
and are evidence of intact functioning in this area. Conversely, his functioning in the areas of
Crystallized Ability (Gc; SS = 78; 8th percentile) and Auditory Processing (Gs; SS = 83; 13th
percentile), however, were below average and in the Normative Weakness range. These results
indicate that Juans overall functioning in most broad ability/processing domains is intact.
Graphs that summarize and compare these results against the performance of other children of the
same age or grade are attached at the end of this report. (See Figures 7.7 and 7.8.)
EVALUATION OF BEHAVIOR, SOCIAL, AND EMOTIONAL FUNCTIONING

Aspects of Juans personality and emotional functioning were formally assessed using projective
techniques, behavior rating scales, observation, and interview. Based on these measures and by his
own report, Juan appears to be well adjusted in most areas; however, he does appear to be
experiencing a lack of motivation and some emotional distress, which are likely attributable to his
difficulties in school as well as possibly the family history of abuse and the family changes he has
experienced. According to Juans mother, he appears to be withdrawn and sad. Ms. Pablo also
reported that Juan demonstrates conduct problems and some aggression, as well as poor social skills.
Ms. Pablo indicated that Juan has frequent unresolved fights with several family members and tends to
respond angrily when frustrated. Moreover, his mother indicated that Juan tends to keep to himself
and sometimes shows a lack of care in his personal appearance. This is consistent with his therapists
report that Juan fails to communicate his feelings, has a difficult time relating with family members
and peers, and had a hard time establishing rapport with her. Ms. Agarosa added that, since starting
therapy, Juan has shown progress in some areas, particularly personal care.
Juans pattern of responses on self-report questionnaires indicated that he is not experiencing any
significant difficulties with interpersonal relationships, self-reliance, or overall personal adjustment.
However, the responses provided by Juans mother and therapist indicate that he is displaying
significant difficulties in several aspects of behavior and social-emotional functioning, particularly
family relations. Moreover, their responses indicate that he lacks any motivation to change the current
situation. Although quite high in some cases, it is likely that the ratings provided by Juans mother
reflect more of an immediate concern and sense of urgency rather than any serious dysfunction.
A table of scores and a graph that summarizes and compares these results against the behavior and
functioning of other children of the same age is attached at the end of this report (see Figure 7.9).
OPINIONS AND IMPRESSIONS

The basic question addressed by this evaluation is whether a disability exists that can account for the
reported or observed problems. Broadly speaking, identification of a disability involves two main
components: (a) verifying the presence of a psychological or physical deficiency; and (b) evidence of
substantial impairment in activities of daily life functioning, such as learning.
From the context of Juans unique experiential background, including cultural and linguistic
factors, nondiscriminatory analysis of the patterns seen in all of the data collected during this
assessment suggest that Juan does not have a learning disability. The majority of information
collected during the course of this assessment appears to suggest that Juans potential for school
success is within the Average range and that he is capable of performing academically in accordance
with what would be expected given what is known about his development, education, cultural, and
linguistic background. Moreover, Juan demonstrated that, when adequately motivated, he is able to
perform at grade level. It would seem that Juans current learning difficulties can best be ascribed to
the circumstantial limitations imposed by being a second-language learner and life experiences that
have negatively affected his motivational level rather than to the presence of any disability.
Cognitive and academic skills frequently lag behind grade- and age-level expectations in English
learners not because they are disabled but because they began the learning process on a different
trajectory and at a different point in time compared to their monolingual English-speaking peers.
Even though Juan has difficulties with certain tasks, his performance on tasks that were less culturally
and linguistically loaded (e.g., Gv, Gs) tended to be those on which he did best. This conclusion is
supported by the analysis of Juans cognitive scores using the Culture-Language Interpretive Matrix
(see Figures 7.10 and 7.11), which shows a rather clear decline in performance as a function of level
of acculturation and English-language proficiency. Unlike Juans monolingual English-speaking
peers, Juans exposure to two languages changes the trajectory and pattern of his language
development in each. For example, the fact that Juan tends to work slowly at times is probably more a
function of the fact that he may need to do some internal translation between languages and thus he
uses more time to process information than would otherwise be expected. When compounded by an
instructional program that does not facilitate development of both languages, Juan, like many other
English learners, will be at a disadvantage in school as well as on any task on which he is compared
to native English speakers.
It should also be noted that, because he is an English learner, some of Juans current academic
difficulties may be the result of a lack of full comprehension in the classroom, which limits his
understanding in ways that appear to fall below the expectations of his teachers and school, who
believe that he should be able to catch up to his native English-speaking peerssomething that is not
actually possible. Although Juans current behavior and performance in school are most likely
explained by second-language issues, it is clear that his current emotional functioning and
motivational state is also contributing to his reported problems in academics and behavior.
Following is a description of formal diagnoses9 that are applicable in this case.
Axis I: V62.4 Acculturation Problem and V61.20 Parent-Child Relational Problem

Axis II: None
Axis III: None

RECOMMENDATIONS FOR INTERVENTION AND REMEDIATION

Juans reported academic difficulties appear to be due to a combination of cultural/linguistic factors
as well as some emotional distress and motivational problems resulting from frustration and
problems in the home. Consequently, these problems are being manifested as feelings of anger,
withdrawal, and lack of interest, which, in turn, affect his ability to perform better in school. Although
Juan does not appear to have any type of disability, it is clear, given the findings presented previously,
that modifications to his learning environment are necessary in order to improve his academic
success and provide an opportunity for greater success in the classroom. Juan has intact
abilities/processes in all areas; however, it will be important to recognize that his performance is
likely to be weaker in areas that rely more heavily on cultural knowledge and English language
development (e.g., Gc and Ga). These factors should be carefully considered and addressed in the
development of ideas for modifying Juans instructional program. However, the nature of
interventions that will likely prove to be of the most benefit to his learning will be those that deal
directly with his current feelings, emotional functioning, and motivational state.
Following are some suggestions for discussion and consideration in the planning of an appropriate
educational program for Juan.
Juan should continue in counseling to address his feelings regarding his father and present
family situation. At some point, additional family therapy is recommended to deal with
relational problems and a history of conflict. In addition, it is recommended that counseling
address his lack of motivation and work with him to establish clear goals and expectations.
Juans teachers should find ways to better motivate himfor example, by providing him with
extra computer time or a homework pass when he reaches an agreed-upon goal.
Teachers should provide additional explanations and instructions of new concepts and ideas,
including a breakdown of steps and procedures into small units so that he may understand the
significance and importance of each unit more readily. Increasing the comprehensibility of
instruction will be vital to helping Juan succeed in school.
At home and at school, Juan should be provided with more structured time and a dedicated
place to complete schoolwork and study. It is recommended that positive feedback be provided
for completing schoolwork in a timely basis.
A speech screening is recommended to determine whether Juans articulation is unusual or
problematic.

E. Surri
Certified School Psychologist

Figure 7.7 Graph of Juans KABC-II Cognitive Scores and KTEA-II Achievement Clusters from
the XBA DMIA

Figure 7.8 Graph of Juans Supplemental WJ III Cognitive Scores from the XBA DMIA

Figure 7.9 Results of Behavioral, Socio-Emotional Functioning for Juan

Figure 7.10 Culture-Language Interpretive Matrix for Juans KABC-II and WJ III XBA Scores
from C-LIM DMIA

Figure 7.11 Graph of Juans Cognitive Scores from the C-LIM

APPENDIX A

The Cattell-Horn-Carroll (CHC) Theory of Cognitive Abilities

CHC THEORY DEFINED

The Cattell-Horn-Carroll theory represents the most recent integration of the original and widely
influential works of Raymond Cattell, John Horn, and John Carroll. In comparison to other well-
known theories of intelligence and cognitive abilities, CHC theory is the most comprehensive and
empirically supported psychometric theory of the structure of cognitive and academic abilities
(McGrew, 2005). Given its impressive level of empirical support and contemporary representation of
cognitive abilities, it is our contention that CHC theory should serve as a foundation for the selection
and interpretation of both intelligence and achievement batteries.
Although CHC theory has been the guiding framework for cognitive test development in the new
millennium (WJ III, KABC-II, SB5, DAS-II) and used recently as the foundation for selecting,
organizing, and interpreting intelligence batteries (e.g., Flanagan & Kaufman, 2004; Flanagan,
McGrew, & Ortiz, 2000; Flanagan & Ortiz, 2001; Kaufman & Kaufman, 2004), it has been used only
recently to inform decisions regarding how academic abilities should be defined and measured (i.e.,
Flanagan, Ortiz, Alfonso, & Mascolo, 2002, 2006). In order to implement a CHC-based approach to
assessing and interpreting both cognitive and academic abilities, it is necessary to understand how the
theory evolved as well as the major components of the theory.
A BRIEF HISTORICAL PERSPECTIVE OF FLUID-CRYSTALLIZED (Gf-Gc)
THEORY

In 1941, Cattell postulated a dichotomous fluid-crystallized (or Gf-Gc ) theory of cognitive abilities.
Fluid intelligence (Gf ) encompassed inductive and deductive reasoning abilities that were thought to
be influenced by both biological and neurological factors and incidental learning through interaction
with the environment (Taylor, 1994). Crystallized intelligence (Gc) consisted primarily of abilities
(especially acquired knowledge) that were thought to reflect the influences of acculturation (namely,
verbal-conceptual knowledge; Gustafsson, 1984; Taylor, 1994). Thus, the original Gf-Gc theory was a
dichotomous conceptualization of human cognitive ability, but quite distinct from the prevailing
verbal-performance dichotomy that was ushered in by the Wechsler scales and that remains in use to
the present day. Although Gf-Gc theory has not been conceived of as a dichotomy since the 1960s
(Gustafsson & Undheim, 1996; Horn & Noll, 1997; Woodcock, 1993), the Gf-Gc label was retained as
the acronym for this theory until just recently. As a result, Gf-Gc theory was often misunderstood to
be a two-factor model, rather than multiple-factor model, of the structure of abilities.
In the mid-1960s, Horn (1965) expanded the Gf-Gc model to include four additional abilities,
including visual perception or processing (Gv ), short-term memory (Short-Term Acquisition and
RetrievalSAR or Gsm), long-term storage and retrieval (Tertiary Storage and RetrievalTSR or
Glr), and speed of processing (Gs). By 1968, additional analyses led Horn to add auditory processing
ability (Ga) to the theoretical model and refine the definitions of Gv, Gs, and Glr. More recently, a
factor representing a persons quickness in reacting (reaction time) and making decisions (decision
speed; called Gt by Horn, 1991, and Correct Decision Speed [CDS] by Carroll, 1993) was added to the
Gf-Gc model. In addition, factors representing a persons quantitative ability or knowledge (Gq) and
facility with reading and writing (Grw; Horn, 1985, 1988, 1991; McGrew, 1997; Woodcock, 1994)
emerged from further research and were added to the model, resulting in a 10-factor ability structure.
Noteworthy is the fact that these last two abilities (i.e., Gq and Grw) are often conceived of by
practitioners, who routinely conduct psychoeducational assessments, as academic achievements rather
than cognitive abilities.
THE HIERARCHICAL STRUCTURE OF ABILITIES

In his review of the extant factor-analytic research literature, Carroll (1993) differentiated factors or
abilities into three strata that varied according to the relative variety and diversity of variables
(Carroll, 1997, p. 124) included at each level. The various G abilities are the most prominent and
recognized abilities of the model. They are classified as broad or stratum II abilities and include
abilities such as Gf and Gc, the two original factors. According to Carroll (1993), broad abilities
represent basic constitutional and long standing characteristics of individuals that can govern or
influence a great variety of behaviors in a given domain and they vary in their emphasis on process,
content, and manner of response (p. 634). Broad abilities, like Gf and Gc, subsume a large number of
narrow or stratum I abilities of which approximately 70 have been identified (Carroll, 1993, 1997).
Narrow abilities represent greater specializations of abilities, often in quite specific ways that reflect
the effects of experience and learning, or the adoption of particular strategies of performance
(Carroll, 1993, p. 634). The hierarchical structure of Gf-Gc theory is demonstrated for the domain of
crystallized intelligence (Gc) in Figure A.1.
In the Gf-Gc taxonomy, Gc is classified as a broad stratum II cognitive ability. The 11 narrow or
stratum I crystallized abilities that comprise Gc demonstrate the broadness or breadth of this factor.
Figure A.1 shows that 11 different narrow or specialized crystallized abilities have been identified in
the literature. The broad Gc ability and the narrow abilities it encompasses are defined later in this
chapter, as are the remaining Gf-Gc broad and narrow abilities that comprise CHC theory. The
significant, moderate-to-high intercorrelations displayed by the narrow (Gc) abilities suggest the
presence of a broader factor or construct that accounts for this shared and (as depicted in Figure A.1)
supposed crystallized intelligence variance. The broad Gc factor is hypothesized to represent this
higher-order explanatory construct and is believed to exert a significant common effect (reflected by
the direction of the arrows in Figure A.1) on the narrow abilities. When this concept is extended to the
9 other broad ability domains, each of which also subsumes a number of narrow abilities, it is clear
that Gf-Gc theory is quite comprehensive.

Figure A.1 Narrow Abilities Subsumed Under the Broad Ability, Crystallized Intelligence (Gc)
Source: Reproduced with permission from Flanagan, D. P., McGrew, K . S., and Ortiz, S. O. Copyright
2002.

The broadest or most general level of ability in the Gf-Gc model is represented by stratum III,
located at the apex of Carrolls (1993) hierarchy. This single cognitive ability, which subsumes both
broad (stratum II) and narrow (stratum I) abilities, is interpreted as representing a general factor (i.e.,
g ) that is involved in complex higher-order cognitive processes (Gustafsson & Undheim, 1996;
Jensen, 1997; McGrew & Woodcock, 2001). It is important to understand that the abilities within each
level of the hierarchical Gf-Gc model typically display nonzero positive intercorrelations (Carroll,
1993; Gustafsson & Undheim, 1996). For example, similar to the previous Gc discussion, the
different stratum I (narrow) abilities that define the various Gf-Gc domains are correlated positively
and to varying degrees. These intercorrelations give rise to and allow for the estimation of the
stratum II (broad) ability factors. Likewise, the positive nonzero correlations among the stratum II
(broad) Gf-Gc abilities allow for the estimation of the stratum III (general) g factor. The positive
factor intercorrelations within each level of the Gf-Gc hierarchy indicate that the different Gf-Gc
abilities do not reflect completely independent (uncorrelated or orthogonal) traits. However, they can,
as is evident from the vast body of literature that supports their existence, be reliably distinguished
from one another and therefore represent unique, albeit related, abilities (see Keith, 2005).
SIMILARITIES AND DIFFERENCES BETWEEN THE CATTELL-HORN
MODEL AND THE CARROLL MODEL

Simplified versions of the Cattell-Horn and Carroll models of the structure of abilities (i.e., where the
narrow abilities are omitted) are presented together in Figure A.2. A review of Figure A.2 shows a
number of important similarities and differences between the two models. In general, these models
are similar in that they both include some form of fluid intelligence (Gf ), crystallized intelligence
(Gc ), short-term memory and learning (Gsm or Gy), visual perception or processing (Gv), auditory
perception or processing (Ga or Gu), long-term retrieval (Glr or Gr ), processing speed (Gs), and
decision and reaction time speed (CDS or Gt ) abilities. Although there are some differences in the
broad-ability definitions, as well as in the specific narrow abilities that are subsumed by the respective
broad Gf-Gc abilities, the major structural differences between the two models are primarily fourfold
(McGrew, 1997, 2005).

Figure A.2 Correspondence between the Cattell-Horn and Carroll Gf-Gc Models
First, the Cattell-Horn and the Carroll models differ in their inclusion of g (global or general
ability) at stratum III. According to Carroll (1993, 1997, 2005), the general intelligence factor at the
apex of his three-stratum theory is analogous to Spearmans g. The off-center placement of g (to the
left side of Figure A.2) in the Carroll model is intended to reflect the strength of the relations between
g and the respective broad Gf-Gc abilities. As represented in Carrolls model in Figure A.2 (i.e., the
top half of the figure), Gf has the strongest association with g, followed by Gc, and continuing on
through the remaining abilities to the two broad abilities that are weakest in association with g (i.e., Gs
and Gt ).
Although Carroll (1997) has stated that the evidence for g is overwhelming, Horn disagrees
strongly, believing g to be primarily a statistical artifact (see Horn, 1991; Horn & Noll, 1997; Horn &
Blankson, 2005). Accordingly, Horn posits a truncated hierarchical modelthat is, a model that does
not contain a single g factor at the apex (Jensen, 1998). Debates about the nature and existence of g
have waxed and waned for decades and have been some of the liveliest debates in differential
psychology (Gustafsson & Undheim, 1996; Jensen, 1997). Much of the debate has been theoretical in
nature, with definitions of g ranging from an index of neural cognitive efficiency, general reasoning
ability, or mental energy to a mere statistical irregularity (Neisser et al., 1996). After being more or
less banned from the scientific scene (Gustafsson & Undheim, 1996), the prominent position of g in
contemporary psychometric models of the structure of abilities (e.g., Carrolls three-stratum model
and Jensens [1998] g factor treatise) has helped it to take center stage once again in intelligence
research and dialogue. Interested readers are directed to the writings of Carroll (1993, 1997), Horn
and colleagues, Horn and Noll (1997), and Jensen (1997, 1998) for further discussion of g-related
issues and research.
Second, in the Cattell-Horn model, quantitative knowledge and quantitative reasoning abilities
together represent a distinct broad ability, as depicted by the Gq rectangle in the bottom half of Figure
A.2. Carroll (1993), however, views quantitative ability as an inexact, unanalyzed popular concept
that has no scientific meaning unless it is referred to the structure of abilities that compose it. It cannot
be expected to constitute a higher-level ability (p. 627). As such, Carroll classified quantitative
reasoning as a narrow ability subsumed by Gf, as indicated by the arrow leading from the Gq
rectangle in the Cattell-Horn model to the Gf rectangle in the Carroll model in Figure A.2.
Furthermore, Carroll included mathematics achievement and mathematics knowledge factors in a
separate chapter in his book, which described a variety of knowledge and achievement abilities (e.g.,
technical and mechanical knowledge, knowledge of behavioral content) that are not included in his
theoretical model.
Third, recent versions of the Cattell-Horn model have included a broad English-language reading
and writing ability (Grw) that is depicted in the bottom half of Figure A.2 (Flanagan et al., 2002;
McGrew, 1997; Woodcock, 1993). Carroll, however, considers reading and writing to be narrow
abilities subsumed under the broad ability of Gc, as reflected by the arrow leading from the Grw
rectangle in the Cattell-Horn model to the Gc rectangle in the Carroll model in Figure A.2.
Fourth, the Cattell-Horn and the Carroll models differ in their treatment of certain narrow memory
abilities. Carroll combined both short-term memory and the narrow abilities of associative,
meaningful, and free-recall memory (defined later in this chapter) with learning abilities under his
General Memory and Learning factor (Gy). Horn (1991) made a distinction between immediate
apprehension (e.g., short-term memory span) and storage and retrieval abilities. The reader is
referred to McGrew (1997, 2005) for a more complete discussion of these differences.
Notwithstanding the important differences between the Cattell-Horn and the Carroll models, in
order to realize the practical benefits of using theory to guide test selection, organization, and
interpretation, it is necessary to define a single taxonomyone that can be used to classify the
individual tests of psychoeducational batteries, including tests of cognitive ability and tests of
academic achievement. A first effort to create a single taxonomy for this purpose was an integrated
Cattell-Horn and Carroll model proposed by McGrew (1997). McGrew and Flanagan (1998)
subsequently presented a slightly revised integrated model, which was further refined by Flanagan et
al. (2000). The model presented in Flanagan and colleagues (2000), which became known as the
Cattell-Horn-Carroll (CHC) theory, is described briefly in the following section.
A TAXONOMY FOR UNDERSTANDING SPECIFIC COGNITIVE AND
ACADEMIC ABILITIES: CHC THEORY

The integration of the Cattell-Horn and the Carroll models, or CHC theory, is presented in Figure
A.3. This figure depicts the current structure of contemporary CHC theory and reflects the manner in
which the Cattell-Horn and Carroll models have been integrated. In this figure, CHC theory includes
10 broad cognitive abilities, which are subsumed by more than 70 narrow abilities. The abilities
printed in italic in Figure A.3 are those that were not included in Carrolls three-stratum model but
that were included by Carroll in his definitions of knowledge and achievement (Carroll, 1993). The
abilities printed in bold in Figure A.3 are those that were placed under CHC broad abilities in a
differing manner than that proposed by Carroll. These changes (or otherwise integrations of the
Cattell-Horn and the Carroll models) are based on the most recent developments of and refinements
to the Cattell-Horn model (e.g., Horn & Noll, 1997) and recent factor-analysis research (e.g.,
Woodcock, McGrew, & Mather, 2001; see also Flanagan et al., 2000; McGrew, 1997; McGrew &
Flanagan, 1998, for a review). The interested reader is referred to McGrew (2005) for a more
comprehensive description of the specific ways in which CHC theory represents an integration of the
Cattell-Horn and Carroll models.

Figure a.3 The Catell-Horn-Caroll (CHC) Theory of Cognitive Abilities

The exclusion of g in Figure A.3 does not mean that the integrated model used in this text does not
subscribe to a separate general human ability or that g does not exist. Rather, g was omitted by
McGrew (1997) and Flanagan et al. (2000) because it was judged to have little practical relevance to
the selection and organization of tests around referral concernsparticularly those involving
suspected learning disability (LD)and the interpretation of cognitive and academic capabilities via
cross-battery principles and procedures. That is, new methods of LD evaluation and the XBA
approach espoused by Flanagan and colleagues were designed specifically to improve the reliability
of LD diagnosis and the validity of psychoeducational assessment practice, respectively, by
describing the unique patterns of cognitive and academic capabilities of individuals.
CHC theory represents the culmination of more than 60 years of factor-analytic research in the
psychometric tradition. However, in addition to structural evidence, there are other sources of validity
evidence, some quite substantial, that support CHC theory. Prior to defining the broad and narrow
abilities that comprise CHC theory, a brief overview of the validity evidence in support of this
structure of cognitive abilities is presented.
A NETWORK OF VALIDITY EVIDENCE IN SUPPORT OF CHC THEORY

It is beyond the scope of this chapter to provide a fully detailed account and review of all the validity
evidence currently available in support of the CHC structural model as well as the broad and narrow
ability constructs it encompasses. The interested reader is referred to Carroll (1993, 2005), Flanagan
et al. (2000), Horn and Blankson (2005), Horn and Noll (1997), and McGrew (1997, 2005) for a more
thorough discussion.
Briefly, the CHC structure of abilities is supported by factor-analytic (i.e., structural) evidence as
well as developmental, neurocognitive, and heritability evidence. Additionally, there is a mounting
body of research available on the relations between the broad cognitive CHC abilities and many
academic outcomes (summarized in Chapter 2) as well as occupational outcomes (Ackerman &
Heggestad, 1997; McGrew & Flanagan, 1998). Furthermore, studies have shown that the factor
structure of CHC theory is invariant across the lifespan (Bickley, Keith, & Wolfe, 1995; Keith, 2005;
Woodcock et al., 2001) and across gender, ethnic, and cultural groups (e.g., Carroll, 1993; Gustafsson
& Balke, 1993; Keith, 1997, 1999). In general, CHC theory is based on a more extensive network of
validity evidence than other contemporary multidimensional ability models (see Daniel, 1997;
Kranzler & Keith, 1999; McGrew, 2005; McGrew & Flanagan, 1998; Messick, 1992; Sternberg &
Kaufman, 1998).
Given the breadth of empirical support for the CHC structure of intelligence, it provides one of the
most useful frameworks for designing and evaluating psychoeducational batteries, including
intelligence and achievement tests (Carroll, 1997, 1998; Flanagan, 2000; Flanagan & McGrew, 1997;
Kaufman, 2000; Kranzler & Keith, 1999; Keith, Kranzler, & Flanagan, 2001; Keith & Witta, 1997;
Kranzler, Keith, & Flanagan, 2000; McGrew, 1997; Woodcock, 1990; Ysseldyke, 1990). Moreover, in
light of the well-established structural validity of CHC theory, external validity support for the
various CHC constructs, derived through sound research methodology, can be used confidently to
guide test interpretation (see Benson, 1998; Evans, Floyd, McGrew, & Leforgee, 2002; Floyd, Evans,
& McGrew, 2003; Flanagan, 2000; McGrew, Flanagan, Keith, & Vanderwood, 1997; Vanderwood,
McGrew, Flanagan, & Keith, 2002).
It is important to recognize that research related to CHC theory is not static. Rather, research on the
hierarchical structure of abilities (within the Gf-Gc and now CHC framework) has been systematic,
steady, and mounting for decades. Even within the context of this book, the attempt to fully integrate
academically related abilities within the CHC theoretical framework has necessitated significant
rethinking of the model (e.g., differences in types of abilities subsumed under Gc and the nature of
Grw as a broad variable representing distinct academic skills) and forced decisions to be made that
previously were either not relevant or not considered.
BROAD AND NARROW CHC ABILITY DEFINITIONS

In this section the definitions of the broad and narrow abilities included in the CHC model are
provided. These definitions are consistent with those presented in Flanagan et al. (2000), Flanagan and
Ortiz (2001), and McGrew and Flanagan (1998). They were derived from an integration of the
writings of Carroll (1993), Gustafsson and Undheim (1996), Horn (1991), McGrew (1997), McGrew,
Werder, and Woodcock (1991), and Woodcock (1994). The narrow ability definitions are presented in
Tables A.1 through A.10.
Fluid Intelligence (Gf )

Fluid intelligence refers to mental operations that an individual uses when faced with a relatively
novel task that cannot be performed automatically. These mental operations may include forming and
recognizing concepts, perceiving relationships among patterns, drawing inferences, comprehending
implications, problem solving, extrapolating, and reorganizing or transforming information.
Inductive and deductive reasoning are generally considered to be the hallmark narrow ability
indicators of Gf. This broad ability also subsumes more specific types of reasoning, most notably
Quantitative Reasoning (RQ). Unlike the other narrow Gf abilities, RQ is more directly related to
formal instruction and classroom-related experiences. Definitions of the narrow abilities subsumed
by Gf are presented in Table A.1.

Table A.1 Narrow Gf Stratum I Ability Definitions

Crystallized Intelligence (Gc)

Crystallized intelligence refers to the breadth and depth of a persons acquired knowledge of a culture
and the effective application of this knowledge. This store of primarily verbal or language-based
knowledge represents those abilities that have been developed largely through the investment of other
abilities during educational and general life experiences (Horn & Blankson, 2005; Horn & Noll,
1997).
Gc includes both declarative (static) and procedural (dynamic) knowledge. Declarative knowledge
is held in long-term memory and is activated when related information is in working memory (Gsm).
Declarative knowledge includes factual information, comprehension, concepts, rules, and
relationships, especially when the information is verbal in nature. Declarative knowledge refers to
knowledge that something is the case, whereas procedural knowledge is knowledge of how to do
something (Gagne, 1985, p. 48). Procedural knowledge refers to the process of reasoning with
previously learned procedures in order to transform knowledge. For example, a childs knowledge of
his or her street address would reflect declarative knowledge, whereas a childs ability to find his or
her way home from school would require procedural knowledge (Gagne, 1985). As mentioned
earlier, most comprehensive academic achievement batteries measure many different aspects of Gc.
For example, the WJ III (Woodcock et al., 2001) includes a Verbal Comprehension subtest, which is a
measure of Lexical Knowledge and Language Development (VL and LD, respectively; i.e., narrow Gc
abilities). The breadth of Gc is apparent from the number of narrow abilities (12) it subsumes (see
Table A.2).
A rather unique aspect of Gc not seen in the other broad abilities is that it appears to be both a store
of acquired knowledge (e.g., lexical knowledge, general information, information about culture) as
well as a collection of processing abilities (e.g., oral production and fluency, listening ability).
Although Gc is most often conceptualized much like Gq and Grw as an ability that is highly dependent
on learning experiences (especially formal, classroom-type experiences), it also seems to encompass
abilities that are more process oriented. The narrow ability of General Information (K0), for
example, is clearly a repository of learned information, whereas the narrow Listening Ability (LS)
appears to represent the ability to effectively comprehend and process information presented orally.
Although comprehension is of course dependent on knowledge of the words being presented, the
nature of these two Gc abilities is clearly not identical. Although research is needed to discern the
nature of acquired knowledge versus processing abilities within the Gc domain, assessment of Gc
should pay close attention to the nature of the narrow abilities that define this broad domain. Despite
the interrelatedness of all narrow abilities under Gc, there may well be times when focus on the
abilities that are more process oriented as opposed to those that are knowledge oriented is most
important, and vice versa.

Table A.2 Narrow Gc Stratum I Ability Definitions

Quantitative Knowledge (Gq)

Quantitative knowledge represents an individuals store of acquired quantitative, declarative, and
procedural knowledge. The Gq store of acquired knowledge represents the ability to use quantitative
information and manipulate numeric symbols. Gq abilities are typically measured by achievement
tests. For example, most comprehensive tests of achievement include measures of math calculation,
applied problems (or math problem solving), and general math knowledge. Some specialized
achievement batteries measure Gq exclusively, such as KeyMath-Revised/Normative Update (KM-
R/NU; Connolly, 1998). Although some intelligence batteries measure aspects of Gq (e.g., Arithmetic
on the Wechsler Scales and Quantitative Reasoning on the SB5), they typically do not measure this
ability comprehensively.
It is important to understand the difference between Gq and the Quantitative Reasoning (RQ) ability
that is subsumed by Gf. On the whole, Gq represents an individuals store of acquired mathematical
knowledge, including the ability to perform mathematical calculations (i.e., procedural knowledge).
Quantitative Reasoning represents only the ability to reason inductively and deductively when solving
quantitative problems. Recall that RQ is a narrow ability that is typically found under Gf. However,
because RQ is dependent on possession of basic mathematical concepts and knowledge, it seems to be
as much related to Gq as it is related to Gf. Gq is most evident when a task requires mathematical
skills (e.g., addition, subtraction, multiplication, division) and general mathematical knowledge (e.g.,
knowing what the square-root symbol means). RQ, on the other hand, would be required to solve for
a missing number in a number series task (e.g., 3, 6, 9, ). Three narrow abilities are listed and defined
under Gq in Table A.3.

Table A.3 Narrow Gq Stratum I Ability Definitions

Reading/Writing Ability (Grw)

Reading/writing ability is an acquired store of knowledge that includes basic reading, reading
fluency, and writing skills required for the comprehension of written language and the expression of
thought via writing. It includes both basic abilities (e.g., reading decoding and fluency, spelling) and
complex abilities (e.g., comprehending written discourse and writing a story). Like Gq, Grw is
considered to be an achievement domain, and, therefore, has been measured traditionally (and almost
exclusively) by tests of academic achievement. In Carrolls (1993) three-stratum model, eight narrow
reading and writing abilities are subsumed by Gc in addition to other abilities. In the CHC model,
these eight narrow abilities define the broad Grw ability. These Grw narrow abilities are defined in
Table A.4.

Table A.4 Narrow Grw Stratum I Ability Definitions

Short-Term Memory (Gsm)

Short-term memory is the ability to apprehend and hold information in immediate awareness and then
use it within a few seconds. Gsm is a limited-capacity system, as most individuals can retain only
seven chunks of information (plus or minus two chunks) in this system at one time. An example of
Gsm is the ability to remember a telephone number long enough to dial it, or the ability to retain a
sequence of spoken directions long enough to complete the tasks specified in the directions. Given the
limited amount of information that can be held in short-term memory, information is typically
retained for only a few seconds before it is lost. As most individuals have experienced, it is difficult
to remember an unfamiliar telephone number for more than a few seconds unless one consciously
uses a cognitive learning strategy (e.g., continually repeating or rehearsing the numbers) or other
mnemonic device. When a new task requires an individual to use his or her Gsm abilities to store new
information, the previous information held in short-term memory is either lost or must be stored in
the acquired stores of knowledge (i.e., Gc, Gq, Grw) through the use of Glr.
In the CHC model, Gsm subsumes the narrow ability of working memory, which has received
considerable attention recently in the cognitive psychology literature (see Kane, Bleckley, Conway, &
Engle, 2001). Working memory is considered to be the mechanism responsible for the temporary
storage and processing of information (Richardson, 1996, p. 23). It has been referred to as the
minds scratchpad ( Jensen, 1998, p. 220) and most models of working memory postulate a number
of subsystems or temporary buffers. The phonological or articulatory loop processes auditory-
linguistic information, whereas the visuospatial sketch/scratchpad (Baddeley, 1986, 1992; Logie,
1996) is the temporary buffer for visually processed information. Most working memory models
also posit a central executive or processor mechanism that coordinates and manages the activities and
subsystems in working memory.
Carroll (1993) was skeptical of the working memory construct, as reflected in his conclusion that
although some evidence supports such a speculation, one must be cautious in accepting it because as
yet there has not been sufficient work on measuring working memory, and the validity and generality
of the concept have not yet been well established in the individual differences research (p. 647).
Notwithstanding, the working memory construct has been related empirically to a variety of different
outcomes, including many specific reading and math skills. Therefore, despite the questions that have
been raised regarding its validity as a measurable construct, Flanagan et al. (2000) and Woodcock et
al. (2001) included working memory in the CHC taxonomy in light of the current literature that
argues strongly for its predictive utility (e.g., Ackerman, Beier, & Boyle, 2002; Hitch, Towse, &
Hutton, 2001). Nevertheless, given that Carroll has raised questions about the validity of the construct
of working memory, it is important to remember that this construct was included in current CHC
theory primarily for practical application and ease of communication. Additional research is
necessary before definitive decisions can be reached about the inclusion or exclusion of working
memory in CHC theory. The narrow Gsm abilities are defined in Table A.5.

Table A.5 Narrow Gsm Stratum I Ability Definitions

Visual Processing (Gv)

Visual processing (Gv) is the ability to generate, perceive, analyze, synthesize, store, retrieve,
manipulate, transform, and think with visual patterns and stimuli (Lohman, 1994). These abilities are
measured frequently by tasks that require the perception and manipulation of visual shapes and forms,
usually of a figural or geometric nature (e.g., a standard block design task). An individual who can
mentally reverse and rotate objects effectively, interpret how objects change as they move through
space, perceive and manipulate spatial configurations, and maintain spatial orientation would be
regarded as having a strength in Gv abilities. Most widely recognized achievement batteries do not
measure Gv abilities directly, although these abilities have been found to be related significantly to
higher-level mathematics achievement (e.g., geometry and trigonometry; Casey, Nuttall, & Pezaris,
1997; Hegarty & Kozhevnikov, 1999). The various narrow abilities subsumed by Gv are listed and
defined in Table A.6.

Table A.6 Narrow Gv Stratum I Ability Definitions

Auditory Processing (Ga)

In the broadest sense, auditory abilities are cognitive abilities that depend on sound as input and on
the functioning of our hearing apparatus (Stankov, 1994, p. 157) and reflect the degree to which the
individual can cognitively control the perception of auditory stimulus inputs (Gustafsson &
Undheim, 1996, p. 192). Auditory processing is the ability to perceive, analyze, and synthesize patterns
among auditory stimuli, and to discriminate subtle nuances in patterns of sound (e.g., complex
musical structure) and speech when presented under distorted conditions. Although Ga abilities do not
require the comprehension of language (Gc) per se, they are important in the development of
language skills (Liberman, Shankweiler, Fischer, & Carter, 1974; Wagner & Torgesen, 1987). Ga
subsumes most of those abilities referred to as phonological awareness/processing. Tests that
measure these abilities (e.g., phonetic coding tests) are found typically on achievement batteries, such
as the Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte,
1999), Test of Phonological Awareness-Second Edition: Plus (TOPA-2+; Torgesen & Bryant, 2004),
and Test of Phonological Awareness Skills (TOPAS; Newcomer & Barenbaum, 2003). The only
major intelligence test that provides adequate measurement of Ga (viz., phonetic coding, sound
discrimination, resistance to auditory distortion) is the WJ III. Noteworthy is the fact that the number
of tests specifically designed to measure phonological processing has increased significantly in
recent years, presumably as a result of the consistent finding that phonological awareness/ processing
appears to be the core deficit in individuals with reading difficulties (e.g., Morris et al., 1998;
Vellutino, Scanlon, & Lyon, 2000; Vellutino and Scanlon, 2002). However, as can be seen from the list
of narrow abilities subsumed by Ga (Table A.7), this domain is very broad, extending far beyond
phonetic coding ability.

Table A.7 Narrow Ga Stratum I Ability Definitions

Long-Term Storage and Retrieval (Glr)

Long-term storage and retrieval is the ability to store information in and fluently retrieve new or
previously acquired information (e.g., concepts, ideas, items, names) from long-term memory. Glr
abilities have been prominent in creativity research, where they have been referred to as idea
production, ideational fluency, or associative fluency. It is important not to confuse Glr with Gc, Gq,
and Grw, which represent to a large extent an individuals stores of acquired knowledge. Specifically,
Gc, Gq, and Grw represent what is stored in long-term memory, whereas Glr is the efficiency with
which this information is initially stored in and later retrieved from long-term memory.
It is also important to note that different processes are involved in Glr and Gsm. Although the word
long-term frequently carries with it the connotation of days, weeks, months, and years in the clinical
literature, long-term storage processes can begin within a few minutes or hours of performing a task.
Therefore, the time lapse between the initial task performance and the recall of information related to
that task is not necessarily of critical importance in defining Glr. More important is the occurrence of
an intervening task that engages short-term memory during the interim before the attempted recall of
the stored information (e.g., Gc; Woodcock, 1993). Although Glr is measured more consistently and
directly by intelligence, rather than achievement batteries, some achievement tests measure Glr. For
example, the WJ III Tests of Achievement include Story Recall-Delayed, a measure of Glrnamely,
Meaningful Memory. One Glr ability that has been receiving increased attention in the literature is
Naming Facility, or the ability to produce names for concepts rapidly. This ability, often referred to
as Rapid Automatized Naming (RAN), has been found to predict reading achievement significantly.
However, Naming Facility or RAN is measured only by a select few achievement tests, such as Rapid
Object Naming on the CTOPP (Wagner et al., 1999) and the rapid naming subtests of the Rapid
Automatized Naming and Rapid Alternating Stimulus Tests (RAN/RAS; Wolf & Denckla, 2005; see
Flanagan et al. 2006 for a review). In the present CHC model, 13 narrow memory and fluency abilities
are included under Glr (see Table A.8).

Table A.8 Narrow Glr Stratum I Ability Definitions

Processing Speed (Gs)

Processing speed, or mental quickness, is often mentioned when talking about intelligent behavior
(Nettelbeck, 1994). Processing speed is the ability to fluently and automatically perform cognitive
tasks, especially when under pressure to maintain focused attention and concentration. Attentive
speediness encapsulates the essence of Gs, which is measured typically by fixed-interval timed tasks
that require little in the way of complex thinking or mental processing (e.g., the Wechsler Symbol
Search, Cancellation, and Digit Symbol/Coding tests).
Recent interest in information processing models of cognitive functioning has resulted in a
renewed focus on Gs (Kail, 1991; Lohman, 1989; McGrew, 2005). A central construct in information
processing models is the idea of limited processing resources (e.g., the limited capacities of short-
term and working memory). Many cognitive activities require a persons deliberate efforts and
people are limited in the amount of effort they can allocate. In the face of limited processing
resources, the speed of processing is critical because it determines in part how rapidly limited
resources can be reallocated to other cognitive tasks (Kail, 1991). Woodcock (1993) likens Gs to a
valve in a water pipe. The rate in which water flows in the pipe (i.e., Gs) increases when the valve is
opened wide and it decreases when the valve is partially closed. Four different narrow speed-of-
processing abilities are subsumed by Gs in the present CHC model (see Table A.9).

Table A.9 Narrow Gs Stratum I Ability Definitions

Decision Speed/Reaction Time (Gt)

In addition to Gs, both Carroll and Horn included a second broad speed ability in their respective
models of the structure of abilities. Processing Speed or Decision Speed/Reaction Time (Gt), as
proposed by Carroll, subsumes narrow abilities that reflect an individuals quickness in reacting
(reaction time) and making decisions (decision speed). Correct Decision Speed (CDS), proposed by
Horn as a second speed ability (Gs being the first), is typically measured by recording the time an
individual requires to provide an answer to problems on a variety of tests (e.g., letter series,
classifications, vocabulary; Horn, 1988, 1991). Because Correct Decision Speed appeared to be a
much narrower ability than Gt, it is subsumed by Gt in CHC theory.
It is important not to confuse Gt with Gs. Gt abilities reflect the immediacy with which an
individual can react to stimuli or a task (typically measured in seconds or fractions of seconds),
whereas Gs abilities reflect the ability to work quickly over a longer period of time (typically
measured in intervals of 2 to 3 minutes). Being asked to read a passage (on a self-paced scrolling
video screen) as quickly as possible and, in the process, touch the word the with a stylus pen each time
it appears on the screen, is an example of Gs. The individuals Gs score would reflect the number of
correct responses (taking into account errors of omission and commission). In contrast, Gt may be
measured by requiring a person to read the same text at his or her normal rate of reading and press
the space bar as quickly as possible whenever a light is flashed on the screen. In this latter paradigm,
the individuals score is based on the average response latency or the time interval between the onset
of the stimulus and the individuals response. Table A.10 includes descriptions of the narrow abilities
subsumed by Gt.

Table A.10 Narrow Gt Stratum I Ability Definitions

REFERENCES

Ackerman, P. L., Beier, M. E., & Boyle, M. B. (2002). Individual differences in working memory
within a nomological network of cognitive and perceptual speed abilities. Journal of Experimental
Psychology: General, 131, 567-605.
Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for
overlapping traits. Psychological Bulletin, 121, 219-245.
Baddeley, A. (1986). Working memory. Oxford: Oxford University Press.
Baddeley, A. (1992). Is working memory working? The fifteenth Bartlett Lecture. Quarterly Journal
of Experimental Psychology, 44A, 1-31.
Benson, J. (1998). Developing a strong program of construct validation: A test anxiety example.
Educational Measurement: Issues and Practice, 17, 10-22.
Bickley, P. G., Keith, T. Z., & Wolfe, L. M. (1995). The three-stratum theory of cognitive abilities: Test
of the structure of intelligence across the life span. Intelligence, 20, 309-328.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, UK:
Carroll, J. B. (1998). Foreword. In K. S. McGrew & D. P. Flanagan (Eds.), The intelligence test desk
& P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 69-76).
New York: Guilford.
Casey, M. B., Nuttall, R. L., & Pezaris, E. (1997). Mediators of gender differences in mathematics
college entrance test scores: A comparison of spatial skills with internalized beliefs and anxieties.
Developmental Psychology, 33, 669-680.
Cattell, R. B. (1941). Some theoretical issues in adult intelligence testing. Psychological Bulletin, 38,
592.
Connolly, A. J. (1998). Key Math-Revised: Normative Update. Circle Pines, MN: American Guidance
Service.
Daniel, M. H. (1997). Intelligence testing: Status and trends. American Psychologist, 52, 1038-1045.
Evans, J., Floyd, R., McGrew, K., & Leforgee, M. (2002). The relations between measures of Cattell-
Horn-Carroll (CHC) cognitive abilities and reading achievement during childhood and adolescence.
School Psychology Review, 31(2), 246-262.
Flanagan, D. P. (2000). Wechsler-based CHC cross-battery assessment and reading achievement:
Strengthening the validity of interpretations drawn from Wechsler test scores. School Psychology
Quarterly, 15(3), 295 -329.
Flanagan, D. P., & Kaufman, A. S. (2004). Essentials of WISC-IV assessment. Hoboken, NJ: Wiley.
(pp. 314 -325). New York: Guilford.
Flanagan, D. P., & Ortiz, S. O. (2001). Essentials of cross battery assessment. New York: Wiley.
Comprehensive assessment of learning disabilities. Boston: Allyn & Bacon.
Floyd, R. G., Evans, J. J., & McGrew, K. S. (2003). Relations between measures of Cattell-Horn-
Carroll (CHC) cognitive abilities and mathematics achievement across the school-age years.
Psycholog y in the Schools, 40, 155 -171.
Gagne, E. D. (1985). The cognitive psychology of school learning. Boston: Little & Brown.
Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8,
179-203.
Gustafsson, J. E., & Balke, G. (1993). General and specific abilities as predictors of school
achievement. Multivariate Behavioral Research, 28, 407- 434.
Gustafsson, J. E., & Undheim, J. O. (1996). Individual differences in cognitive functions. In C. D.
Berliner & R. C. Cabfee (Eds.), Handbook of educational psychology (pp. 186-242). New York:
Macmillan.
Hegarty, M., & Kozhevnikov, M. (1999). Types of visual-spatial representations and mathematical
problem solving. Journal of Educational Psychology, 91, 684 - 689.
Hitch, G. J., Towse, J. N., & Hutton, U. M. Z. (2001). What limits working memory span? Theoretical
accounts and applications for scholastic development. Journal of Experimental Psychology: General,
130, 184 -198.
Horn, J. L. (1965). Fluid and crystallized intelligence : A factor analytic study of the structure among
primary mental abilities. Unpublished doctoral dissertation, University of Illinois, Champaign.
Horn, J. L. (1985). Remodeling old models of intelligence: Gf-Gc theory. In B. B. Wolman (Ed.),
Handbook of intelligence (pp. 267-300). New York: Wiley.
Horn, J. L. (1988). Thinking about human abilities. In J. R. Nesselroade & R. B. Cattell (Eds.),
Handbook of multivariate psychology (Rev. ed., pp. 645-685). New York: Academic.
Horn, J. L. (1991). Measurement of intellectual capabilities: A review of theory. In K. S. McGrew, J. K.
Werder, & R. W. Woodcock (Eds.), Woodcock-Johnson technical manual (pp. 197-232). Chicago:
Horn, J. L., & Blankson, N. (2005). Foundations for better understanding of cognitive abilities. In D. P.
Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories,
tests, and issues (pp. 41- 68). New York: Guilford.
Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. In D. P. Flanagan, J. L.
(pp. 53 -91). New York: Guilford.
Jensen, A. R. (1997, July). What we know and dont know about the g factor. Keynote address
delivered at the bi-annual convention of the International Society for the Study of Individual
Differences. Aarhus, Denmark.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Kail, R. (1991). Developmental changes in speed of processing during childhood and adolescence.
Psychological Bulletin, 109, 490-501.
Kane, M. J., Bleckley, M. K., Conway, A. R. A., & Engle, R. W. (2001). A controlled-attention view of
working-memory capacity. Journal of Experimental Psychology: General, 130, 169-183.
Kaufman, A. S. (2000). Foreword. In D. P. Flanagan, K. S. McGrew, & S. O. Ortiz (Eds.), The Wechsler
intelligence scales and Gf-Gc theory: A contemporary approach to interpretation. Boston: Allyn &
Bacon.
Kaufman, A. S., & Kaufman, N. (2004). Kaufman Assessment Battery for Children-Second Edition.
Keith, T. Z. (1997). Using confirmatory factor analysis to aid in understanding the constructs
measured by intelligence tests. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and issues (pp. 373-402). New York: Guilford.
Keith, T. Z. (1999). Effects of general and specific abilities on student achievement: Similarities and
differences across ethnic groups. School Psychology Quarterly, 14, 239-262.
Keith, T. Z. (2005). Using confirmatory factor analysis to aid in understanding the constructs
measured by intelligence tests. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary
intellectual assessment: Theories, tests, and issues (2nd ed., pp. 581- 615). New York: Guilford.
Keith, T. Z., Kranzler, J., & Flanagan, D. P. (2001). What does the cognitive assessment system (CAS)
measure? Conjoin confirmatory factor analysis of the cognitive assessment system (CAS) and the
Woodcock-Johnson tests (3rd ed.). School Psychology Review, 30, 89-119.
Keith, T. Z., & Witta, E. L. (1997). Hierarchical and cross-age confirmatory factor analysis of the
WISC-III: What does it measure? School Psychology Quarterly, 89-107.
Kranzler, J. H., & Keith, T. Z. (1999). Independent confirmatory factor analysis of the Cognitive
Assessment System (CAS): What does the CAS measure? School Psychology Review, 28, 117-144.
Kranzler, J. H., Keith, T. Z., & Flanagan, D. P. (2000). Independent examination of the factor structure
of the cognitive assessment system (CAS): Further evidence disputing the construct validity of the
CAS. Journal of Psychoeducational Assessment, 18, 143-159.
Liberman, I., Shankweiler, D., Fischer, F. W., & Carter, B. (1974). Explicit syllable and phoneme
segmentation in the young child. Journal of Experimental Child Psychology, 18, 201-212.
Logie, R. (1996). The seven ages of working memory. In J. Richardson, R. Engle, L. Hasher, R. Logie,
E. Stoltzfus, & R. Zacks (Eds.), Working memory and human cognition (pp. 31- 65). New York:
Oxford.
Lohman, D. F. (1989). Human intelligence: An introduction to advances in theory and research.
Review of Educational Research, 59, 333-373.
Lohman, D. F. (1994). Spatial ability. In R. J. Sternberg (Ed.), Encyclopedia of human intelligence (pp.
1000 -1007). New York: Macmillan.
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of cognitive abilities: Past, present, and
future. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual
assessment: Theories, tests, and issues (pp. 136-182). New York: Guilford.
McGrew, K. S., Flanagan, D. P., Keith, T. Z., & Vanderwood, M. (1997). Beyond g: The impact of Gf-
Gc specific cognitive abilities research on the future use and interpretation of intelligence tests in the
schools. School Psychology Review, 26, 189-210.
McGrew, K. S., Werder, J. K., & Woodcock, R. W. (1991). Woodcock-Johnson Psycho-Educational
Battery-Revised technical manual. Chicago: Riverside Publishing.
McGrew, K. S., & Woodcock, R. W. (2001). Woodcock-Johnson III technical manual. Itasca, IL:
Messick, S. (1992). Multiple intelligences or multilevel intelligence? Selective emphasis on distinctive
properties of hierarchy: On Gardner s Frames of Mind and Sternbergs Beyond IQ in the context of
theory and research on the structure of human abilities. Psychological Inquiry, 3, 365 -384.
Morris, R. D., Stuebing, K. K., Fletcher, J. M., Shaywitz, S. E., Lyon, G. R., Shankweiler, D. P., Katz, L.,
Francis, D. J., & Shaywitz, B. A. (1998). Subtypes of reading disability: Variability around a
phonological core. Journal of Educational Psychology, 90, 347-373.
Nettelbeck, T. (1994). Speediness. In R. J. Sternberg (Ed.), Encyclopedia of human intelligence (pp.
1014 -1019). New York: Macmillan.
Newcomer, P., & Barenbaum, E. (2003). Test of phonological awareness skills. Austin, TX: PRO-ED.
Richardson, J. (1996). Evolving concepts of working memory. In J. Richardson, R. Engle, L. Hasher,
R. Logie, E. Stoltzfus, & R. Zacks (Eds.), Working memory and human cognition (pp. 3 -30). New
York: Oxford.
Stankov, L. (1994). Auditory abilities. In R. J. Sternberg (Ed.), Encyclopedia of human intelligence (pp.
157-162). New York: Macmillan.
Sternberg, R. J., & Kaufman, J. C. (1998). Human abilities. Annual Review of Psychology, 49, 479-502.
Taylor, T. R. (1994). A review of three approaches to cognitive assessment, and a proposed integrated
approach based on a unifying theoretical framework. South African Journal of Psychology, 24, 183-
193.
Torgesen, J. K., & Bryant, B. R. (2004). Test of Phonological Awareness-Second Edition. Austin, TX:
PRO-ED.
Vanderwood, M. L., McGrew, K. S., Flanagan, D. P., & Keith, T. Z. (2002). The contribution of general
and specific cognitive abilities to reading achievement. Learning and Individual Differences, 13, 159-
188.
Vellutino, F., Scanlon, D., & Lyon, G. R. (2000). Differentiating between difficult-to-remediate and
readily remediated poor readers: More evidence against IQ-ACHIEVEMENT discrepancy definitions
of reading disability. Journal of Learning Disabilities, 33, 223 -238.
Vellutino, F. R., & Scanlon, D. M. (2002). The interactive strategies approach to reading intervention.
Contemporary Educational Psycholog y, 27, 573-635.
Wagner, R. K., & Torgesen, J. K. (1987). The nature of phonological processing and its causal role in
the acquisition of reading skills. Psychological Bulletin, 101, 192-212.
Processing. Austin, TX: PRO-ED.
Woodcock, R. W. (1990). Theoretical foundations of the WJ-R measures of cognitive ability. Journal
of Psychoeducational Assessment, 8, 231-258.
Woodcock, R. W. (1993). An information processing view of Gf-Gc theory. Journal of
Psychoeducational Assessment Monograph Series: Advances in Psychoeducational Assessment.
Woodcock-Johnson Psychoeducational Battery-Revised (pp. 80-102). Brandon, VT: Clinical
Psychology Publishing.
Woodcock, R. W. (1994). Measures of fluid and crystallized intelligence. In R. J. Sternberg (Ed.), The
encyclopedia of intelligence (pp. 452-456). New York: Macmillan.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson Psychoeducational
Battery-Third Edition. Itasca, IL: Riverside Publishing.
Wolf, M., & Denckla, M. B. (2005). Rapid automatized naming and rapid alternating stimulus test.
Austin, TX: PRO-ED.
Ysseldyke, J. (1990). Goodness of fit of the Woodcock-Johnson Psycho-Educational Battery-Revised
to the Horn-Cattell Gf-Gc theory. Journal of Psychoeducational Assessment, 8, 268 -275.
Appendix B

CHC Broad and Narrow Ability Classification Tables For Tests Published
between 1996-2007

The following information pertains to all CHC classifications included in this Appendix.
Tests printed in BOLD, UPPERCASE LETTERS are strong measures of the abilities listed as
defined empirically via factor analysis.
Tests printed in Bold, Lowercase Letters are moderate measures of the abilities listed as
defined empirically via factor analysis.
Tests printed in REGULAR-FACE, UPPERCASE LETTERS are abilities classified empirically
via an expert consensus process.
Tests printed in Regular-face, Lowercase Letters are abilities classified by author consensus.
Tests printed in Bold Italics, Lowercase Letters are mixed measures of the abilities as defined
empirically via factor analysis.
Tests printed in Regular-face italics, Lowercase Letters are mixed measures of the abilities as
defined empirically via expert consensus or by author consensus.
A number of listed subtests also measure a second narrow ability, and this classification is
reported in parentheses, along with the primary narrow ability.
This test has received a secondary classification via author consensus over and above its
primary classification as derived from factor analysis (see Table B.1 for results).
*This test had different factor loadings depending on age (see Table B.2 for results).

Note: For a more complete description of the subtests included herein, see Appendix C. Subtests
published prior to 2001 that were classified as mixed measures are not included in this appendix.
Although confirmatory factor analyses for the KABC-II subtests were not conducted at age 3 years,
the tables herein include the entire age range for which the KABC-II subtests are applicable.

CRYSTALLIZED INTELLIGENCE (Gc)
The breadth and depth of a persons acquired knowledge of a culture and the effective application of
this knowledge.

FLUID INTELLIGENCE (Gf)
Mental operations that an individual may use when faced with a relatively novel task that cannot be
performed automatlcally.

VISUAL PROCESSING (Gv)
The ability to generate, perceive, analyze, synthesize, manipulate, transform, and think with visual
patterns and stimuli.

LONG-TERM RETRIEVAL (Glr)
Ability to store information (e.g., concepts, ideas, items, names) in long-term memory and to fluently
retrieve it later through association.

SHORT-TERM MEMORY (Gsm)
The ability to generate, perceive, analyze, synthesize, manipulate, transform, and think with visual
patterns and stimuli.

AUDITORY PROCESSING (Ga)
The ability to perceive, analyze, and synthesize patterns among auditory stimuli.

PROCESSING SPEED (Gs)

Ability to fluently perform cognitive tasks automatically, especially when under pressure to maintain
focused attention and concentration.

QUANTITATIVE KNOWLEDGE (Gq)
Represents an individuals store of acquired quantitative declarative and procedural knowledge. It
involves the ability to use quantitative information and manipulate numeric symbols.

READING ABILITY (Grw-R)

An acquired store of knowledge that includes basic reoding skills required for the comprehension of
written language.

WRITING ABILITY (Grw-W)
An acquired store of knowledge that includes basic writing skills required for the expression of
language via writing.

Table B.1 CHC Broad and Narrow Ability Classifications Based on Factor Analyses and Author
Consensus

1011 Note: Classifications in the factor-analysis column correspond to those reported in Appendix B.
Classifications in the last column of this table were based on author consensus. These classifications
reflect abilities that the present authors believe the tests measure in addition to those indicated by
factor analyses. See Appendix A for CHC broad and narrow ability definitions. DAS-II = Differential
Ability Scales-Second Edition (Elliott, 2007); KABC-II = Kaufman Assessment Battery for Children-
Second Edition (Kaufman & Kaufman, 2004); SB5 = Stanford-Binet Intelligence Scales-Fifth Edition
(Roid, 2003); WJ III = Woodcock-Johnson III Tests of Cognitive Abilities (Woodcock, McGrew, &
Mather, 2001); WJ III DS = Woodcock-Johnson III Diagnostic Supplement to the Tests of Cognitive
Abilities (Woodcock, McGrew, Mather, & Schrank, 2003).

Table B.2 CHC Broad Ability Classifications Based on the Results of Factor Analyses in Tests
Technical Manuals

APPENDIX C

Descriptions of Cognitive Ability/Processing and Academic Achievement Subtests

by CHC Domain

REFERENCES

Barenbaum, E., & Newcomer, P. (1996). Test of Childrens Language. Austin, TX: PRO-ED.
Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test. Itasca, IL: Riverside
Publishing.
Brown, L., Sherbenou, R. J., & Johnson, S. K. (1997). Test of Nonverbal Intelligence-Third Edition.
Austin, TX: PRO-ED.
Brownell, R. (2002). Phonics-Based Reading Test. Novato, CA: Academic Therapy.
Bryant, B. R., Wiederholt, J. L., & Bryant, P. B. (2004). Gray Diagnostic Reading Tests- Second
Edition. Austin, TX: PRO-ED.
Carrow-Woolfolk, E. (1996). Oral and Written Language Scales: Written Expression. Circle Pines,
Cohen, M. J. (1997). Childrens Memory Scale. San Antonio, TX: The Psychological Corporation.
Connolly, A. J. (1998). Key Math-Revised: A Diagnostic Inventory of Essential Mathematics
/Normative Update. Circle Pines, MN: American Guidance Service.
Dunn, L. M., & Dunn, L. M. (1997). Peabody Picture Vocabulary Test III. Circle Pines, MN: American
Guidance Service.
Elliott, C. (2007). Differential Ability Scales-Second Edition. San Antonio, TX: The Psychological
Corporation.
Erford, B. T., & Boykin, R. R. (1996). Slosson-Diagnostic Math Screener. East Aurora, NY: Slosson
Educational.
French, J. (2001). Pictorial Test of Intelligence-Second Edition. Austin, TX: PRO-ED.
Glutting, J., Adams, W., & Sheslow, D. (2000). Wide Range Intelligence Test. Wilmington, DE : Wide
Range.
Hammill, D. D. (1998). Detroit Tests of Learning Aptitude-Fourth Edition. Austin, TX: PRO-ED.
Hammill, D. D., Hresko, W. P., Ammer, J. J., Cronin, M. E., & Quinby, S. S. (1998). Hammill
Multiability Achievement Test. Austin, TX: PRO-ED.
Hammill, D. D., & Larsen, S. C. (1996). Test of Written Language-Third Edition. Austin, TX: PRO-ED.
Hammill, D. D., Mather, N., & Roberts, R. (2001). Illinois Test of Psycholinguistic Abilities-Third
Edition. Austin, TX: PRO-ED.
Hammill, D. D., Pearson, N. A., & Wiederholt, J. L. (1998). Comprehensive Test of Nonverbal
Intelligence. Austin, TX: PRO-ED.
Hammill, D. D., Wiederholt, J. L., & Allen, A. A. (2005). Test of Silent Contextual Reading Fluency.
Austin, TX: PRO-ED.
Hresko, W. P., Herron, S. R., & Peak, P. K. (1998). Test of Early Written Language-Second Edition.
Austin, TX: PRO-ED.
Hresko, W. P., Schlieve, P. L., Herron, S. R., Swain, C., & Sherbenou, R. J. (2003). Comprehensive
Mathematical Abilities Test. Austin, TX: PRO-ED.
Hresko, W. P., Peak, P., Herron, S., & Bridges, D. (2004). Young Childrens Achievement Test. Austin,
TX: PRO-ED.
Hresko, W. P., Reid, K., & Hammill, D. D. (1999). Test of Early Language Development- Third Edition.
Pearson Assessments.
Kaufman, A. S., & Kaufman, N. L. (2004a). Kaufman Assessment Battery for Children- Second Edition.
Kaufman, A. S., & Kaufman, N. L. (2004b). Kaufman Brief Intelligence Test-Second Edition. Circle
Pines, MN: AGS Publishing.
Kaufman, A. S., & Kaufman, N. L. (2004c). Kaufman Test of Educational Achievement- Second
Edition. Circle Pines, MN: AGS Publishing.
Keith, R. W. (2000). SCAN-C Test for Auditory Processing Disorders in Children-Revised. San
Korkman, M., Kirk, U., & Kemp, S. (1998). NEPSY: A developmental neuropsychological assessment.
San Antonio, TX: The Psychological Corporation.
Larsen, S. C., Hammill, D. D., & Moats, L. C. (1999). Test of Written Spelling-Fourth Edition. Austin,
TX: PRO-ED.
Markwardt, F. C., Jr. (1997). Peabody Individual Achievement Test-Revised/Normative Update. Circle
Pines, MN: American Guidance Service.
Martin, N., & Brownell, R. (2005). Test of Auditory-Perceptual Skills-Third Edition. Novato, CA:
Academic Therapy Publications.
Mather, N., Hammill, D. D., Allen, A. A., & Roberts, R. (2004). Test of Silent Word Reading Fluency.
Austin, TX: PRO-ED.
Naglieri, J. C., & Das, J. P. (1997). Cognitive Assessment System. Itasca, IL : Riverside Publishing.
Newcomer, P. L. (1999). Standardized Reading Inventory-Second Edition. Austin, TX: PRO-ED.
Newcomer, P. L. (2001). Diagnostic Achievement Battery-Third Edition. Austin, TX: PRO-ED.
Newcomer, P. L., & Barenbaum, E. (2003). Test of Phonological Awareness Skills. Austin, TX: PRO-
ED.
Newcomer, P. L., & Hammill, D. D. (1997). Test of Language Development-Primary-Third Edition.
Austin, TX: PRO-ED.
Reid, D. K., Hresko, W. P., & Hammill, D. D. (2001). Test of Early Reading Ability-Third Edition.
Austin, TX: PRO-ED.
Reynolds, C. R., & Kamphaus, R. W. (2003). Reynolds Intellectual Assessment Scales. Lutz, FL:
Robertson, C., & Salter, W. (1997). The Phonological Awareness Test. East Moline, IL : LinguiSystems.
Robertson, G. J. (2001). Wide Range Achievement Test-Expanded. Lutz, FL: PAR Inc.
Roid, G. H., & Miller, L. J. (1998). Leiter International Performance Scale-Revised. Wood Dale, IL:
Stoelting Co.
Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical Evaluation of Language Fundamentals-
Fourth Edition. San Antonio, TX: The Psychological Corporation.
Sheslow, D., & Adams, W. (2003). Wide Range Assessment of Memory and Learning-Second Edition.
Lutz, FL : Psychological Assessment Resources.
Slosson, R. L., & Nicholson, C. L. (2002). Slosson Oral Reading Test-Revised. East Aurora, NY:
Slosson Educational Publications.
Torgesen, J. K., & Bryant, B. R. (2004). Test of Phonological Awareness-Second Edition: PLUS.
Austin, TX: PRO-ED.
Torgesen, J. K., Wagner, R. K., & Rashotte, C. A. (1999). Test of Word Reading Efficiency. Austin, TX:
PRO-ED.
Processing. Austin, TX: PRO-ED.
Wechsler, D. (1997). Wechsler Memory Scale-Third Edition. San Antonio, TX: The Psychological
Corporation.
Wechsler, D. (2001). Wechsler Individual Achievement Test-Second Edition. San Antonio, TX: The
Wechsler, D., & Naglieri, J. C. (2005). Wechsler Nonverbal Scale of Ability. San Antonio, TX: The
Wiederholt, J. L., & Blalock, G. (2000). Gray Silent Reading Tests. Austin, TX: PRO-ED.
Wiederholt, J. L., & Bryant, B. R. (2001). Gray Oral Reading Test-Fourth Edition. Austin, TX: PRO-
ED.
Wilkinson, G. S., & Robertson, G. J. (2006). Wide Range Achievement Test-Fourth Edition. Lutz, FL :
Williams, K. T. (1997). Expressive Vocabulary Test. Circle Pines, MN: American Guidance Service.
Wilson, B. A., & Felton, R. H. (2002). Word Identification and Spelling Test. Austin, TX: PRO-ED.
Woodcock, R. W. (1998). Woodcock Reading Mastery Tests-Revised/Normative Update. Circle Pines,
Woodcock, R. W., McGrew, K. S., & Mather, N., & Schrank, F. A. (2003). Diagnostic Supplement to
the Woodcock-Johnson III Tests of Cognitive Abilities. Itasca, IL : Riverside Publishing.
Woodcock, R. W., Mather, N., & Schrank, F. A. (2004). Woodcock-Johnson III Diagnostic Reading
Battery. Itasca, IL : Riverside Publishing.
Appendix D

Test-Specific Culture-Language Matrices

Matrix of Cultural Loading and Linguistic Demand Classifications of the WAIS-III Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WISC-IV Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WPPSI-III Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WJ III Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the SB5 Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the DAS-II Subtests12

Matrix of Cultural Loading and Linguistic Demand Classifications of the K ABC-II Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the UNIT Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the Leiter-R Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the CAS Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the KBIT-2 Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the RIAS Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WRIT Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the PTI-2 Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the DTLA-4 Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the NEPSY Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the CTOPP Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WR AML Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the CMS Subtests

Matrix of Cultural Loading and Linguistic Demand Classifications of the WMS-III Subtests

Appendix E

Percentile Rank and Standard Score Conversion Table

Appendix F

Critical Values Required for Statistical Significance among Subtests within

Composites for Seven Intelligence Batteries

Sources: All critical value data for the WISC-IV, WAIS-III, and WPPSI-III were taken from Wechsler
(2003), Wechsler (1997), and Wechsler (2002), respectively. All critical value data for the KABC-II
were taken from Kaufman and Kaufman (2004) except for the FCI and MPI which were taken from
Kaufman, Lichtenberger, Fletcher-Janzen, and Kaufman (2005). All critical value data for the WJ III
were taken from Volker (2006). All critical value data for the SB5 were taken from Roid (2003). All
critical value data for the DAS-II were taken from (Elliott, 2007).

Note: Critical values in regular-face type are based on the difference between the designated subtest
or composite scores at the .05 level of statistical significance. Critical values in bold-face type are
associated with a base rate criterion of approximately 10% (i.e., approximately 10% of the
standardization sample at the designated age(s) demonstrated a difference between the highest and
lowest subtest or composite scores of the magnitude reported here).
WISC-IV = Wechsler Intelligence Scale for Children-Fourth Edition; WAIS-III = Wechsler Adult
Intelligence Scale-Third Edition; WPPSI-III = Wechsler Preschool and Primary Scale of Intelligence-
Third Edition; KABC-II = Kaufman Assessment Battery for Children-Second Edition; WJ III =
Woodcock-Johnson III Tests of Cognitive Abilities; SB5 = Stanford-Binet Intelligence Scales-Fifth
Edition; DAS-II = Differential Ability Scales-Second Edition.

REFERENCES

Elliott, C. (2007). Differential Ability Scales-Second Edition: Normative data tables manual. San
Antonio, TX: PsychCorp.
Kaufman, A. S., & Kaufman, N. L. (2004). Manual for the Kaufman Assessment Battery for Children-
Second Edition. Circle Pines, MN: American Guidance Service.
Kaufman, A. S., Lichtenberger, E. O., Fletcher-Janzen, E., & Kaufman, N. L. (2005). Essentials of
KABC-II assessment. New York: Wiley.
Roid, G. (2003). Stanford-Binet Intelligence Scales-Fifth Edition: Technical manual. Itasca, IL:
Volker, M. A., & Smykowski, T. (2006). An alternative paired subtest comparison method for
interpreting WJ III Cognitive Extended Battery composites. Manuscript submitted for publication.
Wechsler, D. (1997). WAIS-III administration and scoring manual. San Antonio, TX: The
Wechsler, D. (2002). WPPSI-III administration and scoring manual. San Antonio, TX: The
Wechsler, D. (2003). WISC-IV administration and scoring manual. San Antonio, TX: The
Index

A posteriori assumptions, data interpretation and
specification of
A priori assumptions, data interpretation and
specification of
Ability, SLD evaluation and
Ability-achievement discrepancy: IQ and
SLD and
Absolute Pitch ( UP)
Academic function, case reports
Accessibility, misconceptions about XBA and
Acculturation, level of
Acquired knowledge, SLD evaluation and, stores of
Administration, test
incorrect test, C-LIM and
interpretation and
Alphabet Art : With A to Z Animal Art and Finger plays
American Heritage Dictionary
Aptitude, SLD evaluation and
Army Beta test
Assessment, interpretation and hypothesis-driven
Associational Fluency (FA), defined
Associative Memory (MA): defined
Long-Term Retrieval and
Attentive speediness
Attenuation C-LIM and
Auditory Processing (Ga): classification table
defined
Phonetic Coding: Analysis subtests and
Phonetic Coding: Synthesis subtests and
Resistance to Auditory Stimulus Distortion subtests and
Sound Localization subtests and
Speech/General Sound Discrimination subtests and
Auditory Stimulus Distortion (UR), Resistance to
Automation, as a strength of XBA
Average range, interpretation and

Bateria-III
Behavioral functioning, case reports
Bender Visual-Motor Gestalt test, diverse individuals and
Bias:
culture
language
narrow definitions of
Bilingual Verbal Ability Tests-Normative Update (BVAT)
Bilingualism
vs. monolingualism
Binet Scales, diversity and
Brigham, Carl
Buffers

Carroll, John
Case reports
academic achievement, evaluation of
assessment/evaluation procedures
description of
assessment findings
statement of validity
assessment for intervention report
behavior, social, and emotional functioning
behavioral observations
cognitive processes and intellectual functioning
developmental/health history
learning influences, evaluation of
opinions and impressions
reasons/purpose for assessment
recommendations
referrals
summary and data integration
Cattell-Horn-Carroll (CHC) theory of cognitive abilities
ability structure, hierarchical
broad abilities
classification
data interpretation
guidelines for
three scores
defined
representation of intelligence batteries
XBAs for
(see also specific ability)
case reports
Cattell-Horn vs. Carroll model
classification
author consensus and factor analyses
factor analyses in technical manuals
tables
defined
development of new intelligence batteries and
foundations of XBA approach and

historical perspective
misconceptions about XBA and
modernity of XBA approach
narrow abilities
classification
data interpretation, guidelines for
defined
(see also specific ability)
research knowledge base
SLD Assistant
as a taxonomy for understanding
validity of
Cell Averages, C-LIM
Data Manager
Childrens Memory Scale (CMS):
culture-language matrix
Short-Term Memory and
Visual Processing and
Choice Reaction Time (R2), defined
Classification system, contemporary descriptive
Classroom performance, SLD evaluation and
Clinical Evaluation of Language Fundamentals- Fourth Edition (CELF-4), Auditory Processing and
Clinical psychology, needs addressed by XBA
Closure Speed (CS): defined
Cloze Ability (CZ): defined
Reading Ability and
Cluster:
nonunitary
criteria used to determine
unitary
two-subtest
Cognitive Assessment System (CAS): abilities/processes and, representation prior to 2000
Auditory Processing and
CHC theory
influence of
and XBA CHC test classifications
Crystallized Intelligence and
Fluid Intelligence and

limited measurements of
Processing Speed and
Cognitive performance, case reports
Communication:
cultural bias and nonverbal
XBA approach and
Communication Ability (CM), defined
Comparability, assumption of
Comparison Speed (R7), Mental, defined
Complication, as a weakness of XBA
Comprehensive Mathematical Abilities Test (CMAT), Quantitative Knowledge and
Comprehensive Test of Nonverbal Intelligence (CTONI), Fluid Intelligence and
Comprehensive Test of Phonological Processing (CTOPP)
entering scores
Confirmatory bias
Consistency, SLD evaluation and
Construction: assessment battery
Construct-irrelevant variance. See Variance, construct-irrelevant
Construct-relevant variance. See Variance, construct-relevant
Construct representation. See Representation, construct
Core tests, new features of XBA approach
Cross-Battery Assessment (XBA)
application, guiding principles
CHC theory and
broad abilities
narrow abilities
Data Manager and Interpretive Assistant (XBA DMIA)
defined
diverse populations
framework for
needs addressed by
rationale
practice
research
test development
step-by-step implementation

Crystalized Intelligence (Gc)
case reports
classification table
General Information subtests and
Language Development subtests and
Lexical Knowledge subtests and
Listening Ability subtests and
Culturally and Linguistically Diverse (CLD), applications of XBA approach
Culture:
differences, influence of
information about
test selection and mainstream
See also Diverse populations
Culture-Language Interpretive Matrix (C-LIM):
applications of XBA approach
automation and
new features of XBA approach
purpose of
See also Diversity, C-LIM
Culture-Language Test Classifications (C-LTC):
applications of XBA approach and
referral information, utilization of
See Diversity, C-LTC

Data interpretation
guidelines
three score
two score
hypothesis-driven
integrating
Decision Speed/Reaction Time (Gt), defined
Declarative knowledge
Deduction
Deficit, normative
Developmental Disorder, Pervasive
Developmental history, case reports
Diagnostic Achievement Battery-Third Edition (DAB-3):
Quantitative Knowledge and
Reading Ability and
Writing Ability and
Diagnostic impressions, case reports
Diagnostic and Statistical Manual of Mental Disorders:
communication and
identifying SLD and
importance of nomenclature
Differential Ability Scales (DAS): abilities/processes and, representation prior to 2000
CHC theory and XBA CHC test classifications, influence of
Differential Ability Scales-Second Edition (DAS-II):
broad CHC abilities
representation of
XBAs for
CHC classification, factor analyses and author consensus
CHC theory and XBA CHC test classifications after 2000
influence of
critical values required for significance among subtests
data interpretation

guidelines
nonunitary or noninterpretable cluster
influence of XBA on revisions
narrow CHC abilities
math achievement and
reading achievement and
writing achievement and
XBA
DMIA and
weakness and subtest order
Differential Ability Scales-Third Edition ( DAS- III), Auditory Processing and
Disability determination, data interpretation and
Disorder, defined. See also specific type
Diversity:
C-LIM and
caveats of interpretation
examples of interpretation
C-LTC and
cognitive ability tests and
assessment approaches
culture bias vs. culture loading
language bias vs. linguistic demands
evaluation of culture-language differences
overview
SLD evaluation and
use of XBA and
Draw-A-Person test, diverse individuals and
Detroit Tests of Learning Aptitude-Fourth Edition (DTLA-4):

Education system, English literacy and
Emotional difficulty/disturbance:
case reports
C-LIM and
SLD evaluation and
Engagement, test selection and
English language learners (ELLs)
English as a Second Language (ESL) referral information and
English Usage Knowledge (EU): defined
Writing Ability and
Experiential background
C-LIM and
Expressional Fluency (FE), defined
Expressive Vocabulary Test (EVT), Crystallized Intelligence and

Fatigue, C-LIM and
Figural Flexibility (FX), defined
Figural Fluency (FF): defined
Fine-motor impairments, referral information
Flanagan, Dawn
Flexibility, as a strength of XBA
Flexibility of Closure (CF): defined
Fluid Crystallized Index (FCI), case reports
Fluid-Crystalized theory
influence of XBA on test revisions
Fluid Intelligence (Gf )
General Sequential Reasoning subtests and
Induction subtests and
Quantitative Reasoning subtests and
Fluid Reasoning (Gf ), case reports
Foreign Language Aptitude (LA), defined
Foreign Language Proficiency (KL), defined
Free-Recall Memory (M6):
defined

General Ability Index (GAI), data interpretation
General Information (K0):
defined
General Science Information (K1), defined
General Sequential Reasoning (RG): defined
General Sound Discrimination (U3), defined
Generalizability, as a weakness of XBA
Geography Achievement (A5), defined
Gf-Gc theory, development of new intelligence batteries and
Grammatical Sensitivity (MY), defined
Gray Diagnostic Reading Tests-Second Edition (GDRT-2):
Reading Ability and
Gray Oral Reading Tests-Fourth Edition (GORT- 4), Reading Ability and
Gray Silent Reading Tests (GSRT), Reading Ability and
Guckenberger v. Boston University
Guidelines for Providers of Psychological Services to Ethics, Linguistics, and Culturally Diverse
Populations

Hammill Multiability Achievement Test (HAMAT):
Reading Ability and
Writing Ability and
Health history case reports
Hearing and Speech Threshold factors (UA, UT, UU), defined
Horn, John
Human Cognitive Abilities, Table of

Ideational Fluency (FI):
defined
Imagery (IM), defined
Individuals with Disabilities Education Improvement Act (IDEA 2004):
discrepancy and
diversity and
RTI and
SLD evaluation and
acquired knowledge and
normative deficit
prereferral issues test selection and
Induction (I):
defined
Information about Culture (K2), defined
Instruction, insufficient
Integration, case reports
Intelligence:
essence of
testing
Intelligence quotient (IQ), as a predictor for achievement
Interference, with functioning and SLD evaluation
International Classification of Diseases
Interpretive statements, new features of XBA approach
Intervention:
SLD evaluation and individual response to
See also Response to Intervention (RTI)
Intra-individual discrepancy analysis
Ipsative discrepancy analysis
criticism of
Illinois Test of Psycholinguistic Abilities-Third Edition (ITPA-3):
Reading Ability and
Writing Ability and

Jordan, Michael

Kaufman Adolescent and Adult Intelligence Test (KAIT):
abilities/processes and, representation prior to 2020
Kaufman Assessment Battery for Children (K-ABC): abilities/processes and, representation prior to
2000
Kaufman Assessment Battery for Children- Second Edition (KABC-II):
broad CHC abilities
representation of
XBAs for
case reports
CHC theory
abilities classification
XBA CHC test classifications
after 2000
influence of
C-LIM and
construct representation
data interpretation
three score
two-subtest unitary cluster and
entering scores
test selection and
XBA DMIA and
Kaufman Brief Intelligence Test-Second Edition (KBIT-2):
Kaufman Test of Educational Achievement-Second Edition (KTEA-II):
case reports
Reading Ability and
Writing Ability and
KeyMath-Revised/Normative Update (KM-R/ NU)

Language:
acquisition
disability vs. limited proficiency
elimination of requirements for spoken-
expression, test selection and
linguistic demands
classification and
culture-language matrix and
requirements, test selection and
See also Diversity
Language Development (LD):
defined
Language dominance, bilingualism and
Learning Abilities (L1), defined
Learning difficulties, intelligence batteries and
Learning disability (LD), CHC theory
Learning efficiency, SLD evaluation, Level II-A
Legal issues, organization of XBA and
Leiter-R:
Visual Processing and Length Estimation (LE), defined
Lexical Knowledge (VL):
defined
Linguistics:
demand, test specific culture-language matrices of
See also Diversity; Language
Listening Ability (LS):
defined
Listening Comprehension, case reports
Loading: culture
classification and

culture-language matrix and
degree of
test specific culture-language matrices
Long-Term Retrieval (Glr):
Associative Memory subtests and
case reports
Figural Fluency subtests and
Free Recall Memory subtests and
Ideational Fluency subtests and
Meaningful Memory subtests and
Naming Facility subtests and
Long-Term Storage and Retrieval (Glr), defined

Maintaining and Judging Rhythm (U8), defined
Mathematical Achievement (A3):
defined
subtests measuring narrow CHC abilities/processes related to
Mathematical Knowledge (KM):
defined
McGrew, Kevin
Meaningful Memory (MM):
defined
Meaningfulness, SLD and clinical
Memory, case reports. See also Short-Term Memory
Memory for Sound Patterns (UM), defined
Memory Span (MS):
defined
Mental Comparison Speed (R7), defined
Mental Processing Index (MPI), case reports
Mental Retardation:
C-LIM caveats when interpreting
SLD evaluation and
exclusionary factors
Mild Mental Retardation
Misconceptions, XBA
Motivation, lack of:
C-LIM and
SLD evaluation and
Musical Discrimination and Judgment (U1, U9), defined

Naming Facility (NA):
defined
Nelson Denny Reading Test (NDRT), SLD evaluation and
Neuropsychological (NEPSY):
Neuropsychology, needs addressed by XBA
Nondiscriminatory assessment, a model for comprehensive . See also Diversity
Nonunitary cluster. See Cluster, nonunitary
Nonverbal tests, cultural and linguistic diversity
Normal limits, interpretation and
Normative classification, SLD
Norm samples, as a weakness of XBA
Null hypothesis:
assessment and interpretation
data interpretation and
guidelines
Number Facility (N):
defined

Objectivity, intelligence testing and
Observations, behavioral, case reports
Oral Production and Fluency (OP), defined
Organization:
data management and storing
implementation
administering
CHC broad abilities and
scores
test selection
integration, guiding principles of
overview
referral information and
Originality/Creativity (FO), defined
Outcome criteria, cognitive abilities and
Oral and Written Language Scales (OWLS), Writing Ability and

Pantomime
Peabody Individual Achievement Test-Revised /Normative Update (PIAT-R/NU):
Reading Ability and
Writing Ability and
Peabody Picture Vocabulary Test-Third Edition (PPVT-III), Crystallized Intelligence and
Percentile rank, and standard score conversion table
Perceptual Alternations (PN)
Perceptual Illusions (IL)
Perceptual impairments. See Sensory information
Perceptual Speed (P):
defined
Performance anxiety, SLD evaluation and
Periodic Table:
communication and
Person-relative comparison. See Intra-individual discrepancy analysis
Phonetic Coding (PC):
Analysis, defined
Synthesis, defined
Phonetic Coding: Analysis (PC:A), Auditory Processing and
Phonetic Coding: Synthesis (PC:S), Auditory Processing and
Phonics-Based Reading Test (PRT), Reading Ability and
T he Phonological Awareness Test (TPAT), Auditory Processing and
Piagetian Reasoning (RP), defined
Pictorial Test of Intelligence- Second Edition (PTI-2):
Press, Judy
Procedural knowledge
Processing Speed (Gs):
defined
case reports
Number Facility subtests and
Perceptual Speed subtests and

Rate-of-Test-Taking subtests and
Semantic, defined
Speed of Reasoning and
Psychiatric disorders, SLD evaluation and
Psychological report:
data interpretation
nonunitary cluster
unitary cluster
incorporate XBA results
Psychometric theory, SLD evaluation and current

Quantitative Knowledge (Gq):
defined
Cattell-Horn model and
Mathematical Achievement subtests and
Mathematical Knowledge subtests and
Quantitative Reasoning (RQ):
defined
Cattell-Horn model and
Rapid Automatized Naming (RAN)
Rate-of-Test-Taking (R9):
defined
Reaction Time:
Choice
Simple
Reading, case reports
Reading Ability:
Cloze Ability subtests and
Reading Comprehension subtests and
Reading Decoding subtests and
Reading Speed subtests and
Verbal Language Comprehension subtests and
Reading Achievement:
referral information and, utilization of
Reading Comprehension (RC):
defined
Reading Ability and
Reading Decoding (RD):
defined
Reading Ability and
Reading difficulty, data interpretation and
Reading Speed (RS):
defined
Reading Ability and

Reading/Writing Ability (Grw), defined
Recommendations, case reports
Referral information, utilization of
Reliability, SLD evaluation and
Representation, construct
Resistance to Auditory Stimulus Distortion (UR), defined
Response to Intervention (RTI): SLD and
evaluation and exclusionary factors
prereferral issues
See also Intervention
Retrieval. See Long-Term Retrieval
Reynolds Intellectual Assessment Scales (RIAS): CHC theory
influence of
after 2000
influence of
Rhythm (U8), Maintaining and Judging, defined
SCAN-C Test for Auditory Processing Disorders in Children-Revised (SCAN-C-R), Auditory
Processing and
School psychology, needs addressed by XBA
School records, case reports
Science Information, General
Scoring:
conversion and percentile rank table
incorrect, C-LIM and
test
entering
interpretation and
Screeners, SLD evaluation and over-reliance on
Self-report questionnaires
Semantic Processing Speed (R4), defined
Sensitivity to Problems (SP), defined
Sensory impairments:
SLD evaluation and
Serial Perceptual Integration (PI), defined
Short-Term Memory (Gsm):
case reports
defined
Memory Span subtests and
Working Memory subtests and
Simple Reaction Time (R1), defined
Sine qua non criterion
Slosson Diagnostic Math Screener (S-DMS), Quantitative Knowledge and
Slosson Oral Reading Test-Revised3 (SORT-R3), Reading Ability and
Social functioning, case reports
Socioeconomic status (SES)-LIM and
Sound-Frequency Discrimination (U5), defined
Sound-Intensity/Duration Discrimination (U6), defined
Sound Localization (UL):
defined
Spatial Relations (SR)
Spatial Scanning (SS):
defined
Special education eligibility
Specific Learning Disability (SLD) Assistant
automation and
Specific learning disability (SLD) evaluation
aptitude vs. ability
current theory and research
definition of
academic skills and acquired knowledge
operational
prereferral issues
exclusionary factors
interference, functioning
ipsative or intra-individual discrepancies
IQ issues
measurements of abilities/ processes and aptitudes
relative vs. normative weakness
severe discrepancy calculation
single subtests and screening instruments
statistics
underachievement
Speech/General Sound Discrimination, Auditory Processing and
Speech impaired, case reports
Speech Sound Discrimination (US), defined
Speed of Reasoning (RE):
defined
Spelling Ability (SG):
defined
Writing Ability and
Standardized Reading Inventory-Second Edition (SRI-2), Reading Ability and
Standards for Educational and Psychological Testing
Stanford-Binet Intelligence Scales-Fourth Edition (SB:FE):
Stanford-Binet Intelligence Scales-Fifth Edition (SB5): broad CHC abilities
representation of
XBAs for
CHC theory and XBA CHC test classifications
abilities classification
after 2000
influence of
data interpretation
r eferral information, utilization of
XBA DMSP and
XBA weakness and subtest order
Statistical significance, SLD and
A Study of American Intelligence
Subtest order, as a weakness of XBA

Table of Human Cognitive Abilities
Temporal Tracking (UK), defined
Test of Auditory-Perceptual Skills-Third Edition (TAPS-3), Auditory Processing and
Test of Childrens Language (TOCL):
Reading Ability and
Writing Ability and
Test of Early Reading Ability-Third Edition (TERA-3):
Reading Ability and
Writing Ability and
Test of Early Written Language- Second Edition (TEWL-2), Writing Ability and
Testing history, case reports
Test of Language Development-Primary: Third Edition (TOLD-P:3), Auditory Processing and
Test of Nonverbal Intelligence-Third Edition (TONI-3), Fluid Intelligence and
Test of Phonological Awareness Skills (TOPAS)
Test of Phonological Awareness- Second Edition (TOPA-2), Auditory Processing and
Test of Silent Contextual Reading Fluency (TOSCRF), Reading Ability and
Test of Silent Word Reading Fluency (TOSWRF)
Test of Word Reading Efficiency (TOWRE), Reading Ability and
Test of Written Language-Third Edition (TOWL- 3), Writing Ability and
Test of Written Spelling-Fourth Edition (TWS-4), Writing Ability and
Time consuming, as a weakness of XBA
Training programs, diversity and
Truncated hierarchal model
T scores, DAS-II and

Underachievement:
ability-achievement discrepancy and
SLD evaluation and
Unitary cluster. See Cluster, unitary
Universal Nonverbal Intelligence Test (UNIT):
diverse individuals and

Validity:
CHC theory and
C-LIM and
statement of, case reports
Variance:
construct-irrelevant
construct-relevance
Verbal Language Comprehension (V):
defined
Reading Ability and
Visualization (Vz): defined
Visual Memory (MV):
defined
Visual Processing (Gv):
case reports
Closure Speed subtests and
defined
Flexibility of Closure and
Spatial Relations subtests and
Spatial Scanning and
Visualization subtests and
Visual Memory subtests and
Vocabulary development, case reports

Wait time
Wechsler, David
Wechsler Adult Intelligence Scale-Revised (WAIS-R):
Wechsler Adult Intelligence Scale-Third Edition (WAIS-III):
broad CHC abilities
representation of
XBAs for
CHC theory
classification using factor analyses in technical manuals
influence of
after 2000
influence of
construct representation
construct-irrelevant variance
data interpretation
levels of
ipsative, person-relative discrepancy analysis and
narrow CHC abilities, math achievement and
XBA DMIA and
Wechsler-Bellevue, influence of XBA on revisions
Wechsler Individual Achievement Test-Second Edition (WIAT-II):
Reading Ability and
Writing Ability and
Wechsler Intelligence Scale for Children-Revised (WISC-R), diverse individuals and
Wechsler Intelligence Scale for Children-Third Edition (WISC-III):
broad abilities/processes and, representation prior to 2000

CHC theory and XBA CHC test classifications
Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV):
broad CHC abilities
representation of
XBAs for
case reports
cognitive performance
CHC theory
influence of
XBA CHC test classifications after 2000
influence of
C-LIM and
data interpretation
hypothesis evaluation
three score
ipsative, person-relative discrepancy analysis and
Spanish
XBA DMSP and
Wechsler Memory Scale-Third Edition (WMS-III):
Wechsler Nonverbal Scale of Ability (WNV):

Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R):
Wechsler Preschool and Primary Scale of Intelligence-Third Edition (WPPSI-III):
broad CHC abilities
representation of
XBAs for
CHC theory
influence of
after 2000
influence of
data interpretation, nonunitary or noninterpretable cluster
XBA DMIA and
Wechsler Scales (WECH):
CHC abilities classification

Whammy effect, double
Wide Range Achievement Test-Expanded (WRAT Exp)
Wide Range Achievement Test-Fourth Edition (WRAT4):
Reading Ability and
Writing Ability and
Wide Range Assessment of Memory and Learning-Second Edition (WRAML-2):
Wide Range Intelligence Test (WRIT): CHC theory
after 2000
influence of
Crystallized Intelligence and culture-language matrix
Woodcock, Richard
Woodcock-Johnson Psychoeducational BatteryRevised (WJ-R):
SLD evaluation and
Woodcock-Johnson Psychoeducational Battery- Third Edition (WJ III):
broad CHC abilities
representation of
XBAs for
case reports
CHC theory
classification
after 2000
influence of
construct underrepresentation
data interpretation
hypothesis evaluation

levels of
three score
entering scores
Reading Ability and
test selection and
Writing Ability and
XBA DMIA and
XBA weakness and subtest order
Woodcock-Johnson Psychoeducational Battery- Third Edition, Diagnostic Reading Battery (WJ III
DRB):
Reading Ability and
Writing Ability and
Woodcock-Johnson Psychoeducational Battery- Third Edition, Diagnostic Supplement (WJ III DS):
CHC abilities classification
Woodcock Reading Mastery Tests-Revised/ Normative Update (WR MT-R/NU), Reading Ability and
Word Fluency (FW), defined
Word Identification and Spelling Test (WIST):
Reading Ability and
Writing Ability and
Work samples, case reports
Working Memory (MW):
defined
Writing, case reports
Writing Ability (WA):
defined
English Usage Knowledge subtests and
Spelling Ability subtests and
Writing ability subtests and
Writing achievement:

Young Childrens Achievement Test (YCAT):
Reading Ability and
Writing Ability and

Acknowledgments

Few works such as this ever get completed without the help of a number of key people. Our book is
no different and we owe a tremendous debt of gratitude to some incredibly talented friends and
colleagues without whose own sweat and tears this volume would never have seen the light of day. Of
course one of the most notable features of the book is the CD-ROM on which rest some wonderful
programs designed to make the life of the applied psychologist significantly easier. The main
program on the disc is the Cross-Battery Assessment Data Management and Interpretive Assistant
(XBA DMIA), which was designed and programmed by our esteemed colleague, Dr. Elizabeth O.
Lichtenberger. Liz possesses phenomenal knowledge of spreadsheet programming and the XBA
DMIA is a testament to her skill. Despite being put through the ringer many times over, Liz always
came through and provided a program in the end that is not only useful and professional, but that
greatly exceeded our hopes and expectations. Liz, we simply cannot thank you enough for all your
work and dedication to this project and for making the program a true work of art.
We owe a great deal of thanks and appreciation to our long-time supporter, Robert Misak. Robert is
responsible for a good many things related to XBA, notable among them is the new SLD Assistant,
which is also found on the accompanying CD-ROM. The inspiration to use g-weights in addressing a
difficult question in SLD evaluation was sheer genius on Roberts part and, with some tweaking on
our part, he authored a beautiful program that is of enormous value to those seeking to use XBA
methods in the evaluation of SLD. For all of this and so much more before that, Robert, we thank you.
There are others who deserve mention in one respect or another, particularly the staff at Wiley and
our editor, Lisa Gebo. We greatly appreciate all that you have done, Lisa, to guide us and this book
through some rough waters, especially during times when we know your heart and mind were on
more important matters. Thank you for your help, without which this grand book would have
suffered in scope and quality.
We must also thank our series editors, Drs. Alan and Nadeen Kaufman. It is extremely difficult to
put into words the depth of our appreciation for the level and intensity of your support for this
second, ground-breaking version in the Essentials seriesthe first volume to include a CD-ROM.
Our thanks, however, extend far beyond this book because we recognize that both of you have been
staunch supporters of our work for a long time. Your openness to and acceptance of our work has had
a profound influence on our own professional success and the manner in which some of our ideas
(even the more radical ones) have been received by the field. What more could we have asked for
than this? To say that we are appreciative simply belies the degree to which we feel so very fortunate
to have your professional collegiality and, more importantly, your friendship.
A very special thank you goes out to our incredible colleague and personal friend, Dr. Jennifer T.
Mascolo. If there is a better clinician and practicing psychologist out there, we have never met her.
Her acumen and skill in assessment is particularly evident in the case study she contributed to the
book and exemplifies the kind of evaluations that we all aspire to conduct. For your help, your
commitment to perfection in every detail, and your continuing friendship we thank you deeply.
And last, we want to offer a very, very special thank you to Agnieszka Dynda, whom we have come
to call with extreme affection, Aggy. The writing of a book such as this invariably involves a
million and one tasks, the majority of which are often tedious, time consuming, and relatively
thankless. Aggy provided the kind of organization, attention to detail, and responsibility that far
exceeded anything we could have hoped for throughout the preparation of this book. Her extensive
and selfless investment in terms of time, effort, and assistance beyond those required of her own full-
time job and studies exemplifies the quality and nature of her impeccable character. Her name may
not appear on the cover of this book, but her fingerprints are definitely all over it. And so for your
invaluable assistance in every aspect of this books preparation, for your company on the
innumerable working weekend marathons, and for your steadfast and entirely volunteer efforts we
extend to you our most heartfelt thanks. Not only are you every bit as responsible as we are for the
completion of this book, but you made the race to the finish line much more enjoyable. You are the
BEST!

D.P.F.
S.O.O.
V.C.A.
About the Authors

Dr. Dawn P. Flanagan is Professor of Psychology at St. Johns University in Queens, NY. She is also
Clinical Assistant Professor at Yale Child Study Center, School of Medicine. In addition to her
teaching responsibilities in the areas of intellectual assessment, psychoeducational assessment,
learning disability, and professional issues in school psychology, she serves as an expert witness,
learning disability consultant, and psychoeducational test/ measurement consultant and trainer for
organizations both nationally and internationally. She is a widely published author of books and
articles. Her most recent books include the Essentials of WISC-IV Assessment and the second editions
of Contemporary Intellectual Assessment: Theories, Tests, and Issues and The Achievement Test Desk
Reference: A Guide to Learning Disability Identification. Dr. Flanagan is Fellow of APA and
Diplomate of the American Board of Psychological Specialties.

Dr. Samuel O. Ortiz is Associate Professor of Psychology at St. Johns University in Queens, NY. He
holds a Ph.D. in clinical psychology from the University of Southern California and a credential in
school psychology with postdoctoral training in bilingual school psychology from San Diego State
University. He has served previously as Visiting Professor and Research Fellow at Nagoya
University, Japan and as Vice President for Professional Affairs of APA Division 16 (School
Psychology). He currently serves on APAs Committee on Psychological Tests and Assessment and
the Coalition for Psychology in Schools and Education. Dr. Ortiz conducts research and publishes in
a variety of areas including nondiscriminatory assessment of diverse individuals, cross-battery
assessment, and learning disabilities. His recent books include Assessment of Culturally and
Linguistically Diverse Students: A practical guide and The Achievement Test Desk Reference, second
edition: A guide to learning disability identification. Dr. Ortiz is bilingual (Spanish) and bicultural
(Puerto Rican).

Dr. Vincent C. Alfonso is Professor and Associate Dean for Academic Affairs in the Graduate
School of Education at Fordham University in New York City. His research interests include
psychoeducational assessment, early childhood assessment, training issues, and psychometrics. Dr.
Alfonso has published his work in a variety of journals including School Psychology Review, The
Journal of Psychoeducational Assessment, and Psychology in the Schools. Most recently he co-
authored The Achievement Test Desk Reference: A Guide to Learning Disability Identification with
Dawn P. Flanagan, Samuel O. Ortiz, and Jennifer T. Mascolo published by Wiley. Dr. Alfonso is a
certified school psychologist and licensed psychologist in New York State. He is considered an expert
in early childhood and learning disability assessment.
About the CD-ROM

INTRODUCTION

This appendix provides you with information on the contents of the CD that accompanies this book.
For the latest information, please refer to the ReadMe file located at the root of the CD.
SYSTEM REQUIREMENTS

Make sure that your computer meets the minimum system requirements listed in this section. If your
computer doesnt match up to most of these requirements, you may have a problem using the contents
of the CD.

For Microsoft Windows OS (including 98/Me/2000/XP/Vista):
A computer with a processor running at 120 Mhz or faster

At least 32 MB of total RAM installed on your computer; for best performance, we recommend
at least 64 MB
A CD-ROM drive

For Macintosh:
Mac OS computer with a 68040 or faster processor running OS X 10.3 or later

384 MB of memory
A CD-ROM drive

NOTE: Many popular spreadsheet programs are capable of reading Microsoft Excel files. For
Macintosh users, NeoOffice is provided on this CD. However, users should be aware that some of the
formatting and functionality might be lost when using a program other than Microsoft Excel.

ALSO NOTE: The save command will not work on this CD. To save data, or individual subject
information, you must use the save as command, supply a new file name, and save the file to your
hard drive. The read-only status of this CD ensures that the file cannot be saved except when re-named
and saved to your hard drive. This will also ensure that the original file remains intact and ready for
data entry each and every time it is used.

Regarding XBA DMIA specifically: When the program is opened initially, please click on the button
to enable macros as not doing so will reduce the functionality of the program. You will only see this
option if you have the security setting in Excel at medium or a higher level. If your setting is low
you will not see this screen upon opening.
USING THE CD

To view the interface on the CD, follow these steps:
1. Insert the CD into your computers CD-ROM drive.

Note to Windows users: The interface wont launch if you have autorun disabled. In that case,
click Start@@ -->Run. n the dialog box that appears, type D:\start.exe. (Replace D with the
proper letter if your CD drive uses a different letter. If you dont know the letter, see how your
CD drive is listed under My Computer.) Click OK.
Note to Mac users: The CD icon will appear on your desktop, double-click the icon to open
the CD and double-click the Start icon.
2. The CD-ROM interface will appear. The interface provides a simple point-and-click way to
use and explore the programs on the CD.

WHATS ON THE CD

The following sections provide a summary of the software and other materials youll find on the CD.
Programs

This CD-ROM consists of spreadsheet files written and programmed in Microsoft Excel that allow
readers to enter specific test data and information and have it analyzed. Pre-programmed formulae on
the spreadsheets take the data entered and apply cross-battery assessment (XBA) principles to conduct
the analyses. The program then displays the results within the context of CHC theory. The
spreadsheets also allow for the analysis of data to answer questions pertaining to SLD evaluation as
well as the test performance of individuals from culturally and linguistically diverse backgrounds.
These tools represent a convenient way to use the principles and methods described in the book and
are a tremendous time-saver as well.

Name of CD-ROMs 3 main components:
a) Cross-Battery Assessment Data Management and Interpretive Assistant (XBA DMIA) v1.0,
programmed by Elizabeth O. Lichtenberger, as conceived by Flanagan, Ortiz, and Alfonso.
This component relates to Chapters 2 & 3 of the book.
b) Specific Learning Disability (SLD) Assistant v1.0, programmed by Robert Misak in
collaboration with Flanagan, Ortiz, and Alfonso. This component relates to Chapter 4 of the
book.
c) Culture-Language Interpretive Matrix (C-LIM) v1.0, programmed by Agnieszka M. Dynda, as
conceived by Flanagan, Ortiz, and Alfonso. This component relates to Chapter 5 of the book.

Excel Worksheets

Examples of the worksheets on this CD are presented as figures in the book. These automated
worksheets are provided on the CD for your convenience in using, applying and understanding
the principles described in the book. They are meant to facilitate and expedite analysis and
interpretation of obtained data.

These programs do not convert raw scores to any metric. Users of these programs are responsible
for following the respective test publishers administration and scoring guidelines. That is, all
scores entered into these programs must be derived from the norms and procedures provided by
the respective test publishers.
TOOLS

The following applications are on the CD:

Excel Viewer
Excel Viewer is a freeware viewer that allows you to view, but not edit, most Microsoft Excel
spreadsheets. Certain features of Microsoft Excel documents may not work as expected from within
Excel Viewer.

NeoOffice
NeoOffice is a free Mac OS X office productivity suite. It is similar to Microsoft Office or Lotus
SmartSuite, but NeoOffice is absolutely free. It includes word processing, spreadsheet, presentation,
and drawing applications that enable you to create professional documents, newsletters, reports, and
presentations. It supports most file formats of other office software. You should be able to edit and
view any files created with other office solutions.

Shareware programs are fully functional, trial versions of copyrighted programs. If you like
particular programs, register with their authors for a nominal fee and receive licenses, enhanced
versions, and technical support.
Freeware programs are copyrighted games, applications, and utilities that are free for personal use.
Unlike shareware, these programs do not require a fee or provide technical support.
GNU software is governed by its own license, which is included inside the folder of the GNU
product. See the GNU license for more details.
Trial, demo, or evaluation versions are usually limited either by time or functionality (such as being
unable to save projects). Some trial versions are very sensitive to system date changes. If you alter
your computer s date, the programs will time out and no longer be functional.
TROUBLESHOOTING

If you have difficulty using any of the materials on the companion CD, try the following solutions:
Disable any antivirus software that you may have running. Programs sometimes mimic
virus activity and can make your computer incorrectly believe that it is being infected by a
virus. (Be sure to enable antivirus software later.)
Close all running programs. The more programs youre running, the less memory is
available to other programs.
Reference the ReadMe: Refer to the ReadMe file located at the root of the CD-ROM for the
latest product information (if any) at the time of publication.

Customer Care

If you have trouble with the CD-ROM, please call the Wiley Product Technical Support phone number
at (800) 762-2974. Outside the United States, call 1(317) 572-3994. You can also contact Wiley
Product Technical Support at http://support.wiley.com. John Wiley & Sons will provide technical
support only for installation and other general quality control items. For technical support on the
applications themselves, consult the programs vendor or author.

To place additional orders or to request information about other Wiley products, please call (877)
762-2974.
CUSTOMER NOTE: IF THIS BOOK IS ACCOMPANIED BY
SOFTWARE, PLEASE READ THE FOLLOWING BEFORE
OPENING THE PACKAGE.

This software contains files designed to help you utilize the principles and methods described in
the accompanying book. By opening this package, you are agreeing to be bound by the
following agreement:

This software product is protected by copyright and all rights are reserved by the author, John
Wiley & Sons, Inc., or their licensors. You are licensed to use this software on a single computer.
Copying the software to another medium or format for use on a single computer does not
violate the U.S. Copyright Law. Copying the software for any other purpose is a violation of the
U.S. Copyright Law.

This software product is sold as is without warranty of any kind, either express or implied,
including but not limited to the implied warranty of merchantability and fitness for a particular
purpose. Neither Wiley nor its dealers or distributors assumes any liability for any alleged or
actual damages arising from the use of or the inability to use this software. (Some states do not
allow the exclusion of implied warranties, so the exclusion may not apply to you.)

1
Das and Naglieri developed the CAS from PASS theory; therefore, their test is based on an
information-processing theory, rather than any specific theory within the psychometric tradition.
2
Classifications of cognitive ability tests as strong, moderate, or mixed measures of CHC abilities
were based on the following criteria: A classification of strong was given to a test that had a
substantial factor loading (> .50) on a primary factor and a secondary factor loading (if present) that
was equal to or less than of its loading on the primary factor. A classification of moderate was
given to a test that had a primary factor loading of < .50 and a secondary factor loading (if present)
that was less than of the primary loading, or any primary factor loading with a secondary loading
between and of the primary loading. A classification of mixed was given to a test that had a factor
loading on a secondary factor that was greater than of its loading on the primary factor. These
criteria were derived from Woodcock (1990).
3
A test such as the Comprehensive Test of Phonological Processing (Wagner, Torgesen, & Rashotte,
1999) could have been used to supplement the WPPSI-III in the area of Ga. Noteworthy is the fact that
speech/language pathologists, reading specialists, and learning disability diagnosticians often give
tests that contain good measures of Ga. Therefore, professionals should work together to ensure that
the most appropriate batteries are being utilized and to avoid unnecessary redundancy in
measurement of certain abilities/processes.
4
In cases in which anomalous results are obtained, it is important that the examiner provide actual
reasons for such results. Examples of factors that may lead to anomalous results include lack of
motivation, fatigue, interruption of the task, violation of standardization, and so forth. In general,
whenever two or more measures do not converge as expected (e.g., two measures of the same narrow
ability/process), an explanation is warranted.
5
In cases in which anomalous results are obtained, it is important that the examiner provide actual
reasons for such results. Examples of factors that may lead to anomalous results include lack of
motivation, fatigue, interruption of the task, violation of standardization, and so forth. In general,
whenever two or more measures do not converge as expected (e.g., two measures of the same narrow
ability/process), an explanation is warranted.
6
If these sources (Verbal Knowledge = 13 ; Riddles = 7) were entered into the CHC tab, the XBA DMIA
would calculate and report a cluster. However, given that the lower score in this composition may
suggest that the individual has difficulty reasoning with verbal information, it may be necessary to
conduct additional assessment to test this hypothesis, depending on the referral concerns.
7
Note: Strictly speaking, a score must be outside and below the average range or normal limits of
functioning to be indicative of a deficit. However, there may be cases where standard scores at or near
the cutoff of 85 might also suggest deficiency if there are other data to support the presence of such
impairment.
8
Although the integrated Cattell-Horn and Carroll model presented in Flanagan et al. (2000) was used
by the WJ III authors in the development of their battery, it is important to recognize that Woodcocks
1989 cognitive battery, the WJ-R (Woodcock & Johnson, 1989), was the first operationalization of
modern Gf-Gc theory. Also noteworthy here is the fact that Alan and Nadeen Kaufman were the first
to develop a cognitive battery based on theory (i.e., K-ABC, Kaufman & Kaufman, 1983).
9
Diagnoses and codes are based on criteria from the Diagnostic and Statistical Manual of Mental
Disorders, Fourth Edition (DSM-IV ).
10
Classifications based on factor analyses were derived from the individual tests technical manuals.
11
Classifications were based on factor analyses from the WISC-IV technical and interpretive manual
(The Psychological Corporation, 2003) and from the results of factor analyses reported in Keith,
Fine, Taub, Reynolds, and Kranzler (2006).
12
The copying task relies upon normal visual-motor integration and fine motor control, therefore it is
not a purely cognitive task.

Cross Battery Assessment

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Cross Battery Assessment

Caricato da

Copyright:

Formati disponibili

Table

Series Editors, Alan S. Kaufman and Nadeen L. Kaufman

Rapid Reference 1.1

Rapid Reference 1.2

Rapid Reference 1.3

Rapid Reference 1.4

Rapid Reference 1.5

Three Pillars of the XBA Approach

Rapid Reference 1.6

Incorporates and integrates components of prevailing interpretive systems of the

Rapid Reference 1.7

1. Select primary intelligence battery for assessment.

(a) general (stratum III) ability.

(a) Gc and Gv.

(a) CHC narrow (stratum I) classifications of cognitive and achievement tests.

(a) WISC-IV VCI.

(a) Listening Ability (LS).

HOW TO ORGANIZE A CROSS-BATTERY ASSESSMENT

Rapid Reference 2.1

Rapid Reference 2.2

Rapid Reference 2.3

Rapid Reference 2.4

a. the extent to which the battery is engaging to young children;

1. The Cross-Battery Assessment Data Management and Interpretive Assistant (XBA

(a) the relationship between an individuals manifest performance (e.g., academic

(a) a working knowledge of the XBA DMIA.

(a) inductive reasoning.

(a) phonetic coding.

(a) the WJ III.

HOW TO INTERPRET TEST DATA

Rapid Reference 3.1

Stages within the Framework for XBA and Interpretation

Stage A : CHC Theory and Research Knowledge Base

Rapid Reference 3.2

Rapid Reference 3.3

Rapid Reference 3.4

Rapid Reference 3.5

a. are hypotheses an examiner forms based on case history data.

a. one subtest that measures the broad ability/process.

a. all scores fall within the same normative range.

a. >1.5 SDs from the mean.

a. Contemporary CHC theory.

USE OF THE CROSS-BATTERY APPROACH IN SPECIFIC LEARNING

Rapid Reference 4.1

a. At Level I-A, a normative deficit in academic functioning is required.

(a) theory and research are used to operationalize SLD.

(a) has gained considerable research evidence to support its use.

(a) a normative weakness.

(a) assessment should always start with academic testing.

(a) fatigue or lack of motivation.

(a) the examinee from becoming too nervous.

USE OF THE CROSS-BATTERY APPROACH IN THE ASSESSMENT OF

Rapid Reference 5.1

a. the individuals level of acculturation, and

Rapid Reference 5.1

Rapid Reference 5.3

(a) reducing bias related to test selection.

(a) eliminates the problem with bias in language.

(a) they predict equally well for different ethnic groups.

STRENGTHS AND WEAKNESSES OF THE CROSS-BATTERY APPROACH

1. New developments in science are readily appreciated and quickly embraced by

(a) lack of theoretical foundation characteristic of nearly all intelligence tests.

(a) a sudden collective psychometric epiphany by test publishers.

(a) WJ-R and K-ABC.