Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
net/publication/331765650
CITATIONS READS
19 2,554
4 authors, including:
Trevor Williams
University at Buffalo, The State University of New York
7 PUBLICATIONS 156 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Leonard J Simms on 17 March 2019.
CITATION
Simms, L. J., Zelazny, K., Williams, T. F., & Bernstein, L. (2019, March 14). Does the Number of
Response Options Matter? Psychometric Perspectives Using Personality Questionnaire Data.
Psychological Assessment. Advance online publication. http://dx.doi.org/10.1037/pas0000648
Psychological Assessment
© 2019 American Psychological Association 2019, Vol. 1, No. 999, 000
1040-3590/19/$12.00 http://dx.doi.org/10.1037/pas0000648
Psychological tests typically include a response scale whose purpose it is to organize and constrain the
options available to respondents and facilitate scoring. One such response scale is the Likert scale, which
initially was introduced to have a specific 5-point form. In practice, such scales have varied considerably
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
in the nature and number of response options. However, relatively little consensus exists regarding
This document is copyrighted by the American Psychological Association or one of its allied publishers.
several questions that have emerged regarding the use of Likert-type items. First, is there a “psycho-
metrically optimal” number of response options? Second, is it better to include an even or odd number
of response options? Finally, do visual analog items offer any advantages over Likert-type items? We
studied these questions in a sample of 1,358 undergraduates who were randomly assigned to groups to
complete a common personality measure using response scales ranging from 2 to 11 options, and a visual
analog condition. Results revealed attenuated psychometric precision for response scales with 2 to 5
response options; interestingly, however, the criterion validity results did not follow this pattern. Also,
no psychometric advantages were revealed for any response scales beyond 6 options, including visual
analogs. These results have important implications for psychological scale development.
Measurement tools in psychology and the social sciences more dissertation, to measure a range of psychological attitudes. The
generally—including a wide variety of self-report, interview-based, original “Likert” scale (pronounced “lick urt,”/‘lIk.ɘrt/) included
and observational methods—typically include a response scale five symmetrical and balanced options reflecting degree of agree-
whose purpose it is to organize and constrain the options available ment: strongly agree, agree, undecided/neither, disagree, or strongly
to respondents and facilitate scoring. Today, response scales vary disagree. However, despite the ubiquitous place that response
considerably in terms of the nature and number of response op- scales occupy in psychological measurement, the Likert scale has
tions. Arguably the most common response scale used is that been extended and elaborated in numerous ways in the years that
introduced by Rensis Likert in 1932, as part of his doctoral have passed since its original introduction, often with little empir-
ical justification. Although a small (but inconsistent) literature
exists regarding the nature and number of response options to
include in psychological measures, measurement lore rather than
Editor’s Note. Yossef S. Ben-Porath, Editor, served as the sole action
data typically guides the choices scale developers and researchers
editor for this submission.
make regarding response scales.
Given this backdrop, a number of questions frequently are
posed regarding the use of Likert-type items on psychological
Leonard J. Simms, Kerry Zelazny, Trevor F. Williams, and Lee Bernstein, questionnaires. First, is there a “psychometrically optimal”
Department of Psychology, University at Buffalo, The State University of number of response options? Second, is it better to include an
New York.
even or odd number of response options? And finally, do visual
This study was supported by a research grant to Leonard J. Simms from
analog scales—in which respondents simply make a mark (or
the National Institute of Mental Health (R01MH080086).
Correspondence concerning this article should be addressed to Leonard J. move a slider) along a line ranging from agree to disagree—
Simms, Department of Psychology, University at Buffalo, The State Univer- offer any advantages over traditional Likert-type items (e.g.,
sity of New York, Park Hall 218, Buffalo, NY 14221. E-mail: ljsimms@ permitting the possibility finer distinctions along the response
buffalo.edu scale)? In this article, we briefly review the literatures related to
1
2 SIMMS, ZELAZNY, WILLIAMS, AND BERNSTEIN
each of these questions and then offer a summary of fresh data “I would like the work of a librarian,” on a 10-point Likert scale
that were collected to address each. ranging from 1 ⫽ very strongly disagree to 10 ⫽ very strongly
agree (see Table 1 for an example of all point labels across such
Is There a “Psychometrically Optimal” Number of a scale). How finely can a respondent discriminate along this
Response Options? scale? Is a response of 9 (strongly agree) reliably different than a
response of 10 (very strongly agree)? Although an empirical
Despite the central importance of response scale to most ques- question, anecdotal evidence from years of developing and using
tionnaire and rating scale measures of psychological constructs, scales of various lengths in our laboratory suggests that there is a
little consensus has emerged in the literature regarding the number point of diminishing returns with respect to the number of response
of points to include in a Likert-type rating scale. Likert’s (1932) options.
original scale included five options, as described above, but even Some literature is consistent with this perspective. Bendig (1953)
a casual look at the measures used in practice and research reveals
reported equal reliability for three, five, six, or nine response
that measures range widely in the number of response options they
options but a reliability decrease for 11 options. More recently,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Table 1
Summary of Likert Response Labels Used in This Study
Option 2-point 3-point 4-point 5-point 6-point 7-point 8-point 9-point 10-point 11-point Visual Analog
1 Disagree Disagree Strongly Strongly Disagree Strongly Strongly Disagree Very Strongly Very Strongly Very Strongly Very Strongly Disagree
Disagree Disagree Disagree Disagree Disagree Disagree
2 Agree Neither Agree nor Disagree Disagree Disagree Disagree Strongly Strongly Disagree Strongly Strongly Disagree Agree
Disagree Disagree Disagree
3 Agree Agree Neither Agree nor Slightly Slightly Disagree Disagree Disagree Disagree Disagree
Disagree Disagree
4 Strongly Agree Slightly Neither Agree nor Slightly Slightly Disagree Mostly Mostly Disagree
Agree Agree Disagree Disagree Disagree
5 Strongly Agree Agree Slightly Agree Slightly Neither Agree nor Slightly Slightly Disagree
Agree Disagree Disagree
6 Strongly Agree Agree Slightly Agree Slightly Neither Agree nor
Agree Agree Disagree
7 Strongly Agree Strongly Agree Mostly Agree Slightly Agree
Agree
8 Very Strongly Strongly Agree Agree Mostly Agree
Agree
9 Very Strongly Strongly Agree
Agree Agree
10 Very Strongly Strongly Agree
Agree
11 Very Strongly
Agree
NUMBER OF RESPONSE OPTIONS 3
In contrast to the preceding paragraphs, a number of studies also Aside from these considerations, Garland (1991) suggested that
have suggested no clear trends in scale psychometrics as a function midpoint responses may be related to socially desirable respond-
of the number of response options (e.g., Bendig, 1954; Capik & ing. Velez and Ashworth (2007) argued that midpoint endorsement
Gozum, 2015; Matell & Jacoby, 1972). Moreover, some recent decreases with clearer items. Hernández, Drasgow, & González-
studies have painted a contradictory picture of response option Roma (2004) identified different classes of individuals in terms of
effects. Finn et al. (2015), following up on similar studies by Cox how they use middle response options, and suggested small per-
and colleagues (2012) and Cox, Courrégé, Feder, & Weed (2017)), sonality differences across these groups. Thus, the reasons for and
compared 2- and 4-point response scales as applied to the items of impact of allowing a middle-point option have been considered
the restructured form of the Minnesota Multiphasic Personality elsewhere in the literature, albeit infrequently and without clear
Inventory (MMPI)-2 (MMPI-2-RF; Tellegen & Ben-Porath, 2008/ conclusions about the question under consideration here: Should
2011). In this line of research, increasing the number of response scale developers use even or odd numbers of response options for
options resulted in improved internal consistency but, interest- Likert-type items? Thus, our second aim is to directly address this
ingly, no advantage in convergent validity. However, it is an open question.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
numbers of response options, and (c) visual analog scales versus Agreeableness, and Openness). Benet-Martinez and John (1998)
traditional Likert scales. Our study extends previous work by reported good internal consistency for BFI scales and good con-
simultaneously assessing each of these three questions using a vergence between BFI scales and other established measures of the
well-powered study with adequate resolution to yield more con- Big Five Model. A noted above, we manipulated the response
clusive results. Given the previous (albeit inconclusive) work in scale across groups for the BFI, and the resultant reliabilities and
this area, as well as considerations drawn from basic psychometric validities are presented below, representing the primary analyses
theory, we predicted that measurement precision (and subsequent for this article.
markers of validity) would (a) asymptote after six to seven re- Personality Inventory for DSM–5 (PID-5; Krueger et al.,
sponse options, (b) show no advantage for odd numbers of re- 2012). The PID-5 includes 220 self-report items assessing the 25
sponse options relative to matched evens, and (c) be no stronger maladaptive traits and five higher order domains of the Alternative
for visual analog scales relative to traditional Likert scales. Model of Personality Disorder (American Psychiatric Association,
2013). Responses are provided on a four-point Likert scale from 0
Method (very false or often false) to 3 (very true or often true), and scale
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Figure 1. Cohen’s d effect size differences (based on item means) as a function of scale and number of
response options. VA ⫽ visual analog scale.
alphas decreased slightly from eight to nine response options, M results—presented in Supplemental Table 2—revealed several pat-
alphas ⫽ .82 and .81, respectively. The same pattern was observed terns. First, there were relatively few significant differences among
from 10 to 11 response options, M alphas ⫽ .83 and .82, respec- the alphas presented in Figures 2 and 3. Second, the bulk of
tively. Finally, the visual analog scale yielded a slightly lower significant alpha differences involved the 2-point scale, which was
alpha compare d to the most other alphas between six and 11re- significantly lower than the 3-point scale for four of five traits (i.e.,
sponse options, M ␣ ⫽ .81. Taken together, these results suggest all but conscientiousness), 2s (1) ranged from 10.1 to 16.6, p ⬍
a slight decrement in alpha for odd numbers of response options, .01. Similarly, the 2-point scale yielded alphas that were signifi-
beyond six, and an additional slight decrement for visual analog cantly lower than those for all more differentiated scales for
scales. That said, the differences between matched even and odd extraversion, agreeableness, and openness, 2s (1) ranged from 3.6
numbers of response options were small enough that the practical to 16.6, p ⬍ .05, whereas the 2-point scale was less consistently
impact likely is minimal. different than other response scales for conscientiousness and
We also tested for significant differences among the alpha coeffi- neuroticism. Finally, to detect alpha differences between even and
cients described above, using methods described by Feldt, Woodruff, odd numbers of response options, 7 of 25 matched comparisons
and Salih (1987) and implemented in “cocron,” an open-source pro- yielded alpha differences in the predicted direction, 2s (1) ranged
gram on the R platform (Diedenhofen & Musch, 2016). These from 5.2 to 16.6, p ⬍ .05, whereas the remaining tests were not
.90
.85
.80
Coefficient Alpha
.75
BFI Extraversion
.70 BFI Agreeableness
BFI Conscienousness
BFI Neurocism
.65 BFI Openness
Mean
.60
2 3 4 5 6 7 8 9 10 11 VA
Response Opons
Figure 2. Internal consistency reliabilities as a function of scale and number of response options. VA ⫽ visual
analog scale.
6 SIMMS, ZELAZNY, WILLIAMS, AND BERNSTEIN
.90
.85
.80
Coefficient Alpha
.75
BFI Extraversion
.70 BFI Agreeableness
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
BFI Conscienousness
This document is copyrighted by the American Psychological Association or one of its allied publishers.
BFI Neurocism
.65 BFI Openness
Mean
.60
2 3 4 5 6 7 8 9 10 11 VA
Response Opons
Figure 3. Internal consistency reliabilities as a function of scale and number of response options, highlighting
matched even-odd pairs. VA ⫽ visual analog scale.
significant. Thus, these significance tests painted a more conser- .895, .892, and .896, respectively. Full retest results are presented
vative picture of the differences than was evident through visual in Supplemental Table 3, including Z tests for significant differ-
inspection of the figures; through this lens, the most robust finding ences among the correlations. Although the figures suggest no
is that 2-point scales are impoverished relative to more differen- advantage for any scale beyond six response options, the signifi-
tiated scales with respect to internal consistency reliability. cance testing revealed a more conservative pattern: Short-term
We next studied short-term retest correlations within each retest coefficients were significantly lower for the two-to-three
group. Given the short interval—approximately 30 min— between group relative to all other groups for Neuroticism, Extraversion,
test and retest, these coefficients are akin to dependability corre- and Openness, Zs ranged from 2.01 to 2.68, p ⬍ .05. No other
lations (see Watson, 2004). These results—which appear in Figure differences among these correlations were significant.
4—showed that retest correlations (a) all were strong and greater Finally, we studied convergent validity as a function of scale and
than .84, and (b) asymptoted for the six versus seven or eight number of response options. Results for the BFI Extraversion, Agree-
versus nine, and 10 versus 11 comparisons, M retest correlations ⫽ ableness, Conscientiousness, and Neuroticism scales are presented in
Figure 4. Short-term retest correlations as a function of scale and number of response options.
NUMBER OF RESPONSE OPTIONS 7
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 5. Mean convergent validity correlations, measured against the Personality Inventory for DSM–5, as a
function of scale and number of response options. For each BFI scale, convergent validities were included in the
means only if at least one correlation was greater than or equal to .40. All correlations are absolute values. VA ⫽
visual analog scale.
Figure 5 and Supplemental Tables 4 to 8. For each BFI scale, we How Many Response Options?
calculated mean convergent validity correlations against the scales of
the PID-5, Convergent validities were included in the means only if at As noted previously, traditional Likert scales (Likert, 1932)
least one absolute correlation was greater than or equal to .40. No included 5 symmetrical response options ranging from strongly
correlates met this threshold for BFI Openness. Given the premise that disagree to strongly agree, but many variations on that original
unreliability should attenuate validity, we predicted increases in con- structure have emerged since the scales original publication, often
vergent validity as the number of response options increased. Results with little or no evidence offered to support the deviation. Our
varied across scales, with no single pattern of validities emerging results revealed several themes regarding the ideal or optimal
across scales as a function of response scale. The mean validity number of response options to include on such a scale. First,
data—represented by a bold, dashed line in Figure 5—revealed a although we did not make any specific prediction regarding de-
relatively flat line across response formats, with two exceptions. First, scriptive statistics, our results show that changing the number of
there was a slight dip in validity for those who completed the 4- and response options has a non-negligible impact on basic scale norms.
5-point BFI scales, mean rs ⫽ .42 and .41, respectively, compared to Large differences in scale means were revealed for smaller num-
the mean validity across all scales and response scales, mean r ⫽ .45. bers of response options. And although these differences stabilized
Second, there was a notable drop in validity for those completing he a bit for scales with four or more response options, item means
BFI using the visual analog format, mean r ⫽ .41. Notably, however, continued to decline at a smaller rate. Thus, for example, if
none of the mean validity correlations presented in Figure 5 were response scales are modified for a given research study or clinical
significantly different from one another, all Zs ⬍ 1.96, ps ⬎ .05. application, simple proration of published norms—to account for
changes in the number of response options—appears to be a
problematic exercise.
Discussion Second, response scales with two and three response options
In this article, we have attempted to bring clarity to three (and to a lesser extent four and five response options) generally
questions involved in determining a response scale for psycholog- attenuated the psychometric precision associated with these five
ical test items: (a) is there a psychometrically optimal number of BFI scales, which suggests that scale developers wishing to adopt
response options to include for Likert-type items, (b) is there any such response scales do so at some risk of reducing measurement
psychometric advantage or disadvantage associated with using odd precision. Practically speaking, to have scales with minimally
or even numbers of response options, and (c) how does the visual acceptable measurement precision (e.g., ␣ ⫽ .80; see Clark &
analog format compare to traditional Likert-type items with re- Watson, 1995), using fewer response options will require scale
spect to their resultant psychometric characteristics. To address developers to include more items. Thus, these results reveal a
these questions, we examined descriptive statistics, internal con- trade-off between response scale simplicity and measurement im-
sistency reliability, short-term retest correlations, and convergent precision that must be adequately reconciled at the scale develop-
validity across different numbers of response options and with five ment stage. Some developers may desire simpler response scales
broad personality scales. In this discussion, we will summarize and for nonpsychometric reasons (e.g., readability, simplicity), but
consider the implications of our results related to each of these more items will be needed for such scales if minimally acceptable
three questions. measurement precision is desired.
8 SIMMS, ZELAZNY, WILLIAMS, AND BERNSTEIN
Third, no improvements in psychometric precision were identi- impression should attenuate validity correlations (e.g., Gulliksen,
fied past 6 response options. Thus, there appears to be little 1950). That did not occur in this study.
psychometric basis to arguments that additional numbers of re- Another point that deserves discussion is that statistical signif-
sponse options will translate to increased scale reliability while icance testing showed a more conservative pattern than the visual
holding the number of items constant. Going beyond 6 options pattern of results presented in the figures. The most conservative
may confuse participants who perhaps have difficulty perceiving conclusion based on significance testing is that 2- and 3-point
differences between similarly worded response options (e.g., strongly response formats are impoverished with respect to measurement
agree vs. very strongly agree). Alternatively, more differentiated precision (but not validity). Differentiation beyond four response
response scales may pose important challenges to the ability of options generally was not supported by significance testing. How-
humans to make fine-grained distinctions regarding responses to ever, the visual patterns appear to show replicable additional
relatively coarse psychological test items. That is, there likely are advantages, albeit small and nonsignificant, between four and five
important cognitive information processing variables (e.g., mem- and six and seven options. Thus, we maintain that scale developers
ory, perception, discrimination, intelligence) at play that have not should strongly consider using six or seven response options
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
been adequately studied with respect to how they interact with unless pilot testing or other considerations make such a differen-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
participants’ approaches to psychological test items. More work is tiated format undesirable.
needed in this area. Notably, some readers may find this recommendation curious
Interestingly, the criterion validity results defied the pattern of given that the validity results failed to support the need for such a
attenuated psychometrics for smaller numbers of response options, differentiated response scale. We acknowledge this discrepancy.
such that the lower psychometric precision associated with smaller However, the decision regarding how many options to offer on a
numbers of response options did not translate into attenuated personality questionnaire like the BFI involves more than a con-
validity as would be expected from classical test theory (e.g., sideration of validity. Deficits in measurement precision alone are
Crocker & Algina, 1986; Gulliksen, 1950; McDonald, 1999). This directly related to increases in the standard error of measurement
extends recent work suggesting that moving from two to four of a scale (and thus the confidence intervals around scores), which
response options results in no improvement in convergent validity has implications for the precision of point estimates of scores used
in clinical and applied settings. Thus, even in the absence of
correlations (Cox et al., 2012, 2017; Finn et al., 2015). In present
validity implications (about which the conclusions are still un-
study, we replicated this finding and extended it to more differen-
clear), it is our view that response scales should be adequately
tiated response formats up to 11 options (and the visual analog
differentiated to maximize measurement precision.
scale). Several possibilities could explain this pattern of results.
First, the BFI scales we used for the validity analyses are psycho-
metrically strong scales with a long literature showing evidence of Is It Better to Have an Even or Odd Number
their reliability and validity (e.g., see John & Srivastava, 1999). In of options?
particular, it is notable that even in our own results, the short-term
Given the ambiguity associated with the use of the middle option
retest correlations, although slightly lower for less differentiated
on odd-numbered response scales (Kulas & Stachowski, 2013), we
scales, did not reveal dramatic differences in scale dependability.
predicted that odd-numbered Likert scales would show no advantage,
Arguably, measurement imprecision related to temporal differ- psychometrically speaking, over matched even-numbered scales. This
ences (i.e., retest reliability or dependability; Watson, 2004) might result was generally supported, as alphas and criterion validity corre-
be more detrimental to criterion validity than imprecision related lations generally revealed no advantage for odd-numbered scales
to content heterogeneity (i.e., internal consistency reliability). If relative to matched even-numbered scales. Taken together with the
so, then our failure to see a consistent impact of response options conclusions presented above, these results suggest little psychometric
on criterion validity is not as surprising. justification for Likert scales with more than six or seven response
Alternatively, unknown sampling differences across our groups options. Moreover, although the psychometric differences between
could have affected these findings, despite large samples and six and seven response options were small to nonexistent, we would
random assignment. Also, our criterion variables were limited to argue that six response options are preferable to seven options on the
the scales of another personality questionnaire with known limi- grounds of parsimony. That said, the most honest appraisal of our
tations (e.g., Al-Dajani, Gralnick, & Bagby, 2015). Given that both results is that it probably doesn’t matter much. Those unconcerned
our primary measure and convergent validity measure were self- with the ambiguity of the middle option may continue to use odd-
report questionnaires with similar formats, the impact of shared numbered Likert scales given that there is no clear and unequivocal
method variance could not be disentangled in the current study psychometric penalty for doing so.
(e.g., Courrégé & Weed, 2018). Taken together, it would be
important to study the impact on validity of different numbers of
Do Visual Analog Scales Offer Any Advantages Over
response options using a broader range of test and nontest criteria,
such as behavioral observations or real-word indicators of person-
Traditional Likert-Type Items?
ality. Regardless, although the precision results point to an asymp- Visual analog scales offer the promise of added measurement
totic point of diminishing returns following 6 response options, the precision given that they do not rely on a limited number of anchor
criterion validity results limit the strength of this conclusion. This points and rather allow participants to simply mark their response
result deserves further study since it calls into question long- anywhere they wish along a continuous line. That said, our results
standing tenets of classical test theory regarding the relation be- failed to show any psychometric advantage for visual analog items
tween reliability and validity. All things being equal, measurement relative to traditional Likert-type items. In fact, most of our anal-
NUMBER OF RESPONSE OPTIONS 9
yses appeared to reveal small nonsignificant decrements in psy- article. College samples typically are comprised of relatively well-
chometric performance, including criterion validity, for scales educated participants with average to above-average intellectual
comprised of visual analog items. Similar to that which was ability and reading levels. Given the cognitive information pro-
discussed above, it appears that the promise of added psychometric cessing demands that test items likely place on those providing
precision is not realized in practice with scales based on visual responses, college samples likely represent an upper bound on the
analog items, perhaps because humans are unable to reliably make amount of complexity that it is reasonable to include in an item
meaningful and valid fine-grained distinctions for coarse items response format. Thus, samples comprised of participants with
reflecting complex psychological characteristics. Although the dif- lower cognitive ability might yield different results from those
ferences in results for visual analogs generally were not large or identified here.
statistically significant, nothing in our results would lead us to
recommend their use as the basis for psychological test items. Final Thoughts
Moreover, although not directly assessed in the present study, the
present results also call into question the common practice of using Despite the above caveats, the present study offers important
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
visual analog scales as single-item scales to reflect underlying guidance to aid scale developers who wish to base their work on
This document is copyrighted by the American Psychological Association or one of its allied publishers.
latent traits and characteristics. evidence-based practices. We conclude that a 6-point response
scale is the most reasonable format based on the present results,
especially for measures of personality constructs like those we
Limitations and Future Directions assessed in this study. Going much beyond six response options
Strengths of our study include a large sample size, resolution seems to challenge humans’ basic ability to make fine-grained
permitting us to examine all common numbers of response options distinctions about the complex psychological constructs we tend to
between two and 11 (including a visual analog option), and the use study. We also made suggestions regarding the use of odd-
of a widely used and psychometrically strong measure of a prom- numbers of response options and visual analog items. However, an
inent measure of personality as the basis for the study. In addition, important take-home message from this article is that the choice of
we examined the impact of response scale on both reliability and response format should be given the same amount of thought and
validity, something not often done in this literature. That said, our examination as typically is put into the development of the test
results must be interpreted in the context of several limitations. items themselves.
First, our results, by design, apply only to traditional Likert-type If you’re building a new scale and wish to deviate from the
agree-disagree scales. Other metrics—such as scales related to suggestions listed here, we strongly recommend pilot testing.
frequency or intensity or similarity—might yield different results
and thus must be studied in a manner similar to what we presented References
in this article. That said, the basic finding—that there is a point of
diminishing returns in terms of the number of response options—is Aitken, R. C. (1969). Measurement of feelings using visual analogue
likely to generalize to all possible response scales for psycholog- scales. Proceedings of the Royal Society of Medicine, 62, 989 –993.
Al-Dajani, N., Gralnick, T. M., & Bagby, R. M. (2015). A Psychometric
ical scales, although the exact point might be a function of the
Review of the Personality Inventory for DSM–5 (PID-5): Current Status
nature of the response scale under consideration. For example, and Future Directions. Journal of Personality Assessment, 98, 1–20.
Likert scales are inherently bipolar in nature; unipolar scales (e.g., American Psychiatric Association. (2013). Diagnostic and Statistical Man-
not at all to extremely) may not yield as many meaningfully ual of Mental Disorders (5th ed.). Washington, DC: Author.
different anchor points. All such variations in response scales Bendig, A. W. (1953). The reliability of self-ratings as a function of the
ought to be carefully studied during the scale development process. amount of verbal anchoring and of the number of categories on the scale.
Second, our analyses and results were confined to a single Journal of Applied Psychology, 37, 38 – 41. http://dx.doi.org/10.1037/h0
multiscale measure of personality. Although the BFI is a strong 057911
and widely studied measure of a prominent model of personality, Bendig, A. W. (1954). Reliability and the number of rating-scale categories.
it does not fully represent all possible constructs psychological Journal of Applied Psychology, 38, 38 – 40. http://dx.doi.org/10.1037/h0
055647
researchers and clinicians may wish to measure. Taken together
Benet-Martínez, V., & John, O. P. (1998). Los Cinco Grandes across
with the finding that our patterns of results appeared to vary cultures and ethnic groups: Multitrait multimethod analyses of the Big
somewhat even within the BFI scales, generalizing the present Five in Spanish and English. Journal of Personality and Social Psychol-
findings to other measures and different constructs must be done ogy, 75, 729 –750. http://dx.doi.org/10.1037/0022-3514.75.3.729
with caution until the present results are replicated with a broader Bergman, R. D. (2009). Testing the measurement invariance of the Likert
range of measures and constructs, including different domains, and graphic rating scales under two conditions of scale numeric pre-
such as psychopathology, mood/affect, attitudes, values, and con- sentation (Doctoral dissertation). Retrieved from Proquest Dissertations
structs used in other areas of the social sciences (e.g., political or & Theses Global. (Order No. 3360158)
moral attitudes, market research evaluations, etc.). Capik, C., & Gozum, S. (2015). Psychometric features of an assessment
Finally, our sample, although large, was comprised solely of instrument with Likert and dichotomous response formats. Public
Health Nursing, 32, 81– 86. http://dx.doi.org/10.1111/phn.12156
college undergraduates who completed the study to fulfill a course
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in
research requirement. Replicating the present work in a broader, objective scale development. Psychological Assessment, 7, 309 –319.
more ecologically valid community and psychiatric samples would http://dx.doi.org/10.1037/1040-3590.7.3.309
be important aims for future work in this area. Although this is a Courrégé, S. C., & Weed, N. C. (2018). The role of common method
common limitation in empirical studies involving student samples, variance in MMPI-2-RF response option augmentation. Unpublished
there are specific reasons to highlight this limitation in the present manuscript.
10 SIMMS, ZELAZNY, WILLIAMS, AND BERNSTEIN
Cox, A. C., Courrégé, S. C., Feder, A. H., & Weed, N. C. (2017). Effects Research in Personality, 47, 254 –262. http://dx.doi.org/10.1016/j.jrp
of augmenting response options of the MMPI-2-RF: An extension of .2013.01.014
previous findings. Cogent Psychology, 4, 1323988. http://dx.doi.org/10 Lee, J., & Paek, I. (2014). In search of the optimal number of response
.1080/23311908.2017.1323988 categories in a rating scale. Journal of Psychoeducational Assessment,
Cox, A., Pant, H., Gilson, A. N., Rodriguez, J. L., Young, K. R., Kwon, S., 32, 663– 673. http://dx.doi.org/10.1177/0734282914522200
& Weed, N. C. (2012). Effects of augmenting response options on Likert, R. (1932). A technique for the measurement of attitudes. Archives
MMPI-2 RC scale psychometrics. Journal of Personality Assessment, de Psychologie, 140, 5–53.
94, 613– 619. http://dx.doi.org/10.1080/00223891.2012.700464 Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of
Cox, E. P. (1980). The optimal number of response alternatives for a scale: response categories on the reliability and validity of rating scales. Method-
A review. Journal of Marketing Research, 17, 407– 422. http://dx.doi ology: European Journal of Research Methods for the Behavioral and
.org/10.2307/3150495 Social Sciences, 4, 73–79. http://dx.doi.org/10.1027/1614-2241.4.2.73
Crocker, L., & Algina, J. (1986). Introduction to classical & modern test Matell, M. S., & Jacoby, J. (1972). Is there an optimal number of alterna-
theory. Orlando, FL: Holt, Rinehart and Winston. tives for Likert-scale items? Effects of testing time and scale properties.
Diedenhofen, B., & Musch, J. (2016). cocron: A web interface and R Journal of Applied Psychology, 56, 506 –509. http://dx.doi.org/10.1037/
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.