SurveySIG AgreeDisagree Submitted

Agree-Disagree:
A Strongly Disagreeable Response Scale
Many textbooks on questionnaire design advise survey developers to avoid using

agree-disagree response scales i.e., those that range from strongly disagree to
strongly agree (e.g., Converse & Presser, 1986; Fowler, 2002). A wide range of
reasons are offered as to why these types of response scales may negatively impact
questionnaire responses. In spite of the number of rationales provided for avoiding these
scales and the compelling logic behind these reasons, little empirical work exists to
buttress this advice. Perhaps for this reason, use of these response scales has perpetuated.
This proposed paper investigates four hypotheses as to why agree-disagree
response scales might be less than desirable. The ultimate goal of this research is to
begin to provide an empirical foundation to help survey designers determine what types
of response scales will be most effective in helping respondents minimize error in their
self-reports.
One possibility as to why these response scales might be less effective in
minimizing error is that strongly is an unfortunately ambiguous modifier. Converse
and Presser (1986) argue that scales using strongly disagree and strongly agree
confound extremity of opinion (where respondents place themselves on a continuum)
with intensity or certainty of opinion (how sure respondents are of their opinion,
regardless of its location on the continuum). In other words, strongly may be
interpreted by some respondents as referring to extremity, while for other respondents it
is interpreted as certainty. If this hypothesis were true, respondents might not clearly
distinguish strongly disagree from disagree or strongly agree from agree and the
resultant scale might be unevenly spaced.
The second hypothesis investigated the possibility that, because agree-disagree
response scales are bipolar (i.e., ranging from a negative to a positive), they do not
achieve the precision of unipolar scales (e.g., do not agree at all completely agree). In
other words, respondents may feel as though they simply do not have enough choices to
accurately indicate where they lie on the continuum. If this were the case, the variance of
the scale as a whole might be restricted and/or be prone to floor or ceiling effects.
Third, respondents may be more likely to satisfice (Krosnick, 1991) when
presented with agree-disagree response scales. In other words, something about the
nature of answering agree-disagree response scales may reduce respondents
motivation to give thoughtful, accurate answers. Specifically, it seems plausible that
when respondents have to place themselves along a continuum (as is the case when
forced-choice questions are used), they must search through their memories to determine
what their opinion truly is. However, when merely agreeing (or disagreeing) with the
opinion stated on a survey they might simply make a snap judgment as to whether the
statement seems plausible. If true, the agree-disagree response scales might result in
respondents completing these surveys more quickly than surveys using forced-choice
questions.
Fourth, Fowler (2002) indicates that these response scales might encourage
acquiescence. In other words, respondents who might be inclined to give low ratings to a
forced-choice question, might end up agreeing when items are put into the agree-
disagree format. Extending his logic further, it seems that by asking respondents to
agree or disagree with a series of items, there might be a measurement confound. Items
might be partially measuring the intended construct, however they might also be
measuring how agreeable respondents are. After all, agreeableness is one of the
foundational Big Five traits in personality psychology (John, Donahue, & Kentle,
1991). If this were the case, a scale using agree-disagree anchors might correlate much
more positively with a measure of agreeableness than would a scale using different
response anchors.
Methods:
To investigate these hypotheses, participants (N = 331) were randomly assigned to
one of three different forms of a survey. On the scale of interest, Form 1 (n = 109) listed
statements which respondents agreed or disagreed with using response anchors of:
strongly disagree, disagree, neither agree nor disagree, agree, and strongly agree. Form 2
(n = 95) responded to the same statements on the scale of interest but using response
anchors of: do not agree at all, slightly agree, somewhat agree, mostly agree, and
completely agree. Form 3 asked forced-choice questions with a response scale of almost
never; once in a while, sometimes, often, almost always. To create the scale for this
third form, item-wording was identical except for trivial changes necessary to make the
questions grammatically correct as a question.
The scale of interest (i.e., that varied across survey forms) assessed respondents
social perspective taking propensity i.e., how frequently they tried to figure out the
thoughts and feelings of others. This scale was adapted from Davis original (1983)
measure. A second measure consisted of a scale assessing the personality trait of
agreeableness (John et al., 1991). Finally, 15 items asked participants to assess each of
the response anchors through visual analog scales (DeVellis, 2003). In other words,
respondents placed an X on a 10 cm line that provided the following anchors at each
end of the line: 100% disagreement to 100% agreement for Form 1, 0 % agreement
to 100% agreement for Form 2, and 0 % of the time to 100% of the time for Form
3).
Participants were drawn from high school, college, and graduate classes at various
educational institutions in New England. The sample was 66% female (several of the
classes were in schools of education) and 82% white. Respondents ranged from 14 to 53
years old (M = 19.5). Instructions for completing the surveys were read aloud before
respondents began the survey and were printed on the front page of their booklets.
Respondents started the survey at the same time and completed it at their own pace. The
only unusual aspect of the administration was that participants were asked to record how
long it took them to complete the survey by noting the time on a timer that was projected
onto an overhead screen.
Results:
Before assessing the three hypotheses it is helpful to get an overview of how the
three different forms of this scale functioned through an examination of some descriptive
statistics. Form 1 had the lowest reliability ( = .78) and the highest mean (m = 3.61, sd
= .60). Form 2 was the most reliable ( = .86; m = 3.47, sd = .74). Form 3 had a
reliability of = .83 (m = 3.43, sd = .64). None of these reliability coefficients are
significantly different from one another as assessed by Barnettes (2005) confidence
interval approach. However, much larger sample sizes are generally needed to detect
differences in internal consistency coefficients such as these. Congruent with
expectations, the agree-disagree scale appears to be the least reliable.
The first hypothesis posed the possibility that agree-disagree response scales
were problematic because respondents did not adequately distinguish between strongly
disagree and disagree as well as between strongly agree and agree. Thus, these
particular anchors might create uneven or inconsistent spacing along the response scale.
Results provided some support for this notion. Figure 1 shows boxplots from the 15
items in which respondents placed an X along a continuum to indicate the relative
extremity of different modifiers. Forms 2 and 3 provide relatively even spacing between
their anchors. However, the Figure shows that respondents made almost no distinction
between strongly disagree and disagree and only a small distinction between agree
and strongly agree.
The second hypothesis, that respondents lacked an adequate number of response
options to accurately map their opinion onto the response scale, garnered very little
support. Although Form 1 did have the highest mean and lowest standard deviation of
any of the scales, the distribution showed no signs at all of being skewed or kurtosed. All
the individual item means fell between 3 and 4. Thus, there was no evidence of floor or
ceiling effects.
Similarly, minimal support emerged for the third hypothesis (that respondents
were more likely to satisfice on Form 1 and would consequently finish their survey
faster). Although the students did finish the form with the agree-disagree response
scale slightly faster than the other forms were completed (Form 1 = 431 s, Form 2 = 443
s, and Form 3 = 440 s), the differences were not significant F(2, 319) = .321, ns).
The fourth hypothesis received substantial support. The data indicated that Forms
1 and 2, which emphasized the idea of agreement in their response scales, correlated
strongly with the agreeableness scale. Pearson correlations were r(109) = .45, p < .001 for
Form 1 and r(95) = .50, p < .001 for Form 2. For Form 3, SPT and agreeableness were not
significantly related, (r(125) = .13, p = .157). A Fishers z transformation (Howell, 1997)
was used to assess whether the correlation found in Form 3 differed significantly from
either of the other two correlations. The z-values (zs = 2.69 and 3.01 respectively)
indicate that the correlation from Form 3 was significantly different from the correlations
on both of the other forms.
Discussion:
Overall, these data provide empirically grounded explanations for why agree-
disagree response scales may inhibit the collection of accurate survey data. Because the
results emerged from a specific survey scale and with a particular (though relatively
diverse) population, they will need to be replicated before having confidence in their
generalizability. On the other hand, there is no particular reason to think that the
particular scale or particular population would diminish the generalizability of the results
of the experiment.
The results from the first hypothesis indicated that there is some evidence that
strongly is a problematic modifier and that respondents do not easily distinguish
between strongly disagree and disagree or between strongly agree and agree.
The data do manifest one curiosity, however. On the visual analog scales, a surprising
number of respondents located strongly disagree very close to 100% agreement. It
seems likely that respondents experienced some confusion on the item except that very
little confusion emerged for the comparable strongly agree item. Even if these
individual who placed themselves within two cm of 100% agreement for strongly
disagree or disagree are regarded as outliers, the two means are still quite close (M = .
76 and 1.38 for strongly disagree and disagree, respectively). Whether or not this
result stems from the confusion between extremity and certainty of opinion as suggested
by Converse and Presser (1986) is a question for future research. Given the data from the
present study, it seems reasonable to conclude that respondents have some difficulty in
consistently distinguishing between the extreme and more moderate points at the ends of
these response scales.
A second reason why these response scales might not work as well is that they
confound the construct being measured by the researcher and agreeableness. This
evidence seemed particularly strong given the magnitude of the respective correlations.
Because most survey designers who develop attitude and opinion scales wish to measure
one and only one construct, this data alone may be enough to dissuade researchers from
using agree-disagree response scales.
The hypotheses presented here are by no means an exhaustive list; other reasons
do exist for avoiding these scales. Although using disagree-agree scales to test out those
other reasons will make for informative research, in the mean time, the rest of us may
wish to avoid these response scales in our scholarship.
References:
Barnette, J. J. (2005). ScoreRel CI: An Excel program for computing confidence intervals
for commonly used score reliability coefficients. Educational and Psychological
Measurement, 65(6), 980-983.
Converse, J. M., & Presser, S. (1986). Survey questions: Handcrafting the standardized
questionnaire. Beverly Hills: Sage Publications.
Davis, M. H. (1983). Measuring individual differences in empathy: Evidence for a
multidimensional approach. Journal of Personality and Social Psychology, 44(1),
113-126.
DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Newbury
Park, CA: Sage.
Fowler, F. J. (2002). Survey research methods (3rd ed.). Thousand Oaks, CA: Sage
Publications.
Howell, D. C. (1997). Statistical methods for psychology (4th ed.). Belmont, CA:
Duxbury Press.
John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Big Five Inventory: Versions 4a
and 54. Berkeley, CA: University of California.
Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of
attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213-236.
Figure 1a 1c: Distributions of three different response scales for Forms 1, 2, and 3.
Form 1
10
strongly disagree disagree neither agree nor agree strongly agree

disagree
Form 2
10
do not agree at all slightly agree somewhat agree mostly agree completely agree
Form 3
10
almost never once in a while sometimes often almost always

SurveySIG AgreeDisagree Submitted

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

SurveySIG AgreeDisagree Submitted

Caricato da

Copyright:

Formati disponibili

Agree-Disagree:

A Strongly Disagreeable Response Scale

Many textbooks on questionnaire design advise survey developers to avoid using

strongly disagree disagree neither agree nor agree strongly agree

almost never once in a while sometimes often almost always

Potrebbero piacerti anche