Measuring The Whole or The Parts?

Measuring the Whole or the Parts?
Validity, Reliability, and Responsiveness of the

Disabilities of the Arm, Shoulder and Hand Outcome
Measure in Different Regions of the Upper Extremity
Dorcas E. Beaton, BScOT, MSc, PhD Claire Bombardier, MD, FRCP

Institute for Workand Health Institute for Workand Health, Toronto;
Toronto, Ontario, Canada; Graduate Department of Rehabilitation Sciences,
Department of Occupational Therapy, Clinical Epidemiology and Health Care Research
Graduate Department of Rehabilitation . Program, Department of Medicine, and Department
Sciences, and Clinical Epidemiology and of PublicHealth Sciences, University of Toronto;
Health Care Research Program The University Health Network
University of Toronto; Toronto General Hospital;
St. Michael's Hospital, Toronto Mt. Sinai Hospital, Toronto
Jeffrey N. Katz, MD, MS

Institute for Workand Health, Toronto;
Graduate Department of Rehabilitation Sciences ABSTRACT: The Disabilities of the Ann, Shoulder and Hand
(DASH) outcome measure was developed to evaluate disability and
University of Toronto; symptoms in single or multiple disorders of the upper limb at one
Brigham and Women's Hospital and point or at many points in time. Purpose: The purpose of this study
Harvard Medical School was to evaluate the reliability, validity, and responsiveness of the
Boston, Massachusetts DASH in a group of diverse patients and to compare the results
with those obtained with joint-specific measures. Methods: Two
hundred patients with either wrist/hand or shoulder problems
Anne H. Fossel were evaluated by use of questionnaires before treatment, and 172
Brigham and Women's Hospital (86%) were re-evaluated 12 weeks after treatment. Eighty-six
patients also completed a test-retest questionnaire three to five days
Boston, Massachusetts after the initial (baseline) evaluation. The questionnaire package
included the DASH, the Brigham (carpal tunnel) questionnaire, the
James G. Wright, MD, FRCSC, MPH SPADI (Shoulder Pain and Disability Index), and other markers of
pain and function. Correlations or t-tests between the DASH and
R. B. SalterChair in Surgical Research, the other measures were used to assess construct validity.
Department of Surgery, University of Toronto; Test-retest reliability was assessed using the intraclass correlation
Graduate Department of Rehabilitation Sciences, coefficient and other summary statistics. Responsiveness was
Clinical Epidemiology and Health Care Research described using standardized response means, receiver operating
characteristics curves, and correlations between change in DASH
Program, and Department of Public Health score and change in scores of other measures. Standard response
Sciences means were used to compare DASH responsiveness with that of the
University of Toronto; Brigham questionnaire and the SPADI in each region. Results: The
DASH was found to correlate with other measures (r> 0.69) and to
The Hospital for Sick Children, Toronto discriminate well, for example, between patients who were working
and those who were not (p < 0.0001). Test-retest reliability
Valerie Tarasuk, PhD (ICC=0.96) exceeded guidelines. The responsiveness of the DASH
Department of Nutritional Sciences (to self-rated or expected change) was comparable with or better
than that of the joint-specific measures in the whole group and in
University of Toronto each region. Conclusions: Evidence was provided of the validity,
test-retest reliability, and responsiveness of the DASH. This study
also demonstrated that the DASH had validity and responsiveness
in both proximal and distal disorders, confirming its usefulness
across the whole extremity.
J HAND THER. 2001;14:128-146.
This work was supported by research grants from the American AR36308 from the U'S. National Institutes of Health and the u.s.
Society for Surgery of the Hand, Rosemont, Illinois, and the National Arthritis Foundation (Dr. Katz).
Institute for Work and Health, Toronto, Ontario, Canada; by a Address correspondence and reprint requests to Dorcas Beaton,
PhD fellowship in health research from the Medical Research BScOT,MSc, PhD, Institute for Work and Health, 250 Bloor Street
Council of Canada (Dr. Beaton); by a scientist award from the East, Suite 702, Toronto, Ontario, Canada M4W 1E6; e-mail:
Medical Research Council of Canada (Dr. Wright); and by grant <dbeaton@iwh.on.ca>.
128 JOURNAL OF HAND THERAPY

The measurement of disabilityv ' or capacity to validity, test-retest reliability and responsive-
function" is critical to a comprehensive assessment of ness. 21-25 Of the articles mentioned above that used
outcome following an injury in the upper limb. The the DASH, only three collected data over time for the
fluid motion of a swimmer, artist, or musician attests study,14,19,20 none focused on the DASH per se, and
to the coordinated kinetic chain along the extremity none provided information on all three attributes.
which allows for such expression and function.Y' The purpose of this study was to evaluate the valid-
However, measuring disability in patients with ity, test-retest reliability, and responsiveness to
upper-limb disorders poses practical challenges. For change of the DASH in a longitudinal study of
example, distinct questionnaires have been devel- patients with various upper-limb disorders.
oped for the different regions of the uPfer limb7~11
and for various disorders in the limb.12-1 Given that METHODS
many patients could have multiple disorders or mul-
tiple affected regions, the choice between available A convenience sample of patients waiting for treat-
measures is difficult. The Disabilities of the Arm, ment of upper-limb conditions at one of two teaching
Shoulder and Hand outcome measure (the DASH) centers (St. Michael's Hospital in Toronto and
provides one possible solution. It is a questionnaire Brigham and Women's Hospital in Boston) were
designed to be used for single or multiple disorders in invited to participate in this prospective cohort
the upper limb, providing the possibility of a single study. The study did not affect their care but rather
questionnaire for measuring disability for any upper- noted their progress over a three-month period by
limb region. 5,15,16 The intent is that the DASH be used taking measures (through questionnaires only)
no matter what region or regions are affected. before and after treatment.
The development of the DASH has been docu- Four groups of patients, two with proximal disor-
mented elsewhere. I5-17 The DASH is a 30-item ques- ders and two with distal disorders, were targeted.
tionnaire that evaluates symptoms and physical The patients with proximal disorders included those
function (at the level of disabilityl,3,18), with a five- with glenohumeral arthritis who were undergoing
response option for each item. Scoring is done by joint replacement and those with soft tissue disorders
summing up the circled responses and subtracting around the shoulder (predominantly rotator cuff ten-
30. (Subtracting 30 anchors the score with a base of 0, dinitis). Patients with distal disorders included those
a correction required because the response scale is 1 undergoing carpal tunnel release and those receiving
to 5 and needs to be changed to a 0 to 4 equivalent). treatment for a tendon disorder in the wrist or hand
This figure is then divided by 1.2 to get a DASH func- (predominantly trigger finger and tendinitis). Acute
tion/symptom score out of a possible 100. injuries such as tendon lacerations or fractures were
A higher score on the DASH reflects greater dis- not included, because pre-treatment measures of dis-
ability. Missing responses to items (up to three items, ability cannot be obtained in these conditions.
or 10% of items) are replaced by the mean value of No guidelines were available for calculating sam-
the responses to the other items before summing. If ple sizes for studies of responsiveness. A traditional
responses to more than three items are missing, the paired sample calculation (alpha at 0.025, a
overall score cannot be calculated.P Bonferroni correction to allow for multiple compar-
Preliminary work on the validity (against con- isons." and beta at 0.10) was used. 27
structs of function and pain) and reliability (alpha, The amount of change we wanted to be able to
0.9615; test-retest reliability, 0.9219) has been carried detect was calculated using data from a worksite
out directly by those involved in the develop- study and was defined as the difference in improve-
mentl5-17 as well as indirectly by others who used the ment on the DASH scores between those who said
DASH for comparison with another instrument. 14,20 they were much better and those who said they were
Kirkley et aL14 and MacDermid et aL20 also demon- somewhat better between testings one year apart. 28,29
strated that the DASH was responsive, although The former group had change scores of 5.79 out of
slightly less so, in comparison with more joint- or dis- 100 (SD 11.2) on average, and the latter had change
order-specific measures-specifically, a shoulder scores of 2.42 out of 100 (SD 12.3). The difference
instability and a wrist-specific instrument in these between them was 3.37 out of 100. The average
studies respectively. change in those who said they had not changed
These studies provided initial evidence of the between testings was 0.89 out of 100 (SD 9.9). The SD
validity and reliability of the DASH scores; however, in this group was used as the variance in the sample
additional work was needed to compare the DASH size calculation, as suggested by Guyatt et aL30 and
in patients with disorders in different upper-limb Rossner."
regions. Of particular interest was how the DASH A sample of 113 patients was required. A 20% cor-
would do in evaluating change over time (its intend- rection was added to allow for missing or unusable
ed role) in patients with different affected parts of the questionnaires, which raised the requirement to 142
extremity, a role that requires evidence of construct patients. (The target of 113 patients was assumed to
April-June 200 I 129

equal 80% of the data collected.) We anticipated that up Construct Validity
to 20% of participants would not complete the follow-
up package; hence, recruitment was targeted for 178 Several different comparisons were done, using
patients. We continued recruitment until it was appar- recommended methods, to evaluate the construct
ent that we would have follow at least this number and validity of the DASH scores. 25,34
that it represented at least 80% of the baseline sample. We hypothesized that the DASH scores would be
Patients at the St. Michael's Hospital site were sensitive to the range of disability in our sample. This
approached in person by study personnel (either at a was verified by looking at the distributions of base-
surgeon's clinic or in a pre-admission clinic). The line scores (whole sample, proximal, distal, surgical,
study was explained to them, and they were asked to nonsurgical) and specifically looking for floor or ceil-
sign a consent form if they wished to participate. At ing effects (patients with scores at either extreme of
that time they were given the baseline package to the scale), which would indicate a lack of sensitivity
complete and return as well as a second package to to the disability experience in this sample. Floor and
be completed three to five days later (depending on ceiling effects would also lead to difficulties in trying
the date of surgery). Two follow-up packages were to measure change (for instance, if everyone scores at
mailed out to the subjects with stamped return the maximum score--a ceiling effect-there is no
envelopes. This was done 4 and 12 weeks after treat- place to move to on the scale if they improve).35
ment. (Only 12-week data are presented in this We also felt that the DASH scores should be lower
paper.) Up to two reminder packages and phone calls (indicating less disability) in the following groups:
were made to encourage response. those working full duty rather than not and those
In Boston, study personnel identified patients by able to cope and do what they want rather than not.
reviewing ICD-9-CM diagnostic codes in billing data These contrasts were tested with an unpaired
and looking for patients receiving or awaiting surgical Student's t-test, at a 0.05 level of error.
or nonsurgical care for the target disorders. These We posited that the DASH should also correlate at
potential subjects were sent detailed letters explaining least moderately (Pearson correlations greater than
the study and inviting them to participate. The base- 0.5) with visual analog scales of function, pain, and
line questionnaire package and a stamped return ability to work as well as with established joint-spe-
envelope were also included. Participation (returning cific measures, specifically the Shoulder Pain and
the baseline questionnaire package) was considered Disability Index (SPADD for patients with shoulder
consent. Subjects from Boston were sent their second conditions7,36,37 and the Brigham questionnaire (the
package by mail at 12 weeks. Again, phone calls and Brigham) 12,38 for patients with wrist and hand condi-
reminder packages (up to two) were sent as necessary. tions).
The study was approved by the Research Ethics Furthermore, if the joint-specific measures are
Board at both sites. indeed specific to a particular joint, we should see
lower correlations between the DASH and the
MEASURES AND ANALYSIS Brigham in the patients with shoulder conditions as
well as between the DASH and the SPADI in the
Each completed questionnaire package was patients with wrist conditions. The disability or func-
checked by research staff for such things as missing tion scores of the SPADI and the Brigham were the
items and duplicate responses and was then entered focus of this analysis.
into a customized database at each site. These data
were converted into SAS (version 6.12) data sets (SAS Test-Retest Reliability
Statistical Analysis Systems, Cary, North Carolina)
and merged. All analysis was done in SAS. Test-retest reliability was analyzed using data
from patients who had completed two measures
Sample Description before treatment began (three to five days apart) and
said that their ann problem had not changed (in
Baseline demographics for the whole cohort were response to the question, "How is your problem now
analyzed descriptively. This was repeated separately compared to before your treatment/surgery?")
for subjects from each of the two data collection sites. .Mean change scores and associated paired t statistics
The variables described included age, gender, educa- (and p values) were calculated. Correlation coeffi-
tion, and clinical variables (comorbidity, pain med- cients were obtained using both the Pearson method
ication use, duration of symptoms, etc.). The SF-36 (parametric, for normally distributed data) and the
generic health status measure32,33 was also used to Spearman method (non-parametric, using ranks).
describe baseline overall health in the cohort. Means These correlations indicate whether scores for a given
and medians for each dimension of the SF- 36 were patient are high at baseline and also high at follow-
calculated for the entire cohort and plotted against up but not whether the scores are identical.
mean values for the general Ll.S. population. Intraclass correlation coefficients (ICCs) provide an

estimate of how closely the numeric scores for each the-error term and replace the test-retest reliability
patient were to each other (called concordance,and coefficient with the Cronbach alpha coefficient. The
are therefore considered a stronger statistic for use of the alpha moves away from longitudinal sta-
describing reliability,z,39,4o Specifically, we used a bility as the source of the variance and favors instead
Shrout and Fleiss (2,1) model derived from a two- an instrument with high correlations between items,
way analysis of variance.t" By adopting this particu- a cross-sectional strength." Wyrwich et al. argue,
lar model, we are saying that the testing framework and we agree, that in cases of very high sample sizes,
in this particular study (three to five days apart, pre- the Cronbach alpha and test-retest coefficients will
treatment) is assumed to be only one of many possi- be almost the same34,46,52; however, in our experience
ble ways the test-retest reliability could have been in clinical research, such large samples (N - 300)34
assessed. We considered a coefficient between 0.90 samples are rarely found. Test-retest coefficients
and 0.95 a minimum standard for reliability, based derived from similar patients are preferable. Caution
on the guidelines of Lohr et al. 25 and others. 34,4l for should be used to avoid confusing the Wyrwich co-
the ability to interpret questionnaire scores in indi- efficient with what appear to be similar coefficients
vidual patients. under the rubric of minimal detectable change or reli-
The final estimate of reliability, the minimally ability change indexes.
detectable change (MOC),42-46 was calculated using
the test-retest reliability coefficient to estimate the Responsiveness to Change
standard error of measurement (SEM) for the differ-
ence score.t' The SEM was therefore the SO at base- The ability to detect change when it has
line (a base) times the square root of (1- RxX>, where R xx occurred,53,54 or responsiveness, is often incorrectly
is the test-retest reliability.t! Christensen and felt to be a fixed property of an instrument (e.g., the
Mendoza'f and others4l,47 suggest multiplying the XXX questionnaire is "responsive"). However, it is
(1-R xX> by 2, to adjust for the fact that when looking more like construct validity, in which we are validat-
at a change score two samples are being used (time-I ing the application of the instrument in a specific test
and time-2), each with measurement error. The SO is situation, not the instrument in and of itself.
therefore increased by this factor to account for this. Evaluation of responsiveness requires that some sort
From this calculation of the SEM, it is possible to of change has occurred (and that it can be verified in
describe the amount of change that would need to be some way), and then the questionnaire's scores are
observed on the questionnaire to exceed this meas- tested against that change. It is possible that a given
urement error. In health measurement, the terms questionnaire could be responsive or sensitive to a
"minimally detectable change,,45,48 and "smallest particular type of change but not to another type. 55,56
detectable difference or change,,47 have been used for Responsiveness needs to be described in relation to
this change score. It is calculated by multiplying the the relevant type of change.
SEM by the appropriate z value (depending on the In our study, we looked at three types of change
level of confidence desired). Thus, for a minimal known to have occurred between baseline and 12
detectable change at the 95% confidence level, the weeks after the beginning of treatment. First, we
formula would be assumed that patients would likely begin to show
improvement with each of the treatments in the study
(i.e., total shoulder replacement, carpal tunnel release),
and therefore we compared pre-treatment and 12-
The subscript 95 is used with the label because it is week post-treatment scores in the whole group. We
possible to calculate this change score at different lev- recognize that this does not reflect full recovery, but
els, including 90%48 and 67%.46 The MOC95 yields a patients are likely to have had small improvements
threshold-a minimum change score-that allows from their pre-treatment states by this time.
you to be 95% confident that when you observe a Second, we looked at patients who said that their
change score that is greater than this value, it is likely upper-limb problem was better. We determined this
to indicate a real change in your patient, rather than by their response to the Tl-point scale asking,
measurement error (for that instrument in that popu- "Compared to before your treatment/surgery, how
lation) alone. The MOC provides a unique opportuni- much has your arm (either shoulder or wrist/hand)
ty to translate the test-retest reliability coefficient problem changed?" Those indicating more than 6 on
(ICC) into units of change in the instrument. the scale (where 5 indicates no change and 10 indi-
It should be noted that the ways others calculate cates much better) were considered to have
these values vary. In early work, Jacobson et a1. 43 did improved. Different cut-off points could have been
not double the error term; however, Jacobson has used; however, we chose this point on the basis of an
since chan§ed this position and adopted the adjusted a priori consensus of four of the authors (O.B., C.B.,
formula.t'" 1 Wyrwich et al., in their work on bounds J.G.W., and J.N.K.). Similarly, an indication of
of relevant change,46,52 do not use the adjustment-to- improvement of more than 6 in ability to function in

TABLE 1. Sociodemographic Data for the daily activities was used as an external indication
Study Participants that change had occurred, and responsiveness analy-
Whole
Boston Toronto
sis was carried out on that subsample of patients
Cohort who indicated more than 6 of 10 on the scale.
(n=91) (n=109)
(n =200) In all cases, responsiveness was summarized using
Follow-up rate-no. (%) the following statistics-change scores (the mathe-
completing baseline and 172 78 94 matical difference between baseline and follow-up
12-wk follow-up (86%) (86%) (86%)
scores), effect size57 (mean change divided by the SD
Age (mean 53.6 54.4 52.9 of baseline scores) and the standardized response
Gender:
mean58--{jO (mean change divided by the SD of change
scores, or SRM). A comparison was then carried out
Male 86 29 57 contrasting the responsiveness (using the SRM) of the
Female 113 62 51 DASH, the Brigham, and the SPADI in each of the
subcohorts-patients with shoulder conditions and
Marital status:
patients with wrist or hand conditions. We thus eval-
Married/living 146 65 81 uated the responsiveness of a shoulder questionnaire
Divorced/separated 17 8 9 to improvements in wrist patients and vice versa. It
was hypothesized that the responsiveness of the
Widowed 14 7 7
DASH should be comparable with that of the joint-
Single 21 11 10 specific measures but that the highest responsiveness
Schooling: would be found for the joint-specific measures for
patients in whom that joint was involved.
Less than grade 8 10 1 9
Responsiveness was also described by correlating
Some high school 28 3 25 change scores on the DASH with changes in pain
(14%) (3%) (23%) intensity, function, and severity of the problem. Two
High school 31 11 20 approaches were used to measure these attributes.
First, numeric rating scales were used to gather these
Some college or university 29 14 15
ratings at each testing time (we called these "status
Graduated from college or measures"), and the differences between the status
university 98 61 37 measures at 12 weeks and at baseline (we called these
(50%) (67%) (34%)
"differences in status measures") were correlated with
Work status: the difference in DASH scores. Second, we also asked
Full time 78 39 39 patients (at 12 weeks) to rate the amount these same
attributes had changed since baseline (pre-treatment).
Part time 20 14 6
We called this "a transition approach." It would be
Disabled because of DE 23 2 21 hypothesized that both approaches should lead to at
(12%) (2%) (19%) least moderate correlations with changes in the DASH
Disabledfor other reason 8 3 5 scores. Slightly lower correlations were expected
because the comparison measures are made up of only
Homemaker 15 7 8
one or two items and hence are prone to more meas-
Retired 49 22 27 urement error, which would lower the correlation
Student 2 2 0 coefficient (attenuation of correlatiorr'"). The correla-
tions are also reduced because the change scores are
Worker's compensation:
derived from two samples (pre- and post-treatment),
No, not on worker's which also adds to the error in the measure (as dis-
compensation 156 82 74 cussed above) and causes further attenuation of the
Yes, but not yet 14 2 12 correlation. We therefore considered an r value of
about 0.4 indicative of a moderate correlation.
Yes, receiving it 14 0 14
We also constructed receiver operating characteris-
Yes, but no longer 9 4 5 tics (ROC) curves, as described by others.61,62 These
Lawyer for DE? curves demonstrate how accurately different change
(% yes) 13 6 7 scores on the questionnaire distinguish those who are
(6%) (7%) (6%) better from those who are not (as defined by some
NOTE: The first column represents the whole cohort, the second other criterion). In our study we used an affirmative
and third that portion of the cohort coming from each of the sites. answer to the question "Can you cope with your
The differences between the Boston and Toronto cohorts are problem and do what you would like to do?" at fol-
noted, although they are not likely to affect analysis of the ques-
tionnaires in the whole cohort. Analysis across study sites was not low-up, given an inability to cope at baseline, as the
done. criterion of improvement. We selected this question
\32 JOURNAL OF HAND THERAPY

on the basis of a qualitative study in which this was TABLE 2. Clinical Characteristics of the
described as a threshold type of indicator of being Study Participants
"better.,,63 This seemed to be a reasonable marker for Whole Sample Boston Toronto
Characteristic
change; however, it is not a "gold standard," and (n=200) (n=91) (n=109)
other markers could also have been used. Region affected:
Changes in DASH scores of -1,-5, -7, -10, -IS, and
Shoulder 138 61 77
-20 were considered" and were compared with the
external marker to see how well each change score cor- Wrist/hand 62 30 32
rectly corresponds to the external marker. The true- Duration of symptoms
positive rate (the percentage of people who had a (mean weeks): 193 178 206
change score of at least that amount and were also (SD 374)
now able to cope) and the false-positive rate (l minus Medication use:
specificity, or the percentage of people who had a Aspirin, NSAID 117 60 57
change score of at least that amount but had not shift- Tylenol (OTC) 90 39 51
ed from being unable to being able to cope) were cal- Narcotics 59 20 39
culated and plotted on a graph-a ROC curve.66 The
Other 45 20 25
area under the curve represents the responsiveness;
the larger the area, the more responsive the instru- Not taking medication 17 9 8
ment, because the different change scores accurately Comorbidity:
discriminate between improved and non-improved Hypertension 52 23 29
patients as defined by our external marker. In a ROC
Asthma 23 12 11
curve for one instrument, the point highest to the
upper left might be considered the change score most Diabetes 13 4 9
able to discriminate between those who have shifted Ulcers 29 10 19
to coping with their condition and those who have not. Depression 51 21 30
(26%) (23%) (27%)
RESULTS Cancer 16 8 8
Sample Description Arthritis 83 41 42

Low back pain 124 59 65
Two hundred patients were enrolled and complet- (62%) (65%) (60%)
ed the baseline portion of this study. One hundred No. of comorbid conditions:
and seventy-two completed the 12-week follow-up None 29 11 18
questionnaire (86% follow-up rate). The description
One 48 23 25
of the samples differed between sampling sites.
However, because we are making within-person, and Two 57 25 32
not between-site, analyses, the differences should not Three 29 18 11
affect the results. Four 25 9 16
The description of the sample is shown in Table 1.
Five or more 12 5 7
The first column reflects the findings in the whole
cohort, the second and third columns the Boston
(n = 91) and Toronto (n = 109) subsamples, respective-
ly. The mean age and marital status were similar The level of education of patients differed between
across sites. The average age was 42 years, and the sites. In Boston 67% had graduated from university
majority of people were married. The split between or college, whereas in Toronto only 34% had done so.
men and women was fairly even in the whole sam- In contrast, a larger proportion of the Toronto sample
ple, but most of the male patients came from Toronto. indicated that their highest educational level was
"completed some high school" (25%) than the Boston
sample (3%). Level of employment differed, with
*The reasons these change scores were selected for the cut-offs are more people being off work due to their upper- limb
as follows: 1 is closest to the smallest change detectable on the
DASH (0.83).63 A change of 5 is equivalent to one standard error problem in Toronto (19%) than in Boston (2%).
of measurement, which Wyrwich et aL46 suggest is close to a min- Likewise, the Toronto group had a higher proportion
imally clinically important difference; the cut-off suggested by of persons on worker's compensation.
Redelmeier and Lorig64 is 7 (or 7% change). A cut-off of 10 is
selected for convenience only. A cut-off of 15 corresponds to the Table 2 summarizes some of the clinical findings for
minimally detectable change derived from the work of Turchin et the whole cohort as well as for each site. The majority
al. 19 but also corresponds to the suggestion of Redelmeier et a1.65 of the patients with shoulder conditions were from
that the criterion for an important change is 0.5 points per item in
a questionnaire (therefore, 30 x 0.5). Finally, we selected a cut-off of Toronto, especially those with osteoarthritis (undergo-
20 to represent an extreme change score. ing shoulder replacement). The mean duration of

100 , - - - -- -...- - - - - - -- - . - - - , - - --,
FIGURE 1. SF-36 health status: Results for entire cohort
compared with U.S. general population data (solid
squares),32 expressed as baseline mean scores (gray line)
and median scores (solid triangles) in the eight SF-36
dimensions, and as two component scores. The eight SF-
60 36 dimensions are physical function (PF), social function
(SF), role-physical (RP), role-emotional (RE), mental
health (MH), vitality (VT), bodily pain (BP), and general
40 health perceptions (GH); the component scores are the
physical component score (PCS) and the mental compo-
20 . . - _. _.. . . - _. - _. -_ . - _. - _. _. -- - _. - _. _. - _. nent score (MCS). A score of 100 indicates good health.
Population norms for the United States were done on SF-
36-US with a 4-week window. The study data were gath-
o '-- J
ered on the Canadian version of SF-36 Acute.
PF SF RP RE MH VT BP GH PCS MCS
Number
35
whole cohort mean=43.9
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . s.dev'n=22.9 -
30
__ _ _ _ _ . median=44.6 _
25
20
15
10
o
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
" ,,<::I '),<::1 ~<::I ",,<::I ,,<::I '0<::1 ~<::I '0<::1 0,<::1 -c
DASH Score
shoulder patients wrisUhand patients
40 , . - - - - - - - - - - -- - - - - - - -, 40 . - -- - - - - - --'--- - - - - - - --,
mean =48.4 mean=34 .2
s.dev'n=21.2 s.dev'n=23.69
_ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ medlan=31 .25
30 - - - - - - - - - - - - - - - - - - - - - - - - - - median=50 .0 - 30
20 - ---- - - - - - - - - - - - - - - - - - - - - - 20 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
o
~ .!' ,,(:/,,0, -5'~ 4f} tl'o, <,ff<f> ~ '\(:/~ .a!' cfirf' ,,<f' ~ -?o, ,,(:/~ -5'~ -¥"? ...,:l <,ff<f> ~ '\(:/'\0, .a!' cfirf' ,,<f'
DASH Score DASH Score
FIGURE 2. Baseline distribution of DASH scores (out of 100) for the entire sample (top) and for the subsamples defined by the loca-
tion of patients' problems-shoulder (bottom left) or wrist/hand, (bottom right). A DASH score of 100 indicates greater disability.
TABLE 3. Construct Validity; Pearson/Spearman Correlations Between the DASH and

Other Measures of Upper Extremity Function
Whole Cohort (n=200) Shoulder (n = 138) Wrist/Hand (n = 62)
Overall rating of problem 0.71 /0.69 0.69/0.68 0.68/0.74
Pain severity 0.72 /0.72 0.73/0.71 0.67/0.71
Ability to function 0.79/0.79 0.84/0.85 0.75/0.78
Ability to work 0.76/0.77 0.76/0.76 0.69/0.74
SPADIpain 0.82/0.82 0.79/0.76 0.84/0.85
SPADI function 0.88/0.87 0.85/0.83 0.92/0.88
Brigham symptoms 0.71 /0.70 0.73/0.71 0.70/0.73
Brigham function 0.89/0.89 0.90/0.90 0.92/0.92

symptoms was 193 weeks, more than three years, and Test-Retest Reliability
both sites offered secondary or tertiary levels of care.
The majority of the sample were taking some sort of Fifty-six of the 86 people completing the test-retest
medication to manage symptoms preoperatively, reliability package (three to five days after baseline)
including 59 patients (of the 200 total) who stated that indicated that they had no change in their problem
they were taking narcotics for their pain. Of interest is (no change ± 1 response category on an ll-point tran-
the high proportion of patients with histories of sitional scale). The mean change score in this group
depression (23% in Boston and 27% in Toronto) and was -0.15 (median, 0), with an SO of 6.08. The differ-
low back pain (65% in Boston and 60% in Toronto). ence was not significant (paired t statistic, -0.176;
The general health of the cohort is shown in Figure 1, p value, 0.86). The correlation between baseline and
where the mean and median scores in each dimension retest was 0.96 (Pearson correlation) and 0.95
of the SF-36health status measure are plotted with data (Spearman ranked correlation). The ICC(2,1) on this
from the general population ofthe United States. sample (n = 56) was 0.96 (95% confidence interval,
0.93-0.98) for the DASH, indicating excellent agree-
Construct Validity ment. 25,34,41,67 The SEM is 4.6 DASH points, which led
to a minimal detectable change (MDC 95) of 12.75 (SO
The distribution of the baseline scores on the at baseline, 23.02) on a 100-point scale. A 90% MDC
DASH appear to be normally distributed (Figure 2), (MDC 90 ) would be 10.7 of 100 DASH points.
with mean of 43.9 and median of 44.6. Only one
patient was at the "ceiling" (with a score of 0), indi- Responsiveness
cating perfect health, and no one was at the floor
(with a score of 100, indicating maximum disability The DASH questionnaire was able to demonstrate
on the scale). The distribution for the patients with change in all situations in which change was pre-
shoulder conditions and patients with wrist or hand sumed to have occurred-before and after treatment
conditions, shown in the same figure, demonstrates (SRM, 0.74-0.80) and in those patients who either
the less severe disability in the patients with wrist said that their problem was better overall or that their
and hand conditions, described by the DASH scores. ability to function had improved (SRM, 0.92-1.40)
Discriminative validity was confirmed. Those cur- (Table 4). Standards for a "good" or "large" respon-
rently working with their upper limb condition and siveness statistic have little meaning, because they
able to continue doing so had significantly lower dis- are dependent on the type of change being examined.
ability than those who were not able to work (26.8 vs. The distributions of the change scores are shown in
50.7, t=-7.51, p<O.OOOl). (This analysis contrasted Figure 3. The large histogram shows the change for all
only these two subgroups and did not analyze the patients before and after treatment (whether they got
responses from those who were retired or not work- better or worse) as well as the change for the subgroups
ing for reasons other than their upper limb condi- of patients who said their problem was better and those
tion.) Statistically significant differences in DASH who said their function was better.
scores were also found between those who were able The DASH was found to have comparable or slight-
to do all they want to do as opposed to those who ly better responsiveness than the joint-specific meas-
were not able to do so (23.6 vs. 47.1, t = -5.81, ures; Table 5 and Figure 4 summarize these compar-
P< 0.0001). Similar discrimination was found within isons using the SRM statistic. When comparisons were
the patients with shoulder conditions and those with made at the subgroup level (by region of injury), the
wrist or hand conditions when these groups were DASH remained comparable with (only lower, by 0.04
analyzed separately. Thus, the difference was in the to 0.08, in patients with wrist or hand conditions) or
anticipated direction (patients who were unable to do better than the disease-specific measures.
what they wanted and those who were unable to Correlations between differences in the status
work had more disability and higher DASH scores), measures (self-ratings at follow-up minus self-ratings
and the difference was statistically significant. at baseline) and the change scores on the DASH were
Convergent construct validity of the DASH was moderately high (Pearson r>0.65) (Table 6). Those
demonstrated by finding correlations in the expected with the transitional indexes ("how are you now
direction and of the expected magnitude with other compared to before") failed to meet our modest stan-
measures of upper limb function and symptoms. dard of 0.4 (correlations were in the range of
Table 3 summarizes the results. In the whole cohort, 0.32-0.39) except in the overall rating of change in
all correlations exceeded 0.70 (Pearson). Correlations their problem (Pearson r=0.40, Spearman r=0.43).
were highest with the measure of function as well as The correlations between the DASH scores and the
with the function scores on the Brigham and the differences in state measures were of the hypothe-
SPAD!. Correlations between the DASH and these sized magnitude; however, those between the transi-
joint-specific instruments were found even in the
opposite joint. Text continues on p. 138

TABLE 4. Responsiveness of the DASH to Clinical Changes, Expressed as Mean (SD)
Baseline Follow-up Change Score Effect Size SRM
All patients completing baseline
and follow-up (n =172):
Observed change 44.5 (22.7) 30.9 (22.8) -13.3 (16.9) 0.59 0.78
Those rating problem as better (>6/10) 42.9 (22.9) 24.9 (20.2) -17.3 (16.4) 0.75 1.06
Those rating function as better (> 6/10) 40.7 (23.4) 20.2 (19.2) -19.7 (16.5) 0.84 1.20
Shoulder patients:
Observed change 48.3 (21.0) 35.3 (21.3) -13.4 (16.6) 0.64 0.81
Those rating function as better (» 6/10) 49.1 (21.9) 24.3 (17.6) -23.7 (16.4) 1.08 1.44
Wrist/hand patients:
Observed change 33.8 (22.8) 20.5 (22.9) -13.0 (17.5) 0.57 0.74
Those rating function as better (>6/10) 27.4 (19.4) 13.5 (20.3) -13.2 (14.5) 0.68 0.91
NOTE: The mean [and SDI for the DASH score at baseline and follow-up are shown, with the change score, effect size (mean change divid-
ed by SD of baseline), and standardized response mean (mean change divided by SD of change). Data are shown only for patients with data
for both baseline and 12-week follow-up. SRM indicates standardized response mean.
Number , . . - - - - - - - - - - - - - - - - - - - , - - - - - - - -- - ---,
eve change score = g. +ve change score =
40 less disability
III
~ more disability - - - - - -
110 _
30
20
10
o
<-50 40-4930-3920-2910-19 1-09 0 1-09 10-1920-2930-39
Change in DASH (observed change)
in those saying arm problem better (>6) (n=112) in those saying ability to function better (>6) (n=79)
35 35 ,----- -- - - - - - - - t - - - - - - - - ,
30 30 ------- - ----- ----- - - - - - - - - -- - - - - - -
25 25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
20 20 -- --- --- -- - -
II
15 15
10
5
o
«so 40-49 30-39 20-29 10-19 1-09 0 1-09 10- 19 20-2930-39 <-50 40-4930-3920-2910-19 1-09 0 1-09 10-1920-29 30-39
Change in DASH Change in DASH

FIGURE 3. Number of subjects falling into lO-point score ranges for change in DASH scores, using 12-week follow-up data, for all
patients (top), patients who said their arm problem was better (bottom left), and patients who said they wereable to function better
(bottom right). A score of 0 (no change) is shown to provide an anchor.

TABLE 5. Comparison of Responsiveness of the DASH and Two Joint-specific Measures, the SPADI and the Brigham,
Expressed as Standardized Response Means (SRMs)
DASH SPADI Function Score BrighamFLs
All patients:
Observed change 0.78 0.62 0.64
Those rating problem as better (>6/10) 1.06 0.84 0.86
Those rating their function as better (>6/10) 1.20 0.86 1.07
Shoulder patients:
Those rating function as better (>6/10) 1.44 1.13 1.24
Wrist/hand patients:
Those rating function as better (>6/10) 0.91 0.54 0.87

NOTE: The SRM (mean change score divided by SD of the difference) is used as the summary statistic; the SRM values are the same as those
shown in Table 4. SPADI indicates Shoulder Pain and Disability Index; FLs, functional limitations.
whole cohort _DASH

Observed DSPADI
change I:lBrigham
"Problem better"
"Function better"
................. ,., . ", .
..................................
..... ..
, ,
.. . ....... . . . ... ......... . . . ...........
., . ........... . ..
. . ........... . ..............
"
"
, , ,
.. , , ,
.
o 0.2 0.4 0.6 0.8 1 1.2 1.4
SRM
S h0 U Ider patienr-:-t-=-:s-:;-:-:----, wrisUhand patients
_ DASH _ DASH
C SPADI C SPADI
IUlBrigham IUlBrigham
~ ~
~ ~0
-.;f
",'" 0~
,q\# ,q\#
~ ~
~0 ~0
-.;f -.;f
~,-#, 0 0.2 0.4 0.6 0.8 1.2 1.4 !#' 0 0.2 0.4 0.6 0.8 1.2 1.4
,</:1 .«,"
SRM SRM
FIGURE 4. Responsiveness of the DASH compared with joint-specific measures (the SPADI and the Brigham function scores) for the
entire sample (top) and for patients by affected region-shoulder (bottom left) and wrist/hand (bottom right). Standardized response
meansareshown for the wholecohort (observed change), thosewho said their problem was better(n = 112), and thosewho said they were
ableto function better (n = 79).

Table 6. Correlation Between Change in DASH Score and Change in Self-rated Pain, Function, and
Severity of Upper-limb Problem
Transition Scale Status Ratings Difference
Correlation
Change in Change in Change in Change in Change in Change in
Problem Pain Function Problem Status Pain Status Function Status
Pearson OAO 0.38 0.32 0.66 0.65 0.69
Spearman 0.43 0.37 0.39 0.60 0.62 0.63
NOTE: "Correlation" refers to the correlation with the change in DASH scores. Correlation was measured in two ways-first on an ll-point
transition scale (from "much worse" to "much better") at 12 weeks and then as the difference in status ratings (pain at 12 weeks minus pain
at baseline, each using a 7-point rating scale on both testing occasions). Pearson (parametric) and Spearman (ranked) correlations coefficients
are shown.
TABLE 7. Sensitivity and Specificity of Different Text continued from p. 135

Levels of Change
tional indexes and the change in DASH scores were
Sensitivity Specificity Accuracy (%) at or just below the expected level of 0.40.
x less than: The ROC curves, as shown in Table 7 and Figure 5,
<-1 0.92 0.29 48
also demonstrated that the change scores in the DASH
were more sensitive to ability to cope than to just
< -5 0.86 0.38 52 chance alone. In ROC curves, "chance" is equivalent
< -7 0.82 0.42 54 to the diagonal on the graph, where sensitivity equals
0.82
1 minus the specificity. Along this line the change
<-10 0.53 62
score gives no more information about who is better
< -15 0.67 0.68 68 and who is not than would chance alone (e.g., flipping
<-20 0.59 0.78 72 a coin). The point farthest up and off to the left from
the diagonal is often considered the most discriminat-
NOTES: The basis for comparison at different levels of change was a
"yes" answer to the question, "Are you able to cope with your prob- ing. In our graphs, a change score of at least -15 or -20
lem and do what you would like to do?" at follow-up after a "no" appears to be the most discriminative for the criterion
answer to the same question before treatment. x indicates the change of the patient's becoming able to cope; a score of -15
in DASH score. Sensitivity is the probability of having a change in
DASH score of x or less, given that the patient is better. Specificity is correctly rated 68% of the sample, and a score of -20
the probability of not having a change in DASH of x or less, given that correctly rated 72% (Table 7). Using the self-rated
the patient is not better. Accuracy is the percentage of patients whose scores (in which a score of more than 6 of 10 indicated
change score of x or less correctly classifies them as better or not bet-
ter. A change score of -15 or -20 was best able to discriminate improvement), as used in the statistical summary of
between those who were better and those who were not better, responsiveness, much lower change scores were
according to the selected criteria (see footnote on p. 133). found to be the most discriminating (for a change of
It is important to remember that a negative change score on the -lor a greater negative value, accuracy was 75%, sen-
DASH means less disability and, therefore, an improvement. Thus, a
change of less than -1 means just a little bit of improvement, and a sitivity 0.87, and specificity 0.44). This highlights the
change of less than -20 means a much greater improvement. effect of the external marker in this type of analysis.
Sensitivity
1
<-1
<-1 o~_----<..- <-5
0.8 -------------- ---- -------- ----- -<·-1 --- --------
FIGURE 5. Receiver operating characteristics
(ROC) curve describing the ability of different <15
amounts of change to differentiate between patients 0.6 - - - - - - < -20
who went from not coping to coping with their dis-
order (an improvement) and those who did not. The
"best" cut-off, in terms of accuracy, would be that 0.4
shown at the upper left of the curve. Changes of less
than -15 or -20 appeared most accurate on this
curve. Higher change scores were not assessed. 0.2
o
o 0.2 0.4 0.6 0.8
1- Specificity

DISCUSSION description of responsiveness. For the same change
in the same patients, we demonstrated a variation up
This study has provided evidence of the construct to 1.33-fold in the responsiveness described by the
validity, test-retest reliability and responsiveness of effect size statistic compared with the SRM (0.68 and
the DASH Outcome Measure in patients undergoing 0.91, respectively). This difference would also span
treatment for either proximal or distal disorders in the often used (or misused) guideline for what
the upper limb. Cohen68 calls a moderate vs. a large effect. However,
the difference in the numeric estimates we were com-
Reliability, Validity, and Responsiveness paring was attributable to the statistic chosen alone.
of the DASH Responsiveness was also described by correlating
changes14,61 in the DASH with changes in three
The DASH outcome measure exceeded recom- attributes (pain, function, and problem), each meas-
mended standards for test-retest reliabilitr,25,34,41,67 ured in two ways-with transitional scales and with
for both individual- and group-level interpretation of difference-in-status measures performed at baseline
the scores." Generally, test-retest coefficients need to and at 12 weeks. Our results suggested a distinct dif-
exceed 0.90 or 0.95 before their interpretation on an ference between the transitional approach (correla-
individual level can be considered. For group-level tions of 0.32 to 0.43) and the difference-in-status
interpretation, lower coefficients (approximately approach (correlations of 0.60 to 0.69). Given that the
0.75) are acceptable. 34,40 Our results provided a coef- concepts (pain and function) being measured using
ficient of 0.96, which is only slightly higher than that the two approaches were the same, the differences
of Turchin et al. 19 (0.92 in patients with stable elbow might be attributed to the way the questions were
conditions), and these are not likely to be significant- asked (transition vs. difference in status), leading to
ly different in magnitude. several possible explanations. Differences in a transi-
Evidence of both convergent and known-groups tional approach vs. differences in status over time
validity of the DASH was also found. Convergent could be due to recall bias69,7o or to a change in how
validity was shown by demonstrating moderate to people cognitively formulate a response when asked
high correlations with other markers of disability and to describe a current state as 0fJosed to recalling a
symptoms, and known-groups validity by showing change in that state over time.6, -74 Changes in how
differences between the DASH scores of patients who people calibrate or define pain, health, and quality of
were working or functioning and the scores of those life over time have been described in the literature
who were not. The DASH validity was comparable (e.g., "response shift phenomenon,,75-77) and could
with previous results using this questionnaire in have influenced our results.?5-80 All these things
other populations. The findings of Hudak et al. 16 and could be possible reasons for the difference in the
Turchin et al. 19 and the results of the field-testing'' correlations between the change in DASH score and
produced similar findings, although against a small- the two approaches to determining whether change
er number of constructs. had occurred (difference in serial state measures vs.
The DASH was also responsive to the different the transition style of constructs). We are not sug-
types of change designed into this study-specifical- gesting which is better; arguments can be made in
ly, change observed before and after treatment of the both directions. 69,77,81-83 Like Fischeret al.,84 we have
target conditions and change in those patients who demonstrated the difference in the two approaches
said they were better. and, like them, we do not have evidence to suggest
Our results also highlighted two important issues which is better.
that concern responsiveness. First, the size of the
responsiveness statistic varied with the type of The DASH in Comparison with
change that was being quantified. The SRMs for Joint-specific Measures
observed change were lower in magnitude than those
for the change in patients who said they were better. The DASH had high correlations with the two
This finding supports the taxonomy for responsive- joint-specific measures, the Brigham and the
ness that we have presented elsewhere,55 which sug- SPADI-a pattern that persisted when the joint-spe-
gests that instruments are "responsive to" different cific measure was applied in the other region
types of change to different degrees. They do not (Brigham vs. DASH in shoulder patients, 0.90; SPADI
inherently possess a trait of being "responsive." vs. DASH in wrist patients, 0.92). Therefore, our
Comparisons of the responsiveness of different hypothesis that there would be a difference in the
instruments should only be conducted when similar correlations when the joint-specific measures were
types of change are being tested or, ideally, when the applied in the wrong joint was not supported.
instruments are placed in a head-to-head comparison. On a cross-sectional basis, the joint-specific meas-
Second, our results also showed, as did Wright and ures performed well in the "wrong" joint. To our
Young/" that the choice of statistic will affect the knowledge, this is the first time joint-specific meas-

ures in the upper limb have been purposefully upper extremity measures available in a clinic or
applied to the "wrong" joint. Our results support our research setting.
earlier findings that items from joint-specific ques- Second, the high levels of reliability (the Cronbach
tionnaires could not be identified as relevant to only alpha was 0.97 at baseline in this study) indicate that
one joint in the extremity by a panel of experienced a much shorter measure could have acceptable relia-
clinicians.' In that study, experts classified over 70% bility34 with less respondent burden. The longer ver-
of the items they were given (e.g., using a phone) as sion will probably always retain a statistical psycho-
being potentially affected by impairments across the metric advantage; however, the shorter might pro-
upper extremity and not just to problems in one joint; vide a more useful version for studies that entail a
that is, using a phone might be a relevant item for significant respondent burden, such as epidemiolog-
patients with shoulder, elbow, or grasp problems. ic surveys and detailed outcome studies.
The results of our current study show similar results, This study has demonstrated that the DASH out-
but at the whole questionnaire level. The items in the come measure has good construct validity, test-retest
Brigham appear to be sensitive to impairments in the reliability, and responsiveness to change (specifically
shoulder, and the items in the SPADI appear be sen- change before and after four groups of treatments as
sitive to impairments in the wrist. well as change that was estimated to have occurred
The responsiveness of the DASH, with two excep- by the patients). This evidence has been provided for
tions, was also comparable with or better than that of both proximal and distal disorders, which suggests
the joint-specific measures. The exceptions were in that the DASH has a role as a measure of physical
the group of patients with wrist or hand problems, in function and symptoms in any single or multiple dis-
which the SRMs for the Brigham were slightly high- orders of the upper limb. Further work will focus on
er than for the DASH for two of the three compar- its use with other patient groups and on the interpre-
isons (observed change, 0.02 higher for the Brigham; tation of specific scores at the individual level.
change in those who rated their problem as better,
0.01 higher). Acknowledgments
Many would consider the values comparable; the
The authors thank Ms. Elaine Harniman in Toronto and
difference would not be statistically significant. Ms. Joanne Nicklas in Boston, who implemented the study
Nevertheless, in 16 of 18 comparisons between the protocol. They also thank Dr. Robin Richards, Dr. James
DASH and the joint-specific measures, the DASH sta- Mahoney, and Dr. Mike McKee for allowing them access to
tistics were larger. The Brigham questionnaire was their patients; Dr. Louis Bessette, who contributed to the
also responsive to changes in patients with shoulder conceptualization and design of this project; and the
patients who participated in the project.
problems (although its responsiveness was lower in
magnitude than that of the DASH). The SPADI
showed only moderate SRMs (0.43 to 0.64) in the REFERENCES
patients with wrist or hand problems, whereas the
SRMs of both the DASH and the Brigham were larger 1. Jette AM. Physical disablement concepts for physical therapy
research and practice. Phys Ther. 1994;74:38Q--6.
(SRM ±0.74). Both the Brigham and the SPADI 2. Cole B, Finch E, Gowland C, Mayo N. Physical Rehabilitation
appeared responsive outside their region of specialty. Outcome Measures. Toronto, Ontario: Canadian Physio-
These results suggest that the responsiveness of the therapy Association, 1994.
DASH was equivalent to that of both joint-specific 3. Verbrugge LM, Jette AM. The disablement process. Soc Sci
Med.1994;38:1-14.
measures and better than that of either in the whole
4. World Health Organization. International Classification of
cohort, affirming its utility for patients with any sin- Impairment, Disability and Handicap (ICIDH-2 beta version).
gle or multiple disorders in the upper limb. These WHO Web site, 1999. Available at: http:/ / www.who.int/icidh/.
findings are slightly different from those of Kirkley et Last accessed Mar 20, 2001.
al. 14 or MacDermid et al.}O who found the DASH 5. Davis AM, Beaton DE, Hudak P et al. Measuring disability of
the upper extremity: a rational supporting the use of a region-
slightly less responsive than a disease-specific meas- al outcome measure. J Hand Ther. 1999;12:269-74.
ure (although a different disease-specific measure 6. Brand PW, Hollister A. Clinical Mechanics of the Hand. 2nd
was used in each case). ed. Toronto, Ontario: Mosby Year Book, 1993.
The implications of our findings are two-fold. First, 7. Roach KE, Budiman-Mak E, Songsiridej N, Lertratanakul Y.
Development of a shoulder pain and disability index. Arthritis
the DASH has potential in the role of monitoring Care Res. 1991;4:143-9.
physical function and symptoms in shoulder and 8. Beaton DE, Richards RR. Assessing the reliability and respon-
wrist or hand disorders, as demonstrated by its valid- siveness of five shoulder questionnaires. J Shoulder Elbow
ity, high test-retest reliability, and responsiveness to Surg. 1998;7:565-72.
9. Stock SR, Cole DC, Tugwell P, Streiner D. Review of applica-
even small changes early in the recovery process.
bility of existing functional status measures to the study of
This provides a practical solution to the problems of workers with musculoskeletal disorders of the neck and upper
having to use multiple measures in patients with limb. Am J Ind Med. 1996;29:679-88.
multiple impairments and of having to have multiple 10. MacDermid jc, Turgeon T, Richards RS, Beadle M, Roth JH.

Patient-rating of wrist pain and disability: a reliable and valid Health Summary Scales: A User's Manual. Boston, Mass.: The
measurement tool. J Orthop Trauma. 1998;12:577-86. Health Institute, New England Medical Center, 1994:1:1-C:8.
11. MacDermid [C. Development of a scale for patient-rating of 33. Beaton DE, Hogg-Iohnson S, Bombardier e. Evaluating
wrist pain and disability. J Hand Ther. 1996;9:178-83. changes in health status: reliability and responsiveness of five
12. Levine DW, Simmons BP, Koris MJ, et al. A self-administered generic health status measures in workers with musculoskele-
questionnaire for the assessment of severity of symptoms and tal disorders. J Clin Epidemiol. 1997;50:79-93.
functional status in carpal tunnel syndrome. J Bone Joint Surg. 34. Nunnally [C, Bernstein IH. Psychometric Theory. 3rd ed. New
1993;75A(11):1585-92. York: McGraw-Hill, 1994.
13. Davis AM, Wright JG, Williams JI, Bombardier C, Griffin A, 35. Bindman AB, Keane D, Lurie N. Measuring health changes
Bell RS. Development of a measure of physical function for among severely ill patients. Med Care. 1990;28:1142-52.
patients with bone and soft tissue sarcoma. Qual Life Res. 36. Heald SJ, Riddle DL, Lamb RL. The Shoulder Pain and
1996;5:508-16. Disability Index: the construct validity and responsiveness of a
14. Kirkley A, Griffith S, McLintock H, Ng 1. The development region-specific disability measure. Phys Ther. 1997;77:1079-89.
and evaluation of a disease-specific quality of life measure- 37. William JW, Holleman DR, Simel DL. Measuring shoulder
ment tool for shoulder instability: The Western Ontario function with the Shoulder Pain and Disability Index. J
Shoulder Instability Index (WOSI). Am J Sports Med. Rheumatol. 1995;22:727-32.
1998;26:764-72. 38. Katz IN, Fossel KK, Simmons BP, Swartz RA, Fossel AH, Koris
15. McConnell S, Beaton DE, Bombardier e. The DASH Outcome MJ. Symptoms, functional status, and neuromuscular impair-
Measure: A User's Manual. Toronto, Ontario: Institute for ment following carpal tunnel release. J Hand Surg.
Work & Health, 1999. 1995;20A:549-55.
16. Hudak PL, Amadio PC, Bombardier C, and The Upper 39. Deyo RA, Diehr P, Patrick DL. Reproducibility and respon-
Extremity Collaborative Group (UECG). Development of an siveness of health status measures. Cont Clin Trials.
upper extremity outcome measure: The DASH (Disabilities of 1991;12:1425-58S.
the Arm, Shoulder and Head). Am J Ind Med. 1996;29:602-8. 40. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing
17. Marx RG, Bombardier C. Hogg-Johnson S, Wright JG. rater reliability. Psychol Bull. 1979;86(2):420-8.
Clinimetric and psychometric strategies for development of a 41. McHorney CA, Tarlov AR. Individual patient monitoring in
health measurement scale. J Clin Epidemiol. 1999;52:105-11. clinical practice: Are available health status surveys adequate?
18. Verbrugge LM, Ascione FJ. Exploring the iceberg: common Qual Life Rev. 1995;4:293.
symptoms and how people care for them. Med Care. 42. Christensen L, Mendoza JL. A method of assessing change in a
1987;25(6):539-63. single subject: an alteration of the RC index. Behav Ther.
19. Turchin DC, Beaton DE, Richards RR. Validity of observer- 1986;17:305-8.
based aggregate scoring systems as descriptors of elbow pain, 43. Jacobson NS, Follette We. Revenstorf D. Psychotherapy out-
function, and disability. J Bone Joint Surg. 1998;80A:154-62. come research: methods for reporting variability and evaluat-
20. MacDermid JC, Richards RS, Donner A, Bellamy N, Roth JH. ing clinical significance. Behav Ther. 1984;15:336-52.
Responsiveness of the SF-36, DASH, patient-rated wrist evalu- 44. Stratford PW, Binkley J, Soloman P, Finch E, Gill C, Moreland
ation and physical impairments in evaluating recovery after a J. Defining the minimum level of detectable change for the
distal radius fracture. J Hand Surg. 2000;25A:33D-40. Roland-Morris questionnaire. Phys Ther. 1996;76:359-68.
21. Guyatt GH, Kirshner B, Jaeschke R. Measuring health status: 45. Stratford PW, Finch E, Solomon P, Binkley J, Gill C, Moreland
What are the necessary measurement properties? J Clin J. Using the Roland-Morris questionnaire to make decisions
Epidemiol. 1992;45(12):1341-5. about individual patients. Physiother Can. 1996;48:107-10.
22. Hays RD, Anderson R, Revicki D. Psychometric consideration 46. Wyrwich KW, Nienaber MA, Tierney WM, Wolinsky FD.
in evaluating health-related quality of life measures. Qual Life Linking clinical relevance and statistical significance in evalu-
Res. 1993;2:441-9. ating intra-individual changes in health-related quality of life.
23. DeVellis RF. A consumer's guide to finding, evaluating, and Med Care. 1999;37:469-78.
reporting on measurement instruments. Arthritis Care Res. 47. Ravaud P, Giraudeau B, Auleley GR, Edouard-Noel R,
1996;9:239-45. Dougados M, Chastang e. Assessing smallest detectable
24. Guyatt GH, Naylor CD, Juniper E, Heyland DK, Jaeschke R, change over time in continuous structural outcome measures:
Cook DJ. Users' guides to the medical literature: XII. How to application to radiological change in knee osteoarthritis. J Clin
use articles about health-related quality of life. JAMA. Epidemiol. 1999;52:1225-30.
1997;277:1232-7. 48. Stratford PW, Riddle DL, Binkley JM, Spadoni G, Westaway
25. Lohr KN, Aaronson NK, Alonso J, et al. Evaluating quality-of- MD, Padfield B. Using the neck disability index to make deci-
life and health status instruments: development of scientific sions concerning individual patients. Physiother Can. Spring
review criteria. Clin Ther. 1996;18:979-92. 1999:107-12.
26. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. St. 49. Jacobson NS, Truax P. Clinical significance: a statistical
Louis, Mo.: Mosby, 1994:1-260. approach to defining meaningful change in psychotherapy
27. Fleiss JL. The Design and Analysis of Clinical Experiments. research. J Consult Clin Psychol. 1991;59:12-9.
New York: Wiley, 1986:1-432. 50. Speer De. Clinically significant change: Jacobson and Truax
28. Polanyi M, Cole DC, Beaton DE, et al. Upper limb work-relat- (1991) revisited. J Consult Clin Psychol. 1992;60:402-8.
ed musculoskeletal disorders among newspaper employees: 51. Jacobson NS, Roberts LJ, Berns SB, McGlinchey JB. Methods
cross-sectional survey results. Am J Ind Med. 1997;32:620-8. for defining and determining the clinical significance of treat-
29. Beaton DE, Cole De. Bombardier e. Estimating the burden of ment effects: description, application, and alternatives. J
WMSD: Does case definition make a difference? Toronto, Consult Clin Psychol. 1999;67:300-7.
Ontario: Institute for Work & Health, 1999. Working paper 52. Wyrwich KW, Tierney WM, Wolinsky PD. Further evidence
no. 60. supporting standard error of measurement based criterion for
30. Guyatt GH, Walter SD, Norman GR. Measuring change over identifying meaningful intra-individual change in health-relat-
time: assessing the usefulness of evaluative instruments. J ed quality of life. J Clin Epidemiol. 1999;52:861-73.
Chronic Dis. 1987;40(2):171-8. 53. De Bruin AF, Diederiks JPM, De Witte LP, Stevens FC},
31. Rosner B. Fundamentals of Biostatistics. 3rd Ed. Boston, Mass.: Philipsen H. Assessing the responsiveness of a functional sta-
PW5-Kent, 1990. tus measure: the Sickness Impact Profile versus the SIP68. J
32. Ware JE Jr, Kosinski M, Keller S. SF-36 Physical and Mental Clin Epidemiol. 1997;50:529-40.

54. Wright JG, Young NL. A comparison of different indices of tus: what we know about distortion. Med Care. 1995;33:
responsiveness. J Clin Epidemiol. 1998;50:239--46. AS89-94.
55. Beaton DE, Bombardier C, Katz IN, Wright JG. A taxonomy of 70. Mancuso CA, Charlson ME. Does recollection error threaten
responsiveness. J Clin Epidemiol. 2001; in press. the validity of cross-sectional studies of effectiveness? Med
56. Wright JG. The minimal important difference: Who's to say Care. 1995;33:AS77--88.
what is important? J Clin Epidemiol. 1996;49:1221-2. 71. Sprangers MAG, Hoogstraten J. Pretesting effects in retrospec-
57. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting tive pretest-posttest designs. J Appl Psychol. 1989;74:265-72.
changes in health status. Med Care. 1989;27(3suppl):S178-89. 72. Gibbons FX. Social comparison as a mediator of response shift.
58. Katz IN, Larson MG, Phillips CB, Fossel AH, Liang MH. Soc Sci Med. 1999;48:1517-30.
Comparative measurement sensitivity of short and longer 73. Ross M. Relation of implicit theories to the construction of per-
health status instruments. Med Care. 1992;30(0):917-25. sonal histories. Psychol Rev. 1989;96:341-57.
59. Liang MH, Fossel AH, Larson MG. Comparisons of five health 74. Allison PI, Locker D, Feine JS. Quality of life: a dynamic con-
status instruments for orthopedic evaluation. Med Care. 1990; struct. Soc Sci Med. 1997;45:221-30.
28(7):632--42. 75. Sprangers MAG, Schwartz CE. Integrating response shift into
60. Katz IN, Gelberman RH, Wright EA, Lew RA, Liang MH. health-related quality of life research: a theoretical model. Soc
Responsiveness of self-reported and objective measures of dis- Sci Med. 1999;48:1507-15.
ease severity in carpal tunnel syndrome. Med Care. 1994; 76. Schwartz CE, Sprangers MAG. Introduction to symposium on
32(1):1127-33. the challenge of response shift in social science and medicine.
61. Deyo RA, Centor RM. Assessing the responsiveness of func- Soc Sci Med. 1999;48:1505-6.
tional scales to clinical change: an analogy to diagnostic test 77. Armenakis AA. A review of research on the change typology.
performance. J Chronic Dis. 1986;39(11):897-906. Res Org Change Devel. 1988;2:163-94.
62. Stratford PW, Binkley JM, Riddle DL. Health status measures: 78. Kind P, Dolan P. The effect of past and present illness experi-
strategies and analytic methods for assessing change scores. ence on the valuations of health states. Med Care. 1999;33:
Phys Ther. 1996;76:1109-23. AS255-63.
63. Beaton DE, Tarasuk V, Katz IN, Wright JG, Bombardier C. Are 79. Shaul MP. From early twinges to mastery: the process of
you better? A qualitative study of the meaning of recovery. adjustment in living with rheumatoid arthritis. Arthritis Care
Arthritis Care Res. 2001; in press. Res. 1995;8:290-7.
64. Redelmeier DA, Lorig K. Assessing the clinical importance of 80. Reid I, Ewan C, Lowy E. Pilgrimage of pain: the illness experi-
symptomatic improvements: an illustration in rheumatology. ences of women with repetition strain injury and the search for
Arch Intern Med. 1993;153(1):1337--42. credibility. Soc Sci Med. 1991;32(5):601-12.
65. Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the min- 81. Armenakis AA, Zmund R. Interpreting the measurement of
imal important difference in symptoms: a comparison of two change in organizational research. Personnel Psychol. 1979;32:
techniques. J Clin Epidemiol. 1996;49:1215-9. 709-23.
66. Fletcher RH, Fletcher SW, Wagner EH. Clinical Epidemiology: 82. Howard GS, Dailey PRo Response shift bias: a source of con-
The Essentials. 2nd Ed. Baltimore, Md.: Williams & Wilkins, tamination of self-report measures. J Appl Psychol. 1979;64:
1988:1-246. 144-50.
67. Staquet M, Hays RD, Fayers PM. Quality of Life Assessments 83. Redelmeier DA, Kahneman D. Patients' memories of painful
in Clinical Trials: Methods and Practice. New York: Oxford medical treatments: real-time and retrospective evaluations of
University Press, 1998. two minimally invasive procedures. Pain. 1996;66:3-8.
68. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 84. Fischer D, Stewart AL, Bloch DA, Lorig K, Laurent D, Holman
2nd Ed. Mahwah, NJ: Lawrence Erlbaum, 1988. H. Capturing the patient's view of change as a clinical outcome
69. Herrmann D. Reporting current, past and changed health sta- measure. JAMA. 1999;282:1157.
Appendix
THE DASH
The complete text of the Disabilities of the Arm, Shoulder and Hand questionnaire (© 1997) appears on the follow-
ing pages. It includes 30 items and two optional modules-the high-performance / sports module and the work
module. The DASH was developed jointly by the Institute for Work & Health (Toronto, Ontario, Canada) and
the American Academy of Orthopaedic Surgeons (Rosemont, Illinois), who both hold the copyright, and it is
reprinted here with their permission.

Measuring The Whole or The Parts?

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Measuring The Whole or The Parts?

Caricato da

Copyright:

Formati disponibili

Measuring the Whole or the Parts?

Validity, Reliability, and Responsiveness of the

Dorcas E. Beaton, BScOT, MSc, PhD Claire Bombardier, MD, FRCP

Jeffrey N. Katz, MD, MS

128 JOURNAL OF HAND THERAPY

April-June 200 I 129

130 JOURNAL OF HAND THERAPY

April-June 200 I 131

\32 JOURNAL OF HAND THERAPY

Sample Description Arthritis 83 41 42

April-June 200 I 133

TABLE 3. Construct Validity; Pearson/Spearman Correlations Between the DASH and

134 JOURNAL OF HAND THERAPY

April-June 200 I 135

Change in DASH Change in DASH

136 JOURNAL OF HAND THERAPY

Observed change 0.78 0.62 0.64

Those rating problem as better (>6/10) 1.06 0.84 0.86

Those rating their function as better (>6/10) 1.20 0.86 1.07

Observed change 0.81 0.71 0.59

Those rating problem as better (>6/10) 1.13 0.95 0.83

Those rating function as better (>6/10) 1.44 1.13 1.24

Observed change 0.74 0.43 0.76

Those rating problem as better (>6/10) 0.92 0.64 0.93

Those rating function as better (>6/10) 0.91 0.54 0.87

whole cohort _DASH

April-June 200 I 137

Spearman 0.43 0.37 0.39 0.60 0.62 0.63

TABLE 7. Sensitivity and Specificity of Different Text continued from p. 135

138 JOURNAL OF HAND THERAPY

April-June 200 I 139

140 JOURNAL OF HAND THERAPY

April-June 200 I 141

142 JOURNAL OF HAND THERAPY

Potrebbero piacerti anche