Rosas Kane, 2012, Quality and Rigor of Concept Mapping Article

Evaluation and Program Planning 35 (2012) 236–245
Contents lists available at SciVerse ScienceDirect
Evaluation and Program Planning

journal homepage: www.elsevier.com/locate/evalprogplan
Quality and rigor of the concept mapping methodology: A pooled study analysis
Scott R. Rosas *, Mary Kane
Concept Systems, Inc., 136 East State Street, Ithaca, NY 14850, United States
A R T I C L E I N F O A B S T R A C T
Article history: The use of concept mapping in research and evaluation has expanded dramatically over the past 20
Received 8 June 2011 years. Researchers in academic, organizational, and community-based settings have applied concept
Received in revised form 28 September 2011 mapping successfully without the benefit of systematic analyses across studies to identify the features of
Accepted 5 October 2011
a methodologically sound study. Quantitative characteristics and estimates of quality and rigor that may
Available online 12 October 2011
guide for future studies are lacking. To address this gap, we conducted a pooled analysis of 69 concept
mapping studies to describe characteristics across study phases, generate specific indicators of validity
Keywords:
and reliability, and examine the relationship between select study characteristics and quality indicators.
Concept mapping
Pooled analysis
Individual study characteristics and estimates were pooled and quantitatively summarized, describing
Quality the distribution, variation and parameters for each. In addition, variation in the concept mapping data
Benchmarking collection in relation to characteristics and estimates was examined. Overall, results suggest concept
Validity mapping yields strong internal representational validity and very strong sorting and rating reliability
Reliability estimates. Validity and reliability were consistently high despite variation in participation and task
completion percentages across data collection modes. The implications of these findings as a practical
reference to assess the quality and rigor for future concept mapping studies are discussed.
ß 2011 Elsevier Ltd. All rights reserved.
1. Introduction theoretical questions. As concept mapping has gained in populari-

ty, so too has the need to define and examine the characteristics of
More than 20 years ago, Trochim and colleagues published a the method’s methodological quality. No published research exists
series of papers on concept mapping in a special issue of Evaluation that has systematically assessed the degree to which concept
and Program Planning (Trochim, 1989a). In this seminal work, the mapping produces valid and reliable results across an array of
theoretical and practical features of concept mapping were different studies. The absence of such information limits research-
outlined, making a case for its utility in planning, evaluation, ers’ ability to articulate, assess and improve the methodological
and research. Since then, concept mapping has been applied in a quality of concept mapping studies. To address this need, we
number of fields and contexts, including public and community accessed a large sample of concept mapping studies to: (a)
health (Rao et al., 2005; Risisky et al., 2008; Trochim, Cabrera, quantitatively describe study characteristics across different
Milstein, Gallagher, & Leischow, 2006; Trochim, Milstein, Wood, phases of the process; (b) quantitatively describe specific
Jackson, & Pressler, 2004), social work (Petrucci & Quinlan, 2007; indicators of validity and reliability; and (c) examine the
Ridings et al., 2008), health care (Trochim & Kane, 2005), human relationship between select study characteristics and quality
services (Pammer et al., 2001; Paulson & Worth, 2002), and indicators. As a context for this study, we provide a succinct
biomedical research and evaluation (Kagan, Kane, Quinlan, Rosas, overview of concept mapping, followed by a rationale for
& Trochim, 2009; Robinson & Trochim, 2007; Trochim, Marcus, examining the quality of concept mapping as a mixed-method
Masse, Moser, & Weld, 2008). The publication of the book Concept approach. Finally, we briefly outline an explanation of validity and
Mapping for Planning and Evaluation (Kane & Trochim, 2007) reliability, as they pertain to concept mapping.
provided concept mapping practitioners with a comprehensive
methodological resource. 1.1. Concept mapping overview
Over the course of two decades, concept mapping has
demonstrated value in addressing a variety of practical and Concept mapping is a type of structured conceptualization
method designed to organize and represent ideas from an
identified group. A participatory mixed-methods approach,
concept mapping integrates qualitative individual and group
* Corresponding author. Tel.: +1 607 272 1206.
E-mail addresses: srosas@conceptsystems.com, scott.rosas@cortland.edu processes with multivariate statistical analyses to help a group of
(S.R. Rosas), mkane@conceptsystems.com (M. Kane). individuals describe ideas on any topic of interest and represent
0149-7189/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.evalprogplan.2011.10.003
S.R. Rosas, M. Kane / Evaluation and Program Planning 35 (2012) 236–245 237
these ideas visually through a series of related two-dimensional represent. Analytical strategies for concept mapping to assess
maps (Kane & Trochim, 2007; Trochim, 1989a). Concept mapping the degree to which the conceptual model is recognized as the
is used frequently in evaluation as a practical means of addressing modal representation for a group have been suggested (Cacy,
stakeholder participation in ways that enhance the relevance, 1996). These techniques, however, are exploratory and not yet
ownership and utilization of evaluation (Cousins & Whitmore, practical. Typically, the assessment of external representational
1998). The multi-phase concept mapping process typically validity is generally managed as a function of each concept
requires participants to first brainstorm a large set of statements mapping study by seeking verification that the brainstormed
relevant to the topic of interest. Second, each participant sorts statement set represents the topic under inquiry, using multiple
these statements into piles based on perceived similarity, and rates data collection and analysis methods, and including independent
each statement on one or more scales. Third, multivariate analyses participants with diverse perspectives. Because uniform data
are conducted that include two-dimensional multidimensional relevant to external validity are not routinely available for
scaling (MDS) of the unstructured sort data, a hierarchical cluster individual studies to include in a pooled analysis, external
analysis of the MDS coordinates, and the computation of average representational validity is not considered in this study.
ratings for each statement and cluster of statements. The maps that Internal representational validity however, is particularly
result show the individual statements in two-dimensional (x, y) germane to this study. Internal representational validity refers
space with more similar statements located nearer each other, and to the degree to which the conceptualized model reflects the
show how the statements are grouped into clusters that partition judgments made by participants in organizing information to
the space on the map. Finally, the group interprets the maps that produce the model. In that sense, the question of whether the
result from the analyses through a structured interpretation conceptualized model reveals the same distinctions among
process designed to help them understand the maps and label groupings made by the average participant is of particular
them in a substantively meaningful way. The quantitative maps importance. A case has been made that the analytic approach
reveal how a group discerns the interrelationships between and that anchors concept mapping results represents the best fit of the
among items and assigns values to ideas and concepts, thus various cognitive structures of participants (Forgas, 1979).
constructing a basis for further discussion, interpretation, and Questions have been raised, however, as to whether the final
action. We refer readers to Kane and Trochim (2007) for a more model may obscure some of the finer details; due perhaps to
detailed description of the entire concept mapping process. variations in how participants approach the structuring task
(Keith, 1989). Thus, determining the overall match between the
1.2. Quality in mixed-method applications participant-structured input and the mathematically generated
output is central to assessing internal representational validity.
Consistent with the arguments for combining qualitative and Several data elements common to all concept mapping studies
quantitative methods in a single study (Creswell & Plano Clark, can be used to evaluate the correspondence of the represented
2007; Sale, Lohfeld, & Brazil, 2002; Tashakkori & Teddlie, 1998) model to the original participant structures. First, early work by
concept mapping blends the two in a complementary and additive Dumont (1989) and Trochim (1989b) suggests that the degree of
manner. Rather than data remaining distinct but connected as with configural similarity between input and output matrices can be
some mixed-method applications, concept mapping integrates measured by computing a Pearson’s Product–Moment correlation
data at multiple points of the process. Qualitative and quantitative coefficient. A second measure, the stress value, is a goodness-of-fit
methods are combined in ways that challenge the distinction indicator between a given set of dissimilarities as input and the
between the two, and may suggest they may be more deeply resultant distances in a configuration (Kruskal & Wish, 1978).
intertwined (Kane & Trochim, 2007). Given the presence of several Finally, the individual sorting input (i.e. the number of sorted piles)
design typologies that emphasize a range of sequencing and can be examined relative to the number of clusters, to understand
mixing decisions (e.g. Creswell & Plano Clark, 2007; Tashakkori & the relationship between the groupings from each participant and
Teddlie, 1998) addressing the quality of concept mapping is the final partitioning of the content represented in the map.
pertinent. However, the absence of a comprehensive set of criteria Collectively, these measures computed from data routinely
for critically appraising mixed-method studies (Tashakkori & produced for each concept mapping study can be used to estimate
Teddlie, 1998; Sale & Brazil, 2004) and the conceptual variation of the internal representational validity of the conceptualized model.
mixed-method quality among evaluators and researchers (Car-
acelli & Riggin, 1994) further compound how concept mapping 1.4. Reliability in concept mapping
quality should be operationalized. Although generic criteria have
been used to assess the quality of mixed-method studies, the need For concept mapping, the consistency of participant input can
for more specific evaluation criteria, depending on the design and be assessed using the sorting and rating data. Reliability of
approach, is warranted (Sale & Brazil, 2004). This perspective participant ratings on a chosen scale for each of the final
supports the need to address the methodological quality of concept statements can be assessed by computing conventional item
mapping in ways unique to the approach. and rater reliability estimates. The reliability of participant input of
the perceived relationships between statements can be assessed by
1.3. Validity in concept mapping computing a set of estimates that are specific to concept mapping
sort data. As suggested by Trochim (1993), the traditional theory of
The traditional notions of external and internal validity are reliability, as typically applied in social science research, does not
challenging to operationalize for concept mapping, and are readily conform to sort data in the concept mapping model.
frequently overlooked. As defined by Cook and Campbell (1979), Conventional means for assessing reliability focus on estimating
validity is the best available approximation of truth or falsity, both the repeatability of test items or total scores, based on some known
externally and internally, of a given inference, proposition, or or assumed correct response. Sort data in concept mapping is
conclusion. Because validity can be operationalized differently, we different. Instead of estimating the reliability of items or overall
posit that external representational validity and internal representa- scores of a measure, sorting reliability assessment is more
tional validity may be analogues for concept mapping. External appropriately focused on determining the extent to which the
representational validity is concerned with the extent to which a structural arrangements, both individually and collectively, reflect
conceptualized model mirrors the reality it is purported to an assumed normatively typical arrangement. Thus, the individual
238 S.R. Rosas, M. Kane / Evaluation and Program Planning 35 (2012) 236–245
and aggregated sort configurations (similarity matrices used as patterns and relationships between different concept mapping
input by multidimensional scaling), as well as the resulting study characteristics. Finally, we computed several correlation
distance matrices (the between-item Euclidean distances output analyses between concept mapping study characteristics that were
generated by multidimensional scaling), provide information to expected to be related. Specifically, configural similarity and stress
calculate reliability estimates. We refer to Trochim’s (1993) values were correlated with other concept mapping data, including
recommended procedures for calculating a set of reliability the number of statements, the final number of clusters, and the sort
statistics specific to concept mapping input and output to estimate reliability estimates.
consistency of sorting within and across specific studies.
3. Results
2. Method
A majority of the concept mapping studies in the sample were
2.1. Overall study approach classified as public health oriented (59.4%). Others were in the
fields of human services (20.3%), biomedical research (5.8%), social
The standardized procedures for data collection, data organi- science research (2.9%), and business or human resources (2.9%).
zation, analysis, and representation in the concept mapping Twenty-eight studies were supported through Federal sources
process yield a set of common quantitative data, which can be (41.5%), with several receiving support from foundations or not-
configured in a comparable statistical form. This uniformity allows for-profit organizations (20.3%), universities or colleges (17.4%),
for the same quality constructs and relationships to be examined, and state government sources (11.6%). The stated purpose of each
and thus produces results that are more objective and exact than a study varied considerably, confirming the broad use of concept
narrative review. Our first step in conducting this pooled study mapping. Twenty-eight (40.6%) were used for strategic planning
analysis was to generate quantitative characteristics and estimates purposes, defined as an initiative of an organization to establish a
of the common data elements for each concept mapping study in a specific strategy or direction and make decisions regarding
sample of studies meeting specific inclusion criteria. These resources to pursue this strategy. Twelve (17.4%) were conducted
characteristics and estimates were then aggregated across the for the purpose of developing an action or research agenda, defined
sample of studies describing the distribution, variation, and as a collaborative effort to outline a strategic direction in a field
parameters for each characteristic and set of estimates. Finally, that extends across organization boundaries. Project purposes also
we conducted quantitative analyses to examine further the included: evaluation (14.5%), primarily for framework and design
relationships between project characteristics. development; research (7.2%), primarily for conceptualization and
theory development; needs assessment (8.7%); and program or
2.2. Sample intervention development (8.7%).
Three main types of data collection modes were identified in
The sample for this quantitative pooled study analysis was the sample. Face-to-face or traditional means, whereby the
sixty-nine (69) individual concept mapping studies conducted researcher interacts directly with the participants for brainstorm-
within the past 10 years. This set is part of an archive of completed ing, sorting, and rating tasks, accounted for 14.5% of the studies.
concept mapping studies conducted over that time period by Web-based means, whereby information is gathered exclusively
Concept Systems, Inc., the sole proprietor and licensor of the through the use of the Internet without direct interaction of the
Concept System1.1 The criteria for sample selection were general researchers with participants, accounted for 34.8% of the studies.
and meant to include a wide range of studies. For inclusion in this Multi-method means, whereby information is collected through a
pooled study analysis, each concept mapping study needed to have variety of means including paper forms, face-to-face interaction,
a final computed map (i.e. each had to have a final multidimen- and web-based platforms, accounted for 50.7% of the studies.
sional scaling analysis result and cluster solution generated), and All pooled analysis results for the sample of concept mapping
each study needed to have at least one rating completed by studies by study characteristic, described in subsequent sections,
participants. are presented in Table 1.
2.3. Procedure 3.1. Participants
Once the list of acceptable studies was generated, we extracted The total number of participants is the unduplicated count of
data from each specific concept mapping study database. individuals who provided input during the brainstorming, sorting,
Descriptive characteristics for each study included: data on the or rating tasks. The average total number of participants was
number of participants (overall and by concept mapping task), 155.78 with a range of 20–649 across studies. The total number of
completion rates by task, and the number of statements resulting participants for each study varied depending upon the concept
from brainstorming. Categorical information for each study mapping data collection mode, and a significant group effect was
included: the general field of study, related organizational or detected, F(2, 66) = 13.25, p < .001. The average total number of
institutional support, the general purpose of the study, and the study participants for the web-based data collection mode was
primary data collection mode. For each study, we obtained data for significantly larger (M = 243.42, SD = 172.14) than both the face-
assessing the reliability and validity as it pertains to concept to-face (M = 62.10, SD = 49.14), p < .001 and multi-method
mapping and computed estimates. (M = 122.46, SD = 45.76) modes, p < .001. For specific concept
After generating each concept mapping study’s specific mapping tasks, numbers are captured for both sorting and rating
characteristics and estimates, the results were configured, participation. Overall, the number of sorters averaged 24.62
aggregated, and analyzed for the entire sample of studies. (SD = 15.29), well over the recommended number of 15 (Jackson &
Measures of dispersion, central tendency, and interval estimates Trochim, 2002), and nearly 1.7 times (M = 14.62) larger than what
were computed for each study element for the pooled studies in was found in Trochim (1993). The smallest study in the sample had
the sample. We analyzed further these estimates to examine 6 sorters and the largest had 90. For the ratings task, the number of
participants on the first rating (rating 1) averaged 81.77
1
Details and availability of the technology can be found at http://www.con- (SD = 69.83) and 65.82 (SD = 43.32) on the second (rating 2).
ceptsystems.com. The second rating typically has fewer participants, due to attrition,
Table 1
Participants and completion percentages by concept mapping data collection mode.
Data collection mode Average number of participants Average percent completing task
Sorting Rating 1 Rating 2 Sorting Rating 1 Rating 2
Face-to-face 25.7 44.7 33.6 74.1 80.3 72.8

Web-based 27.9 112.8 75.6 52.4 68.7 48.0
Multi-method 22.1 71.1 63.9 43.7 61.1 54.0
Overall 24.6 81.8 65.8 50.1 65.9 51.6
level of participant knowledge, or fatigue by those completing the relationships between all items in a typically large set. Sorting
first rating. The average number of participants for the first rating participation was observed to be fairly consistent across the
is nearly 5.8 times (M = 13.94) what was found in Trochim’s (1993) various data collection modes, and may be perceived to be limited
study. more by the demanding nature of the task, rather than the means
Differences in the number of participants by the data collection of participation.
mode were observed in this sample and are displayed in Table 2.
No meaningful difference was found in the average number of 3.2. Completion rates for sorting and rating
sorters by data collection mode. However, a significant group effect
was observed, F(2, 66) = 4.62, p < .05, for rating 1, with significant As shown in Table 2, the average percent completion for sorting
mean differences in the number of participants between the web- and rating tasks indicates that, overall more than half of those who
based mode (M = 164.04, SD = 116.70) and both face-to-face initially agreed to complete the task, did so in a manner that
(M = 116.31, SD = 43.17), p < .01 and multi-method (M = 55.70, produced usable data. The percent completion for sorting differed
SD = 45.31) modes, p < .05. Interestingly, no group differences by data collection mode and a significant group effect was
were detected in the average number of participants for rating 2 by detected, x2(2) = 111.36, p < .001. As expected, the average
the data collection mode. On close inspection, the largest decrease percent completion for sorting was highest when done face-to-
in the average number of participants between ratings 1 and 2 was face (M = 74.06, SD = 19.93) compared to web-based (M = 52.38,
observed for the web-based mode. These findings suggest that the SD = 23.88), z = 7.17, p < .001 and multi-method (M = 43.69,
use of the web for concept mapping facilitates greater participation SD = 19.86) modes, z = 10.28, p < .001. The average percent
with respect to the rating task. However, the level of attrition is completion for web-based was also significantly higher than the
greater from completion of rating 1 to rating 2, when the web is multi-method mode, z = 4.70, p < .001. The average percent
used exclusively for data collection. completion of rating tasks followed a similar pattern with a
Despite the availability of the Internet to increase access to the significant group effect found for rating 1, x2(2) = 107.50, p < .001,
concept mapping process, sorting remains a fairly intensive and rating 2, x2(2) = 66.75, p < .001. As with sorting, the average
activity, intended to capture participant judgments about the percent of completion for rating 1 was significantly higher for
Table 2
Concept mapping study characteristics and estimates.
Common study elements M SE SD Mdn Min Max 95% CI for mean
Lower Upper
Number of statements 96.32 2.07 17.23 98.00 45 132 92.18 100.46

Number of sorters 24.62 1.84 15.30 20.00 6 90 20.95 28.30
Number of raters 1 81.77 8.04 69.83 62.00 18 485 64.99 98.54
Number of raters 2 65.82 5.84 43.32 57.00 5 247 54.11 77.53
Total number of participants 155.78 15.21 126.34 118.00 20 649 125.43 186.13
Percent completing sorting 50.07 2.84 23.59 56.86 10.58 100 48.39 51.75
Percent completing rating 1 65.87 2.43 20.24 70.27 12.79 100 64.87 66.87
Percent completing rating 2 51.64 2.83 20.84 56.00 10.50 100 50.47 52.81
Stress value .28 .00 .04 .29 .17 .34 .27 .29
r (configural similarity)a .66 .01 .07 .66 .53 .83 .64 .68
r2 .44 .01 .09 .43 .28 .68 .42 .46
Stress value split-half 1 .30 .00 .03 .30 .20 .36 .29 .31
Stress value split-half 2 .30 .00 .04 .31 .19 .35 .29 .31
rII .87 .01 .06 .88 .69 .96 .85 .88
rIT .96 .00 .02 .96 .90 .99 .95 .96
rIM .91 .00 .04 .92 .80 .98 .90 .92
rSHT .86 .01 .07 .87 .65 .97 .85 .88
rSHM .63 .02 .17 .61 .26 .95 .59 .67
a for rating 1 .97 .00 .02 .97 .91 .99 .96 .97
a for rating 2 .97 .00 .02 .97 .91 .99 .96 .97
AICC for rating 1 .89 .01 .07 .92 .69 .99 .88 .91
AICC for rating 2 .87 .01 .10 .90 .42 .97 .84 .90
Number of map clusters 8.93 .19 1.55 9 6 14 8.56 9.30
Average statements per cluster 11.10 .31 2.58 11.11 5.63 20.67 10.43 11.67
Statements in largest cluster 18.64 .59 4.94 18.00 9 32 17.45 19.82
Statements in smallest cluster 5.49 .23 1.94 5.00 1 10 5.03 5.96
Average number of sorted piles 10.93 .23 1.88 10.90 6.55 15.76 10.47 11.38
Median number of sorted piles 9.93 .27 2.22 10.00 6.00 16.00 9.40 10.46
Largest pile of statements 23.25 1.09 9.04 21.00 11 61 21.0 25.42
Smallest pile of statements 4.68 .15 1.28 5.00 2 8 4.37 4.99
a
Absolute values are reported for this characteristic.
face-to-face (M = 80.25, SD = 21.61) than web-based (M = 68.73, by dividing the number of statements by the final number of
SD = 24.61), z = 5.51, p < .001 and multi-method (M = 61.13, clusters found in the map. The mean average number of statements
SD = 14.17) modes, z = 8.74, p < .001. Similarly, the average per cluster was 11.10 (SD = 2.58) for the sample. The number of
percent of completion for rating 2 was significantly higher for statements in the largest and smallest clusters for each map was
face-to-face (M = 72.72, SD = 30.02) than web-based (M = 48.02, identified for each study, and was averaged to identify the upper
SD = 24.34), z = 7.21, p < .001 and multi-method (M = 53.95, and lower levels across the sample. The average number of
SD = 12.58) modes, z = 5.48, p < .001). Interestingly, in both cases statements in the largest cluster for each of the studies was 18.64
the average percent completion for web-based ratings was (SD = 4.94, range: 9–32) and the average number of statements in
significantly higher than the multi-method mode (z = 7.09, the smallest cluster for each study was 5.49 (SD = 1.94, range: 1–
p < .001 for rating 1 and z = 4.86, p < .001 for rating 2). It is not 10), suggesting considerable variation in cluster density across
surprising that the face-to-face mode of data collection yielded sample.
higher completion percentages, as the researcher manages the Because sorting reflects the judgments made by each partici-
process directly. It may also be that the variety of forms and pant about the relationships among the statements in a set, and the
options to manage in the multi-method mode contributed to the number of clusters selected reflects the aggregated relationship
consistently lowest percent completion. Nevertheless, the percent representation, it is useful to determine the degree of correspon-
completion for data collected through the web are well above dence. A significant Pearson’s Product–Moment correlation of
those found in other on-line activities, such as internet-based r = .43, p < .001 was found, indicating a moderate relationship
surveys where completion rates of 20% to 30% are common (Cook, between the median number of piles and the final number of
Health, & Thompson, 2000; Kaplowitz, Hadlock, & Levine, 2004). clusters in which the map is partitioned. Fig. 1 represents a
bivariate plot of the relationship between the mean number of
3.3. Statements piles and the final number of clusters for each study. This portrays
the correspondence between the structural arrangement of the
The sample of studies averaged 96.32 statements (SD = 17.23) sort data from each participant, on average, and the final
with a range of 45 to 132. This average represents approximately a partitioning of the multidimensional scaling analysis structure
20% increase initially found by Trochim (1993). No difference was across the sample. Collectively these findings suggests a positive
found in the number of statements by data collection mode. Kane relationship and conceptual consistency between the aggregated
and Trochim (2007) report that with the availability of the web- groupings from participants and the final groupings found in the
based platform, concept mapping studies commonly yields map, despite the multiple ways the final number of clusters may be
brainstormed statements sets well over 100 items. Guidance on selected.
selection of an appropriate statement set size has emphasized the
need to consider participant burden, at the same time working to 3.5. Stress, fit, and similarity
ensure saturation of the topic (Kane & Trochim, 2007; Trochim,
1989a). These authors recommend a structured process for Stress is a statistic routinely generated and reported in
synthesizing and reducing the set to a manageable size that multidimensional scaling (MDS) analyses, reflecting the goodness
minimizes burden and maximizes breadth. Thus, despite the of fit of the final representation with the original similarity matrix
propensity for statement sets to be very large when collected via used as input. Stress is the normalized residual variance for a
the Internet, size consistency in the final statement set used for perfect relationship of a monotone regression of distance upon
sorting and rating was evident. dissimilarity or similarity (Kruskal, 1964). Thus, for any given
configuration the stress indicates how well that configuration
3.4. Sorting and clusters matches the data.
The average stress value for the sample was .28 (SD = .04, range:
The sorting task requests each participant to arrange the set of .17–.34, 95% CI [.27, .29]). The literature on multidimensional
statements into piles or groups, based on participant-perceived scaling suggests lower stress values are preferred and reflect better
similarity. It is an unstructured sort procedure; there is no pre- congruence between the raw data and the processed data
determined number of piles into which the participants are (Davison, 1983; Kruskal, 1964). Stress values found in concept
expected to sort the statements. Procedurally, concept mapping mapping analyses are typically higher than those recommended in
participants receive a set of instructions and minimum expecta-
tions to guide the sorting task. Each participant is directed to sort
the statements into an arrangement that makes sense to her or him 16
(Kane & Trochim, 2007). The mean and median number of piles for
Average Number of Sorted Piles
each concept mapping study was identified and then summarized 14

for the entire sample. The mean average number of piles was 10.93
(SD = 1.88) and the average median number of piles was 9.93 12
(SD = 2.22). No difference was found in the number of individual
participant sorted piles by data collection mode. 10
The number of clusters for each concept map is selected
through a combination of statistical analysis, expert judgment, and 8
participant feedback. There is no single correct number of clusters
or a set of mathematical decision criteria for determining the final
6
cluster solution (Kane & Trochim, 2007). The average number of
clusters selected for the final concept map in the sample was 8.93
4
(SD = 1.55) with a range of 6–14. Again, no difference was found in 4 5 6 7 8 9 10 11 12 13 14 15
the number of clusters selected for the final concept map in
Final Number of Clusters
relation to the data collection mode. The distribution of statements
across the final number of clusters for each study was also Fig. 1. Plot of average piles by final number of clusters for 69 concept mapping
calculated. For each study, an average distribution was computed studies.
the literature on MDS. Several reasons for the discrepancy have 0.4
been presented by Trochim (1993) and Kane and Trochim (2007).
Comparatively, the stress values across the sample were very 0.35
similar to those found by Trochim (1993). In fact, nearly the entire
set of concept mapping studies in this sample (96%) had a stress
Stress Values
value that fell within the 95% CI [.21, .37] originally estimated by 0.3
Trochim (1993) and reported in Kane and Trochim (2007). Hence,
in two pooled analyses using independent samples, nearly 0.25
identical patterns of stress were observed. It should be noted that
a group effect was found, F(2, 66) = 3.62, p < .05, when examining
the average stress values by data collection mode. However, the 0.2
mean difference between the web-based (M = .27, SD = .04) and
multi-method (M = .29, SD = .03) modes, p < .05 was very small, 0.15
and no difference was seen between face-to-face and web-based 0.20 0.30 0.40 0.50 0.60 0.70 0.80
sorting. r Values
2
In terms of judging the acceptability of the stress values found

Fig. 2. Plot of stress values by coefficient of determination values for 69 concept
across studies in this sample, a previous simulation study by
mapping studies.
Sturrock and Rocha (2000) can serve as a guide. Based on the
distributions of over a half a million randomly created and scaled
matrices, these authors found that for two-dimensional MDS
solutions where 100 objects have been scaled, there is a 1% chance of one randomly selected participant, the fit of the map was
the arrangement of the objects in the matrix is random if the stress calculated and assessed.
value is below a upper limit of .39. Thus, multidimensional maps The results for each of the five studies are displayed in Fig. 3,
with a stress statistic below this threshold have less than a 1% with number of sorters plotted on the x-axis and stress values
probability of having either no structure or a random configura- plotted on the y-axis. Trends across the five studies reveal a
tion. Since none of the studies produced a stress value above .39 consistent pattern of stress as the number of sorters increase. The
(even those with over 100 objects in the input matrix), it is likely variability in the stress values for each study was dramatic when
that none of the two-dimensional configurations included in this about 15 or fewer sorters were included. As the number of sorters
study were random or without structure. for each of the five studies reached about 35 sorters, substantial
As a second measure of validity, the configural similarity was improvements in stress (i.e. lower stress values) were observed.
calculated for each concept mapping study, reflecting the However, beyond 40 sorters, only marginal improvements in stress
congruence between data used as input and the final repre- were detected. These findings suggest between 20 and 30 sorters is
sented form. Configural similarity was estimated by computing warranted to maximize the consistency of fit in the concept
the Pearson’s Product–Moment correlation between the original mapping representation by minimizing the variability in the stress
aggregated similarity matrix from participants and the final value found with smaller groups of sorters. This range is about
matrix of Euclidean distances between points on the map. A twice the number recommended by Trochim (1989a) and Jackson
squared correlation coefficient, r2 was also calculated for each and Trochim (2002). Our observation comports with previous card
study to assess the proportion of shared variance of the input sorting studies examining adequate sample sizes needed to
and output data. The average squared correlation of the input produce high quality representations (Tullis & Wood, 2004; Wood
similarities and the scaled distances from the MDS coordinates & Wood, 2008). While smaller numbers of sort participants may
was .44 and statistically significant, t(68) = 5.38, p < .05. still yield acceptable stress values, the likelihood of generating a
This estimate indicates that on average 44% of the variation higher stress value is greater with smaller groups. Thus,
in the aggregated participant sort was accounted for by the consideration of the appropriate sorting sample in designing
conceptualized model. No difference was found in the propor- concept mapping studies is critical.
tion of shared variance of the input and output by data
collection mode. 3.6. Sorting reliability estimates
Fig. 2 represents a bivariate plot that illustrates the relationship
between the stress values and the r2 values for each study in the For each study in the sample, five unique reliability estimates
sample. As observed in the figure, the better the fit of the sort data for the sort data were generated using the procedures outlined in
with the statistically represented model (i.e. lower stress values) Trochim (1993), then averaged. First, the set of sort data from each
the greater the proportion the input data are accounted for in the study was randomly divided into two halves (for odd-numbered
model. Taken together, these results suggest that concept mapping groups, one group was randomly assigned one more person than
performs well in representing a complex set of multivariate data in the other). Separate concept maps were computed for each group.
two-dimensional space. The total sort matrices for each split-half group were correlated
An examination of the correlation between the stress value and and the Spearman–Brown correction applied to obtain the split-
the number of sorters for each study revealed no linear half reliability of the sorts (rSHT). The Euclidean distances between
relationship between the two variables. Nonetheless, it is all pairs of points on the two maps from the split-half samples were
important to understand further the relationship between sorting also correlated and the Spearman–Brown correction applied to
participation and stress. To model how stress is affected by the obtain the split-half reliability of the map (rSHM). Second, the sort
number of sorters, a subset of five studies with the most sorters matrices for each individual were correlated and averaged. The
was identified. Within this subset, sort data from an individual Spearman–Brown correction was applied to yield the Individual-
participant was randomly selected from the list of participants to-Individual Sort Reliability (rII). Third, the sort matrix for each
who completed the sort task in that study. A second randomly individual was correlated with the total similarity matrix. These
selected sort was then aggregated with the previous, and the stress correlations were averaged and the Spearman–Brown correction
value calculated. The process continued, and resulted in the applied to produce the Individual-to-Total Matrix Reliability (rIT).
inclusion of sort data from all study participants. At every addition Finally, the sort matrix for each individual was correlated with the
0.36
0.35
0.34
0.33
0.32
0.31
0.3
0.29
0.28
Stress value
0.27
0.26
0.25
0.24
0.23
0.22
0.21
0.2
0.19
0.18
0.17
0.16
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89
Sorters
Study 1 Study 2 Study 3 Study 4 Study 5
Fig. 3. Increase in sorters by stress values: five largest concept mapping projects studies in the pooled sample.
Euclidean distances from the overall, final map. These correlations the reliability estimates, ranging from .44 to .71(all significant at
were averaged and the Spearman–Brown correction applied to p < .001). This finding suggests that more sorters may yield more
produce the individual-to-map reliability (rIM). Of the five reliable sorting results although the differences may be minor.
reliability estimates calculated, three rely on analysis of the sort
data used as input. Overall, the average reliability estimates for the 3.7. Rating reliability estimates
sort data for the sample were high. The average individual-to-
individual sort reliability value (rII) was .87, the average individual- Reliability estimates of the ratings data were calculated for each
to-total matrix value (rIT) was .96, and the average split-half total of the concept mapping studies in the sample (54 studies had two
matrix reliability (rSHT) was .87. No differences were found in the ratings). For each rating, the internal consistency was assessed by
reliability estimates of the sort data in relation to the data computing the average correlation among items using Cronbach’s
collection mode. Two of the reliability estimates include informa- alpha. We were also interested in the reliability of different raters
tion from the final map for each study. The average reliability averaged together. The reliability of averaged ratings is a more
between individuals’ sort matrices and the final map configuration useful statistic in practice due to the considerable variation in
(rIM) was .91. The average split-half reliability of the final map reliability among raters which cannot be assessed (MacLennan,
configuration (rSHM) was .67. The lower value for rSHM was 1993). The average measure intraclass correlation (AICC) was
expected as this reliability estimate is calculated on the split-half calculated to produce an inter-rater reliability coefficient, which is
sample of analyzed or processed data, rather than the raw data equivalent to the average correlation between all pairs of raters
used to generate rSHT (Trochim, 1993). Nevertheless, the average with the Spearman–Brown correction for the number of raters. The
reliability estimates found here were high and slightly above those average Cronbach’s alpha coefficients for both ratings 1 and 2 were
found in Trochim’s (1993) previous study. above .96, suggesting the items on the scale are highly
The interrelationships between the sorting reliability estimates intercorrelated and internally consistent (DeVellis, 1991), even
were assessed by calculating the Pearson’s Product–Moment for studies where two ratings were administered. A significant
correlations for the pairs of estimates. All correlations were group effect was found for rating 1, F(2, 66) = 4.29, p < .05 and
significant at the .001 level. The correlations between the sort data rating 2, F(2, 51) = 3.20, p < .05 when examining the average
estimates (rII, rIT, rIM, rSHT) were strongly positive (range: .94–.99), alphas by data collection mode. For rating 1, mean differences were
while correlations between the split-half map reliabilities and all found between the web-based (M = .97, SD = .02) and face-to-face
other reliabilities were lower (range: .62–.72). This suggests strong (M = .95, SD = .02) modes, p < .05. Similarly, for rating 2, mean
associations between all reliability estimates, even among those differences were found between the web-based (M = .97, SD = .02)
assessing the reliability of the output data. and face-to-face (M = .95, SD = .02) modes, p < .05. However, these
The five sort data reliability estimates were also correlated with differences were very slight and the alphas by group were high
the number of statements and the number of sorters, in order enough so as to not be meaningful. The average inter-rater
further to examine relationships between sort reliability and other reliability coefficients (AICC) were also high, suggesting that across
study characteristics that presumably affect the estimates. The raters the mean ratings are stable, although the inter-rater
number of statements was marginally correlated with the reliability coefficients for the second rating were slightly lower.
reliability estimates, although the correlations were negative. No differences were found in the AICC reliability estimates for
Conversely, the number of sorters was significantly correlated with rating 1 or rating 2, relative to the data collection mode.
These reliability estimates were correlated with the number of 2000), it appears that quality and rigor can be maintained for
statements and the number of raters to further examine relation- concept mapping data collected via the Internet.
ships between rating reliability and other concept mapping Although this sample does not represent the totality of concept
characteristics that presumably affect the estimates. Moderately mapping studies, this pooled study analysis is the largest and most
strong correlations between the number of statements and comprehensive conducted to date. The studies combined for this
Cronbach’s alphas for rating 1 (r = .49, p < .001) and rating 2 analysis were diverse in scope and topic, and typical of those found
(r = .55, p < .001) were detected. This suggests that larger in the published research and evaluation literature. In fact, the
statement sets yield higher internal consistency estimates. results of several concept mapping studies included in this analysis
Because ratings used in concept mapping are uni-dimensional in were published previously across a variety of content areas and
nature; that is, they measure a single construct across a large disciplines. Each study in this sample followed the same
number of items (e.g. importance, feasibility, readiness, etc.), it is procedural steps and was subject to similar constraints during
not surprising that higher alphas were found. Moreover, there is a implementation. This process consistency, coupled with data
tendency for large numbers of items to produce higher alpha collection and computation standardization at the study level,
coefficients (DeVellis, 1991). Similarly, the number of raters was enabled an analysis that produced findings configured in a
moderately correlated with the inter-rater reliability coefficients comparable statistical form. This study’s systematic approach
for both rating 1 (r = .53, p < .001) and rating 2 (r = .51, p < .001), enabled the identification and analysis of patterns that might
suggesting larger numbers of raters yield higher inter-rater otherwise be obscured in a case by case assessment.
reliability estimates. Notwithstanding the strengths of the analytical strategy,
several limitations to the study are important to note. First, as
with any analysis that pools the results of multiple studies, the
4. Discussion variability in random and non-random error is of concern. To
minimize error, and subsequent overestimation or underestima-
As with similar mixed-method applications, the concept tion of estimates, we employed several strategies. A detailed
mapping studies that are the basis of this pooled analysis were protocol for calculating the indicators was consistently applied for
conducted to understand complex realities using data from each study in the sample. In addition, strict inclusion criteria were
multiple perspectives to combine and present practical informa- used to ensure methodological homogeneity in the pooled sample.
tion. This quantitative analysis generated useful baseline infor- Nonetheless, for this analysis, advanced statistical methods were
mation to address questions regarding the methodological quality not employed when analyzing the sample to mitigate the potential
of concept mapping. The study approach suggested several means compounding of error found at the study level. Second, the analysis
for determining the quality of concept mapping that are does not capture qualitative distinctions across studies. Each of the
appropriate, defensible, and relevant. In particular, this work individual studies included in this study sample contain extensive
considers the significance of validity and reliability as it relates to detail that offers insight into a particular topic considered in
concept mapping, and reports on these critical aspects. With this context. The qualitative variation in participant experiences,
emphasis in mind, concept mapping as an integrated mixed- settings, content, interpretation, and uses of the results were
method approach for planning, evaluation, and research appears not considered in this pooled study analysis. Furthermore, the
to generate valid and reliable results. Although the representation assessment of quality of elements that are directly influenced by
of a complex set of input data was limited to two dimensions, the the qualitative judgments of the researcher was not undertaken.
internal representational validity across the set of studies was Cluster selection and labeling, for example, is one of those areas
found to be good, supported by multiple measures of fit and that would benefit from further study to better delineate some
similarity. While better fit and greater similarity between the notion of intra-observer agreement in the determination of
input data and output representation might be observed using clusters and names of concepts represented by each cluster. The
more than two dimensions, the current approach appears typical approach is for the researcher to conduct the analysis and
appropriate for generating the most parsimonious and interpret- arrive at options for cluster arrays, and then discuss and confirm or
able results. The reliability of the sort data was observed to be change with key study stakeholders. Given that the concept
high, both between sorters and among sets of aggregated sorters. mapping process calls for cooperation and negotiation in the final
Likewise, the consistency of the rating data, between individuals structure and labeling, capturing the degree to which two
and among items, was very high. These findings comport with an reasonable people with knowledge in the field agree or disagree
earlier, more limited, study of the reliability of concept mapping has important implications for quality. Third, relative to the entire
by Trochim (1993), where similar patterns were found using the set of concept mapping projects completed in the past 20 years, it is
same reliability calculations. possible that this study includes potentially problematic studies.
As observed in this sample, the advent of a web-based platform However, this risk is somewhat mitigated by results that were
for conducting concept mapping asynchronously in a virtual consistent with those found in previous pooled analyses (cf.
environment has expanded the level of participation across all Trochim, 1993). In addition, the sample consisted of studies
phases of the process. This presents concept mapping users with completed by the originators and providers of the concept
unique benefits for expanded participation, and notable challenges mapping technique, and as such are an exclusive set. Implemented
to establishing and maintaining quality. For example, the use of the by experts in the conduct of the concept mapping process, these
web affects the percent of completion across different tasks based studies may not be representative of studies conducted by those
on the ability to invite a greater number to take part in the study. with less experience with the method. A case could be made that
However, despite the utilization of web technology and increased studies in the sample represent a high level of quality due to the
access to concept mapping for a broader set of participants, adherence to the concept mapping process, as outlined by the
estimates of reliability and validity appear consistent. Indeed, no developers. Finally, this pooled study analysis is fundamentally
meaningful differences were found between multiple data correlational. This study examined only the linear relationships
collection modes and the estimates of reliability and validity between select characteristics found in concept mapping. Ques-
calculated in this study. While questions persist as to how data tions as to whether varying certain features of the concept
collected through group processes like brainstorming are affected mapping process, such as participant engagement, may result in
when generated individually (Dugosh, Paulus, Roland, & Yang, changes in other characteristics remain unanswered.
Despite these limitations, three implications of the findings and consistency in subjective decisions are needed to further the
warrant attention. First, the results of this analysis offer a practical dialogue on validity in concept mapping.
reference for researchers and evaluators to judge the quality of Ultimately, defining appropriate expectations of quality and
concept mapping studies and support their choices related to data rigor for concept mapping, established through systematic study,
collection, analysis, and representation. In establishing a basis for has value for users of the approach. This study attempted to
comparison, this study generated empirical data for several address academic questions of validity, reliability, and quality at
characteristics of the concept mapping process, providing realistic the same time accounting for the practical and participatory
estimates of what one might expect in typical field applications of concerns for concept mapping. This study supports those who use
the method. This study establishes a set of benchmarks and ranges concept mapping in their work, to help establish value in
that can be used for individual concept mapping studies to gauge participation, measurement, data, and conclusions within a
the reliability and validity of the results, and can provide concept mixed-method, participatory application.
mapping users a basis for confirming the practical issues of fidelity
and integrity related to their work (Bradbury & Reason, 2001).
Second, the results provide critical information that helps to Acknowledgements
establish expectations for the quality of concept mapping as a
social science research method; upon which other researchers, The authors would like to acknowledge the support of the staff
journal reviewers, editors, and dissertations committees can at Concept Systems, Inc., Specifically we thank Brenda Pepe for
evaluate the utility of the methodology’s processes and outcomes information related to the concept mapping studies included in the
in future studies. Except for basic counts of the number of analysis, Perry Slack for study level data extraction, and Marie Cope
statements and participants, few published concept mapping for review and comments on the manuscript. We reference and
studies report critical elements included in this study, such as acknowledge the foundational thinking and work of William
stress values. This is likely due, in part, to the lack of information Trochim in the methodology’s early applications and summary
about standards for reporting and how the data is generated. By reliability estimates of almost 20 years ago.
extension, the absence of clear expectations as to what constitutes
quality for concept mapping hinders peer-reviewers in their
References
appraisal of submissions. Thus, the results of this study provide
both researchers, and those who review research, information to Bradbury, H., & Reason, P. (2001). Broadening the bandwidth of validity: Issues and
help evaluate and ensure the rigor of concept mapping studies choice-points for improving the quality of action research. In P. Reason & H.
Bradbury (Eds.), Handbook of action research: Participative inquiry and practice
across a broad base of literature.
(pp. 447–456). Thousand Oaks, CA: Sage Publications.
Third, the results provide a set of empirically grounded Cacy, J.R. (1996). The reality of stakeholder groups: A study of the validity and reliability of
recommendations for different activities within the concept concept maps. Ph.D. dissertation, University of Oklahoma.
mapping process. Most of the current recommendations for Caracelli, V. J., & Riggin, L. J. C. (1994). Mixed-method evaluation: Developing quality
criteria through concept mapping. Evaluation Practice, 15(2), 139–152.
concept mapping found in the literature are general heuristics Cook, C., Health, F., & Thompson, R. L. (2000). A meta-analysis of response rates in
based on professional experience or logic. The results of this study web- or Internet-based surveys. Educational and Psychological Measurement,
provide systematically derived information that practitioners can 60(6), 821–836.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues
use to make key decisions in the concept mapping process, which for field settings. Houghton Mifflin.
will affect quality. Recommendations regarding the appropriate Cousins, J. B., & Whitmore, E. (1998). Framing participatory evaluation. New Directions
number of sorters or suggestions for interpreting stress values are for Evaluation, 80, 5–23.
Creswell, J. W., & Plano Clark, V. L. (2007). Designing and conducting mixed methods
based on examining the variation of these elements in relation to research. Thousand Oaks, CA: Sage Publications.
other indicators. Furthermore, criteria that emerged from this Davison, M. L. (1983). Multidimensional scaling. New York, NY: John Wiley and Sons.
study for judging the integrity of concept mapping are more DeVellis, R. F. (1991). Scale development: Theory and applications. Newbury Park, CA:
Sage.
consistent with the assumptions and realities of the approach. For Dugosh, K. L., Paulus, P. B., Roland, E. J., & Yang, C.-H. (2000). Cognitive stimulation in
example, present criteria for determining the acceptability of fit for brainstorming. Journal of Personality and Social Psychology, 79(5), 722–735.
multidimensional scaling (MDS) applications are based on Dumont, J. (1989). Validity of multidimensional scaling in the context of structured
conceptualization. Evaluation and Program Planning, 12, 81–86.
experimental and synthetic data (Kruskal, 1964). For applied field
Forgas, J.P. (1979). Multidimensional scaling: A discovery method in social psychology.
studies like concept mapping where MDS is used, it seems more In G. P. Ginsburg (Ed.), Emerging strategies in social psychological research (pp.
appropriate and reasonable to assess the acceptability of fit in 253–288). New York: Wiley.
relation to results from the study of similar practical applications. Jackson, K., & Trochim, W. (2002). Concept mapping as an alternative approach for the
analysis of open-ended survey responses. Organizational Research Methods, 5(4),
Thus, using the results of this analysis as a reference, concept 307–336.
mapping practitioners should routinely report on importance Kagan, J. M., Kane, M., Quinlan, K. M., Rosas, S., & Trochim, W. M. K. (2009). Developing a
indices like stress values, to allow others to judge the relative conceptual framework for an evaluation system for the NIAID HIV/AIDS clinical
trials networks. Health Research Policy and Systems 7(12).
quality of their studies. Kane, M., & Trochim, W. M. K. (2007). Concept mapping for planning and evaluation.
This work represents a foundational step in building a base of Thousand Oaks, CA: Sage Publications.
evidence to support the methodological quality and expectations Kaplowitz, M. D., Hadlock, T. D., & Levine, R. (2004). A comparison of web and mail
survey response rates. Public Opinion Quarterly, 68(1), 94–101.
of the concept mapping approach. However, several areas of Keith, D. (1989). Refining concept maps: Methodological issues and an example.
inquiry remain incomplete and suggest opportunities for future Evaluation and Program Planning, 12(1), 75–80.
pooled study investigation, including: content analysis in areas Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a
nonmetric hypothesis. Psychometrika, 29(1), 1–27.
where multiple concept maps have been produced; examination of
Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage.
concept mapping processes and procedures; and inquiry into MacLennan, R. N. (1993). Interrater reliability with SPSS for Windows 5.0. The American
participant characteristics across different studies. Future analyses Statistician, 47(4), 292–296.
Pammer, W., Haney, M., Wood, B. M., Brooks, R. G., Morse, K., Hicks, P., et al. (2001). Use
might also include other studies completed by a broader pool of
of telehealth technology to extend child protection team services. Pediatrics,
concept mapping practitioners. Pooled study analyses of studies 108(3), 584–590.
with smaller numbers of participants or those with relatively Paulson, B. L., & Worth, M. (2002). Counseling for suicide: Client perspectives. Journal of
smaller statement sets might provide useful information not Counseling & Development, 80, 86–93.
Petrucci, C. J., & Quinlan, K. M. (2007). Bridging the research-practice gap: Concept
observed in this study. Moreover, the development and standardi- mapping as a mixed methods strategy in practice-based research and evaluation.
zation of methods for assessing external representational validity Journal of Social Services Research, 34(2), 25–42.
Rao, J. K., Alongi, J., Anderson, L. A., Jenkins, L., Stokes, G. A., & Kane, M. (2005). Trochim, W., & Kane, M. (2005). Concept mapping: An introduction to structured
Development of public health priorities for end-of-life initiatives. American Journal conceptualization in health care. International Journal for Quality in Health Care,
of Preventive Medicine, 29(5), 453–460. 7(3), 187–191.
Ridings, J. W., Powell, D. M., Johnson, J. E., Pullie, C. J., Jones, C. M., Jones, R. L., et al. Trochim, W. M. K., Marcus, S. E., Masse, L. C., Moser, R. P., & Weld, P. C. (2008). The
(2008). Using concept mapping to promote community building: The African evaluation of large research initiatives: A participatory integrative mixed-methods
American initiative at Roseland. Journal of Community Practice, 16(1), approach. American Journal of Evaluation, 29, 8–28.
39–63. Trochim, W. M. K., Milstein, B., Wood, B. J., Jackson, S., & Pressler, V. (2004). Setting
Risisky, D., Hogan, V. K., Kane, M., Burt, B., Dove, C., & Payton, M. (2008). Concept objectives for community and systems change: An application of concept mapping
mapping as a tool to engage a community in health disparity identification. for planning a statewide health improvement initiative. Health Promotion Practice,
Ethnicity & Disease, 18, 77–83. 5(1), 8–19.
Robinson, J. M., & Trochim, W. M. K. (2007). An examination of community members’, Tullis, T., & Wood, L. (2004 June). How many users are enough for a card-sorting study?
researchers’ and health professionals’ perceptions of barriers to minority partici- The Proceedings of Usability Professionals Association Conference.
pation in medical research: An application of concept mapping. Ethnicity and Wood, J., & Wood, L. (2008). Card sorting: Current practices and beyond. Journal of
Health, 12(5), 521–539. Usability Studies, 4(1), 1–6.
Sale, J. E., & Brazil, K. (2004). A strategy to identify critical appraisal criteria for primary
mixed-method studies. Quality and Quantity, 38, 352–365.
Sale, J. E., Lohfeld, L. H., & Brazil, K. (2002). Revisiting the quantitative–qualitative Scott R. Rosas, PhD is a Senior Consultant at Concept Systems, Inc. where he specializes
debate: Implications for mixed methods research. Quality and Quantity, 36, in the design and use of the concept mapping methodology. His work has focused on
43–53. conceptualization and measurement in evaluation using concept mapping, with
Sturrock, K., & Rocha, J. (2000). A multidimensional scaling stress evaluation table. Field attention to the validity and reliability of the approach. He received his PhD in Human
Methods, 12(1), 49–60. Development and Family Studies from the University of Delaware, with an emphasis
Tashakkori, A., & Teddlie, C. (1998). Mixed methodology: Combining qualitative and on program evaluation. He previously served as Associate Faculty at the Bloomberg
quantitative approaches. Applied Social Research Methods Series (46, pp. ). ). School of Public Health at Johns Hopkins University and is currently Adjunct Faculty in
Thousand Oaks, CA: Sage Publications. the Department of Health at SUNY-Cortland.
Trochim, W. M. K. (1989a). An introduction to concept mapping for planning and
evaluation. Evaluation and Program Planning, 12(1), 1–16. Mary Kane, MSLIS is the Chief Executive and Principal Consultant at Concept Systems,
Trochim, W. M. K. (1989b). Concept mapping: Soft science or hard art? Evaluation and Inc. Her consulting experience includes strategic and operational planning, product
Program Planning, 12(1), 87–110. and program development, education and training design, and program needs assess-
Trochim, W. M. K. (1993, November). The reliability of concept mapping. Paper ment and evaluation. She has coauthored several articles on the application of concept
presented at the Annual Conference of the American Evaluation Association. mapping across several content areas, including public and community health. Ms.
Trochim, W. M. K., Cabrera, D. A., Milstein, B., Gallagher, R. S., & Leischow, S. J. (2006). Kane is co-author of the definitive volume on concept mapping: Concept Mapping for
Practical challenges of systems thinking and modeling in public health. American Planning and Evaluation. She holds a Masters degree in Library and Information
Journal of Public Health, 96(3), 538–546. Sciences from Columbia University.

Rosas Kane, 2012, Quality and Rigor of Concept Mapping Article

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Rosas Kane, 2012, Quality and Rigor of Concept Mapping Article

Caricato da

Copyright:

Formati disponibili

Evaluation and Program Planning 35 (2012) 236–245

Contents lists available at SciVerse ScienceDirect

Evaluation and Program Planning

1. Introduction theoretical questions. As concept mapping has gained in populari-

2.3. Procedure 3.1. Participants

Sorting Rating 1 Rating 2 Sorting Rating 1 Rating 2

Face-to-face 25.7 44.7 33.6 74.1 80.3 72.8

Common study elements M SE SD Mdn Min Max 95% CI for mean

Number of statements 96.32 2.07 17.23 98.00 45 132 92.18 100.46

each concept mapping study was identiﬁed and then summarized 14

In terms of judging the acceptability of the stress values found

Study 1 Study 2 Study 3 Study 4 Study 5

Potrebbero piacerti anche