Sei sulla pagina 1di 6

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/305462834

Scale Development

Chapter January 2016

CITATIONS READS

0 581

2 authors:

Louis Tay Andrew T. Jebb


Purdue University Purdue University
84 PUBLICATIONS 1,516 CITATIONS 9 PUBLICATIONS 20 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cultural and Intellectual Openness Differentially Relate to Social Judgments of Potential Work Partners
View project

Expimetrics: Experience Sampling Platform for Social Scientists View project

All content following this page was uploaded by Louis Tay on 20 July 2016.

The user has requested enhancement of the downloaded file.


Tay, L., & Jebb, A. (2017). Scale Development. In S. Rogelberg (Ed), The SAGE Encyclopedia
of Industrial and Organizational Psychology, 2nd edition. Thousand Oaks, CA: Sage.

Scale Development

Scale creation is a process of developing a reliable and valid measure of a construct in


order to assess an attribute of interest. Industrial-Organizational research involves the
measurement of organizational and psychological constructs, which present unique challenges
because they are generally unobservable (e.g., work attitudes, perceptions, personality traits). As
opposed to observable characteristics (e.g., height, precipitation, velocity) unobservable
constructs cannot be measured directly and must be assessed through indirect means, such as
self-report. Relatedly, these constructs are often very abstract (e.g., core self-evaluations),
making it difficult to determine which items adequately represent themand which ones do so
reliably. Finally, these constructs are often complex and may be composed of several different
components rather than being a single, solitary concept. As a result of these complexities,
developing a measurement instrument can be a challenging task, and validation is especially
important to the process of scale construction. This article focuses on the principles and best
practices of scale creation with regard to self-report scales.

Approaches to scale creation. There are two distinct approaches to scale creation. A
deductive approach focuses on using theory and the already-formed conceptualization of
construct to generate items within its domain. This approach is useful when the definition of the
construct is known and substantial enough to generate an initial pool of items. By contrast, an
inductive approach is useful when there is uncertainty in the definition or dimensionality of the
construct. In this case, organizational incumbents are asked to provide descriptions of the
concept and a conceptualization is then derived which then forms the basis for generating items.

Construct Definition. Regardless of the approach to scale creation, in order to create a


scale, a clear conceptualization of the construct is required. This entails delineating and defining
the construct (i.e., stating what it is, and thus what it is not) either through a thorough literature
review or through an inductive uncovering of the phenomenon. It is also important to define the
level of conceptual breadth of the target construct. A construct that is very broad (e.g., attitudes
about working in general) requires different types of items than one that is more specific (e.g.,
attitudes about performing administrative duties). Another important theoretical step is to specify
the likely number of components, or dimensions, that make up the construct. The dimensionality
of a construct can be understood as whether the construct is best conceived as being made up of a
single variable (unidimensional) or the combination of a number of distinct subcomponents
(multidimensional). For instance, job satisfaction has been conceptualized, and thus measured, as
both a unidimensional and multidimensional construct. Hakman and Oldhams 3-item scale of is
a single scale of global job satisfaction, whereas Smith et al.s Job Descriptive Index is
composed five subscales: satisfaction with pay, the work itself, promotions, coworkers, and
supervisors. Although this example shows that the same construct can be validly conceived as
both unidimensional and multidimensional, properly specifying the dimensions of a construct is
essential, as a distinct scale must be constructed for each one.

A key idea in construct definition is to outline the nomological network: i.e., how the
focal construct (and its specific dimensions) is related to other constructs. Once the construct is
defined, one can begin to specify this nomological network, which entails stating what the
construct should be positively related to, negatively related to, and relatively independent of
based on theory. The nomological network will be essential to the validation process, as a scale
that empirically relates to other established measures in the way predicted by theory displays
important types of validity evidence (convergent and divergent validity).

Purposes of created scale. Before discussing the specific principles of item writing, it is
necessary to specify the purpose of the scale. Will the scale be used for research, selection,
development, or another purpose? Is the scale intended for the general population, the population
of adult workers, or another specific population? Outlining the scales purpose and use in future
contexts will allow one to identify the unique practical concerns related to the scale. This guides
item creation in a number of ways, such as (1) determining an appropriate reading level for the
target population; (2) identifying whether the items should refer to general or specific contexts
and situations (work contexts); (3) considering differences in how respondents interpret the items
(e.g., the different meaning of the term stress in different national contexts); (4) deciding the
type of scale response format and behavioral anchors, which can potentially affect scale
responses; and (4) determining the applicability of reverse scoring, which may not be appropriate
for positive constructs such as virtues.

Principles of item writing. When writing items, one aims to create an initial item pool
that contains many more items than in the final scale (e.g., 3-4 times larger than in the final
scale). This gives the researcher more freedom about the psychometric standard of the items that
survive to the final scale. The initial redundancy and over-inclusivity in the initial item pool is
also desirable because it can serve to uncover sub-dimensions or closely related but distinct
constructs. As for the actual writing of items, recommendations from a wide range of sources
agree on the following principles: items should be simple and straightforward; one should avoid
slang, jargon, double negatives, ambiguous words, overly abstract words and favor the use of
specific and concrete words; no double-barreled items (i.e., two different ideas included in a
single question); no leading questions or statements (e.g., Most supervisors are toxic. Please
respond to how aggressive your supervisor has been to you); and items should not be identical
re-statements but should seek to state the same idea in different ways. Finally, it is often helpful
to provide the construct definition, relevant adjectives, and example scale items to item writers
when generating items.

Scale validation research design. As noted in the introduction, validation is supremely


important in the development of a self-report scale; in the measurement of unobservable
variables, one cannot simply assume that a scale measures what it intends to. Such assumptions
can lead to false scientific conclusions. There are many ways suggested by Cronbach and Meehl
on how scale validation can be conducted. Primary approaches include comparing group
differences, assessing correlations with other measures, or examining the change in scale scores
over repeated occasions. As mentioned earlier, the specification of the nomological network will
help a researcher determine the types of designs and measures to include. Group differences are
appropriate when there is an expectation that measures should discriminate between groups (e.g.,
experts vs. non-experts). On the other hand, establishing correlations with related
constructs/criteria are important for assessing convergent-divergent validity and predictive
validity. Further, changes over time can help determine the reliability and stability of the
operationalized construct. Where possible, the use of multitrait-multimethod designs are more
informative than a single method or single trait approach to scale validation.

Regarding sampling, the preliminary sample size for examining psychometric properties
of items has been recommended to be 100-200 and a later confirmatory sample size with a
minimum of 300. However, this may depend on group differences and the type of analysis one
seeks to conduct. Based on its theoretical and practical context, one should also seek to match the
validation of the scale to its scale application. For instance, if a scale is meant for a work sample
for entrepreneurs, it will be important to obtain a sample from the same subpopulation of
interest. Notably, using a broader sample than the target subpopulation can artificially raise
reliability of the scale. A recommended best practice is to cross-validate the scale across
independent samples to show that scale properties are stable and generalizable.

Scale psychometric properties. After data collection, one needs to establish the reliability
and validity of the scale items. At the first step, it is critical to identify a good set of items with
reasonable psychometric properties. This is usually done by examining the mean, standard
deviations, score range, endorsement proportions across all the options, and the item-total
correlation for each item. One should select items that have reasonable item-total correlations
(around .20 or higher), appropriate score ranges (i.e., no ceiling or floor effects), and a utilization
of different scale options.

Based on the selected items, there are different approaches for calculating reliability, but
calculation of internal consistency is the most common. In general, the rule-of-thumb for internal
consistency reliability is a minimum .70 although it is recommended that .90 or higher for high-
stakes decisions (e.g., selection). One should also calculate the reliability on sub-dimensions of
the construct.

It is important to distinguish reliability from dimensionality, as a high reliability does not


necessarily indicate unidimensionality. The number of dimensions should have been specified by
theory and be confirmed by exploratory factor analysis (EFA). The number of latent
factors/dimensions should equal the number of scales being developed. One may also seek to
replicate the factor structures across different subpopulations to ensure the generalizability of the
factor structure.
EFA loadings of items to specified dimensions should be moderate (around .4) to high
(closer to 1.0), and one may choose to delete items that inappropriately low on other dimensions
or have low loadings. After theoretically-based dimensionality is borne out in EFA, confirmatory
factor analysis (CFA) should be conducted with new sample, and the model should be evaluated
using a number of fit indices. Although there are many fit indices that can be used, some of the
most popular and useful are the comparative fit index (CFI), Tucker-Lewis index (TLI), root
mean square error of approximation (RMSEA), and standardized root mean square residual
(SRMR). General standards hold that the minimum standards of good fit for these metrics are:
CFI .90, TLI .90, RMSEA .08, SRMR .08.

After establishing reliability and factorial validity, a researcher would continue providing
validation evidence by examining evidence based on the scale validation design. This may
include examining group differences on scale scores or divergent and convergent validity based
on with other related measures. This involves examining how the new construct empirically
relates to other constructs its nomological network, and this overall process is a test of both the
scale as well as the underlying theory driving the test.

Scale revision. It is common to conduct several rounds of scale revision to improve on


the initial items. There are several reasons for this, including poor reliability, divergence between
theoretical and empirical structure, and inadequate construct representation. Revising a scale
requires analyzing items with poor item-total correlations or low loadings to discern possible
sources of poor item functioning. Where needed, one would also revise and write more items to
tap onto specific dimensions that were not adequately measured.
Recommended Reading
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale
development. Psychological Assessment, 7, 309-319.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications.
Journal of Applied Psychology, 78, 98-104.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological
Bulletin, 52, 281-302.
DeVellis, R. F. (2012). Scale development: Theory and application. Newbury Park, CA: Sage.
Drasgow, F., Nye, C. D., & Tay, L. (2010). Indicators of quality assessment. In J. C. Scott & D.
H. Reynolds (Eds.), Handbook of workplace assessment: Evidence-based practices for
selecting and developing organizational talent (pp. 27-60). San Francisco, CA: John
Wiley & Sons.
Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey question
questionnaires. Organizational Research Methods, 1, 104-121.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons'
responses and performances as scientific inquiry into score meaning. American
Psychologist, 50, 741-749.
Peterson, C., & Park, N. (2004). Classification and measurement of character strengths:
Implications for practice. In P. A. Linley & S. Joseph (Eds.), Positive psychology in
practice (pp. 433-446). Hoboken, NJ: Wiley and Sons Inc.
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision.
Psychological Assessment, 12, 287-297.
Schwarz, N. (1999). Self-reports: How the questions shape the answer. American Psychologist,
54, 93-105.
Schwarz, N., Knauper, B., Hippler, H.-J., Noelle-Neumann, E., & Clark, L. (1991). Rating
scales: Numeric values may change the meaning of scale labels. Public Opinion
Quarterly, 55, 570-582.
Smith, G. T., & McCarthy, D. M. (1995). Methodological considerations in the refinement of
clinical assessment instruments. Psychological Assessment, 7, 300-308.

View publication stats

Potrebbero piacerti anche