Slide Share Scale PDF

CH.
7
Scaling
 Scaling is a procedure for the assignment of

numbers (or other symbols) to a property of
objects in order to import some of the
characteristics of numbers to properties in
question
Four Scales of Measurement;
Nominal Ordinal
Scales Scales
Interval Ratio
Scales Scales
Measurement and Scaling
 A scale is a mechanism by which individuals are

distinguished as to how they differ from one another on the
variables of interest.
 A scale is a continuous series of categories and has been
defined as any series of items that are arranged
progressively according to value or magnitude, into which
an item can be placed according to its quantification
 Four popular scales in business research are:
1. Nominal scales
2. Ordinal scales
3. Interval scales
4. Ratio scales
4
SCALES
 Nominal Scales: splits data into groups, e.g., men,

women
 Ordinal Scales: ranks data in some order, e.g.,
exercising for 20 minutes is good, for 30 minutes is
better, for 40 minutes is best
 Interval Scales: sets data on a continuum, e.g.
1 2 3 4 5
very low very high
 Ratio Scales: starts with absolute zero and indicates
proportion, e.g.
0 5 10 ten is twice as big as five
 A Nominal Scale is the simplest of the four

scale types and in which the numbers or letters
assigned to objects serve as labels for
identification or classification
 Example: variable of gender
 Males = 1, Females = 2
 Sales Zone A = Islamabad, Sales Zone B = Rawalpindi
 Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda
6
 An Ordinal Scale is one that arranges objects or

alternatives according to their magnitude
 Examples:
 Career Opportunities = Moderate, Good, Excellent
 Investment Climate = Bad, inadequate, fair, good, very
good
 Merit = A grade, B grade, C grade, D grade
A problem with ordinal scales is that the difference

between categories on the scale is hard to quantify, i.e..,
excellent is better than good but how much is excellent
better?
7
 An Interval Scale allows us to perform certain arithmetical
operations on the data collected from respondents. This scale
measure the distance between any two points on the scale
 It taps the differences and the magnitudes of the differences in
the variable----Example:
8
 A Ratio Scale is a scale that possesses absolute

rather than relative qualities and has an absolute
zero point.
 Examples:
 Money
 Weight
 Distance
 Temperature on the Kelvin Scale
Interval scales allow comparisons of the differences of
magnitude (e.g. of attitudes) as well as determinations of
the actual strength of the magnitude
9
Numerical Descriptive
Type of Scale
Operation Statistics
Frequency in each
Nominal Counting category, percentage in
each category, mode
Median, range,
Ordinal Rank Ordering
percentile ranking
Arithmetic Operations on
Mean, standard
Interval Intervals between
deviation, variance
numbers
Arithmetic Operations on Geometric mean,

Ratio
actual quantities coefficient of variation
10
Finish
Nominal Numbers
7 8 3
Assigned
to Runners
Finish
OrdinalRank Order of
Winners
Third Second First
place place place
Interval Performance
8.2 9.1 9.6
Rating on a
0 to 10 Scale
Ratio Time to
Finish, in 15.2 14.1 13.4
Seconds
Classification of Scaling Techniques;
Scales
Nominal Ordinal Interval Ratio
Fixed sum
Graphic
rating
Scales
Likert
Semantic
differential
Numerical
Itemized rating
Staple
Scales
Dichotomous Fixed sum Likert
Category Graphic rating Semantic differential
Numerical
Itemized rating
Staple
Nominal scales focus on only requiring a

respondent to provide some type of
descriptor as the raw response
Example.
Please indicate your current martial status.
__Married __ Single __ Single, never married __ Widowed
Ordinal scales allow the respondent to

express “relative magnitude” between the raw
responses to a question
Example.
Which one statement best describes your opinion of an Intel PC
processor?
__ Higher than AMD’s PC processor
__ About the same as AMD’s PC processor
__ Lower than AMD’s PC processor
 Interval scales demonstrate the absolute

differences between each scale point
Example.
How likely are you to recommend the new phone to a friend?
Definitely will not Definitely will
1 2 3 4 5 6 7
Ratio scales allow for the identification of

absolute differences between each scale point,
and absolute comparisons between raw
responses
Example 1.
Please circle the number of children under 18 years of age
currently living in your household.
0 1 2 3 4 5 6 7 (if more than 7, please specify ___.)
Chapter 7
MEASUREMENT:
SCALING, RELIABILITY,
VALIDITY
Methods of Scaling;
Rating scales
 Have several response categories and are used to obtain

responses with regard to the object, event, or person studied.
Ranking scales
 Make comparisons between or among objects, events,

persons and extract the preferred choices and ranking
among them.
Rating Scales;
Measurement scales that allow a respondent to register
the degree (or amount) of a characteristic or attribute
possessed by an object directly on the scale.
Types of rating scales Formats:

1. Dichotomous scale 6. Itemized rating scale
2. Category scale 7. Constant sum scale
3. Likert scale 8. Stapel scale
4. Numerical scales 9. Graphic scale
5. Semantic differential scale 10. Consensus scale
Rating Scales Formats;
Dichotomous scale
 Is used to obtain a Yes or No answer.
 Nominal scale
Do you own a car?
 Yes
 No
Category scale
 Uses multiple items to elicit a single response.
 Nominal scale
A Category rating scale which the response options

provided for a closed-ended question are labeled
with specific verbal descriptions.
Example:
Please rate car model A on each of the following
dimensions:
Poor Fair Good V. good Excellent
a) Durability [ ] [ ] [ ] [ ] [ ]
b) Fuel consumption [ ] [ ] [ ] [ ] [ ]
A simple category scale with only two response categories

(or scale points) both of which are labeled.
Example:
Please rate brand A on each of the following dimensions:
poor excellent
a) Durability [ ] [ ]
b) Fuel consumption [ ] [ ]
Likert scale
 Is designed to examine how strongly subjects
agree or disagree with statements on a
5-point scale.
 Interval scale
The Likert Scale (Summated Ratings Scale)

 A multiple item rating scale in which the degree of an attribute
possessed by an object is determined by asking respondents to
agree or disagree with a series of positive and/or negative
statements describing the object.
 Example:
Attitude toward buying from the Internet

Totally Totally
disagree Disagree Neutral Agree agree
a) Shopping takes much longer on the Internet [ ] [ ] [ ] [ ] [ ]
b) It is a good thing that Saudi consumers have
the opportunity to buy products through the [ ] [ ] [ ] [ ] [ ]
c) Buying products over the Internet is not a
sensible thing to do [ ] [ ] [ ] [ ] [ ]
Likert scale
My work is very interesting
 Strongly disagree
 Disagree
 Neither agree nor disagree
 Agree
 Strongly agree
Semantic differential scale

 Several bipolar attributes are identified at the
extremes of the scale, and respondents are asked to
indicate their attitudes.
 Interval scale
A Semantic Differential rating scale in which bipolar adjectives

are placed at both ends (or poles) of the scale, and response
options are expressed as “semantic” space.
Example:
Please rate car model A on each of the following dimensions:
Durable ---:-X-:---:---:---:---:--- Not durable
Low fuel consumption ---:---:---:---:---:-X-:--- High fuel consumption
Numerical scale
 Similar to the semantic differential scale, with the difference
that numbers on a 5-point or 7-point scale are provided, with
bipolar adjectives at both ends.
 Interval scale
Poor Excellent
Durability 1 2 3 4 5 6 7
Durable Not Durable
Durability 1 2 3 4 5 6 7
Itemized rating scale

 A 5-point or 7-point scale with anchors, as needed, is
provided for each item and the respondent states the
appropriate number on the side of each item, or circles the
relevant number against each item.
 Interval scale
I will be changing my job within the next 12 months
1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely
Fixed or constant sum scale

 The respondents are here asked to distribute a given number
of points across various items.
 Ordinal scale
 A Constant-Sum rating scale in which respondents divide a

constant sum among different attributes of an object (usually to
indicate the relative importance of each attribute).
 Assumed to have ratio level properties.
 Example: Divide 100 points among the following dimensions to

indicate their level of importance to you when you purchase a
car:
Durability
Fuel Consumption
Total 100
Stapel scale
 This scale simultaneously measure both the direction and
intensity of the attitude toward the items under study.
 A simplified version of the semantic differential scale in which
a single adjective or descriptive phrase is used instead of
bipolar adjectives.
 Interval data
Model A
-3 -2 -1 Durable Car 1 2 3
-3 -2 -1 Good Fuel Conaumption 1 2 3
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.
SEARS
+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5
The data obtained by using a Stapel scale can be analyzed in the

same way as semantic differential data.
Graphic rating scale

 A graphical representation helps the respondents to indicate
on this scale their answers to particular question by placing a
mark at the appropriate point on the line.
 Rating scales in which respondents rate an object on a
graphic continuum, usually a straight line.
 Modified versions are the ladder scale and happy face scale.
 Ordinal scale
Graphic Rating Scales



Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000

Paired Comparison

Paired Comparison
Package -C 1 Mbps 12 GB Rs: 1000

Paired Comparison

Ranking Scales Formats;
Forced Choice
 Enable respondents to rank objects relative to one another,
among the alternatives provided.
Forced Choice
Comparative Scale
 Provides a benchmark or a point of reference to assess
attitudes toward the current object, event, or situation under
study.
Comparative Scale
Characteristics Different Types of Rating Scales
Rating Scale Subject must: Advantages Disadvantages
2.Category scale Indicate a response Flexible, easy to respond Ambiguous items, few
category categories, only gross
distinction.
3. Likert scale Evaluate statements on Easiest scale to Hard to judge what a
a 5-point scale construct single score means
4. Semantic differential Choose points between Easy to construct, norms Bipolar adjectives must
and numerical scales bipolar adjectives on exist for comparison, e.g. be found, data may be
relative dimensions profile analysis ordinal, not interval
5. Constant sum scale Divide a construct sum Scale approximates an Difficult for respondents
among response interval measure with low education
alternatives levels
6. Stapel scale Choose point on scale Easier to construct than Endpoints are
with 1 center adjective semantic differential numerical, not verbal.
7. Graphic scale Choose a point on a Visual impact, unlimited No standard answers
continuum scale points
8. Graphic scale-picture Choose a visual picture Visual impact, easy for Hard to attach a verbal
response poor readers explanation to response
Goodness of Measures;
Goodness of Measures
Understanding Validity and Reliability
Figure 8.1 Illustrations of Possible Reliability and Validity Situations in
Measurement
Situation 1 Situation 2 Situation 3
Neither Reliable Highly Reliable Highly Reliable

nor Valid nor Not Valid and Valid
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-retest reliability
Stability
Parallel-form reliability
Reliability
(accuracy in
measurement) Interitem consistency reliability
Consistency
Goodness Split-half reliability
of data
Validity
(are we
measuring
the right
thing?)
Logical validity Criterion-related Congruent validity

(content) validity (construct)
Face validity Predictive Concurrent Convergent Discriminant

 It is important to make sure that the instrument that we develop to
measure a particular concept is indeed accurately measuring the
variable, and that in fact, we are actually measuring the concept
that we set out to measure.
 This ensures that in operationally defining perceptual and

attitudinal variables, we have not overlooked some important
dimensions and elements or included some irrelevant ones.
Item Analysis
 Item analysis is done to see if the items in the instrument belong
there or not.
 Each item is examined for its ability to discriminate between those
subjects whose total scores are high, and those will low scores.
 In item analysis, the means between the high-score group and the
low-score group are tested to detect significant differences
through the t-values.
 The items with a high t-value (test which is able to identify the
highly discriminating items in the instrument) are then included in
the instrument.
Reliability
 The reliability of a measure indicates the extent to which it
is without bias (error free) and hence ensures consistent
measurement across time and across the various items in
the instrument.
 In other words, the reliability of a measure is an indication
of the stability and consistency with which the instrument
measures the concept and helps to assess the “goodness”
of a measure.
Stability of Measures
 The ability of a measure to remain the same over time —despite
uncontrollable testing conditions or the state of the respondents
themselves—is indicative of its stability and low vulnerability to
changes in the situation.
 This attests to its “goodness” because the concept is stably
measured, no matter when it is done. Two tests of stability are
test-retest reliability and parallel-form reliability.
Stability
Reliability
(accuracy in
Consistency
Goodness Split-half reliability
of data
Validity
(are we
measuring
the right
thing?)


Test-Retest Reliability
 The reliability coefficient obtained with a repetition of the same
measure on a second occasion is called test-retest reliability.
 That is, when a questionnaire is administered to a set of

respondents now, and again to the same respondents, says
several weeks to 6 months later, then the correlation between
the scores obtained at the two different times from one and the
same set of respondents is called the test-retest coefficient.
 The higher it is, the better the test-retest reliability, and

consequently, the stability of the measure across time.
Parallel-Form Reliability
 When responses on two comparable sets of measures tapping
the same construct are highly correlated, we have parallel-form
reliability.
 Both forms have similar items and the same response format, the
only changes being the wordings and the order or sequence of
the questions.
 What we try to establish here is the error variability resulting
from wording and ordering of the questions.
 If two such comparable forms are highly correlated the measures
are reasonably reliable.
Inter item Consistency Reliability

 This is a test of the consistency of respondents’ answers to all
the items in a measure.
 To the degree that items are independent measures of the same
concept, they will be correlated with one another.
 The most popular test of inter item consistency reliability is the
Cronbach’s coefficient alpha (Cronbach’s alpha; Cronbach,
1946), which is used for multipoint-scaled items, and the Kuder-
Richardson formulas (Kuder & Richardson, 1937), used for
dichotomous items.
 The higher the coefficients, the better the measuring instrument.

Split-Half Reliability
 Split-half reliability reflects the correlations between two halves
of an instrument.
 The estimates would vary depending on how the items in the
measure are split into two halves.
 Split-half reliabilities could be higher than Cronbach’s alpha only
in the circumstance of there being more than one underlying
response dimension tapped by the measure and when certain
other conditions are met as well.
 Hence, in almost all cases, Cronbach’s alpha can be considered
a perfectly adequate index of the interitem consistency reliability.

Understanding Validity and Reliability
5. Validity
Several types of validity tests are used to test the goodness of measures and
writers use different terms to denote them. For the sake of clarity, we may
group validity tests under three broad headings: content validity,
criterion-related validity, and construct validity.
5.1 Content Validity
Content validity ensures that the measure includes an adequate and
representative set of items that tap the concept. The more the scale items
represent the domain or universe of the concept being measured, the greater
the content validity. To put it differently, content validity is a function of how
well the dimensions and elements of a concept have been delineated.
Face validity is considered by some as a basic and a very minimum index of

content validity. Face validity indicates that the items that are intended to
measure a concept, do on the face of it look like they measure the concept.
Criterion-Related Validity
 Criterion-related validity is established when the measure differentiates
individuals on a criterion it is expected to predict. This can be done by
establishing con-current validity or predictive validity, as explained below.
 Concurrent validity is established when the scale discriminates individuals
who are known to be different; that is, they should score differently on the
instrument as in the example that follows.
5.3 Construct Validity

Construct validity testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed. This is assessed
through convergent and discriminant validity, which are explained below.
Convergent validity is established when the scores obtained with two different
instruments measuring the same concept are highly correlated.
Discriminant Validity is established when, based on theory, two variables are

predicted to be uncorrelated, and the scores obtained by measuring them are
indeed empirically found to be so.
Thanks
Chapter 9:
Measurement: Scaling, Reliability, Validity
Table 9.1 Types of Validity
Validity Description
Content validity Does the measure adequately measure the concept?
Face validity Do “experts” validate that the instrument measures what its
name suggests it measure?
Criterion-related validity Does the measure differentiate in a manner that helps to
predict a criterion variable?
Concurrent validity Does the measure differentiate in a manner that helps to
predict a criterion variable currently?
Predictive validity Does the measure differentiate individuals in a manner as to
help predict a future criterion?
Construct validity Does the instrument tap the concept as theorized?
Convergent validity Does the measure have low correlation with a variable
That is supposed to be unrelated to this variable?
Reliability
 Indicates the extent to which it is without bias (error
free) and hence ensures consistent measurement
across time and across the various items in the
instrument.
Goodness of Measures-Reliability
Stability of measures:
 Test-retest reliability
 Parallel-form reliability
 Correlation
Internal consistency of measures:
 Interitem consistency reliability
 Cronbach’s alpha
 Split-half reliability
 Correlation
Goodness of Measures-Validity
Validity
 Ensures the ability of a scale to measure the intended concept.
 Content validity
 Criterion related validity
 Construct validity
Content validity
 Ensures that the measure includes an adequate and
representative set of items that tap the concept.
 A panel of judges
Criterion related validity

 Is established when the measure differentiates individuals on a
criterion it is expected to predict
 Concurrent validity: established when the scale differentiates
individuals who are known to be different
 Predictive validity: indicates the ability of measuring
instrument to differentiate among individuals with reference
to future criterion
 Correlation
Construct validity
 Testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed.
 Convergent validity: established when the scores obtained
with two different instrument measuring the same concept are
highly correlated
 Discriminant validity: established when, based on theory, two
variables are predicted to be uncorrelated, and the scores
obtained by measuring them are indeed empirically found to
be so
 Correlation, factor analysis, convergent-discriminant
techniques, multitrait-multimethod analysis
Figure 8.1 Illustrations of Possible Reliability and Validity Situations in
Measurement
Situation 1 Situation 2 Situation 3
Neither Reliable Highly Reliable Highly Reliable

nor Valid nor Not Valid and Valid
Diagram 9.1
Stability
Reliability
(accuracy in
Consistency
Goodness
Split-half reliability
of data
Validity
(are we
measuring
the right
thing?)


Slide Share Scale PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Slide Share Scale PDF

Caricato da

Copyright:

Formati disponibili

CH.

 Scaling is a procedure for the assignment of

 A scale is a mechanism by which individuals are

 Nominal Scales: splits data into groups, e.g., men,

 A Nominal Scale is the simplest of the four

 Example: variable of gender

 An Ordinal Scale is one that arranges objects or

A problem with ordinal scales is that the difference

 A Ratio Scale is a scale that possesses absolute

Arithmetic Operations on Geometric mean,

Nominal Ordinal Interval Ratio

Nominal Ordinal Interval Ratio

Dichotomous Fixed sum Likert

Category Graphic rating Semantic differential

Nominal scales focus on only requiring a

Ordinal scales allow the respondent to

 Interval scales demonstrate the absolute

Ratio scales allow for the identification of

 Have several response categories and are used to obtain

 Make comparisons between or among objects, events,

Types of rating scales Formats:

Do you own a car?

A Category rating scale which the response options

A simple category scale with only two response categories

The Likert Scale (Summated Ratings Scale)

Attitude toward buying from the Internet

Semantic differential scale

A Semantic Differential rating scale in which bipolar adjectives

Durable Not Durable

Itemized rating scale

I will be changing my job within the next 12 months

Fixed or constant sum scale

 A Constant-Sum rating scale in which respondents divide a

 Example: Divide 100 points among the following dimensions to

The data obtained by using a Stapel scale can be analyzed in the

Graphic rating scale

Graphic Rating Scales

Graphic Rating Scales

Graphic Rating Scales

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -C 1 Mbps 12 GB Rs: 1000

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000

Situation 1 Situation 2 Situation 3

Neither Reliable Highly Reliable Highly Reliable

Logical validity Criterion-related Congruent validity

Face validity Predictive Concurrent Convergent Discriminant

 This ensures that in operationally defining perceptual and

Logical validity Criterion-related Congruent validity

Face validity Predictive Concurrent Convergent Discriminant

 That is, when a questionnaire is administered to a set of