Sei sulla pagina 1di 77

CH.

7
Scaling

 Scaling is a procedure for the assignment of


numbers (or other symbols) to a property of
objects in order to import some of the
characteristics of numbers to properties in
question
Four Scales of Measurement;

Nominal Ordinal
Scales Scales

Interval Ratio
Scales Scales
Measurement and Scaling

 A scale is a mechanism by which individuals are


distinguished as to how they differ from one another on the
variables of interest.
 A scale is a continuous series of categories and has been
defined as any series of items that are arranged
progressively according to value or magnitude, into which
an item can be placed according to its quantification
 Four popular scales in business research are:

1. Nominal scales
2. Ordinal scales
3. Interval scales
4. Ratio scales

4
SCALES

 Nominal Scales: splits data into groups, e.g., men,


women
 Ordinal Scales: ranks data in some order, e.g.,
exercising for 20 minutes is good, for 30 minutes is
better, for 40 minutes is best
 Interval Scales: sets data on a continuum, e.g.
1 2 3 4 5
very low very high
 Ratio Scales: starts with absolute zero and indicates
proportion, e.g.
0 5 10 ten is twice as big as five
Measurement and Scaling

 A Nominal Scale is the simplest of the four


scale types and in which the numbers or letters
assigned to objects serve as labels for
identification or classification

 Example: variable of gender

 Males = 1, Females = 2
 Sales Zone A = Islamabad, Sales Zone B = Rawalpindi
 Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda

6
Measurement and Scaling

 An Ordinal Scale is one that arranges objects or


alternatives according to their magnitude
 Examples:
 Career Opportunities = Moderate, Good, Excellent
 Investment Climate = Bad, inadequate, fair, good, very
good
 Merit = A grade, B grade, C grade, D grade

A problem with ordinal scales is that the difference


between categories on the scale is hard to quantify, i.e..,
excellent is better than good but how much is excellent
better?

7
Measurement and Scaling
 An Interval Scale allows us to perform certain arithmetical
operations on the data collected from respondents. This scale
measure the distance between any two points on the scale
 It taps the differences and the magnitudes of the differences in
the variable----Example:

8
Measurement and Scaling

 A Ratio Scale is a scale that possesses absolute


rather than relative qualities and has an absolute
zero point.
 Examples:
 Money
 Weight
 Distance
 Temperature on the Kelvin Scale
Interval scales allow comparisons of the differences of
magnitude (e.g. of attitudes) as well as determinations of
the actual strength of the magnitude

9
Measurement and Scaling

Numerical Descriptive
Type of Scale
Operation Statistics
Frequency in each
Nominal Counting category, percentage in
each category, mode

Median, range,
Ordinal Rank Ordering
percentile ranking

Arithmetic Operations on
Mean, standard
Interval Intervals between
deviation, variance
numbers

Arithmetic Operations on Geometric mean,


Ratio
actual quantities coefficient of variation

10
Four Scales of Measurement;
Finish
Nominal Numbers
7 8 3
Assigned
to Runners
Finish
OrdinalRank Order of
Winners
Third Second First
place place place

Interval Performance
8.2 9.1 9.6
Rating on a
0 to 10 Scale

Ratio Time to
Finish, in 15.2 14.1 13.4
Seconds
Classification of Scaling Techniques;

Scales

Nominal Ordinal Interval Ratio

Fixed sum

Graphic
rating
Classification of Scaling Techniques;

Scales
Nominal Ordinal Interval Ratio

Likert
Semantic
differential

Numerical

Itemized rating

Staple
Classification of Scaling Techniques;

Scales

Nominal Ordinal Interval Ratio

Dichotomous Fixed sum Likert

Category Graphic rating Semantic differential

Numerical

Itemized rating

Staple
Four Scales of Measurement;

Nominal scales focus on only requiring a


respondent to provide some type of
descriptor as the raw response

Example.
Please indicate your current martial status.
__Married __ Single __ Single, never married __ Widowed
Four Scales of Measurement;

Ordinal scales allow the respondent to


express “relative magnitude” between the raw
responses to a question

Example.
Which one statement best describes your opinion of an Intel PC
processor?
__ Higher than AMD’s PC processor
__ About the same as AMD’s PC processor
__ Lower than AMD’s PC processor
Four Scales of Measurement;

 Interval scales demonstrate the absolute


differences between each scale point

Example.
How likely are you to recommend the new phone to a friend?
Definitely will not Definitely will
1 2 3 4 5 6 7
Four Scales of Measurement;

Ratio scales allow for the identification of


absolute differences between each scale point,
and absolute comparisons between raw
responses

Example 1.
Please circle the number of children under 18 years of age
currently living in your household.
0 1 2 3 4 5 6 7 (if more than 7, please specify ___.)
Chapter 7

MEASUREMENT:

SCALING, RELIABILITY,
VALIDITY
Methods of Scaling;

Rating scales

 Have several response categories and are used to obtain


responses with regard to the object, event, or person studied.

Ranking scales

 Make comparisons between or among objects, events,


persons and extract the preferred choices and ranking
among them.
Rating Scales;
Measurement scales that allow a respondent to register
the degree (or amount) of a characteristic or attribute
possessed by an object directly on the scale.

Types of rating scales Formats:


1. Dichotomous scale 6. Itemized rating scale
2. Category scale 7. Constant sum scale
3. Likert scale 8. Stapel scale
4. Numerical scales 9. Graphic scale
5. Semantic differential scale 10. Consensus scale
Rating Scales Formats;

Dichotomous scale
 Is used to obtain a Yes or No answer.
 Nominal scale

Do you own a car?

 Yes
 No
Rating Scales Formats;

Category scale
 Uses multiple items to elicit a single response.
 Nominal scale
Rating Scales Formats;

A Category rating scale which the response options


provided for a closed-ended question are labeled
with specific verbal descriptions.
Example:
Please rate car model A on each of the following
dimensions:
Poor Fair Good V. good Excellent

a) Durability [ ] [ ] [ ] [ ] [ ]

b) Fuel consumption [ ] [ ] [ ] [ ] [ ]
Rating Scales Formats;

A simple category scale with only two response categories


(or scale points) both of which are labeled.

Example:
Please rate brand A on each of the following dimensions:
poor excellent
a) Durability [ ] [ ]
b) Fuel consumption [ ] [ ]
Rating Scales Formats;

Likert scale
 Is designed to examine how strongly subjects
agree or disagree with statements on a
5-point scale.

 Interval scale
Rating Scales Formats;

The Likert Scale (Summated Ratings Scale)


 A multiple item rating scale in which the degree of an attribute
possessed by an object is determined by asking respondents to
agree or disagree with a series of positive and/or negative
statements describing the object.
 Example:

Attitude toward buying from the Internet


Totally Totally
disagree Disagree Neutral Agree agree
a) Shopping takes much longer on the Internet [ ] [ ] [ ] [ ] [ ]
b) It is a good thing that Saudi consumers have
the opportunity to buy products through the [ ] [ ] [ ] [ ] [ ]
c) Buying products over the Internet is not a
sensible thing to do [ ] [ ] [ ] [ ] [ ]
Rating Scales Formats;

Likert scale
My work is very interesting

 Strongly disagree
 Disagree
 Neither agree nor disagree
 Agree
 Strongly agree
Rating Scales Formats;

Semantic differential scale


 Several bipolar attributes are identified at the
extremes of the scale, and respondents are asked to
indicate their attitudes.
 Interval scale
Rating Scales Formats;

A Semantic Differential rating scale in which bipolar adjectives


are placed at both ends (or poles) of the scale, and response
options are expressed as “semantic” space.

Example:
Please rate car model A on each of the following dimensions:
Durable ---:-X-:---:---:---:---:--- Not durable
Low fuel consumption ---:---:---:---:---:-X-:--- High fuel consumption
Rating Scales Formats;

Numerical scale
 Similar to the semantic differential scale, with the difference
that numbers on a 5-point or 7-point scale are provided, with
bipolar adjectives at both ends.
 Interval scale

Poor Excellent

Durability 1 2 3 4 5 6 7

Durable Not Durable

Durability 1 2 3 4 5 6 7
Rating Scales Formats;

Itemized rating scale


 A 5-point or 7-point scale with anchors, as needed, is
provided for each item and the respondent states the
appropriate number on the side of each item, or circles the
relevant number against each item.
 Interval scale

I will be changing my job within the next 12 months

1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely
Rating Scales Formats;

Fixed or constant sum scale


 The respondents are here asked to distribute a given number
of points across various items.
 Ordinal scale
Rating Scales Formats;

 A Constant-Sum rating scale in which respondents divide a


constant sum among different attributes of an object (usually to
indicate the relative importance of each attribute).
 Assumed to have ratio level properties.

 Example: Divide 100 points among the following dimensions to


indicate their level of importance to you when you purchase a
car:
Durability
Fuel Consumption
Total 100
Rating Scales Formats;

Stapel scale
 This scale simultaneously measure both the direction and
intensity of the attitude toward the items under study.
 A simplified version of the semantic differential scale in which
a single adjective or descriptive phrase is used instead of
bipolar adjectives.
 Interval data

Model A
-3 -2 -1 Durable Car 1 2 3
-3 -2 -1 Good Fuel Conaumption 1 2 3
Rating Scales Formats;
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.

SEARS

+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5

The data obtained by using a Stapel scale can be analyzed in the


same way as semantic differential data.
Rating Scales Formats;

Graphic rating scale


 A graphical representation helps the respondents to indicate
on this scale their answers to particular question by placing a
mark at the appropriate point on the line.
 Rating scales in which respondents rate an object on a
graphic continuum, usually a straight line.
 Modified versions are the ladder scale and happy face scale.

 Ordinal scale
Rating Scales Formats;

Graphic Rating Scales


Rating Scales Formats;

Graphic Rating Scales


Rating Scales Formats;

Graphic Rating Scales


Rating Scales Formats;

Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000


Rating Scales Formats;

Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000


Rating Scales Formats;

Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -C 1 Mbps 12 GB Rs: 1000


Rating Scales Formats;

Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination

Package -A 512 kbps 8 GB Rs: 750

Package -B 1 Mbps 8 GB Rs: 850

Package -C 512 Kbps 12 GB Rs: 900

Package -D 1 Mbps 12 GB Rs: 1000


Ranking Scales Formats;
Ranking Scales Formats;

Forced Choice
 Enable respondents to rank objects relative to one another,
among the alternatives provided.
Ranking Scales Formats;

Forced Choice
Ranking Scales Formats;

Comparative Scale
 Provides a benchmark or a point of reference to assess
attitudes toward the current object, event, or situation under
study.
Ranking Scales Formats;

Comparative Scale
Characteristics Different Types of Rating Scales
Rating Scale Subject must: Advantages Disadvantages

2.Category scale Indicate a response Flexible, easy to respond Ambiguous items, few
category categories, only gross
distinction.
3. Likert scale Evaluate statements on Easiest scale to Hard to judge what a
a 5-point scale construct single score means
4. Semantic differential Choose points between Easy to construct, norms Bipolar adjectives must
and numerical scales bipolar adjectives on exist for comparison, e.g. be found, data may be
relative dimensions profile analysis ordinal, not interval
5. Constant sum scale Divide a construct sum Scale approximates an Difficult for respondents
among response interval measure with low education
alternatives levels
6. Stapel scale Choose point on scale Easier to construct than Endpoints are
with 1 center adjective semantic differential numerical, not verbal.
7. Graphic scale Choose a point on a Visual impact, unlimited No standard answers
continuum scale points
8. Graphic scale-picture Choose a visual picture Visual impact, easy for Hard to attach a verbal
response poor readers explanation to response
Goodness of Measures;

Goodness of Measures
Understanding Validity and Reliability
Figure 8.1 Illustrations of Possible Reliability and Validity Situations in
Measurement

Situation 1 Situation 2 Situation 3

Neither Reliable Highly Reliable Highly Reliable


nor Valid nor Not Valid and Valid
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-retest reliability
Stability
Parallel-form reliability
Reliability
(accuracy in
measurement) Interitem consistency reliability
Consistency
Goodness Split-half reliability
of data

Validity
(are we
measuring
the right
thing?)

Logical validity Criterion-related Congruent validity


(content) validity (construct)

Face validity Predictive Concurrent Convergent Discriminant


Goodness of Measures;

Goodness of Measures
 It is important to make sure that the instrument that we develop to
measure a particular concept is indeed accurately measuring the
variable, and that in fact, we are actually measuring the concept
that we set out to measure.

 This ensures that in operationally defining perceptual and


attitudinal variables, we have not overlooked some important
dimensions and elements or included some irrelevant ones.
Goodness of Measures;

Item Analysis
 Item analysis is done to see if the items in the instrument belong
there or not.
 Each item is examined for its ability to discriminate between those
subjects whose total scores are high, and those will low scores.
 In item analysis, the means between the high-score group and the
low-score group are tested to detect significant differences
through the t-values.
 The items with a high t-value (test which is able to identify the
highly discriminating items in the instrument) are then included in
the instrument.
Goodness of Measures;

Reliability
 The reliability of a measure indicates the extent to which it
is without bias (error free) and hence ensures consistent
measurement across time and across the various items in
the instrument.
 In other words, the reliability of a measure is an indication
of the stability and consistency with which the instrument
measures the concept and helps to assess the “goodness”
of a measure.
Goodness of Measures;

Stability of Measures
 The ability of a measure to remain the same over time —despite
uncontrollable testing conditions or the state of the respondents
themselves—is indicative of its stability and low vulnerability to
changes in the situation.
 This attests to its “goodness” because the concept is stably
measured, no matter when it is done. Two tests of stability are
test-retest reliability and parallel-form reliability.
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-retest reliability
Stability
Parallel-form reliability
Reliability
(accuracy in
measurement) Interitem consistency reliability
Consistency
Goodness Split-half reliability
of data

Validity
(are we
measuring
the right
thing?)

Logical validity Criterion-related Congruent validity


(content) validity (construct)

Face validity Predictive Concurrent Convergent Discriminant


Goodness of Measures;

Test-Retest Reliability
 The reliability coefficient obtained with a repetition of the same
measure on a second occasion is called test-retest reliability.

 That is, when a questionnaire is administered to a set of


respondents now, and again to the same respondents, says
several weeks to 6 months later, then the correlation between
the scores obtained at the two different times from one and the
same set of respondents is called the test-retest coefficient.

 The higher it is, the better the test-retest reliability, and


consequently, the stability of the measure across time.
Goodness of Measures;

Parallel-Form Reliability
 When responses on two comparable sets of measures tapping
the same construct are highly correlated, we have parallel-form
reliability.
 Both forms have similar items and the same response format, the
only changes being the wordings and the order or sequence of
the questions.
 What we try to establish here is the error variability resulting
from wording and ordering of the questions.
 If two such comparable forms are highly correlated the measures
are reasonably reliable.
Goodness of Measures;

Inter item Consistency Reliability


 This is a test of the consistency of respondents’ answers to all
the items in a measure.
 To the degree that items are independent measures of the same
concept, they will be correlated with one another.
 The most popular test of inter item consistency reliability is the
Cronbach’s coefficient alpha (Cronbach’s alpha; Cronbach,
1946), which is used for multipoint-scaled items, and the Kuder-
Richardson formulas (Kuder & Richardson, 1937), used for
dichotomous items.
 The higher the coefficients, the better the measuring instrument.


Goodness of Measures;

Split-Half Reliability
 Split-half reliability reflects the correlations between two halves
of an instrument.
 The estimates would vary depending on how the items in the
measure are split into two halves.
 Split-half reliabilities could be higher than Cronbach’s alpha only
in the circumstance of there being more than one underlying
response dimension tapped by the measure and when certain
other conditions are met as well.
 Hence, in almost all cases, Cronbach’s alpha can be considered
a perfectly adequate index of the interitem consistency reliability.


Understanding Validity and Reliability
Goodness of Measures;

5. Validity
Several types of validity tests are used to test the goodness of measures and
writers use different terms to denote them. For the sake of clarity, we may
group validity tests under three broad headings: content validity,
criterion-related validity, and construct validity.
5.1 Content Validity
Content validity ensures that the measure includes an adequate and
representative set of items that tap the concept. The more the scale items
represent the domain or universe of the concept being measured, the greater
the content validity. To put it differently, content validity is a function of how
well the dimensions and elements of a concept have been delineated.

Face validity is considered by some as a basic and a very minimum index of


content validity. Face validity indicates that the items that are intended to
measure a concept, do on the face of it look like they measure the concept.
Goodness of Measures;

Criterion-Related Validity
 Criterion-related validity is established when the measure differentiates
individuals on a criterion it is expected to predict. This can be done by
establishing con-current validity or predictive validity, as explained below.
 Concurrent validity is established when the scale discriminates individuals
who are known to be different; that is, they should score differently on the
instrument as in the example that follows.
Goodness of Measures;

5.3 Construct Validity


Construct validity testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed. This is assessed
through convergent and discriminant validity, which are explained below.

Convergent validity is established when the scores obtained with two different
instruments measuring the same concept are highly correlated.

Discriminant Validity is established when, based on theory, two variables are


predicted to be uncorrelated, and the scores obtained by measuring them are
indeed empirically found to be so.
Thanks
Chapter 9:
Measurement: Scaling, Reliability, Validity

Table 9.1 Types of Validity

Validity Description
Content validity Does the measure adequately measure the concept?
Face validity Do “experts” validate that the instrument measures what its
name suggests it measure?
Criterion-related validity Does the measure differentiate in a manner that helps to
predict a criterion variable?
Concurrent validity Does the measure differentiate in a manner that helps to
predict a criterion variable currently?
Predictive validity Does the measure differentiate individuals in a manner as to
help predict a future criterion?
Construct validity Does the instrument tap the concept as theorized?
Convergent validity Does the measure have low correlation with a variable
That is supposed to be unrelated to this variable?
Goodness of Measures

Reliability
 Indicates the extent to which it is without bias (error
free) and hence ensures consistent measurement
across time and across the various items in the
instrument.
Goodness of Measures-Reliability
Stability of measures:
 Test-retest reliability
 Parallel-form reliability
 Correlation
Internal consistency of measures:
 Interitem consistency reliability
 Cronbach’s alpha
 Split-half reliability
 Correlation
Goodness of Measures-Validity

Validity
 Ensures the ability of a scale to measure the intended concept.
 Content validity
 Criterion related validity
 Construct validity
Goodness of Measures-Validity

Content validity
 Ensures that the measure includes an adequate and
representative set of items that tap the concept.
 A panel of judges
Goodness of Measures-Validity

Criterion related validity


 Is established when the measure differentiates individuals on a
criterion it is expected to predict
 Concurrent validity: established when the scale differentiates
individuals who are known to be different
 Predictive validity: indicates the ability of measuring
instrument to differentiate among individuals with reference
to future criterion
 Correlation
Goodness of Measures-Validity

Construct validity
 Testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed.
 Convergent validity: established when the scores obtained
with two different instrument measuring the same concept are
highly correlated
 Discriminant validity: established when, based on theory, two
variables are predicted to be uncorrelated, and the scores
obtained by measuring them are indeed empirically found to
be so
 Correlation, factor analysis, convergent-discriminant
techniques, multitrait-multimethod analysis
Figure 8.1 Illustrations of Possible Reliability and Validity Situations in
Measurement

Situation 1 Situation 2 Situation 3

Neither Reliable Highly Reliable Highly Reliable


nor Valid nor Not Valid and Valid
Diagram 9.1
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-retest reliability
Stability
Parallel-form reliability
Reliability
(accuracy in
measurement) Interitem consistency reliability
Consistency
Goodness
Split-half reliability
of data

Validity
(are we
measuring
the right
thing?)

Logical validity Criterion-related Congruent validity


(content) validity (construct)

Face validity Predictive Concurrent Convergent Discriminant

Potrebbero piacerti anche