Sei sulla pagina 1di 58

CHAP 3

1-1
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Location
–Mode
–Median
–Mean
–Percentiles
–Quartiles

3-2
Mode
• The most frequently occurring value in a data
set
• Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)

• Bimodal -- Data sets that have two modes


• Multimodal -- Data sets that contain more
than two modes
3-3
Median
• Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
• Unaffected by extremely large and extremely
small values.

3-4
Arithmetic Mean
• Commonly called ‘the mean’
• is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including
extreme values
• Computed by summing all values in the data set
and dividing the sum by the number of values in
the data set

3-5
Percentiles
• Measures of central tendency that divide a group
of data into 100 parts

Quartiles
• Measures of central tendency that divide a group
of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second
quartile
• Q3: 75% of the data set is below the third
quartile

4-6
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread or the
dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
3-7
Range
The difference between the largest and the
smallest values in a set of data
• Mean Absolute Deviation
Average of the absolute deviations from the
mean
• Population Variance
Average of the squared deviations from the
arithmetic mean

3-8
Interquartile Range

• Range of values between the first and third


quartiles
• Range of the “middle half”
• Less influenced by extremes

 3
Interquartile
Range
Q Q1

3-9
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion


C .V .   1 0 0 

3-10
Measures of Central Tendency
and Variability: Grouped Data
• Measures of Central Tendency
–Mean
–Median
–Mode
• Measures of Variability
–Variance
–Standard Deviation

3-11
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
– Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness

3-12
Skewness

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed

3-13
Coefficient of Skewness
• Summary measure for skewness

3    M d 
S 

• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).

3-14
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out
Leptokurtic

Mesokurtic
Platykurtic

3-15
Box and Whisker Plot
• Five secific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR

3-16
Chap 4

1-17
Methods of Assigning Probabilities

• Classical method of assigning probability


(rules and laws)
• Relative frequency of occurrence
(cumulated historical data)
• Subjective Probability (personal intuition or
reasoning)

4-18
Classical Probability

• Number of outcomes leading to


the event divided by the total
n
P( E )  e
N
number of outcomes possible
Where:
• Each outcome is equally likely
N  total number of outcomes
• Determined a priori -- before
performing the experiment ne  number of outcomes in E
• Applicable to games of chance
• Objective -- everyone correctly
using the method assigns an
identical probability

4-19
P( E )  n e

Relative Frequency Probability


Where:
N

N  total number of trials


n  number of outcomes
• Based
e
on historical data
producing E
• Computed after
performing the
experiment
• Number of times an
event occurred divided
by the number of trials
• Objective -- everyone
correctly using the
method assigns an
identical probability

4-20
Subjective Probability
• Comes from a person’s intuition or
reasoning
• Subjective -- different individuals may
(correctly) assign different numeric
probabilities to the same event
• Degree of belief
• Useful for unique (single-trial) experiments
– New product introduction
– Initial public offering of common stock
– Site selection decisions
– Sporting events

4-21
Structure of Probability
• Experiment
• Event
• Elementary Events
• Sample Space
• Unions and Intersections
• Mutually Exclusive Events
• Independent Events
• Collectively Exhaustive Events
• Complementary Events
4-22
Experiment
• Experiment: a process that produces outcomes
– More than one possible outcome
– Only one outcome per trial
• Trial: one repetition of the process
• Elementary Event: cannot be decomposed or
broken down into other events
• Event: an outcome of an experiment
– may be an elementary event, or
– may be an aggregate of elementary events
– usually represented by an uppercase letter, e.g., A,
E1
4-23
Examples
• Interviewing 20 randomly selected consumers
and asking them which brand of appliance
they prefer
• Sampling every 200th bottle of ketchup from
an assembly line and weighing the contents
• Auditing every 10th account to detect any
errors

4-24
Sample Space
• The set of all elementary events for an
experiment
• Methods for describing a sample space
– listing
– tree diagram
– set builder notation
– Venn diagram

4-25
Sample Space: Tree Diagram for
Random Sample of Two Families
B
A C
D
A
B C
D
A
C B
D
A
D B
C
4-26
Union of Sets
• The union of two sets contains an instance
of each element of the two sets.
X  1,4,7,9 X Y
Y  2,3,4,5,6
X  Y  1,2,3,4,5,6,7,9
C   IBM , DEC , Apple
F   Apple, Grape, Lime
C  F   IBM , DEC , Apple, Grape, Lime

4-27
Intersection of Sets
• The intersection of two sets contains only
those element common to the two sets.

X  1,4,7,9 X Y
Y  2,3,4,5,6
X  Y   4

C   IBM , DEC , Apple


F   Apple, Grape, Lime
C  F   Apple
4-28
Mutually Exclusive Events
• Events with no
common outcomes
• Occurrence of one
event precludes the
occurrence of the X Y
other event

C   IBM , DEC , Apple X  1,7,9 P( X  Y )  0


F   Grape, Lime Y  2,3,4,5,6
CF   X Y   
4-29
Independent Events
• Occurrence of one event does not affect the
occurrence or nonoccurrence of the other
event
• The conditional probability of X given Y is
equal to the marginal probability of X.
• The conditional probability of Y given X is
equal to the marginal probability of Y.
P( X | Y )  P( X ) and P(Y | X )  P(Y )

4-30
Complementary Events

• All elementary events not in the event ‘A’ are


in its complementary event.

P( Sample Space)  1
Sample
Space
A A P( A)  1  P( A)

4-31
Counting the Possibilities
• mn Rule
• Sampling from a Population with Replacement
• Combinations: Sampling from a Population
without Replacement

4-32
mn Rule
• If an operation can be done m ways and a
second operation can be done n ways, then
there are mn ways for the two operations to
occur in order.
• A cafeteria offers 5 salads, 4 meats, 8
vegetables, 3 breads, 4 desserts, and 3 drinks.
A meal is two servings of vegetables, which
may be identical, and one serving each of the
other items. How many meals are available?

4-33
Sampling from a Population with Replacement

• A tray contains 1,000 individual tax returns. If


3 returns are randomly selected with
replacement from the tray, how many possible
samples are there?
• (N)n = (1,000)3 = 1,000,000,000

4-34
Sampling from a Population without
Replacement- Combinations
• A tray contains 1,000 individual tax returns. If
3 returns are randomly selected without
replacement from the tray, how many
possible samples are there?
N N! 1000!
    166,167,000
 n  n!( N  n)! 3!(1000  3)!

4-35
Four Types of Probability
• Marginal Probability
• Union Probability
• Joint Probability
• Conditional Probability

4-36
Four Types of Probability

Marginal Union Joint Conditional

P( X ) P( X  Y ) P( X  Y ) P( X | Y )
The probability The probability The probability The probability
of X occurring of X or Y of X and Y of X occurring
occurring occurring given that Y
has occurred

X X Y X Y
Y

4-37
General Law of Addition

P( X  Y )  P( X )  P( Y )  P( X  Y )

X Y

4-38
Special Law of Addition

If X and Y are mutually exclusive,


P( X  Y )  P( X )  P(Y )

Y
X

4-39
Special Law of Multiplication
for Independent Events

• General Law
P( X  Y )  P( X )  P( Y | X )  P( Y )  P( X | Y )

• Special Law
If events X and Y are independent,
P( X )  P( X | Y ), and P(Y )  P(Y | X ).
Consequently,
P( X  Y )  P( X )  P(Y )
4-40
Law of Conditional Probability
• The conditional probability of X given Y is the
joint probability of X and Y divided by the
marginal probability of Y.

P( X  Y ) P(Y | X )  P( X )
P( X | Y )  
P(Y ) P(Y )

4-41
Independent Events
• If X and Y are independent events, the
occurrence of Y does not affect the probability
of X occurring.
• If X and Y are independent events, the
occurrence of X does not affect the probability
ofIfYXoccurring.
and Y are independent events,
P( X | Y )  P( X ), and
P(Y | X )  P(Y ).
4-42
Chap 5

1-43
Reasons for Sampling

• Sampling can save money.


• Sampling can save time.
• For given resources, sampling can broaden the
scope of the data set.
• Because the research process is sometimes
destructive, the sample can save product.
• If accessing the population is impossible;
sampling is the only option.

5-44
Reasons for Taking a Census

• Eliminate the possibility that a random sample


is not representative of the population.

• The person authorizing the study is


uncomfortable with sample information.

5-45
Population Frame
• A list, map, directory, or other source used to represent
the population

• Overregistration -- the frame contains all members of the


target population and some additional elements
Example: using the chamber of commerce
membership directory as the frame for a target
population of member businesses owned by women.

• Underregistration -- the frame does not contain all


members of the target population.
Example: using the chamber of commerce
membership directory as the frame for a target
population of all businesses.

5-46
Random Versus Nonrandom Sampling
• Random sampling
• Every unit of the population has the same probability of being
included in the sample.
• A chance mechanism is used in the selection process.
• Eliminates bias in the selection process
• Also known as probability sampling
• Nonrandom Sampling
• Every unit of the population does not have the same probability of
being included in the sample.
• Open the selection bias
• Not appropriate data collection methods for most statistical methods
• Also known as nonprobability sampling

5-47
Random Sampling Techniques
• Simple Random Sample
• Stratified Random Sample
– Proportionate
– Disportionate
• Systematic Random Sample
• Cluster (or Area) Sampling

5-48
Simple Random Sample
• Number each frame unit from 1 to N.
• Use a random number table or a random
number generator to select n distinct numbers
between 1 and N, inclusively.
• Easier to perform for small population

5-49
Simple Random Sample:
Numbered Population Frame

01 Alaska Airlines 11 DuPont 21 Lucent


02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner

5-50
Simple Random Sampling:
Random Number Table

9 9 4 3 7 8 7 9 6 1 4 5 7 3 7 3 7 5 5 2 9 7 9 6 9 3 9 0 9 4 3 4 4 7 5 3 1 6 1 8
5 0 6 5 6 0 0 1 2 7 6 8 3 6 7 6 6 8 8 2 0 8 1 5 6 8 0 0 1 6 7 8 2 2 4 5 8 3 2 6
8 0 8 8 0 6 3 1 7 1 4 2 8 7 7 6 6 8 3 5 6 0 5 1 5 7 0 2 9 6 5 0 0 2 6 4 5 5 8 7
8 6 4 2 0 4 0 8 5 3 5 3 7 9 8 8 9 4 5 4 6 8 1 3 0 9 1 2 5 3 8 8 1 0 4 7 4 3 1 9
6 0 0 9 7 8 6 4 3 6 0 1 8 6 9 4 7 7 5 8 8 9 5 3 5 9 9 4 0 0 4 8 2 6 8 3 0 6 0 6
5 2 5 8 7 7 1 9 6 5 8 5 4 5 3 4 6 8 3 4 0 0 9 9 1 9 9 7 2 9 7 6 9 4 8 1 5 9 4 1
8 9 1 5 5 9 0 5 5 3 9 0 6 8 9 4 8 6 3 7 0 7 9 5 5 4 7 0 6 2 7 1 1 8 2 6 4 4 9 3

• N = 30
• n=6

5-51
Simple Random Sample:
Sample Members

01 Alaska Airlines 11 DuPont 21 Lucent


02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner

• N = 30
• n=6

5-52
Stratified Random Sample
• Population is divided into nonoverlapping
subpopulations called strata
• A random sample is selected from each stratum
• Potential for reducing sampling error
• Proportionate -- the percentage of thee sample taken
from each stratum is proportionate to the percentage
that each stratum is within the population
• Disproportionate -- proportions of the strata within the
sample are different than the proportions of the strata
within the population

5-53
Stratified Random Sample: Population
of FM Radio Listeners

Stratified by Age

20 - 30 years old
(homogeneous within)
(alike) Hetergeneous
(different)
30 - 40 years old between
(homogeneous within)
(alike) Hetergeneous
(different)
40 - 50 years old between
(homogeneous within)
(alike)

5-54
Systematic Sampling
• Convenient and relatively
easy to administer N
k = ,
• Population elements are an n
ordered sequence (at least, where:
conceptually).
• The first sample element is n = sample size
selected randomly from the N = population size
first k population elements.
• Thereafter, sample elements k = size of selection interval
are selected at a constant
interval, k, from the ordered
sequence frame.

5-55
Systematic Sampling: Example
• Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000).
• A sample of fifty (n = 50) purchases orders is
needed for an audit.
• k = 10,000/50 = 200
• First sample element randomly selected from the
first 200 purchase orders. Assume the 45th
purchase order was selected.
• Subsequent sample elements: 245, 445, 645, . . .

5-56
Cluster Sampling
• Population is divided into nonoverlapping
clusters or areas
• A subset of the clusters is selected
randomly for the sample.
• If the number of elements in the subset of
clusters is larger than the desired value of
n, these clusters may be subdivided to form
a new set of clusters and subjected to a
random selection process.
5-57
Nonrandom Sampling
• Convenience Sampling: sample elements
are selected for the convenience of the
researcher
• Judgment Sampling: sample elements are
selected by the judgment of the researcher
• Quota Sampling: sample elements are
selected until the quota controls are
satisfied
• Snowball Sampling: survey subjects are
selected based on referral from other
survey respondents

5-58

Potrebbero piacerti anche