Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1-1
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Location
–Mode
–Median
–Mean
–Percentiles
–Quartiles
3-2
Mode
• The most frequently occurring value in a data
set
• Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)
3-4
Arithmetic Mean
• Commonly called ‘the mean’
• is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including
extreme values
• Computed by summing all values in the data set
and dividing the sum by the number of values in
the data set
3-5
Percentiles
• Measures of central tendency that divide a group
of data into 100 parts
Quartiles
• Measures of central tendency that divide a group
of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second
quartile
• Q3: 75% of the data set is below the third
quartile
4-6
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread or the
dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
3-7
Range
The difference between the largest and the
smallest values in a set of data
• Mean Absolute Deviation
Average of the absolute deviations from the
mean
• Population Variance
Average of the squared deviations from the
arithmetic mean
3-8
Interquartile Range
3
Interquartile
Range
Q Q1
3-9
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion
C .V . 1 0 0
3-10
Measures of Central Tendency
and Variability: Grouped Data
• Measures of Central Tendency
–Mean
–Median
–Mode
• Measures of Variability
–Variance
–Standard Deviation
3-11
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
– Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal shape
– Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
3-12
Skewness
3-13
Coefficient of Skewness
• Summary measure for skewness
3 M d
S
• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not
skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
3-14
Kurtosis
• Peakedness of a distribution
– Leptokurtic: high and thin
– Mesokurtic: normal in shape
– Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
3-15
Box and Whisker Plot
• Five secific values are used:
– Median, Q2
– First quartile, Q1
– Third quartile, Q3
– Minimum value in the data set
– Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
3-16
Chap 4
1-17
Methods of Assigning Probabilities
4-18
Classical Probability
4-19
P( E ) n e
4-20
Subjective Probability
• Comes from a person’s intuition or
reasoning
• Subjective -- different individuals may
(correctly) assign different numeric
probabilities to the same event
• Degree of belief
• Useful for unique (single-trial) experiments
– New product introduction
– Initial public offering of common stock
– Site selection decisions
– Sporting events
4-21
Structure of Probability
• Experiment
• Event
• Elementary Events
• Sample Space
• Unions and Intersections
• Mutually Exclusive Events
• Independent Events
• Collectively Exhaustive Events
• Complementary Events
4-22
Experiment
• Experiment: a process that produces outcomes
– More than one possible outcome
– Only one outcome per trial
• Trial: one repetition of the process
• Elementary Event: cannot be decomposed or
broken down into other events
• Event: an outcome of an experiment
– may be an elementary event, or
– may be an aggregate of elementary events
– usually represented by an uppercase letter, e.g., A,
E1
4-23
Examples
• Interviewing 20 randomly selected consumers
and asking them which brand of appliance
they prefer
• Sampling every 200th bottle of ketchup from
an assembly line and weighing the contents
• Auditing every 10th account to detect any
errors
4-24
Sample Space
• The set of all elementary events for an
experiment
• Methods for describing a sample space
– listing
– tree diagram
– set builder notation
– Venn diagram
4-25
Sample Space: Tree Diagram for
Random Sample of Two Families
B
A C
D
A
B C
D
A
C B
D
A
D B
C
4-26
Union of Sets
• The union of two sets contains an instance
of each element of the two sets.
X 1,4,7,9 X Y
Y 2,3,4,5,6
X Y 1,2,3,4,5,6,7,9
C IBM , DEC , Apple
F Apple, Grape, Lime
C F IBM , DEC , Apple, Grape, Lime
4-27
Intersection of Sets
• The intersection of two sets contains only
those element common to the two sets.
X 1,4,7,9 X Y
Y 2,3,4,5,6
X Y 4
4-30
Complementary Events
P( Sample Space) 1
Sample
Space
A A P( A) 1 P( A)
4-31
Counting the Possibilities
• mn Rule
• Sampling from a Population with Replacement
• Combinations: Sampling from a Population
without Replacement
4-32
mn Rule
• If an operation can be done m ways and a
second operation can be done n ways, then
there are mn ways for the two operations to
occur in order.
• A cafeteria offers 5 salads, 4 meats, 8
vegetables, 3 breads, 4 desserts, and 3 drinks.
A meal is two servings of vegetables, which
may be identical, and one serving each of the
other items. How many meals are available?
4-33
Sampling from a Population with Replacement
4-34
Sampling from a Population without
Replacement- Combinations
• A tray contains 1,000 individual tax returns. If
3 returns are randomly selected without
replacement from the tray, how many
possible samples are there?
N N! 1000!
166,167,000
n n!( N n)! 3!(1000 3)!
4-35
Four Types of Probability
• Marginal Probability
• Union Probability
• Joint Probability
• Conditional Probability
4-36
Four Types of Probability
P( X ) P( X Y ) P( X Y ) P( X | Y )
The probability The probability The probability The probability
of X occurring of X or Y of X and Y of X occurring
occurring occurring given that Y
has occurred
X X Y X Y
Y
4-37
General Law of Addition
P( X Y ) P( X ) P( Y ) P( X Y )
X Y
4-38
Special Law of Addition
Y
X
4-39
Special Law of Multiplication
for Independent Events
• General Law
P( X Y ) P( X ) P( Y | X ) P( Y ) P( X | Y )
• Special Law
If events X and Y are independent,
P( X ) P( X | Y ), and P(Y ) P(Y | X ).
Consequently,
P( X Y ) P( X ) P(Y )
4-40
Law of Conditional Probability
• The conditional probability of X given Y is the
joint probability of X and Y divided by the
marginal probability of Y.
P( X Y ) P(Y | X ) P( X )
P( X | Y )
P(Y ) P(Y )
4-41
Independent Events
• If X and Y are independent events, the
occurrence of Y does not affect the probability
of X occurring.
• If X and Y are independent events, the
occurrence of X does not affect the probability
ofIfYXoccurring.
and Y are independent events,
P( X | Y ) P( X ), and
P(Y | X ) P(Y ).
4-42
Chap 5
1-43
Reasons for Sampling
5-44
Reasons for Taking a Census
5-45
Population Frame
• A list, map, directory, or other source used to represent
the population
5-46
Random Versus Nonrandom Sampling
• Random sampling
• Every unit of the population has the same probability of being
included in the sample.
• A chance mechanism is used in the selection process.
• Eliminates bias in the selection process
• Also known as probability sampling
• Nonrandom Sampling
• Every unit of the population does not have the same probability of
being included in the sample.
• Open the selection bias
• Not appropriate data collection methods for most statistical methods
• Also known as nonprobability sampling
5-47
Random Sampling Techniques
• Simple Random Sample
• Stratified Random Sample
– Proportionate
– Disportionate
• Systematic Random Sample
• Cluster (or Area) Sampling
5-48
Simple Random Sample
• Number each frame unit from 1 to N.
• Use a random number table or a random
number generator to select n distinct numbers
between 1 and N, inclusively.
• Easier to perform for small population
5-49
Simple Random Sample:
Numbered Population Frame
5-50
Simple Random Sampling:
Random Number Table
9 9 4 3 7 8 7 9 6 1 4 5 7 3 7 3 7 5 5 2 9 7 9 6 9 3 9 0 9 4 3 4 4 7 5 3 1 6 1 8
5 0 6 5 6 0 0 1 2 7 6 8 3 6 7 6 6 8 8 2 0 8 1 5 6 8 0 0 1 6 7 8 2 2 4 5 8 3 2 6
8 0 8 8 0 6 3 1 7 1 4 2 8 7 7 6 6 8 3 5 6 0 5 1 5 7 0 2 9 6 5 0 0 2 6 4 5 5 8 7
8 6 4 2 0 4 0 8 5 3 5 3 7 9 8 8 9 4 5 4 6 8 1 3 0 9 1 2 5 3 8 8 1 0 4 7 4 3 1 9
6 0 0 9 7 8 6 4 3 6 0 1 8 6 9 4 7 7 5 8 8 9 5 3 5 9 9 4 0 0 4 8 2 6 8 3 0 6 0 6
5 2 5 8 7 7 1 9 6 5 8 5 4 5 3 4 6 8 3 4 0 0 9 9 1 9 9 7 2 9 7 6 9 4 8 1 5 9 4 1
8 9 1 5 5 9 0 5 5 3 9 0 6 8 9 4 8 6 3 7 0 7 9 5 5 4 7 0 6 2 7 1 1 8 2 6 4 4 9 3
• N = 30
• n=6
5-51
Simple Random Sample:
Sample Members
• N = 30
• n=6
5-52
Stratified Random Sample
• Population is divided into nonoverlapping
subpopulations called strata
• A random sample is selected from each stratum
• Potential for reducing sampling error
• Proportionate -- the percentage of thee sample taken
from each stratum is proportionate to the percentage
that each stratum is within the population
• Disproportionate -- proportions of the strata within the
sample are different than the proportions of the strata
within the population
5-53
Stratified Random Sample: Population
of FM Radio Listeners
Stratified by Age
20 - 30 years old
(homogeneous within)
(alike) Hetergeneous
(different)
30 - 40 years old between
(homogeneous within)
(alike) Hetergeneous
(different)
40 - 50 years old between
(homogeneous within)
(alike)
5-54
Systematic Sampling
• Convenient and relatively
easy to administer N
k = ,
• Population elements are an n
ordered sequence (at least, where:
conceptually).
• The first sample element is n = sample size
selected randomly from the N = population size
first k population elements.
• Thereafter, sample elements k = size of selection interval
are selected at a constant
interval, k, from the ordered
sequence frame.
5-55
Systematic Sampling: Example
• Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000).
• A sample of fifty (n = 50) purchases orders is
needed for an audit.
• k = 10,000/50 = 200
• First sample element randomly selected from the
first 200 purchase orders. Assume the 45th
purchase order was selected.
• Subsequent sample elements: 245, 445, 645, . . .
5-56
Cluster Sampling
• Population is divided into nonoverlapping
clusters or areas
• A subset of the clusters is selected
randomly for the sample.
• If the number of elements in the subset of
clusters is larger than the desired value of
n, these clusters may be subdivided to form
a new set of clusters and subjected to a
random selection process.
5-57
Nonrandom Sampling
• Convenience Sampling: sample elements
are selected for the convenience of the
researcher
• Judgment Sampling: sample elements are
selected by the judgment of the researcher
• Quota Sampling: sample elements are
selected until the quota controls are
satisfied
• Snowball Sampling: survey subjects are
selected based on referral from other
survey respondents
5-58