Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
7/6/2019
Sampling Distributions 4
7/6/2019
Sampling Distributions 5
An element is the entry on which data are collected.
A population is the collection of all the elements of
interest.
A sample is a subset of the population.
Idea is to gather information on population using
sample.
Examples:
a) 400 samples of voters and 160 voters in favor of one
candidate, showing 0.4 in favor.
• A tire manufacturer is considering producing a new tire
designed to provide an increase in mileage over the firm’s
current line of tires. For this purpose, the company
produced a sample of 120 tires for testing. The test results
provided a sample mean of 40,000 miles. Hence, an
estimate of the mean useful life for the population of new
tires was 40,000 miles 7/6/2019
Sample Statistics as Estimators of6
Population Parameters
• A sample statistic is a A population parameter is
numerical measure of a a numerical measure of a
summary characteristic summary characteristic of
of a sample. a population.
7/6/2019
Sampling Error 8
The sampling errors are the difference b/w
point estimator and population parameter or
|x | for sample mean
7/6/2019
Conducting a Census 9
• Population Mean 𝒙𝒊
𝝁=
𝑵
X i
2
• Population Standard Deviation
N
X
• Population Proportion p N
x
• Sample Proportion p$ n
7/6/2019
Population and Sample Proportions
10
• The population proportion is equal to the
number of elements in the population belonging
to the category of interest, divided by the total
number of elements in the population:
𝒂 𝒃
Population Proportion a+b = n; p = or or
𝒏 𝒏
‘a’ means trained and ‘b’ means untrained persons,
lets consider that out of 2500 managers only 1500 got
training, then p = 1500/2500 = 0.6
• The sample proportion is the number of
elements in the sample belonging to the category
of interest, divided by the sample size: p$ x
n 7/6/2019
A Population Distribution, a Sample11
from a Population, and the Population
and Sample Means
Population mean ()
Frequency distribution
of the population
X X X X X X X
X X X X X X X
X X X X
Sample points
Sample mean ( X )
7/6/2019
Sampling Methods 12
7/6/2019
Stratified Sampling 13
7/6/2019
Stratified Sampling 14
7/6/2019
Stratified Sampling 15
size of whole sample
Sample size for each layer = ∗ size of layer
size of population
7/6/2019
Cluster Sampling 16
The researcher divides the population into separate groups,
called clusters. Then, a simple random sample of clusters is
selected from the population.
• Properties:
1. The population is divided into N groups, called clusters.
2. The researcher randomly selects n clusters to include in
the sample.
3. The number of observations within each cluster Mi is
known, and M = M1 + M2 + M3 + ... + MN-1 + MN.
4. Each element of the population can be assigned to one,
and only one, cluster
Types
• One-stage sampling. All of the elements within selected
clusters are included in the sample.
• Two-stage sampling. A subset of elements within selected
clusters are randomly selected for inclusion in the sample. 7/6/2019
Cluster Sampling 17
7/6/2019
Cluster Sampling 18
7/6/2019
Systemic Sampling 19
Systemic sampling is a type of probability sampling method in
which sample members from a larger population are selected
according to a random starting point and a fixed periodic
interval. This interval, called the sampling interval, is calculated
by dividing the population size by the desired sample size
• Example: Suppose a sample of 50 pistons is required from the list
of 1000 pistons available in the stock of manufacturing section.
• The sample fraction will be 50/1000 = 1/20, thus k =20.
• The first piston from the list is selected randomly between 1 and
20.
• The first and every 20th piston is subsequently selected as sample
members.
7/6/2019
Systemic Sampling 20
7/6/2019
Sampling Distributions 21
• The sampling distribution of a statistic is the
probability distribution of all possible values the
statistic may assume, when computed from
random samples of the same size, drawn from a
specified population.
• The sampling distribution of X is the probability
distribution of all possible values the random
variable may assume when a sample of size ‘n’
is taken from a specified population.
7/6/2019
Sampling Distributions 22
Uniform population of integers from 1 to 8:
P(X)
5 0.125 0.625 0.5 0.25 0.03125 0 .1
6 0.125 0.750 1.5 2.25 0.28125
7 0.125 0.875 2.5 6.25 0.78125
8 0.125 1.000 3.5 12.25 1.53125
E(X) = = 4.5
V(X) = 2 = 5.25
SD(X) = = 2.2913
7/6/2019
Sampling Distributions 23
There are 8*8 = 64 different but Each of these samples has a
equally-likely samples of size 2 that sample mean. For example, the
can be drawn (with replacement) mean of the sample (1,4) is 2.5,
from a uniform population of the and the mean of the sample (8,4)
integers from 1 to 8: is 6.
Samples of Size 2 from Uniform (1,8) Sample Means from Uniform (1,8), n = 2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
3 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
4 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 4 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
5 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
6 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 6 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 7 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
8 8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8 8 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
7/6/2019
Sampling Distributions 24
The probability distribution of the sample mean is called
the sampling distribution of the sample mean.
0.1 0
1.0 0.015625 0.015625 -3.5 12.25 0.191406
1.5 0.031250 0.046875 -3.0 9.00 0.281250
P(X)
2.0 0.046875 0.093750 -2.5 6.25 0.292969
2.5 0.062500 0.156250 -2.0 4.00 0.250000 0.0 5
3.0 0.078125 0.234375 -1.5 2.25 0.175781
3.5 0.093750 0.328125 -1.0 1.00 0.093750
4.0 0.109375 0.437500 -0.5 0.25 0.027344
0.0 0
4.5 0.125000 0.562500 0.0 0.00 0.000000 1.0 1 .5 2.0 2 .5 3.0 3 .5 4.0 4 .5 5 .0 5.5 6 .0 6.5 7 .0 7.5 8 .0
5.0 0.109375 0.546875 0.5 0.25 0.027344 X
5.5 0.093750 0.515625 1.0 1.00 0.093750
6.0 0.078125 0.468750 1.5 2.25 0.175781
6.5
7.0
0.062500
0.046875
0.406250
0.328125
2.0
2.5
4.00
6.25
0.250000
0.292969 E ( X ) X 4.5
V ( X ) X2 2.625
7.5 0.031250 0.234375 3.0 9.00 0.281250
8.0 0.015625 0.125000 3.5 12.25 0.191406
7/6/2019
Properties of the Sampling Distribution
25
of the Sample Mean
Comparing the population distribution Uniform Distribution (1, 8)
and the sampling distribution of the 0.2
mean:
The sampling distribution is more
P(X)
0.1
P(X)
Standardized sample mean 0.05
X
Z
0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
/ n 7/6/2019
Sampling Distribution of the Mean…
26
A fair die is thrown infinitely many times,
with the random variable X = # of spots on any throw.
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
7/6/2019
Sampling Distribution of Two Dice27
A sampling distribution is created by looking at all
samples of size n=2 (i.e. two dice) and their means…
6/36
P( )
1.0 1/36 5/36
1.5 2/36
2.0 3/36
4/36
)
2.5 4/36
3.0 5/36
3/36
P(
3.5 6/36
4.0 5/36
4.5 4/36 2/36
5.0 3/36
5.5 2/36
6.0 1/36 1/36
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
7/6/2019
Sampling Distribution of the Sample Mean 29
1.
2.
7/6/2019
Sampling Distribution of the Sample Mean 30
The summaries above assume that the population is infinitely
large. However if the population is finite the standard error is
Nn
x
n N 1
Nn
Finite Population Correction (FPC) Factor =
N 1
7/6/2019
Sampling Distribution of the Sample Mean 31
If the population size is large relative to the sample size the
finite population correction factor is close to 1 and can be
ignored.
V(X) 2
X
X
n
The standard deviation of the sample mean, known as the
standard error of the mean, is equal to the population standard
deviation divided by the square root of the sample size:
SD( X ) X
X
n
7/6/2019
Sampling from a Normal Population33
When sampling from a normal population with mean and
standard deviation , the sample mean, X, has a normal sampling
distribution: 2
X ~ N ( , )
n
Sampling Distribution of the Sample Mean
This means that, as the sample
size increases, the sampling 0.4
f(X)
0.2
Normal population7/6/2019
Central Limit Theorem… 34
If the population is normal, then X is normally distributed
for all values of n.
7/6/2019
The Central Limit Theorem 35
0.25
n=5
When sampling from a population 0.20
0.15
P(x)
with mean and finite standard 0.10
0.05
P(x)
n 0.1
Large n
For “large enough” n: 0.4
0.3
f(X)
0.2
X ~ N ( , / n)
2 0.1
0.0
-
X
7/6/2019
The Central Limit Theorem Applies to Sampling36
Distributions from Any Population
Population
n=2
n = 30
X X X X
7/6/2019
The Central Limit Theorem (Example)37
Mercury makes a 2.4 liter V-6 engine used in speedboats. The
company’s engineers believe the engine delivers an average
power of 220 horsepower and that the standard deviation of
power delivered is 15 HP. A potential buyer intends to sample
100 engines (each engine is to be run a single time). What is the
probability that the sample mean will be less than 217 HP?
7/6/2019
The Central Limit Theorem (Example)38
X 217
P ( X 217) P
n n
217 220 Z 217 220
P Z P
15 15
100 10
P ( Z 2) 0.0228
7/6/2019
Student’s t Distribution 39
If the population standard deviation, , is unknown, replace
with the sample standard deviation, s. If the population is
normal, the resulting statistic: X
t
s/ n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom.
• The expected value of t is 0.
• The variance of t is greater than 1, but
approaches 1 as the number of degrees of
freedom increases. The t is flatter and has
fatter tails than does the standard normal.
• The t distribution approaches a standard
normal as the number of degrees of freedom
increases. 7/6/2019
The Sampling Distribution of the Sample 40
Proportion, p$
The sample proportion is the percentage of successes
in n binomial trials. It is the number of successes, X,
divided by the number of trials, n.
X
Sample proportion: p$
n
7/6/2019
Sample Proportion (Example) 41
In recent years, convertible sports coupes have become very popular in
Japan. Toyota is currently shipping Celicas to Los Angeles, where a
customizer does a roof lift and ships them back to Japan. Suppose that
25% of all Japanese in a given income and lifestyle category are
interested in buying Celica convertibles. A random sample of 100
Japanese consumers in the category of interest is to be selected. What is
the probability that at least 20% of those in the sample will express an
interest in a Celica convertible?
n= 100
p = 0.25
np = (100)((0.25) = 25 = E(p)
7/6/2019
Estimators and Their Properties 42
An estimator of a population parameter is a sample statistic used to
estimate the parameter. The most commonly-used estimator of the:
Population Parameter Sample Statistic
Mean () is the Mean (X)
Variance (2) is the Variance (s2)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the Proportion ( p̂)
7/6/2019
Unbiasedness 43
An estimator is said to be unbiased if its expected value is equal
to the population parameter it estimates.
7/6/2019
Unbiased and Biased Estimators 44
{
Bias
7/6/2019
Efficiency 45
An estimator is efficient if it has a relatively small variance (and
standard deviation).
Consistency
n = 10 n = 100
7/6/2019
Properties of the Sample Variance 48
The sample variance (the sum of the squared deviations from the
sample mean divided by (n-1) is an unbiased estimator of the
population variance. In contrast, the average squared deviation
from the sample mean is a biased (though consistent) estimator of the
population variance.
E(s ) E
( x x ) 2
2
2
(n 1)
(x x)2
E
2
n
7/6/2019
Degrees of Freedom 49
Consider a sample of size n=4 containing the following data
points:
x 12 14 x x4
3
x= 14
n 4
12 14 x x4 56
3
7/6/2019
Degrees of Freedom (Continued) 51
The number of degrees of freedom is equal to the total number of
measurements (these are not always raw data points), less the total
number of restrictions on the measurements. A restriction is a
quantity computed from the measurements.
The sample mean is a restriction on the sample measurements, so
after calculating the sample mean there are only (n-1) degrees of
freedom remaining with which to calculate the sample variance.
The sample variance is based on only (n-1) free data points:
7/6/2019
Example 52
A sample of size 10 is given below. We are to choose three different numbers
from which the deviations are to be taken. The first number is to be used for the
first five sample points; the second number is to be used for the next three sample
points; and the third number is to be used for the last two sample points.
Sample # 1 2 3 4 5 6 7 8 9 10
Sample 93 97 60 72 96 83 59 66 88 53
Point
SSD x x
2
• Note:
7/6/2019
Example (continued) 53
Solution: Choose the means of the corresponding sample points. These are:
83.6, 69.33, and 70.5.
Solution: df = 10 – 3 = 7.
7/6/2019
Example (continued) 54
Sample # Sample Point Mean Deviations Deviation
Squared
1 93 83.6 9.4 88.36
2 97 83.6 13.4 179.56
3 60 83.6 -23.6 556.96
4 72 83.6 -11.6 134.56
5 96 83.6 12.4 153.76
6 83 69.33 13.6667 186.7778
7 59 69.33 -10.3333 106.7778
8 66 69.33 -3.3333 11.1111
9 88 70.5 17.5 306.25
10 53 70.5 -17.5 306.25
SSD 2030.367
SSD/df 290.0524
7/6/2019
Self Assessment Example 55
The foreman of a bottling plant has observed that the amount
of soda in each “32-ounce” bottle is actually a normally
distributed random variable, with a mean of 32.2 ounces and
a standard deviation of 0.3 ounce.
a) If a customer buys one bottle, what is the probability that
the bottle will contain more than 32 ounces?
b) If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will
be greater than 32 ounces?
7/6/2019
Exercise Problem 56
Do practice by solving exercise problem 68 by C. R. Kothari
7/6/2019