Sei sulla pagina 1di 5

PS II 2015

Problem Set 2 : sampling distribution

Solutions

1. Discuss whether normal tables can be of use in each of the following situations:
a) Weights of a group of adults are approximately normally distributed with mean 70 kgs and
standard deviation 25 kgs. We want to know the probability that the average weight of 10
randomly selected people is more than 100 kgs.
b) Salaries at a large corporation have mean of 40, 000 and standard deviation of 20,000.
We want to know the probability that a randomly selected employee makes more than
50,000.
c) A club has 50 members, 10 of which want the president to be deposed. We want to compute
the probability that if we select 20 members of the club at random, 20% or more in our sample
would want the president to be deposed.
The answer is a) only since for approximate normal population, the sample mean is also
approximately normal regardless of the sample size, and hence the normal table can be used.
For b) the shape of the population distribution may not be normal and the sample size is 1
(only one person is selected) so the CLT does not apply and hence the normal table cant be
used.
For c), p = 10/50 = 0.2, n = 20; hence np = 4 <10. So, the CLT does not apply and hence the
normal table cant be used.
2. Suppose in a lottery game, you bet 1 on a number between 0 and 9 that you pick at random.
If you are correct, you win5, otherwise you win nothing. Suppose you play a game every
weekday i.e. Monday through Friday. (You do not play on the weekends).
a) Over 52 weeks, what is the exact sampling distribution of your total profit (i.e. income
cost)? Can the sampling distribution of your total profit over 52 weeks be approximated by a
normal distribution?
Let X be the number of wins over 52 weeks. Then X~Bin(260, 0.1). Hence, profit is 5X-260.
Since np = 260*.1 = 26 >10 and n(1-p) = 260*.9 = 234 >10, CLT holds and hence X can be
approximated by a Normal distribution. Hence, 5X-260 can also be approximated by a normal
distribution.
b) What is the approximate probability that your total profit over 52 weeks is more than 0?
As mentioned above, the sampling distribution of 5X-260 can be approximated by a normal.
with mean 5*26-260 = -130 and variance 25*23.4 = 585 i.e. 5X-260 ~ N(-130, 585). [mean
and variance of X being 260*.1 = 26 and 260*.1*.9 = 23.4 respectively]. Thus, the required
probability of it being greater than 0 is approximately

P(5X-260 > 0) = P Z>

130
585

= P(Z>5.37) = 0, where Z is the N(0,1) variable.

c) What is the probability that in a week your profit is more than 0? What is the exact sampling
distribution of the number of weeks your profit is more than 0? Can the sampling distribution

of the number of weeks your profit is more than 0 can be approximated by a normal
distribution?
The number of successes in a week (say S) ~ Bin(5,0.1). Hence, the weekly profit will be 5S-5
which will be positive only if the number of successes is > 1 (or at least 2). The probability of
5
this happening is = P(S 2) = 1- P(S<2) = 1 - P(S1) = 1 0.95 0.1 0.94 0.08146.
1
Thus, the number of weeks my profit is more than 0 (say W) will have a Binomial distribution
with n = 52 and p = .08146.
Since, np = 52*.08146 < 10, the above distribution cannot be approximated by a normal one.
d) What is the probability that the number of weeks (out of 52) your profit is more than 0 is
positive?
The required probability is P(W > 0) = 1 P(W=0) = 1 - P(profit is never more than 0) =
1 (1 .08146)52 = .9879
3. A waiter believes that his tips from various customers have a right skewed distribution with
a mean of 100 and standard deviation of 25.
a) With the above information, can we obtain the exact probability, or a reasonable
approximation, that the average tip from 15 customers will be at least 130? Why or why not?
Since the population distribution is not Normal and the sample size is less than 30, we cannot
apply the CLT (and hence the normal approximation to the sampling distribution of X ).
Hence, we cannot find a solution to this problem at least with the tools we have.
b) What is the approximate probability that the average tip from 35 customers will be at least
130?
Since n 30, we can apply the CLT as follows P( X 130) = P(Z (130-100)/25/35) = P(Z
7.1)
c) What is the approximate probability that the average tip from 35 customers will be between
90 and 150?
This will be P(90 X 150) = P((90-100)/25/35 Z (150-100)/25/35)
= P(-2.37 Z 11.83) = P(Z 11.83) P(Z -2.37) = 1 - 0.0089 =0 .99
4. Suppose the probability that Barry Bonds, a famous baseball player, gets a hit at bat is 0.3.
a) If Barry has 400 bats in a single season, what is the mean and standard error of the sampling
distribution of the sample proportion of hits at bat?
The mean of the sample proportion will be the population proportion i.e. 0.3 while the standard
error will be

.3.7/400 = .023 . As 400*.3 and 400*.7 are both greater than 10, the

distribution of p can be approximated by normal.

b) What is the probability that Barry will have at least 35 % hits at bat?
This will be P( p .35) = P(Z (.35-.30)/ .023) = P(Z 2.17) = 0.015
c) What is the probability that Barry will have at most 65% hits at bat?
This will be P( p .65) = p(Z (.65-.30)/.023) = P(Z15.22) = 1
d) What is the probability that Barry will have between 40% and 70% hits at bat?
This will be P(.40 < p <.70) = P(4.364 < Z < 17.457) 0
5. Sometimes I visit Falafal, the fruit and juice store in our campus to have a drink. The
probability that on a given day I visit Falafal is 0.25. Every day I decide independently whether
I am going to Falafal or not. When I go there, I always buy watermelon juice, which costs me
Rs. 30.
a) What is the distribution of my daily expenditure at Falafal?
Let X be my daily expenditure at Falafal. The distribution would be :
X = 30 with probability 0.25
0 wp 0.75
b) Obtain the expectation and variance of the distribution of my daily expenditure at Falafal.
E(X) = 30*.25 + 0*.75 = 7.5.
Var(X) can similarly be shown to be 168.75
c) Obtain the sampling distribution of my average daily expenditure at Falafal over two days.
Let Z be the average daily expenditure at
follows :
Day 1 expenditure
0
0
30
30

Falafal over 2 days. The possible values of Z will be as

Hence, sampling distribution of Z :

z
P(Z=z)

Day 2 expenditure
0
30
0
30
0
0.5625

Z
0
15
15
30
15
0.375

30
0.0625

d) Obtain the expectation and variance of my average daily expenditure at Falafal over two days.
From the above table, it is easy to show that E(Z) = 7.5 and Var(Z) = 84.375
e) Obtain the expectation and variance of my average daily expenditure at Falafal over 80 days.

Suppose I go to Falafel on S days out of 80 days. Then, S ~ Bin(80, .25). Each time I go to
Falafal, I spend Rs 30. Hence, my total expenditure in S days will be 30S. Hence my average
daily expenditure in 80 days will be 30S/80 = T, say. So,
E(T) = 30/80*E(S) = 30/80*(80*.25) = 30*.25 = 7.5
Similarly, Var(T) = (30/80)(30/80)(80*.25*.75) = 2.109
f) What is the sampling distribution of my average daily expenditure at Falafal over 80 days?
As mentioned above, the exact sampling distribution of my daily average expenditure over 80
days will be (30/80)S where S ~ Bin(80, .25).
However, since np and n(1-p) are both greater than 10, CLT holds and hence the above quantity
approximately follows a Normal distribution with mean 7.5 and standard error 1.452.
g) What is the probability that my total expenditure at Falafal over 80 days is more than Rs.880?
Total expenditure at Falafal = 80T.
So, P(80T > 880) = P(T>11) = P(Z > (11-7.5)/1.452) = P(Z > 2.41) = .008
6. A popular news magazine wanted to write an article on how much Indians know about
geography. They devised a test that lists 100 cities in India, all of them mentioned in the news
magazine in the last year. Each respondent was supposed to tell the state in which the city
can be found. Some examples were: (Bhopal, Silliguri, Madurai etc) Each correct answer
earned one point, for a maximum of 100. The random sample of 5000 people had a
distribution of scores that was normally distributed with mean 62 and standard deviation 12.
a) The central ninety-five percent of the people in this sample can identify how many states
correctly ?
38-86

b) 50-86

c) 50-74

d) 26-98

By the Empirical rule, approximately 95% of the observations will fall within 2 standard
deviations of the mean i.e between (62 2*12, 62+2*12) = (38, 86).
b) What percentage of those sampled scored between 50 and 74 points?
a)

68.5%

b) 95%

c) 90%

d) 82%

(50, 74) correspond to the 1 standard deviation interval about the mean because 62-12 = 50
and 62+12 = 74. By the Empirical rule, approximately 68% of the observations lie within this
interval. The closest answer is 68.5%.
As an alternative approach, you can do P(50 < X < 74) = P((54-62)/12< Z< (7462)/12) = P(-1 < Z < 1). The area between -1 and 1 under the Z curve is 0.8413-0.1587 =
0.6826 = 68.3 % which is close to 68.5%. This is NOT a question about the sample mean
number of scores. So, we use the standard deviation, NOT the standard error.
c) What kinds of scores will the top 5% of people achieve?
a)

78 or better

b) 81.74 or better

c) 90.25 or better

d) 98 or better

This is the score X with 0.05 area to its right under the normal curve or with area 0.95 to its
left. The Z score with area 0.05 to its right is 1.645. So, the required X score will be 62 +
12*1.645 = 81.74.
d) Correctly matching 45 of 100 cities to states is considered a poor performance. What
percentage of respondents in this sample scored this low ?
a)

9.93%

b) 7.78%

c) 6.55%

d) 5%

We want P(X < 45) = P(Z < (45-62)/12) = P(Z < -1.42). The area under -1.42 under the Z
curve is 0.0778 = 7.78%.

Potrebbero piacerti anche